Files
pigibrack/docs/llm_sexp_tools.md
2026-04-04 10:23:48 +02:00

18 KiB
Raw Permalink Blame History

LLM Tool Design for Scheme/Guile — Prior Art & Sharp Tool Set

Executive Summary

Text-based read/write tools break predictably when LLMs edit Lisp code: paren balancing errors cascade into multi-turn repair loops. The prior art is clear — structural tools that target forms by name and identity, not text position, dramatically improve edit reliability. However, not every operation needs structural treatment; a sharp tool set avoids over-engineering by applying structure only where it genuinely beats plain text.[^1]


Prior Art: What Has Been Tried

The Failure Mode of Plain Text Editing

The canonical failure is well-documented across tools and user reports. Aider's SEARCH/REPLACE format requires exact whitespace, indentation, and content match — even minor discrepancies cause silent failures. An analysis of a real coding session logged 39 editing failures: 13 from redundant edits where the LLM lost track of applied changes, 8 from context/state mismatches, and 6 from SEARCH blocks that didn't exactly match the file. For Lisp specifically, parenthesis balancing errors compound this: Claude Code issue #1827 reports Claude "spending many tool invocations trying to solve imbalanced parentheses" after a single edit, as each repair attempt may introduce new imbalances.[^2][^3][^1]

Aider's designers distilled the lesson: "HIGH LEVEL — encourage GPT to structure edits as new versions of substantive code blocks (functions, methods), not as a series of surgical/minimal changes to individual lines". Unified diffs, which present whole-function replacements, outperform surgical line edits because LLMs reason better about complete units of code.[^4]

clojure-mcp — The Closest Analog

The most relevant prior art is bhauman/clojure-mcp, an MCP server that connects a Clojure nREPL to LLMs. Its editing pipeline addresses the s-expression problem directly:[^5][^6]

  1. clojure_edit — targets top-level forms by type and name, not text position. Operations: replace, insert_before, insert_after. The LLM says "replace the function named process-data," not "replace lines 4267."
  2. clojure_edit_replace_sexp — targets sub-expressions within a function via structural navigation.
  3. Pipeline: incoming code → lint (clj-kondo) → paren repair (parinfer) → structural patch (clj-rewrite) → format (cljfmt). Syntactic validity is a precondition, not an afterthought.[^6]
  4. clojure_eval — evaluates in the live nREPL; REPL feedback is the fast correction loop that makes the whole system work.

The reported outcome: "edit acceptance rates significantly" higher than text-based tools; "buttery smooth" editing. The key lesson from clojure-mcp's documentation is that the REPL is the killer feature — LLMs are excellent at iterating in a REPL where each expression is independently evaluated and any error is immediately visible. The limitation is that clojure-mcp requires high-capability models (Claude 3.7/4.1, Gemini 2.5, o3/o4-mini) — structural editing tools expose poor reasoning in weaker models.[^7][^5]

A separate open issue in the Claude Code repo confirms the gap: there is no production-grade structural editing tool for Lisp dialects other than Clojure, and users resort to workarounds.[^1]

Tree-sitter-Based Indexing Tools

Several tools use tree-sitter for structural analysis rather than editing:

  • CodeRLM indexes a codebase with tree-sitter and exposes tools: init, structure, search, impl, callers, grep. It replaces the glob/grep/read exploration cycle with index-backed lookups. In tests, it found semantic bugs (duplicated code with identical names, orphaned code) that text-search missed, and completed codebase analysis in 3 minutes vs 8 minutes for native tools.[^8]
  • Codebase-Memory builds a persistent tree-sitter knowledge graph (66 languages, including Scheme) via MCP with 14 typed structural query tools. It achieves a 10× token reduction and 2.1× fewer tool calls vs iterative file-reading for cross-file structural queries.[^9]
  • mcp-server-tree-sitter bridges tree-sitter parsing to MCP, enabling agents to rename functions and trace call hierarchies across files.[^10]
  • VT Code (Rust, terminal) combines tree-sitter and ast-grep for structural edits, previewing changes before application.[^11]

For Scheme specifically, a tree-sitter-scheme grammar exists with explicit Guile support. It parses code as lists by default — appropriate for s-expression-level operations — with custom queries available for construct-level analysis (defines, lambdas, etc.).[^12]

Research: AST Guidance for LLMs

Academic work confirms the structural advantage. AST-guided fine-tuning of LLMs reduces the trainingtesting accuracy gap from 29.5% to 23.1% by embedding structural knowledge that generalizes better. The AST-T5 pretraining approach outperforms text-only models on code repair and transpilation. A 2026 paper on LLM code summarization found that serialized ASTs reduce average input length by 28.6% and training time by 11.3% while achieving comparable summary quality — an efficiency argument for giving LLMs structure rather than raw text.[^13][^14][^15]

The Codebase-Memory authors articulate the key distinction: "The MCP Agent excels at cross-file structural queries, hub detection, caller ranking, and dependency chain traversal, where pre-materialized graph edges avoid the linear token cost of iterative file exploration". For within-file operations on Scheme, the equivalent is form-level targeting.[^9]

LLMs in a Persistent Lisp REPL (Research Architecture)

A 2025 paper proposes embedding LLMs within a persistent Lisp REPL, where the model generates sp>...</lisp> tagged expressions that middleware intercepts and evaluates. The REPL maintains state across turns, supports introspection, macro expansion, and dynamic redefinition. This architecture maps directly onto Guile: Guile's REPL is first-class, with (system repl) accessible programmatically and support for define-syntax, eval, and runtime introspection. The key insight is that Scheme/Lisp REPLs are a natural interface for agentic loops — expressions are the unit of evaluation, and the REPL gives immediate correctness feedback.[^16]


When Structural Tools Beat Plain Text — And When They Don't

Operation Plain Text Structural Verdict
Read small file (<100 LOC) Full content in context Overkill Plain text wins
Read large file (>300 LOC) Wastes tokens on irrelevant forms Collapsed signature view Structural wins
Write new file file_write is sufficient N/A Plain text wins
Replace a top-level define SEARCH/REPLACE, fragile on whitespace Form-by-name replace, guaranteed valid Structural wins
Edit a comment or string Structural offers no help Text is fine Plain text wins
Insert a new define after another file_edit with text anchor, fragile insert_after by form name, robust Structural wins
Sub-expression surgery (e.g., change 3rd arg of nested call) Fragile Fragile — LLMs struggle to specify exact paths Both lose; use REPL instead
Find all callers of a function grep (misses aliases, shadowing) Symbol-aware lookup Structural wins
Check paren balance Manual count, error-prone Parser guarantee Structural wins
Evaluate and test a form Not applicable REPL eval REPL always wins here

The Sharp Tool Set for Guile/Scheme

The following is a minimal, high-ROI tool set designed from an LLM perspective. Every tool is evaluated against whether it is a genuine improvement over plain file read/write.

Tier 1: Always Needed

read_module(path) Returns a collapsed view of a Scheme source file: one line per top-level form showing only the head — (define (foo x y) ...), (define-record-type <point> ...), (define-syntax when ...). Full content for forms under a configurable line threshold (e.g., ≤5 lines). This directly mirrors the read_file collapsed view in clojure-mcp and saves tokens in proportion to file size. Beats plain read: yes, for files above ~100 LOC.[^5]

read_form(path, name) Returns the full source text of a single top-level form identified by its defined name. The LLM calls this after read_module identifies which form to read in detail. This is the "drill down" step. Beats plain read: yes — isolates exactly what the LLM needs without surrounding context noise.

replace_form(path, name, new_source) Replaces the entire top-level form named name with new_source. The tool:

  1. Parses new_source to verify it is syntactically valid (balanced parens/brackets/quotes); applies parinfer-style repair if close.
  2. Locates the existing form by symbol, not line number.
  3. Replaces in-place, preserving surrounding whitespace.
  4. Returns the repaired+formatted source if any correction was made.

This is the core of clojure-mcp's clojure_edit, adapted for Scheme. LLM generates the full new form (guided by read_form output), not a diff. Beats plain write: yes — structural location, guaranteed valid output, no paren disasters.[^6]

insert_form(path, anchor_name, position, new_source) Inserts a new top-level form before or after the form named anchor_name. Includes the same validation pipeline as replace_form. Beats plain file_edit: yes — text-based insertion anchored to a line number breaks if any prior edit shifts lines.

delete_form(path, name) Removes a top-level form by name. Cleaner than text-based deletion which can accidentally remove surrounding blank lines or leave orphaned comments. Beats plain edit: yes.

eval_expr(expr, namespace?) Evaluates a Scheme expression in a running Guile REPL (e.g., via guild socket or embedded Guile). Returns stdout, return value, and any error with stack trace. Optionally scoped to a loaded module. This is the feedback loop that makes all editing tools safe — after any replace_form, the LLM calls eval_expr to verify. The REPL is the most important single tool in the set. Not a comparison to plain text — it's irreplaceable.[^16][^5]

Tier 2: High Value for Non-Trivial Codebases

check_syntax(source_or_path) Parses a string or file and returns: {valid: bool, errors: [...], repaired_source: string}. Uses tree-sitter-scheme or Guile's own (read) in a sandboxed context. The LLM can call this before submitting edits or when unsure about balance. Beats nothing (additive) — but extremely useful as a pre-flight check.[^12]

find_references(path_or_module, symbol) Returns all top-level forms (and optionally sub-expressions) that reference a given symbol. Uses tree-sitter structural queries rather than grep, so it handles let-bound shadowing and macro-introduced bindings differently from symbol references. Returns {form_name, path, line, context_snippet} per hit. Beats grep: yes, for refactoring and impact analysis.[^10]

list_module_exports(path) For files with (define-module ...) and explicit #:export, returns the exported API surface. Useful for the LLM to understand what is safe to rename vs what is a public API. Beats manual reading: marginal — worth having if Guile modules are in scope.

Tier 3: Situational

macro_expand(expr, namespace) Calls (macroexpand expr) in the REPL, returns expanded form. Useful when the LLM needs to reason about macro-generated code. Unique capability — no text equivalent.

load_module(path) Re-loads a file into the running Guile REPL session (i.e., (load path) or (use-modules ...)). Used after a sequence of edits to verify the whole module compiles and no unbound-variable errors occur. Beats eval_expr for whole-module validation.

What to Explicitly Omit

Sub-expression path addressing (e.g., "replace the 3rd element of the 2nd let binding"): LLMs consistently fail to specify correct structural coordinates for deeply nested forms. clojure-mcp experimented with this; the guidance now is to use the REPL instead — evaluate, observe, generate a new complete form, replace it. The REPL feedback loop is more robust than surgical sub-expression addressing.[^5]

Raw AST JSON dump: Serialized AST trees are verbose and waste context. LLMs do not need to see the full AST — they need to see the source text of the relevant form. Academic work confirms that serialized ASTs don't outperform plain source for LLM comprehension tasks.[^15]

Paredit-style operations (slurp, barf, transpose-sexp): These are human-interactive operations. LLMs do not navigate code incrementally the way a human using paredit does — they reason about complete transformations. Exposing slurp/barf as tools adds complexity without benefit.[^17]


Implementation Notes for Guile

Guile provides first-class programmatic REPL access and a native s-expression reader, which means the "validation pipeline" that clojure-mcp builds from external tools (clj-kondo, parinfer, clj-rewrite) can be implemented using Guile itself:

  • Parse/validate: (with-exception-handler ... (lambda () (read (open-input-string src))) ...) — Guile's native read is the ground truth for s-expression validity.
  • Format: (pretty-print form) from (use-modules (ice-9 pretty-print)).
  • REPL socket: Guile supports spawning a REPL on a Unix domain socket via (run-server (make-tcp-server-socket #:port 7000)) from (use-modules (system repl server)), making eval_expr trivially implementable.
  • tree-sitter-scheme (Guile grammar #7) provides structural querying for find_references without running Guile itself — useful for static analysis of files that may not be loadable (e.g., incomplete work-in-progress).[^12]

The pi-coding-agent extension system (TypeScript-based) can host these tools as tool definitions that call out to a sidecar Guile process or tree-sitter library, keeping the structural intelligence in the language that understands Scheme natively.


Conclusion

The evidence from clojure-mcp, CodeRLM, Codebase-Memory, and Aider's failure analysis converges on three principles: (1) target forms by name/identity, never by line number or exact text match; (2) validate and auto-repair on input, not after the fact; (3) use a live REPL as the primary correctness signal. For Guile/Scheme, a seven-tool set — read_module, read_form, replace_form, insert_form, delete_form, eval_expr, check_syntax — covers the 95% case. Sub-expression addressing and raw AST exposure should be omitted; they add LLM-facing complexity that empirically leads to more errors, not fewer. The REPL is the structural editor.[^1][^5]


References

  1. Structural editing tools for s-expr languages · Issue #1827 · anthropics/claude-code - Claude has excess trouble with editing LISPs' code, since they require keeping parentheses balanced....

  2. Code Surgery: How AI Assistants Make Precise Edits to Your Files - Detailed Error ReportingPermalink. Aider excels at providing highly informative feedback when edits ...

  3. Aider analysis of editing failures from · Issue #3895 - GitHub - Unexplained Failures: A significant number of failures occurred where Aider's feedback indicated the...

  4. Unified diffs make GPT-4 Turbo 3X less lazy - Aider - Aider now asks GPT-4 Turbo to use unified diffs to edit your code. This dramatically improves GPT-4 ...

  5. bhauman/clojure-mcp - Clojure MCP. Contribute to bhauman/clojure-mcp development by creating an account on GitHub.

  6. ClojureMCP (Clojure MCP Server) by bhauman | AI Coding Workflows - ClojureMCP is an MCP (Model Context Protocol) server for Clojure that connects LLM clients (Claude C...

  7. clojure-mcp AI Agents Free Tier and OneKey Router Discounted ... - clojure-mcp from AI Hub Admin, Insights of Top Ranking AI & Robotics Applications.

  8. CodeRLM Tree-sitter-backed code indexing for LLM agents - I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to shar...

  9. Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM ... - An MCP-based tool interface exposing 14 typed structural queries (call-path tracing, impact analysis...

  10. mcp-server-tree-sitter: The Ultimate Guide for AI Engineers - Tree-sitter: A powerful and efficient parser generator that builds a concrete, incremental Abstract ...

  11. Rust terminal coding agent for structural edits (Tree-sitter/ast-grep) - It combines Tree-sitter parsing with ast-grep patterns for safe refactors, plus tool calls (read/sea...

  12. 6cdh/tree-sitter-scheme - GitHub - This parser doesn't parse language constructs. Instead, it parses code as lists. If you want languag...

  13. An AST-guided LLM Approach for SVRF Code Synthesis - arXiv - Through data augmentation techniques using our internal LLM tools, we expanded this to 741 diverse e...

  14. [PDF] Advancing Large Language Models for Code Using Code - EECS

  15. [PDF] Code vs Serialized AST Inputs for LLM-Based Code Summarization - Experimental results show that, for method-level code summarization, serialized ASTs can achieve sum...

  16. From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp ... - This work proposes an alternative path: empowering language models to use a Lisp REPL as a persisten...

  17. Paredit, a Visual Guide - Calva User Guide - Calva Paredit helps you navigate, select and edit Clojure code in a structural way. LISP isn't line ...