18 KiB
LLM Tool Design for Scheme/Guile — Prior Art & Sharp Tool Set
Executive Summary
Text-based read/write tools break predictably when LLMs edit Lisp code: paren balancing errors cascade into multi-turn repair loops. The prior art is clear — structural tools that target forms by name and identity, not text position, dramatically improve edit reliability. However, not every operation needs structural treatment; a sharp tool set avoids over-engineering by applying structure only where it genuinely beats plain text.[^1]
Prior Art: What Has Been Tried
The Failure Mode of Plain Text Editing
The canonical failure is well-documented across tools and user reports. Aider's SEARCH/REPLACE format requires exact whitespace, indentation, and content match — even minor discrepancies cause silent failures. An analysis of a real coding session logged 39 editing failures: 13 from redundant edits where the LLM lost track of applied changes, 8 from context/state mismatches, and 6 from SEARCH blocks that didn't exactly match the file. For Lisp specifically, parenthesis balancing errors compound this: Claude Code issue #1827 reports Claude "spending many tool invocations trying to solve imbalanced parentheses" after a single edit, as each repair attempt may introduce new imbalances.[^2][^3][^1]
Aider's designers distilled the lesson: "HIGH LEVEL — encourage GPT to structure edits as new versions of substantive code blocks (functions, methods), not as a series of surgical/minimal changes to individual lines". Unified diffs, which present whole-function replacements, outperform surgical line edits because LLMs reason better about complete units of code.[^4]
clojure-mcp — The Closest Analog
The most relevant prior art is bhauman/clojure-mcp, an MCP server that connects a Clojure nREPL to LLMs. Its editing pipeline addresses the s-expression problem directly:[^5][^6]
clojure_edit— targets top-level forms by type and name, not text position. Operations:replace,insert_before,insert_after. The LLM says "replace the function namedprocess-data," not "replace lines 42–67."clojure_edit_replace_sexp— targets sub-expressions within a function via structural navigation.- Pipeline: incoming code → lint (clj-kondo) → paren repair (parinfer) → structural patch (clj-rewrite) → format (cljfmt). Syntactic validity is a precondition, not an afterthought.[^6]
clojure_eval— evaluates in the live nREPL; REPL feedback is the fast correction loop that makes the whole system work.
The reported outcome: "edit acceptance rates significantly" higher than text-based tools; "buttery smooth" editing. The key lesson from clojure-mcp's documentation is that the REPL is the killer feature — LLMs are excellent at iterating in a REPL where each expression is independently evaluated and any error is immediately visible. The limitation is that clojure-mcp requires high-capability models (Claude 3.7/4.1, Gemini 2.5, o3/o4-mini) — structural editing tools expose poor reasoning in weaker models.[^7][^5]
A separate open issue in the Claude Code repo confirms the gap: there is no production-grade structural editing tool for Lisp dialects other than Clojure, and users resort to workarounds.[^1]
Tree-sitter-Based Indexing Tools
Several tools use tree-sitter for structural analysis rather than editing:
- CodeRLM indexes a codebase with tree-sitter and exposes tools:
init,structure,search,impl,callers,grep. It replaces the glob/grep/read exploration cycle with index-backed lookups. In tests, it found semantic bugs (duplicated code with identical names, orphaned code) that text-search missed, and completed codebase analysis in 3 minutes vs 8 minutes for native tools.[^8] - Codebase-Memory builds a persistent tree-sitter knowledge graph (66 languages, including Scheme) via MCP with 14 typed structural query tools. It achieves a 10× token reduction and 2.1× fewer tool calls vs iterative file-reading for cross-file structural queries.[^9]
- mcp-server-tree-sitter bridges tree-sitter parsing to MCP, enabling agents to rename functions and trace call hierarchies across files.[^10]
- VT Code (Rust, terminal) combines tree-sitter and ast-grep for structural edits, previewing changes before application.[^11]
For Scheme specifically, a tree-sitter-scheme grammar exists with explicit Guile support. It parses code as lists by default — appropriate for s-expression-level operations — with custom queries available for construct-level analysis (defines, lambdas, etc.).[^12]
Research: AST Guidance for LLMs
Academic work confirms the structural advantage. AST-guided fine-tuning of LLMs reduces the training–testing accuracy gap from 29.5% to 23.1% by embedding structural knowledge that generalizes better. The AST-T5 pretraining approach outperforms text-only models on code repair and transpilation. A 2026 paper on LLM code summarization found that serialized ASTs reduce average input length by 28.6% and training time by 11.3% while achieving comparable summary quality — an efficiency argument for giving LLMs structure rather than raw text.[^13][^14][^15]
The Codebase-Memory authors articulate the key distinction: "The MCP Agent excels at cross-file structural queries, hub detection, caller ranking, and dependency chain traversal, where pre-materialized graph edges avoid the linear token cost of iterative file exploration". For within-file operations on Scheme, the equivalent is form-level targeting.[^9]
LLMs in a Persistent Lisp REPL (Research Architecture)
A 2025 paper proposes embedding LLMs within a persistent Lisp REPL, where the model generates sp>...</lisp> tagged expressions that middleware intercepts and evaluates. The REPL maintains state across turns, supports introspection, macro expansion, and dynamic redefinition. This architecture maps directly onto Guile: Guile's REPL is first-class, with (system repl) accessible programmatically and support for define-syntax, eval, and runtime introspection. The key insight is that Scheme/Lisp REPLs are a natural interface for agentic loops — expressions are the unit of evaluation, and the REPL gives immediate correctness feedback.[^16]
When Structural Tools Beat Plain Text — And When They Don't
| Operation | Plain Text | Structural | Verdict |
|---|---|---|---|
| Read small file (<100 LOC) | Full content in context | Overkill | Plain text wins |
| Read large file (>300 LOC) | Wastes tokens on irrelevant forms | Collapsed signature view | Structural wins |
| Write new file | file_write is sufficient |
N/A | Plain text wins |
Replace a top-level define |
SEARCH/REPLACE, fragile on whitespace | Form-by-name replace, guaranteed valid | Structural wins |
| Edit a comment or string | Structural offers no help | Text is fine | Plain text wins |
Insert a new define after another |
file_edit with text anchor, fragile |
insert_after by form name, robust |
Structural wins |
| Sub-expression surgery (e.g., change 3rd arg of nested call) | Fragile | Fragile — LLMs struggle to specify exact paths | Both lose; use REPL instead |
| Find all callers of a function | grep (misses aliases, shadowing) |
Symbol-aware lookup | Structural wins |
| Check paren balance | Manual count, error-prone | Parser guarantee | Structural wins |
| Evaluate and test a form | Not applicable | REPL eval | REPL always wins here |
The Sharp Tool Set for Guile/Scheme
The following is a minimal, high-ROI tool set designed from an LLM perspective. Every tool is evaluated against whether it is a genuine improvement over plain file read/write.
Tier 1: Always Needed
read_module(path)
Returns a collapsed view of a Scheme source file: one line per top-level form showing only the head — (define (foo x y) ...), (define-record-type <point> ...), (define-syntax when ...). Full content for forms under a configurable line threshold (e.g., ≤5 lines). This directly mirrors the read_file collapsed view in clojure-mcp and saves tokens in proportion to file size. Beats plain read: yes, for files above ~100 LOC.[^5]
read_form(path, name)
Returns the full source text of a single top-level form identified by its defined name. The LLM calls this after read_module identifies which form to read in detail. This is the "drill down" step. Beats plain read: yes — isolates exactly what the LLM needs without surrounding context noise.
replace_form(path, name, new_source)
Replaces the entire top-level form named name with new_source. The tool:
- Parses
new_sourceto verify it is syntactically valid (balanced parens/brackets/quotes); applies parinfer-style repair if close. - Locates the existing form by symbol, not line number.
- Replaces in-place, preserving surrounding whitespace.
- Returns the repaired+formatted source if any correction was made.
This is the core of clojure-mcp's clojure_edit, adapted for Scheme. LLM generates the full new form (guided by read_form output), not a diff. Beats plain write: yes — structural location, guaranteed valid output, no paren disasters.[^6]
insert_form(path, anchor_name, position, new_source)
Inserts a new top-level form before or after the form named anchor_name. Includes the same validation pipeline as replace_form. Beats plain file_edit: yes — text-based insertion anchored to a line number breaks if any prior edit shifts lines.
delete_form(path, name)
Removes a top-level form by name. Cleaner than text-based deletion which can accidentally remove surrounding blank lines or leave orphaned comments. Beats plain edit: yes.
eval_expr(expr, namespace?)
Evaluates a Scheme expression in a running Guile REPL (e.g., via guild socket or embedded Guile). Returns stdout, return value, and any error with stack trace. Optionally scoped to a loaded module. This is the feedback loop that makes all editing tools safe — after any replace_form, the LLM calls eval_expr to verify. The REPL is the most important single tool in the set. Not a comparison to plain text — it's irreplaceable.[^16][^5]
Tier 2: High Value for Non-Trivial Codebases
check_syntax(source_or_path)
Parses a string or file and returns: {valid: bool, errors: [...], repaired_source: string}. Uses tree-sitter-scheme or Guile's own (read) in a sandboxed context. The LLM can call this before submitting edits or when unsure about balance. Beats nothing (additive) — but extremely useful as a pre-flight check.[^12]
find_references(path_or_module, symbol)
Returns all top-level forms (and optionally sub-expressions) that reference a given symbol. Uses tree-sitter structural queries rather than grep, so it handles let-bound shadowing and macro-introduced bindings differently from symbol references. Returns {form_name, path, line, context_snippet} per hit. Beats grep: yes, for refactoring and impact analysis.[^10]
list_module_exports(path)
For files with (define-module ...) and explicit #:export, returns the exported API surface. Useful for the LLM to understand what is safe to rename vs what is a public API. Beats manual reading: marginal — worth having if Guile modules are in scope.
Tier 3: Situational
macro_expand(expr, namespace)
Calls (macroexpand expr) in the REPL, returns expanded form. Useful when the LLM needs to reason about macro-generated code. Unique capability — no text equivalent.
load_module(path)
Re-loads a file into the running Guile REPL session (i.e., (load path) or (use-modules ...)). Used after a sequence of edits to verify the whole module compiles and no unbound-variable errors occur. Beats eval_expr for whole-module validation.
What to Explicitly Omit
Sub-expression path addressing (e.g., "replace the 3rd element of the 2nd let binding"): LLMs consistently fail to specify correct structural coordinates for deeply nested forms. clojure-mcp experimented with this; the guidance now is to use the REPL instead — evaluate, observe, generate a new complete form, replace it. The REPL feedback loop is more robust than surgical sub-expression addressing.[^5]
Raw AST JSON dump: Serialized AST trees are verbose and waste context. LLMs do not need to see the full AST — they need to see the source text of the relevant form. Academic work confirms that serialized ASTs don't outperform plain source for LLM comprehension tasks.[^15]
Paredit-style operations (slurp, barf, transpose-sexp): These are human-interactive operations. LLMs do not navigate code incrementally the way a human using paredit does — they reason about complete transformations. Exposing slurp/barf as tools adds complexity without benefit.[^17]
Implementation Notes for Guile
Guile provides first-class programmatic REPL access and a native s-expression reader, which means the "validation pipeline" that clojure-mcp builds from external tools (clj-kondo, parinfer, clj-rewrite) can be implemented using Guile itself:
- Parse/validate:
(with-exception-handler ... (lambda () (read (open-input-string src))) ...)— Guile's nativereadis the ground truth for s-expression validity. - Format:
(pretty-print form)from(use-modules (ice-9 pretty-print)). - REPL socket: Guile supports spawning a REPL on a Unix domain socket via
(run-server (make-tcp-server-socket #:port 7000))from(use-modules (system repl server)), making eval_expr trivially implementable. - tree-sitter-scheme (Guile grammar #7) provides structural querying for
find_referenceswithout running Guile itself — useful for static analysis of files that may not be loadable (e.g., incomplete work-in-progress).[^12]
The pi-coding-agent extension system (TypeScript-based) can host these tools as tool definitions that call out to a sidecar Guile process or tree-sitter library, keeping the structural intelligence in the language that understands Scheme natively.
Conclusion
The evidence from clojure-mcp, CodeRLM, Codebase-Memory, and Aider's failure analysis converges on three principles: (1) target forms by name/identity, never by line number or exact text match; (2) validate and auto-repair on input, not after the fact; (3) use a live REPL as the primary correctness signal. For Guile/Scheme, a seven-tool set — read_module, read_form, replace_form, insert_form, delete_form, eval_expr, check_syntax — covers the 95% case. Sub-expression addressing and raw AST exposure should be omitted; they add LLM-facing complexity that empirically leads to more errors, not fewer. The REPL is the structural editor.[^1][^5]
References
-
Structural editing tools for s-expr languages · Issue #1827 · anthropics/claude-code - Claude has excess trouble with editing LISPs' code, since they require keeping parentheses balanced....
-
Code Surgery: How AI Assistants Make Precise Edits to Your Files - Detailed Error ReportingPermalink. Aider excels at providing highly informative feedback when edits ...
-
Aider analysis of editing failures from · Issue #3895 - GitHub - Unexplained Failures: A significant number of failures occurred where Aider's feedback indicated the...
-
Unified diffs make GPT-4 Turbo 3X less lazy - Aider - Aider now asks GPT-4 Turbo to use unified diffs to edit your code. This dramatically improves GPT-4 ...
-
bhauman/clojure-mcp - Clojure MCP. Contribute to bhauman/clojure-mcp development by creating an account on GitHub.
-
ClojureMCP (Clojure MCP Server) by bhauman | AI Coding Workflows - ClojureMCP is an MCP (Model Context Protocol) server for Clojure that connects LLM clients (Claude C...
-
clojure-mcp AI Agents Free Tier and OneKey Router Discounted ... - clojure-mcp from AI Hub Admin, Insights of Top Ranking AI & Robotics Applications.
-
CodeRLM – Tree-sitter-backed code indexing for LLM agents - I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to shar...
-
Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM ... - An MCP-based tool interface exposing 14 typed structural queries (call-path tracing, impact analysis...
-
mcp-server-tree-sitter: The Ultimate Guide for AI Engineers - Tree-sitter: A powerful and efficient parser generator that builds a concrete, incremental Abstract ...
-
Rust terminal coding agent for structural edits (Tree-sitter/ast-grep) - It combines Tree-sitter parsing with ast-grep patterns for safe refactors, plus tool calls (read/sea...
-
6cdh/tree-sitter-scheme - GitHub - This parser doesn't parse language constructs. Instead, it parses code as lists. If you want languag...
-
An AST-guided LLM Approach for SVRF Code Synthesis - arXiv - Through data augmentation techniques using our internal LLM tools, we expanded this to 741 diverse e...
-
[PDF] Advancing Large Language Models for Code Using Code - EECS
-
[PDF] Code vs Serialized AST Inputs for LLM-Based Code Summarization - Experimental results show that, for method-level code summarization, serialized ASTs can achieve sum...
-
From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp ... - This work proposes an alternative path: empowering language models to use a Lisp REPL as a persisten...
-
Paredit, a Visual Guide - Calva User Guide - Calva Paredit helps you navigate, select and edit Clojure code in a structural way. LISP isn't line ...