184 lines
18 KiB
Markdown
184 lines
18 KiB
Markdown
# LLM Tool Design for Scheme/Guile — Prior Art & Sharp Tool Set
|
||
|
||
## Executive Summary
|
||
|
||
Text-based read/write tools break predictably when LLMs edit Lisp code: paren balancing errors cascade into multi-turn repair loops. The prior art is clear — structural tools that target forms by name and identity, not text position, dramatically improve edit reliability. However, not every operation needs structural treatment; a sharp tool set avoids over-engineering by applying structure only where it genuinely beats plain text.[^1]
|
||
|
||
***
|
||
|
||
## Prior Art: What Has Been Tried
|
||
|
||
### The Failure Mode of Plain Text Editing
|
||
|
||
The canonical failure is well-documented across tools and user reports. Aider's SEARCH/REPLACE format requires exact whitespace, indentation, and content match — even minor discrepancies cause silent failures. An analysis of a real coding session logged 39 editing failures: 13 from redundant edits where the LLM lost track of applied changes, 8 from context/state mismatches, and 6 from SEARCH blocks that didn't exactly match the file. For Lisp specifically, parenthesis balancing errors compound this: Claude Code issue #1827 reports Claude "spending many tool invocations trying to solve imbalanced parentheses" after a single edit, as each repair attempt may introduce new imbalances.[^2][^3][^1]
|
||
|
||
Aider's designers distilled the lesson: "HIGH LEVEL — encourage GPT to structure edits as new versions of substantive code blocks (functions, methods), not as a series of surgical/minimal changes to individual lines". Unified diffs, which present whole-function replacements, outperform surgical line edits because LLMs reason better about complete units of code.[^4]
|
||
|
||
### clojure-mcp — The Closest Analog
|
||
|
||
The most relevant prior art is `bhauman/clojure-mcp`, an MCP server that connects a Clojure nREPL to LLMs. Its editing pipeline addresses the s-expression problem directly:[^5][^6]
|
||
|
||
1. **`clojure_edit`** — targets top-level forms by *type and name*, not text position. Operations: `replace`, `insert_before`, `insert_after`. The LLM says "replace the function named `process-data`," not "replace lines 42–67."
|
||
2. **`clojure_edit_replace_sexp`** — targets sub-expressions within a function via structural navigation.
|
||
3. **Pipeline**: incoming code → lint (clj-kondo) → paren repair (parinfer) → structural patch (clj-rewrite) → format (cljfmt). Syntactic validity is a *precondition*, not an afterthought.[^6]
|
||
4. **`clojure_eval`** — evaluates in the live nREPL; REPL feedback is the fast correction loop that makes the whole system work.
|
||
|
||
The reported outcome: "edit acceptance rates significantly" higher than text-based tools; "buttery smooth" editing. The key lesson from clojure-mcp's documentation is that **the REPL is the killer feature** — LLMs are excellent at iterating in a REPL where each expression is independently evaluated and any error is immediately visible. The limitation is that clojure-mcp requires high-capability models (Claude 3.7/4.1, Gemini 2.5, o3/o4-mini) — structural editing tools expose poor reasoning in weaker models.[^7][^5]
|
||
|
||
A separate open issue in the Claude Code repo confirms the gap: there is no production-grade structural editing tool for Lisp dialects other than Clojure, and users resort to workarounds.[^1]
|
||
|
||
### Tree-sitter-Based Indexing Tools
|
||
|
||
Several tools use tree-sitter for structural *analysis* rather than editing:
|
||
|
||
- **CodeRLM** indexes a codebase with tree-sitter and exposes tools: `init`, `structure`, `search`, `impl`, `callers`, `grep`. It replaces the glob/grep/read exploration cycle with index-backed lookups. In tests, it found semantic bugs (duplicated code with identical names, orphaned code) that text-search missed, and completed codebase analysis in 3 minutes vs 8 minutes for native tools.[^8]
|
||
- **Codebase-Memory** builds a persistent tree-sitter knowledge graph (66 languages, including Scheme) via MCP with 14 typed structural query tools. It achieves a **10× token reduction** and **2.1× fewer tool calls** vs iterative file-reading for cross-file structural queries.[^9]
|
||
- **mcp-server-tree-sitter** bridges tree-sitter parsing to MCP, enabling agents to rename functions and trace call hierarchies across files.[^10]
|
||
- **VT Code** (Rust, terminal) combines tree-sitter and ast-grep for structural edits, previewing changes before application.[^11]
|
||
|
||
For Scheme specifically, a `tree-sitter-scheme` grammar exists with explicit Guile support. It parses code as lists by default — appropriate for s-expression-level operations — with custom queries available for construct-level analysis (defines, lambdas, etc.).[^12]
|
||
|
||
### Research: AST Guidance for LLMs
|
||
|
||
Academic work confirms the structural advantage. AST-guided fine-tuning of LLMs reduces the training–testing accuracy gap from 29.5% to 23.1% by embedding structural knowledge that generalizes better. The AST-T5 pretraining approach outperforms text-only models on code repair and transpilation. A 2026 paper on LLM code summarization found that serialized ASTs reduce average input length by 28.6% and training time by 11.3% while achieving comparable summary quality — an efficiency argument for giving LLMs structure rather than raw text.[^13][^14][^15]
|
||
|
||
The Codebase-Memory authors articulate the key distinction: "The MCP Agent excels at cross-file structural queries, hub detection, caller ranking, and dependency chain traversal, where pre-materialized graph edges avoid the linear token cost of iterative file exploration". For within-file operations on Scheme, the equivalent is form-level targeting.[^9]
|
||
|
||
### LLMs in a Persistent Lisp REPL (Research Architecture)
|
||
|
||
A 2025 paper proposes embedding LLMs within a persistent Lisp REPL, where the model generates `sp>...</lisp>` tagged expressions that middleware intercepts and evaluates. The REPL maintains state across turns, supports introspection, macro expansion, and dynamic redefinition. This architecture maps directly onto Guile: Guile's REPL is first-class, with `(system repl)` accessible programmatically and support for `define-syntax`, `eval`, and runtime introspection. The key insight is that Scheme/Lisp REPLs are a *natural* interface for agentic loops — expressions are the unit of evaluation, and the REPL gives immediate correctness feedback.[^16]
|
||
|
||
***
|
||
|
||
## When Structural Tools Beat Plain Text — And When They Don't
|
||
|
||
| Operation | Plain Text | Structural | Verdict |
|
||
|---|---|---|---|
|
||
| Read small file (<100 LOC) | Full content in context | Overkill | **Plain text wins** |
|
||
| Read large file (>300 LOC) | Wastes tokens on irrelevant forms | Collapsed signature view | **Structural wins** |
|
||
| Write new file | `file_write` is sufficient | N/A | **Plain text wins** |
|
||
| Replace a top-level `define` | SEARCH/REPLACE, fragile on whitespace | Form-by-name replace, guaranteed valid | **Structural wins** |
|
||
| Edit a comment or string | Structural offers no help | Text is fine | **Plain text wins** |
|
||
| Insert a new `define` after another | `file_edit` with text anchor, fragile | `insert_after` by form name, robust | **Structural wins** |
|
||
| Sub-expression surgery (e.g., change 3rd arg of nested call) | Fragile | Fragile — LLMs struggle to specify exact paths | **Both lose; use REPL instead** |
|
||
| Find all callers of a function | `grep` (misses aliases, shadowing) | Symbol-aware lookup | **Structural wins** |
|
||
| Check paren balance | Manual count, error-prone | Parser guarantee | **Structural wins** |
|
||
| Evaluate and test a form | Not applicable | REPL eval | **REPL always wins here** |
|
||
|
||
***
|
||
|
||
## The Sharp Tool Set for Guile/Scheme
|
||
|
||
The following is a minimal, high-ROI tool set designed from an LLM perspective. Every tool is evaluated against whether it is a genuine improvement over plain file read/write.
|
||
|
||
### Tier 1: Always Needed
|
||
|
||
**`read_module(path)`**
|
||
Returns a *collapsed* view of a Scheme source file: one line per top-level form showing only the head — `(define (foo x y) ...)`, `(define-record-type <point> ...)`, `(define-syntax when ...)`. Full content for forms under a configurable line threshold (e.g., ≤5 lines). This directly mirrors the `read_file` collapsed view in clojure-mcp and saves tokens in proportion to file size. **Beats plain read: yes, for files above ~100 LOC.**[^5]
|
||
|
||
**`read_form(path, name)`**
|
||
Returns the full source text of a single top-level form identified by its defined name. The LLM calls this after `read_module` identifies which form to read in detail. This is the "drill down" step. **Beats plain read: yes — isolates exactly what the LLM needs without surrounding context noise.**
|
||
|
||
**`replace_form(path, name, new_source)`**
|
||
Replaces the entire top-level form named `name` with `new_source`. The tool:
|
||
1. Parses `new_source` to verify it is syntactically valid (balanced parens/brackets/quotes); applies parinfer-style repair if close.
|
||
2. Locates the existing form by symbol, not line number.
|
||
3. Replaces in-place, preserving surrounding whitespace.
|
||
4. Returns the repaired+formatted source if any correction was made.
|
||
|
||
This is the core of clojure-mcp's `clojure_edit`, adapted for Scheme. LLM generates the full new form (guided by `read_form` output), not a diff. **Beats plain write: yes — structural location, guaranteed valid output, no paren disasters.**[^6]
|
||
|
||
**`insert_form(path, anchor_name, position, new_source)`**
|
||
Inserts a new top-level form `before` or `after` the form named `anchor_name`. Includes the same validation pipeline as `replace_form`. **Beats plain file_edit: yes — text-based insertion anchored to a line number breaks if any prior edit shifts lines.**
|
||
|
||
**`delete_form(path, name)`**
|
||
Removes a top-level form by name. Cleaner than text-based deletion which can accidentally remove surrounding blank lines or leave orphaned comments. **Beats plain edit: yes.**
|
||
|
||
**`eval_expr(expr, namespace?)`**
|
||
Evaluates a Scheme expression in a running Guile REPL (e.g., via `guild` socket or embedded Guile). Returns stdout, return value, and any error with stack trace. Optionally scoped to a loaded module. This is the feedback loop that makes all editing tools safe — after any `replace_form`, the LLM calls `eval_expr` to verify. The REPL is the most important single tool in the set. **Not a comparison to plain text — it's irreplaceable.**[^16][^5]
|
||
|
||
### Tier 2: High Value for Non-Trivial Codebases
|
||
|
||
**`check_syntax(source_or_path)`**
|
||
Parses a string or file and returns: `{valid: bool, errors: [...], repaired_source: string}`. Uses tree-sitter-scheme or Guile's own `(read)` in a sandboxed context. The LLM can call this before submitting edits or when unsure about balance. **Beats nothing (additive) — but extremely useful as a pre-flight check.**[^12]
|
||
|
||
**`find_references(path_or_module, symbol)`**
|
||
Returns all top-level forms (and optionally sub-expressions) that reference a given symbol. Uses tree-sitter structural queries rather than grep, so it handles `let`-bound shadowing and macro-introduced bindings differently from symbol references. Returns `{form_name, path, line, context_snippet}` per hit. **Beats grep: yes, for refactoring and impact analysis.**[^10]
|
||
|
||
**`list_module_exports(path)`**
|
||
For files with `(define-module ...)` and explicit `#:export`, returns the exported API surface. Useful for the LLM to understand what is safe to rename vs what is a public API. **Beats manual reading: marginal — worth having if Guile modules are in scope.**
|
||
|
||
### Tier 3: Situational
|
||
|
||
**`macro_expand(expr, namespace)`**
|
||
Calls `(macroexpand expr)` in the REPL, returns expanded form. Useful when the LLM needs to reason about macro-generated code. **Unique capability — no text equivalent.**
|
||
|
||
**`load_module(path)`**
|
||
Re-loads a file into the running Guile REPL session (i.e., `(load path)` or `(use-modules ...)`). Used after a sequence of edits to verify the whole module compiles and no unbound-variable errors occur. **Beats eval_expr for whole-module validation.**
|
||
|
||
### What to Explicitly Omit
|
||
|
||
**Sub-expression path addressing** (e.g., "replace the 3rd element of the 2nd `let` binding"): LLMs consistently fail to specify correct structural coordinates for deeply nested forms. clojure-mcp experimented with this; the guidance now is to use the REPL instead — evaluate, observe, generate a new complete form, replace it. The REPL feedback loop is more robust than surgical sub-expression addressing.[^5]
|
||
|
||
**Raw AST JSON dump**: Serialized AST trees are verbose and waste context. LLMs do not need to see the full AST — they need to see the *source text* of the relevant form. Academic work confirms that serialized ASTs don't outperform plain source for LLM comprehension tasks.[^15]
|
||
|
||
**Paredit-style operations** (slurp, barf, transpose-sexp): These are human-interactive operations. LLMs do not navigate code incrementally the way a human using paredit does — they reason about complete transformations. Exposing slurp/barf as tools adds complexity without benefit.[^17]
|
||
|
||
***
|
||
|
||
## Implementation Notes for Guile
|
||
|
||
Guile provides first-class programmatic REPL access and a native s-expression reader, which means the "validation pipeline" that clojure-mcp builds from external tools (`clj-kondo`, `parinfer`, `clj-rewrite`) can be implemented using Guile itself:
|
||
|
||
- **Parse/validate**: `(with-exception-handler ... (lambda () (read (open-input-string src))) ...)` — Guile's native `read` is the ground truth for s-expression validity.
|
||
- **Format**: `(pretty-print form)` from `(use-modules (ice-9 pretty-print))`.
|
||
- **REPL socket**: Guile supports spawning a REPL on a Unix domain socket via `(run-server (make-tcp-server-socket #:port 7000))` from `(use-modules (system repl server))`, making eval_expr trivially implementable.
|
||
- **tree-sitter-scheme** (Guile grammar #7) provides structural querying for `find_references` without running Guile itself — useful for static analysis of files that may not be loadable (e.g., incomplete work-in-progress).[^12]
|
||
|
||
The pi-coding-agent extension system (TypeScript-based) can host these tools as tool definitions that call out to a sidecar Guile process or tree-sitter library, keeping the structural intelligence in the language that understands Scheme natively.
|
||
|
||
***
|
||
|
||
## Conclusion
|
||
|
||
The evidence from clojure-mcp, CodeRLM, Codebase-Memory, and Aider's failure analysis converges on three principles: (1) target forms by name/identity, never by line number or exact text match; (2) validate and auto-repair on input, not after the fact; (3) use a live REPL as the primary correctness signal. For Guile/Scheme, a seven-tool set — `read_module`, `read_form`, `replace_form`, `insert_form`, `delete_form`, `eval_expr`, `check_syntax` — covers the 95% case. Sub-expression addressing and raw AST exposure should be omitted; they add LLM-facing complexity that empirically leads to more errors, not fewer. The REPL is the structural editor.[^1][^5]
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
1. [Structural editing tools for s-expr languages · Issue #1827 · anthropics/claude-code](https://github.com/anthropics/claude-code/issues/1827) - Claude has excess trouble with editing LISPs' code, since they require keeping parentheses balanced....
|
||
|
||
2. [Code Surgery: How AI Assistants Make Precise Edits to Your Files](https://fabianhertwig.com/blog/coding-assistants-file-edits/) - Detailed Error ReportingPermalink. Aider excels at providing highly informative feedback when edits ...
|
||
|
||
3. [Aider analysis of editing failures from · Issue #3895 - GitHub](https://github.com/Aider-AI/aider/issues/3895) - Unexplained Failures: A significant number of failures occurred where Aider's feedback indicated the...
|
||
|
||
4. [Unified diffs make GPT-4 Turbo 3X less lazy - Aider](https://aider.chat/docs/unified-diffs.html) - Aider now asks GPT-4 Turbo to use unified diffs to edit your code. This dramatically improves GPT-4 ...
|
||
|
||
5. [bhauman/clojure-mcp](https://github.com/bhauman/clojure-mcp) - Clojure MCP. Contribute to bhauman/clojure-mcp development by creating an account on GitHub.
|
||
|
||
6. [ClojureMCP (Clojure MCP Server) by bhauman | AI Coding Workflows](https://www.augmentcode.com/mcp/clojure-mcp-server) - ClojureMCP is an MCP (Model Context Protocol) server for Clojure that connects LLM clients (Claude C...
|
||
|
||
7. [clojure-mcp AI Agents Free Tier and OneKey Router Discounted ...](https://www.deepnlp.org/store/ai-agent/autonomous-agent/pub-bhauman/clojure-mcp) - clojure-mcp from AI Hub Admin, Insights of Top Ranking AI & Robotics Applications.
|
||
|
||
8. [CodeRLM – Tree-sitter-backed code indexing for LLM agents](https://news.ycombinator.com/item?id=46974515) - I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to shar...
|
||
|
||
9. [Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM ...](https://arxiv.org/html/2603.27277v1) - An MCP-based tool interface exposing 14 typed structural queries (call-path tracing, impact analysis...
|
||
|
||
10. [mcp-server-tree-sitter: The Ultimate Guide for AI Engineers](https://skywork.ai/skypage/en/mcp-server-tree-sitter-The-Ultimate-Guide-for-AI-Engineers/1972133047164960768) - Tree-sitter: A powerful and efficient parser generator that builds a concrete, incremental Abstract ...
|
||
|
||
11. [Rust terminal coding agent for structural edits (Tree-sitter/ast-grep)](https://www.reddit.com/r/rust/comments/1o9ak42/vt_code_rust_terminal_coding_agent_for_structural/) - It combines Tree-sitter parsing with ast-grep patterns for safe refactors, plus tool calls (read/sea...
|
||
|
||
12. [6cdh/tree-sitter-scheme - GitHub](https://github.com/6cdh/tree-sitter-scheme) - This parser doesn't parse language constructs. Instead, it parses code as lists. If you want languag...
|
||
|
||
13. [An AST-guided LLM Approach for SVRF Code Synthesis - arXiv](https://arxiv.org/html/2507.00352v1) - Through data augmentation techniques using our internal LLM tools, we expanded this to 741 diverse e...
|
||
|
||
14. [[PDF] Advancing Large Language Models for Code Using Code - EECS](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-50.pdf)
|
||
|
||
15. [[PDF] Code vs Serialized AST Inputs for LLM-Based Code Summarization](https://paul-harvey.org/publication/2026-llm-ast-code-summary/2026-llm-ast-code-summary.pdf) - Experimental results show that, for method-level code summarization, serialized ASTs can achieve sum...
|
||
|
||
16. [From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp ...](https://arxiv.org/html/2506.10021v1) - This work proposes an alternative path: empowering language models to use a Lisp REPL as a persisten...
|
||
|
||
17. [Paredit, a Visual Guide - Calva User Guide](https://calva.io/paredit/) - Calva Paredit helps you navigate, select and edit Clojure code in a structural way. LISP isn't line ...
|
||
|