First iteration

2026-04-01 23:35:50 +02:00
commit d0115672dd
29 changed files with 3553 additions and 0 deletions
--- a/docs/INSPIRATION.md
+++ b/docs/INSPIRATION.md
@@ -0,0 +1,813 @@
+# Inspiration: Existing Lisp Linters, Formatters & Static Analysers
+
+Survey of reference tools in `./refs/` — what they do, how they work, and what
+we can steal for a Guile linter/formatter.
+
+---
+
+## Table of Contents
+
+| Tool | Ecosystem | Type | Language |
+|------|-----------|------|----------|
+| [Eastwood](#eastwood) | Clojure | Linter (bug-finder) | Clojure/JVM |
+| [fmt](#fmt) | Racket | Formatter | Racket |
+| [Kibit](#kibit) | Clojure | Linter (idiom suggester) | Clojure |
+| [Mallet](#mallet) | Common Lisp | Linter + formatter + fixer | Common Lisp |
+| [OCICL Lint](#ocicl-lint) | Common Lisp | Linter + fixer | Common Lisp |
+| [racket-review](#racket-review) | Racket | Linter | Racket |
+| [SBLint](#sblint) | Common Lisp (SBCL) | Compiler-driven linter | Common Lisp |
+
+---
+
+## Eastwood
+
+**Repo:** `refs/eastwood/` — Clojure linter (v1.4.3, by Jonas Enlund)
+
+### What it does
+
+A **bug-finding linter** for Clojure. Focuses on detecting actual errors
+(wrong arity, undefined vars, misplaced docstrings) rather than enforcing style.
+Achieves high accuracy by using the same compilation infrastructure as the
+Clojure compiler itself.
+
+### How it works
+
+```
+File discovery (tools.namespace)
+  → Topological sort by :require/:use deps
+  → For each namespace:
+      Parse → Macroexpand → AST (tools.analyzer.jvm) → eval
+      → Run linter functions over AST nodes
+      → Filter warnings by config
+  → Report
+```
+
+Key: uses `tools.analyzer.jvm/analyze+eval` — it actually **compiles and
+evaluates** source code to build an AST. This gives compiler-grade accuracy but
+means it can only lint code that successfully loads.
+
+### Architecture
+
+- **`lint.clj`** — Central coordinator: linter registry, namespace ordering,
+  main analysis loop
+- **`analyze-ns.clj`** — AST generation via tools.analyzer
+- **`passes.clj`** — Custom analysis passes (reflection validation, def-name
+  propagation)
+- **`linters/*.clj`** — Individual linter implementations (~8 files)
+- **`reporting-callbacks.clj`** — Output formatters (multimethod dispatch)
+- **`util.clj`** — Config loading, AST walking, warning filtering
+
+### Rules (25+)
+
+| Category | Examples |
+|----------|----------|
+| Arity | `:wrong-arity` — function called with wrong arg count |
+| Definitions | `:def-in-def`, `:redefd-vars`, `:misplaced-docstrings` |
+| Unused | `:unused-private-vars`, `:unused-fn-args`, `:unused-locals`, `:unused-namespaces` |
+| Suspicious | `:constant-test`, `:suspicious-expression`, `:suspicious-test` |
+| Style | `:unlimited-use`, `:non-dynamic-earmuffs`, `:local-shadows-var` |
+| Interop | `:reflection`, `:boxed-math`, `:performance` |
+| Types | `:wrong-tag`, `:deprecations` |
+
+### Configuration
+
+Rules are suppressed via **Clojure code** (not YAML/JSON):
+
+```clojure
+(disable-warning
+ {:linter :suspicious-expression
+  :for-macro 'clojure.core/let
+  :if-inside-macroexpansion-of #{'clojure.core/when-first}
+  :within-depth 6
+  :reason "False positive from when-first expansion"})
+```
+
+Builtin config files ship for `clojure.core`, contrib libs, and popular
+third-party libraries. Users add their own via `:config-files` option.
+
+### What we can learn
+
+- **Macroexpansion-aware suppression** — Can distinguish user code from
+  macro-generated code; suppression rules can target specific macro expansions.
+  Critical for any Lisp linter.
+- **Topological namespace ordering** — Analyse dependencies before dependents.
+  Relevant if we want cross-module analysis.
+- **Linter registry pattern** — Each linter is a map `{:name :fn :enabled-by-default :url}`.
+  Simple, extensible.
+- **Warning filtering pipeline** — Raw warnings → handle result → remove ignored
+  faults → remove excluded kinds → filter by config → final warnings. Clean
+  composable chain.
+- **Metadata preservation through AST transforms** — Custom `postwalk` that
+  preserves metadata. Essential for accurate source locations.
+
+---
+
+## fmt
+
+**Repo:** `refs/fmt/` — Racket code formatter (v0.0.3, by Sorawee Porncharoenwase)
+
+### What it does
+
+An **extensible code formatter** for Racket. Reads source, reformats according
+to style conventions using **cost-based optimal layout selection**. Supports
+custom formatting rules via pluggable formatter maps.
+
+### How it works
+
+Clean **4-stage pipeline**:
+
+```
+Source string
+  → [1] Tokenize (syntax-color/module-lexer)
+  → [2] Read/Parse → tree of node/atom/wrapper structs
+  → [3] Realign (fix sexp-comments, quotes)
+  → [4] Pretty-print (pretty-expressive library, cost-based)
+  → Formatted string
+```
+
+The pretty-printer uses the **Wadler/Leijen optimal layout algorithm** via the
+`pretty-expressive` library. It evaluates multiple layout alternatives and
+selects the one with the lowest cost vector.
+
+### Architecture
+
+- **`tokenize.rkt`** (72 lines) — Lexer wrapper around Racket's `syntax-color`
+- **`read.rkt`** (135 lines) — Token stream → tree IR; preserves comments
+- **`realign.rkt`** (75 lines) — Post-process sexp-comments and quote prefixes
+- **`conventions.rkt`** (640 lines) — **All formatting rules** for 100+ Racket forms
+- **`core.rkt`** (167 lines) — `define-pretty` DSL, AST-to-document conversion
+- **`main.rkt`** (115 lines) — Public API, cost factory, entry point
+- **`params.rkt`** (38 lines) — Configuration parameters (width, indent, etc.)
+- **`raco.rkt`** (148 lines) — CLI interface (`raco fmt`)
+
+### Formatting rules (100+)
+
+Rules are organised by form type in `conventions.rkt`:
+
+| Category | Forms |
+|----------|-------|
+| Control flow | `if`, `when`, `unless`, `cond`, `case-lambda` |
+| Definitions | `define`, `define-syntax`, `lambda`, `define/contract` |
+| Bindings | `let`, `let*`, `letrec`, `parameterize`, `with-handlers` |
+| Loops | `for`, `for/list`, `for/fold`, `for/hash` (15+ variants) |
+| Modules | `module`, `begin`, `class`, `interface` |
+| Macros | `syntax-rules`, `match`, `syntax-parse`, `syntax-case` |
+| Imports | `require`, `provide` — vertically stacked |
+
+### Configuration
+
+**Pluggable formatter maps** — a function `(string? → procedure?)`:
+
+```racket
+;; .fmt.rkt
+(define (the-formatter-map s)
+  (case s
+    [("my-form") (format-uniform-body/helper 4)]
+    [else #f]))  ; delegate to standard
+```
+
+Formatter maps compose via `compose-formatter-map` (chain of responsibility).
+
+**Runtime parameters:**
+
+| Parameter | Default | Purpose |
+|-----------|---------|---------|
+| `current-width` | 102 | Page width limit |
+| `current-limit` | 120 | Computation width limit |
+| `current-max-blank-lines` | 1 | Max consecutive blank lines |
+| `current-indent` | 0 | Extra indentation |
+
+### Cost-based layout selection
+
+The pretty-printer evaluates layout alternatives using a **3-dimensional cost
+vector** `[badness, height, characters]`:
+
+- **Badness** — Quadratic penalty for exceeding page width
+- **Height** — Number of lines used
+- **Characters** — Total character count (tiebreaker)
+
+This means the formatter provably selects the **optimal layout** within the
+configured width, not just the first one that fits.
+
+### What we can learn
+
+- **Cost-based layout is the gold standard** for formatter quality. Worth
+  investing in an optimal pretty-printer (Wadler/Leijen family) rather than
+  ad-hoc heuristics.
+- **Staged pipeline** (tokenize → parse → realign → pretty-print) is clean,
+  testable, and easy to reason about. Each stage has well-defined I/O.
+- **Form-specific formatting rules** (`define-pretty` DSL) — each Scheme
+  special form gets a dedicated formatter. Extensible via user-provided maps.
+- **Comment preservation as metadata** — Comments are attached to AST nodes, not
+  discarded. Essential for a practical formatter.
+- **Pattern-based extraction** — `match/extract` identifies which elements can
+  stay inline vs. must be on separate lines. Smart structural analysis.
+- **Memoisation via weak hash tables** — Performance optimisation for AST
+  traversal without memory leaks.
+- **Config file convention** — `.fmt.rkt` in project root, auto-discovered. We
+  should do similar (`.gulie.scm` or similar).
+
+---
+
+## Kibit
+
+**Repo:** `refs/kibit/` — Clojure idiom suggester (v0.1.11, by Jonas Enlund)
+
+### What it does
+
+A **static code analyser** that identifies non-idiomatic Clojure code and
+suggests more idiomatic replacements. Example: `(if x y nil)` → `(when x y)`.
+Supports auto-replacement via `--replace` flag.
+
+**Status:** Maintenance mode. Authors recommend **Splint** as successor
+(faster, more extensible).
+
+### How it works
+
+```
+Source file
+  → Parse with edamame (side-effect-free reader)
+  → Extract S-expressions
+  → Tree walk (depth-first via clojure.walk/prewalk)
+  → Match each node against rules (core.logic unification)
+  → Simplify (iterative rewriting until fixpoint)
+  → Report or Replace (via rewrite-clj zippers)
+```
+
+The key insight: rules are expressed as **logic programming patterns** using
+`clojure.core.logic`. Pattern variables (`?x`, `?y`) unify against arbitrary
+subexpressions.
+
+### Architecture
+
+- **`core.clj`** (33 lines) — Core simplification logic (tiny!)
+- **`check.clj`** (204 lines) — Public API for checking expressions/files
+- **`check/reader.clj`** (189 lines) — Source parsing with alias tracking
+- **`rules.clj`** (39 lines) — Rule aggregation and indexing
+- **`rules/*.clj`** (~153 lines) — Rule definitions by category
+- **`reporters.clj`** (59 lines) — Output formatters (text, markdown)
+- **`replace.clj`** (134 lines) — Auto-replacement via rewrite-clj zippers
+- **`driver.clj`** (144 lines) — CLI entry point, file discovery
+
+Total: ~1,105 lines. Remarkably compact.
+
+### Rules (~60)
+
+Rules are defined via the `defrules` macro:
+
+```clojure
+(defrules rules
+  ;; Control structures
+  [(if ?x ?y nil)         (when ?x ?y)]
+  [(if ?x nil ?y)         (when-not ?x ?y)]
+  [(if (not ?x) ?y ?z)   (if-not ?x ?y ?z)]
+  [(do ?x)               ?x]
+
+  ;; Arithmetic
+  [(+ ?x 1)              (inc ?x)]
+  [(- ?x 1)              (dec ?x)]
+
+  ;; Collections
+  [(not (empty? ?x))     (seq ?x)]
+  [(into [] ?coll)       (vec ?coll)]
+
+  ;; Equality
+  [(= ?x nil)            (nil? ?x)]
+  [(= 0 ?x)             (zero? ?x)])
+```
+
+Categories: **control structures**, **arithmetic**, **collections**,
+**equality**, **miscellaneous** (string ops, Java interop, threading macros).
+
+### Auto-replacement
+
+Uses **rewrite-clj zippers** — functional tree navigation that preserves
+whitespace, comments, and formatting when applying replacements. Navigate to the
+target node, swap it, regenerate text.
+
+### What we can learn
+
+- **Logic programming for pattern matching** is beautifully expressive for
+  "suggest X instead of Y" rules. `core.logic` unification makes patterns
+  concise and bidirectional. We could use Guile's pattern matching or even a
+  miniKanren implementation.
+- **Rule-as-data pattern** — Rules are just vectors `[pattern replacement]`.
+  Easy to add, easy to test, easy for users to contribute.
+- **Iterative rewriting to fixpoint** — Apply rules until nothing changes.
+  Catches nested patterns that only become apparent after an inner rewrite.
+- **Zipper-based source rewriting** — Preserves formatting/comments when
+  applying fixes. Critical for auto-fix functionality.
+- **Side-effect-free parsing** — Using edamame instead of `clojure.core/read`
+  avoids executing reader macros. Important for security and for analysing code
+  with unknown dependencies.
+- **Guard-based filtering** — Composable predicates that decide whether to
+  report a suggestion. Users can plug in custom guards.
+- **Two resolution modes** — `:toplevel` (entire defn) vs `:subform` (individual
+  expressions). Different granularity for different use cases.
+
+---
+
+## Mallet
+
+**Repo:** `refs/mallet/` — Common Lisp linter + formatter + fixer (~15,800 LOC)
+
+### What it does
+
+A **production-grade linter** for Common Lisp with 40+ rules across 7
+categories, auto-fixing, a powerful configuration system (presets with
+inheritance), and multiple suppression mechanisms. Targets SBCL.
+
+### How it works
+
+**Three-phase pipeline:**
+
+```
+File content
+  → [1] Tokenize (hand-written tokenizer, preserves all tokens incl. comments)
+  → [2] Parse (Eclector reader with parse-result protocol → forms with precise positions)
+  → [3] Rule checking (text rules, token rules, form rules)
+  → Suppression filtering
+  → Auto-fix & formatting
+  → Report
+```
+
+**Critical design decision:** Symbols are stored as **strings**, not interned.
+This means the parser never needs to resolve packages — safe to analyse code
+with unknown dependencies.
+
+### Architecture
+
+| Module | Lines | Purpose |
+|--------|-------|---------|
+| `main.lisp` | ~600 | CLI parsing, entry point |
+| `engine.lisp` | ~900 | Linting orchestration, suppression filtering |
+| `config.lisp` | ~1,200 | Config files, presets, path-specific overrides |
+| `parser/reader.lisp` | ~800 | Eclector integration, position tracking |
+| `parser/tokenizer.lisp` | ~200 | Hand-written tokenizer |
+| `suppression.lisp` | ~600 | Suppression state management |
+| `formatter.lisp` | ~400 | Output formatters (text, JSON, line) |
+| `fixer.lisp` | ~300 | Auto-fix application |
+| `rules/` | ~5,500 | 40+ individual rule implementations |
+
+### Rules (40+)
+
+| Category | Rules | Examples |
+|----------|-------|----------|
+| Correctness | 2 | `ecase` with `otherwise`, missing `otherwise` |
+| Suspicious | 5 | Runtime `eval`, symbol interning, `ignore-errors` |
+| Practice | 6 | Avoid `:use` in `defpackage`, one package per file |
+| Cleanliness | 4 | Unused variables, unused loop vars, unused imports |
+| Style | 5 | `when`/`unless` vs `if` without else, needless `let*` |
+| Format | 6 | Line length, trailing whitespace, tabs, blank lines |
+| Metrics | 3 | Function length, cyclomatic complexity, comment ratio |
+| ASDF | 8 | Component strings, redundant prefixes, secondary systems |
+| Naming | 4 | `*special*` and `+constant+` conventions |
+| Documentation | 4 | Missing docstrings (functions, packages, variables) |
+
+Rules are **classes** inheriting from a base `rule` class with generic methods:
+
+```lisp
+(defclass if-without-else-rule (base:rule)
+  ()
+  (:default-initargs
+   :name :missing-else
+   :severity :warning
+   :category :style
+   :type :form))
+
+(defmethod base:check-form ((rule if-without-else-rule) form file)
+  ...)
+```
+
+### Configuration system
+
+**Layered presets with inheritance:**
+
+```lisp
+(:mallet-config
+ (:extends :strict)
+ (:ignore "**/vendor/**")
+ (:enable :cyclomatic-complexity :max 15)
+ (:disable :function-length)
+ (:set-severity :metrics :info)
+ (:for-paths ("tests")
+  (:enable :line-length :max 120)
+  (:disable :unused-variables)))
+```
+
+Built-in presets: `:default`, `:strict`, `:all`, `:none`.
+
+Precedence: CLI flags > config file > preset inheritance > built-in defaults.
+
+### Suppression mechanisms (3 levels)
+
+1. **Declarations** — `#+mallet (declaim (mallet:suppress-next :rule-name))`
+2. **Inline comments** — `; mallet:suppress rule-name`
+3. **Region-based** — `; mallet:disable rule-name` / `; mallet:enable rule-name`
+4. **Stale suppression detection** — Warns when suppressions don't match any violation
+
+### Auto-fix
+
+Fixes are collected, sorted bottom-to-top (to preserve line numbers), and
+applied in a single pass. Fix types: `:replace-line`, `:delete-range`,
+`:delete-lines`, `:replace-form`.
+
+### What we can learn
+
+- **Symbols as strings** is a crucial insight for Lisp linters. Avoids
+  package/module resolution entirely. We should do the same for Guile — parse
+  symbols without interning them.
+- **Eclector-style parse-result protocol** — Every sub-expression gets precise
+  line/column info. Invest in this early; it's the foundation of accurate error
+  reporting.
+- **Three rule types** (text, token, form) — Clean separation. Text rules don't
+  need parsing, token rules don't need a full AST, form rules get the full tree.
+  Efficient and composable.
+- **Preset inheritance with path-specific overrides** — Powerful configuration
+  that scales from solo projects to monorepos. `:for-paths` is particularly
+  useful (different rules for `src/` vs `tests/`).
+- **Multiple suppression mechanisms** — Comment-based, declaration-based,
+  region-based. Users need all three for real-world use.
+- **Stale suppression detection** — Prevents suppression comments from
+  accumulating after the underlying issue is fixed. Brilliant.
+- **Rule metaclass pattern** — Base class + generic methods scales cleanly to
+  40+ rules. Each rule is self-contained with its own severity, category, and
+  check method.
+- **Bottom-to-top fix application** — Simple trick that avoids line number
+  invalidation when applying multiple fixes to the same file.
+
+---
+
+## OCICL Lint
+
+**Repo:** `refs/ocicl/` — Common Lisp linter (part of the OCICL package manager)
+
+### What it does
+
+A **129-rule linter with auto-fix** for Common Lisp, integrated into the OCICL
+package manager as a subcommand (`ocicl lint`). Supports dry-run mode,
+per-line suppression, and `.ocicl-lint.conf` configuration.
+
+### How it works
+
+**Three-pass analysis:**
+
+```
+File content
+  → [Pass 1] Line-based rules (text-level: whitespace, tabs, line length)
+  → [Pass 2] AST-based rules (via rewrite-cl zippers: naming, bindings, packages)
+  → [Pass 3] Single-pass visitor rules (pattern matching: 50+ checks in one traversal)
+  → Suppression filtering (per-line ; lint:suppress comments)
+  → Auto-fix (via fixer registry)
+  → Report
+```
+
+### Architecture
+
+```
+lint/
+├── linter.lisp        — Main orchestrator, issue aggregation, output formatting
+├── config.lisp        — .ocicl-lint.conf parsing
+├── parsing.lisp       — rewrite-cl wrapper (zipper API)
+├── fixer.lisp         — Auto-fix infrastructure with RCS/backup support
+├── main.lisp          — CLI entry point
+├── rules/
+│   ├── line-based.lisp    — Text-level rules (9 rules)
+│   ├── ast.lisp           — AST-based rules (naming, lambda lists, bindings)
+│   └── single-pass.lisp   — Pattern matching rules (50+ in one walk)
+└── fixes/
+    ├── whitespace.lisp    — Formatting fixes
+    └── style.lisp         — Style rule fixes
+```
+
+### Rules (129)
+
+| Category | Count | Examples |
+|----------|-------|---------|
+| Formatting | 9 | Trailing whitespace, tabs, line length, blank lines |
+| File structure | 3 | SPDX headers, package declarations, reader errors |
+| Naming | 6 | Underscores, `*special*` style, `+constant+` style, vague names |
+| Boolean/conditionals | 18 | `(IF test T NIL)` → `test`, `(WHEN (NOT x) ...)` → `(UNLESS x ...)` |
+| Logic simplification | 12 | Flatten nested `AND`/`OR`, redundant conditions |
+| Arithmetic | 4 | `(+ x 1)` → `(1+ x)`, `(= x 0)` → `(zerop x)` |
+| List operations | 13 | `FIRST`/`REST` vs `CAR`/`CDR`, `(cons x nil)` → `(list x)` |
+| Comparison | 5 | `EQL` vs `EQ`, string equality, membership testing |
+| Sequence operations | 6 | `-IF-NOT` variants, `ASSOC` patterns |
+| Advanced/safety | 26 | Library suggestions, destructive ops on constants |
+
+### Configuration
+
+INI-style `.ocicl-lint.conf`:
+
+```ini
+max-line-length = 180
+suppress-rules = rule1, rule2, rule3
+suggest-libraries = alexandria, uiop, serapeum
+```
+
+Per-line suppression:
+
+```lisp
+(some-code) ; lint:suppress rule-name1 rule-name2
+(other-code) ; lint:suppress  ;; suppress ALL rules on this line
+```
+
+### Fixer registry
+
+```lisp
+(register-fixer "rule-name" #'fixer-function)
+```
+
+Fixers are decoupled from rule detection. Each fixer takes `(content issue)` and
+returns modified content or NIL. Supports RCS backup before modification.
+
+### What we can learn
+
+- **Single-pass visitor for pattern rules** — 50+ pattern checks in one tree
+  traversal. Much faster than running each rule separately. Good model for
+  performance-sensitive linting.
+- **Quote awareness** — Detects quoted contexts (`'x`, `quote`, backtick) to
+  avoid false positives inside macro templates. We'll need the same for Guile.
+- **Fixer registry pattern** — Decouples detection from fixing. Easy to add
+  auto-fix for a rule without touching the rule itself.
+- **Library suggestion rules** — "You could use `(alexandria:when-let ...)`
+  instead of this pattern." Interesting category that could work for Guile
+  (SRFI suggestions, etc.).
+- **Three-pass architecture** — Line-based first (fastest, no parsing needed),
+  then AST, then pattern matching. Each pass adds cost; skip what you don't need.
+
+---
+
+## racket-review
+
+**Repo:** `refs/racket-review/` — Racket linter (v0.2, by Bogdan Popa)
+
+### What it does
+
+A **surface-level linter** for Racket modules. Intentionally does NOT expand
+macros — analyses syntax only, optimised for **speed**. Designed for tight
+editor integration (ships with Flycheck for Emacs).
+
+### How it works
+
+```
+File → read-syntax (Racket's built-in reader)
+  → Validate as module form (#lang)
+  → Walk syntax tree via syntax-parse
+  → Track scopes, bindings, provides, usages
+  → Report problems
+```
+
+The entire rule system is built on Racket's `syntax/parse` — pattern matching
+on syntax objects with guard conditions and side effects.
+
+### Architecture
+
+Remarkably compact:
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `lint.rkt` | 1,130 | **All linting rules** + semantic tracking |
+| `problem.rkt` | 26 | Problem data structure |
+| `cli.rkt` | 25 | CLI interface |
+| `ext.rkt` | 59 | Extension mechanism |
+
+### Semantic tracking
+
+Maintains multiple **parameter-based state machines**:
+
+- **Scope stack** — Hierarchical scope with parent links, binding hash at each level
+- **Binding info** — Per-identifier: syntax object, usage count, check flag,
+  related identifiers
+- **Provide tracking** — What's explicitly `provide`d vs `all-defined-out`
+- **Punted bindings** — Forward references resolved when definition is encountered
+- **Savepoints** — Save/restore state for tentative matching in complex patterns
+
+### Rules
+
+**Errors (23 patterns):**
+- Identifier already defined in same scope
+- `if` missing else branch
+- `let`/`for` missing body
+- `case` clauses not quoted literals
+- Wrong match fallthrough pattern (`_` not `else`)
+- Provided but not defined
+
+**Warnings (17+ patterns):**
+- Identifier never used
+- Brackets: `let` bindings should use `[]`, not `()`
+- Requires not sorted (for-syntax first, then alphabetical)
+- Cond without else clause
+- Nested if (flatten to cond)
+- `racket/contract` → use `racket/contract/base`
+
+### Suppression
+
+```racket
+#|review: ignore|#   ;; Ignore entire file
+;; noqa              ;; Ignore this line
+;; review: ignore    ;; Ignore this line
+```
+
+### Extension mechanism
+
+Plugins register via Racket's package system:
+
+```racket
+(define review-exts
+  '((module-path predicate-proc lint-proc)))
+```
+
+Extensions receive a `current-reviewer` parameter with API:
+`recur`, `track-error!`, `track-warning!`, `track-binding!`, `push-scope!`,
+`pop-scope!`, `save!`, `undo!`.
+
+### What we can learn
+
+- **Surface-level analysis is fast and useful** — No macro expansion means
+  instant feedback. Catches the majority of real mistakes. Good default for
+  editor integration; deeper analysis can be opt-in.
+- **syntax-parse as rule DSL** — Pattern matching on syntax objects is a natural
+  fit for Lisp linters. Guile has `syntax-case` and `match` which serve a
+  similar role.
+- **Scope tracking with punted bindings** — Handles forward references in a
+  single pass. Elegant solution for `letrec`-style bindings and mutual recursion.
+- **Savepoints for tentative matching** — Save/restore state when the parser
+  enters a complex branch. If the branch fails, roll back. Useful for `cond`,
+  `match`, etc.
+- **Plugin API via reviewer parameter** — Extensions get a well-defined API
+  surface. Clean contract between core and plugins.
+- **Snapshot-based testing** — 134 test files with `.rkt`/`.rkt.out` pairs.
+  Lint a file, compare output to expected. Simple, maintainable, high coverage.
+- **Bracket style enforcement** — Racket uses `[]` for bindings, `()` for
+  application. Guile doesn't have this, but we could enforce consistent bracket
+  usage or other parenthesis conventions.
+
+---
+
+## SBLint
+
+**Repo:** `refs/sblint/` — SBCL compiler-driven linter (~650 LOC)
+
+### What it does
+
+A **compiler-assisted linter** for Common Lisp. Doesn't implement its own rules —
+instead, it **compiles code through SBCL** and surfaces all compiler diagnostics
+(errors, warnings, style notes) with proper file locations.
+
+### How it works
+
+```
+Source code
+  → Resolve ASDF dependencies (topological sort)
+  → Load dependencies via Quicklisp
+  → Compile project via SBCL (handler-bind captures conditions)
+  → Extract file/position from compiler internals (Swank protocol)
+  → Convert byte offset → line:column
+  → Deduplicate and report
+```
+
+No custom parser. No AST. Just the compiler.
+
+### Architecture
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `run-lint.lisp` | 277 | Core logic: lint file/system/directory |
+| `compiler-aux.lisp` | 33 | SBCL introspection bridge |
+| `asdf.lisp` | 153 | Dependency resolution graph |
+| `file-position.lisp` | 18 | Byte offset → line:column conversion |
+| `quicklisp.lisp` | 41 | Auto-install missing dependencies |
+| `sblint.ros` | — | CLI entry point (Roswell script) |
+
+### What it catches
+
+Whatever SBCL catches:
+- Undefined variables and functions
+- Type mismatches (with SBCL's type inference)
+- Style warnings (ANSI compliance, naming)
+- Reader/syntax errors
+- Dead code paths
+- Unused declarations
+
+Filters out: redefinition warnings, Quicklisp dependency warnings, SBCL
+contrib warnings.
+
+### What we can learn
+
+- **Leverage the host compiler** — Guile itself has `compile` and can produce
+  warnings. We should capture Guile's own compiler diagnostics (undefined
+  variables, unused imports, etc.) as a baseline — it's "free" accuracy.
+- **Condition-based error collection** — CL's condition system (≈ Guile's
+  exception/handler system) lets you catch errors without stopping compilation.
+  `handler-bind` continues execution after catching. Guile's `with-exception-handler`
+  can do the same.
+- **Dependency-aware compilation** — Load dependencies first, then compile
+  project. Catches "symbol not found" errors that surface-level analysis misses.
+- **Deduplication** — Multiple compilation passes can report the same issue.
+  Hash table dedup is simple and effective.
+- **Minimal is viable** — 650 LOC total. A compiler-driven linter layer could
+  be our first deliverable, augmented with custom rules later.
+
+---
+
+## Cross-cutting themes
+
+### Parsing strategies
+
+| Strategy | Used by | Pros | Cons |
+|----------|---------|------|------|
+| Host compiler | SBLint, Eastwood | Maximum accuracy, type checking | Requires loading code, slow |
+| Custom reader with positions | Mallet, fmt | Full control, no side effects | Must maintain parser |
+| Language's built-in reader | racket-review | Free, well-tested | May lack position info |
+| Side-effect-free reader lib | Kibit (edamame) | Safe, preserves metadata | External dependency |
+| Zipper-based AST | OCICL (rewrite-cl) | Preserves formatting for fixes | Complex API |
+
+**For Guile:** We should explore whether `(ice-9 read)` or Guile's reader
+provides sufficient source location info. If not, a custom reader (or a reader
+wrapper that annotates with positions) is needed. Guile's `read-syntax` (if
+available) or source properties on read forms could be the answer.
+
+### Rule definition patterns
+
+| Pattern | Used by | Character |
+|---------|---------|-----------|
+| Logic programming (unification) | Kibit | Elegant, concise; slow |
+| OOP classes + generic methods | Mallet | Scales well, self-contained rules |
+| Registry maps | Eastwood | Simple, data-driven |
+| Syntax-parse patterns | racket-review, fmt | Natural for Lisps |
+| Single-pass visitor | OCICL | High performance |
+| Compiler conditions | SBLint | Zero-effort, limited scope |
+
+**For Guile:** A combination seems right — `match`/`syntax-case` patterns for
+the rule DSL (natural in Scheme), with a registry for rule metadata (name,
+severity, category, enabled-by-default).
+
+### Configuration patterns
+
+| Feature | Mallet | OCICL | Eastwood | Kibit | racket-review | fmt |
+|---------|--------|-------|----------|-------|---------------|-----|
+| Config file | `.mallet.lisp` | `.ocicl-lint.conf` | Clojure maps | `project.clj` | - | `.fmt.rkt` |
+| Presets | Yes (4) | - | - | - | - | - |
+| Preset inheritance | Yes | - | - | - | - | - |
+| Path-specific rules | Yes | - | - | - | - | - |
+| Inline suppression | Yes (3 mechanisms) | Yes | Yes | - | Yes | - |
+| Stale suppression detection | Yes | - | - | - | - | - |
+| CLI override | Yes | Yes | Yes | Yes | - | Yes |
+
+**For Guile:** Mallet's configuration system is the most sophisticated and
+worth emulating — presets, inheritance, path-specific overrides, and stale
+suppression detection.
+
+### Auto-fix patterns
+
+| Tool | Fix mechanism | Preserves formatting? |
+|------|--------------|----------------------|
+| Kibit | rewrite-clj zippers | Yes |
+| Mallet | Bottom-to-top line replacement | Partial |
+| OCICL | Fixer registry + zipper AST | Yes |
+
+**For Guile:** Zipper-based AST manipulation (or Guile's SXML tools) for
+formatting-preserving fixes. The fixer registry pattern (OCICL) keeps rule
+detection and fixing decoupled.
+
+### Output formats
+
+All tools support at minimum: `file:line:column: severity: message`
+
+Additional formats: JSON (Mallet), Markdown (Kibit), line-only for CI (Mallet).
+
+---
+
+## Feature wishlist for gulie
+
+Based on this survey, the features worth cherry-picking:
+
+### Must-have (core)
+
+1. **Guile compiler diagnostics** — Capture Guile's own warnings as baseline (SBLint approach)
+2. **Custom reader with source positions** — Every form, subform, and token gets line:column
+3. **Staged pipeline** — Text rules → token rules → form rules (Mallet/OCICL)
+4. **Pattern-based rule DSL** — Using Guile's `match` or `syntax-case` (Kibit/racket-review inspiration)
+5. **Rule registry** — `{name, severity, category, enabled-by-default, check-fn}` (Eastwood)
+6. **Standard output format** — `file:line:column: severity: rule: message`
+7. **Inline suppression** — `; gulie:suppress rule-name` (Mallet/OCICL)
+
+### Should-have (v1)
+
+8. **Config file** — `.gulie.scm` with presets and rule enable/disable (Mallet)
+9. **Auto-fix infrastructure** — Fixer registry, bottom-to-top application (OCICL/Mallet)
+10. **Idiom suggestions** — Pattern → replacement rules (Kibit style)
+11. **Unused binding detection** — Scope tracking with forward reference handling (racket-review)
+12. **Quote/unquote awareness** — Don't lint inside quoted forms (OCICL)
+13. **Snapshot-based testing** — `.scm`/`.expected` pairs (racket-review)
+
+### Nice-to-have (v2+)
+
+14. **Code formatter** — Cost-based optimal layout (fmt)
+15. **Pluggable formatter maps** — Per-form formatting rules (fmt)
+16. **Path-specific rule overrides** — Different rules for `src/` vs `tests/` (Mallet)
+17. **Stale suppression detection** (Mallet)
+18. **Editor integration** — Flycheck/flymake for Emacs (racket-review)
+19. **Macroexpansion-aware analysis** — Suppress false positives from macro output (Eastwood)
+20. **Cyclomatic complexity and other metrics** (Mallet)
--- a/docs/PLAN.md
+++ b/docs/PLAN.md
@@ -0,0 +1,470 @@
+# Gulie — Guile Linter/Formatter: Architecture & Implementation Plan
+
+## Context
+
+No linter, formatter, or static analyser exists for Guile Scheme. We're building
+one from scratch, called **gulie**. The tool is written in Guile itself, reusing
+as much of Guile's infrastructure as possible (reader, compiler, Tree-IL
+analyses, warning system). The design draws on patterns observed in 7 reference
+tools (see `docs/INSPIRATION.md`).
+
+Guile 3.0.11 is available in the devenv. No source code exists yet.
+
+---
+
+## High-level architecture
+
+Two independent passes, extensible to three:
+
+```
+                         .gulie.sexp (config)
+                              |
+  file.scm ──┬──> [Tokenizer] ──> tokens ──> [CST parser] ──> CST
+              |         |
+              |   [Pass 1: Surface]  line rules + CST rules
+              |         |
+              |    diagnostics-1
+              |
+              └──> [Guile reader] ──> s-exprs ──> [Guile compiler] ──> Tree-IL
+                        |
+                  [Pass 2: Semantic]  built-in analyses + custom Tree-IL rules
+                        |
+                   diagnostics-2
+                        |
+              [merge + suppress + sort + report/fix]
+```
+
+**Why two passes?** Guile's reader (`ice-9/read.scm:949-973`) irrecoverably
+strips comments, whitespace, and datum comments in `next-non-whitespace`. There
+is no way to get formatting info AND semantic info from one parse. Accepting this
+and building two clean, independent passes is simpler than fighting the reader.
+
+---
+
+## Module structure
+
+```
+gulie/
+  bin/gulie                        # CLI entry point (executable Guile script)
+  gulie/
+    cli.scm                        # (gulie cli) — arg parsing, dispatch
+    config.scm                     # (gulie config) — .gulie.sexp loading, defaults, merging
+    diagnostic.scm                 # (gulie diagnostic) — record type, sorting, formatting
+    tokenizer.scm                  # (gulie tokenizer) — hand-written lexer, preserves everything
+    cst.scm                        # (gulie cst) — token stream → concrete syntax tree
+    compiler.scm                   # (gulie compiler) — Guile compile wrapper, warning capture
+    rule.scm                       # (gulie rule) — rule record, registry, define-rule macros
+    engine.scm                     # (gulie engine) — orchestrator: file discovery, pass sequencing
+    fixer.scm                      # (gulie fixer) — fix application (bottom-to-top edits)
+    suppression.scm                # (gulie suppression) — ; gulie:suppress parsing/filtering
+    formatter.scm                  # (gulie formatter) — cost-based optimal pretty-printer
+    rules/
+      surface.scm                  # (gulie rules surface) — trailing-ws, line-length, tabs, blanks
+      indentation.scm              # (gulie rules indentation) — indent checking vs CST
+      comments.scm                 # (gulie rules comments) — comment style conventions
+      semantic.scm                 # (gulie rules semantic) — wrappers around Guile's analyses
+      idiom.scm                    # (gulie rules idiom) — pattern-based suggestions via match
+      module-form.scm              # (gulie rules module-form) — define-module checks
+  test/
+    test-tokenizer.scm
+    test-cst.scm
+    test-rules-surface.scm
+    test-rules-semantic.scm
+    fixtures/
+      clean/                       # .scm files producing zero diagnostics
+      violations/                  # .scm + .expected pairs (snapshot testing)
+```
+
+~16 source files. Each has one clear job.
+
+---
+
+## Key components
+
+### Tokenizer (`gulie/tokenizer.scm`)
+
+Hand-written character-by-character state machine. Must handle the same lexical
+syntax as Guile's reader but **preserve** what the reader discards.
+
+```scheme
+(define-record-type <token>
+  (make-token type text line column)
+  token?
+  (type   token-type)      ;; symbol (see list below)
+  (text   token-text)      ;; string: exact source text
+  (line   token-line)      ;; integer: 1-based
+  (column token-column))   ;; integer: 0-based
+```
+
+Token types (~15): `open-paren`, `close-paren`, `symbol`, `number`, `string`,
+`keyword`, `boolean`, `character`, `prefix` (`'`, `` ` ``, `,`, `,@`, `#'`,
+etc.), `special` (`#;`, `#(`, `#vu8(`, etc.), `line-comment`, `block-comment`,
+`whitespace`, `newline`, `dot`.
+
+**Critical invariant:** `(string-concatenate (map token-text (tokenize input)))` must
+reproduce the original input exactly. This is our primary roundtrip test.
+
+Estimated size: ~200-250 lines. Reference: Mallet's tokenizer (163 lines CL).
+
+### CST (`gulie/cst.scm`)
+
+Trivial parenthesised tree built from the token stream:
+
+```scheme
+(define-record-type <cst-node>
+  (make-cst-node open close children)
+  cst-node?
+  (open     cst-node-open)       ;; <token> for ( [ {
+  (close    cst-node-close)      ;; <token> for ) ] }
+  (children cst-node-children))  ;; list of <cst-node> | <token>
+```
+
+Children is a flat list of interleaved atoms (tokens) and nested nodes. Comments
+and whitespace are children like anything else.
+
+The first non-whitespace symbol child of a `<cst-node>` identifies the form
+(`define`, `let`, `cond`, etc.) — enough for indentation rules.
+
+Estimated size: ~80-100 lines.
+
+### Compiler wrapper (`gulie/compiler.scm`)
+
+Wraps Guile's compile pipeline to capture warnings as structured diagnostics:
+
+```scheme
+;; Key Guile APIs we delegate to:
+;; - (system base compile): read-and-compile, compile, default-warning-level
+;; - (language tree-il analyze): make-analyzer, analyze-tree
+;; - (system base message): %warning-types, current-warning-port
+```
+
+Strategy: call `read-and-compile` with `#:to 'tree-il` and `#:warning-level 2`
+while redirecting `current-warning-port` to a string port, then parse the
+warning output into `<diagnostic>` records. Alternatively, invoke `make-analyzer`
+directly and hook the warning printers.
+
+Guile's built-in analyses (all free):
+- `unused-variable-analysis`
+- `unused-toplevel-analysis`
+- `unused-module-analysis`
+- `shadowed-toplevel-analysis`
+- `make-use-before-definition-analysis` (unbound variables)
+- `arity-analysis` (wrong arg count)
+- `format-analysis` (format string validation)
+
+### Rule system (`gulie/rule.scm`)
+
+```scheme
+(define-record-type <rule>
+  (make-rule name description severity category type check-proc fix-proc)
+  rule?
+  (name        rule-name)         ;; symbol
+  (description rule-description)  ;; string
+  (severity    rule-severity)     ;; 'error | 'warning | 'info
+  (category    rule-category)     ;; 'format | 'style | 'correctness | 'idiom
+  (type        rule-type)         ;; 'line | 'cst | 'tree-il
+  (check-proc  rule-check-proc)  ;; procedure (signature depends on type)
+  (fix-proc    rule-fix-proc))   ;; procedure | #f
+```
+
+Three rule types with different check signatures:
+- **`'line`** — `(lambda (file line-num line-text config) -> diagnostics)` — fastest, no parsing
+- **`'cst`** — `(lambda (file cst config) -> diagnostics)` — needs tokenizer+CST
+- **`'tree-il`** — `(lambda (file tree-il env config) -> diagnostics)` — needs compilation
+
+Global registry: `*rules*` alist, populated at module load time via
+`register-rule!`. Convenience macros: `define-line-rule`, `define-cst-rule`,
+`define-tree-il-rule`.
+
+### Diagnostic record (`gulie/diagnostic.scm`)
+
+```scheme
+(define-record-type <diagnostic>
+  (make-diagnostic file line column severity rule message fix)
+  diagnostic?
+  (file     diagnostic-file)      ;; string
+  (line     diagnostic-line)      ;; integer, 1-based
+  (column   diagnostic-column)    ;; integer, 0-based
+  (severity diagnostic-severity)  ;; symbol
+  (rule     diagnostic-rule)      ;; symbol
+  (message  diagnostic-message)   ;; string
+  (fix      diagnostic-fix))      ;; <fix> | #f
+```
+
+Standard output: `file:line:column: severity: rule: message`
+
+### Config (`gulie/config.scm`)
+
+File: `.gulie.sexp` in project root (plain s-expression, read with `(read)`,
+never evaluated):
+
+```scheme
+((line-length . 80)
+ (indent . 2)
+ (enable trailing-whitespace line-length unused-variable arity-mismatch)
+ (disable tabs)
+ (rules
+   (line-length (max . 100)))
+ (indent-rules
+   (with-syntax . 1)
+   (match . 1))
+ (ignore "build/**" ".direnv/**"))
+```
+
+Precedence: CLI flags > config file > built-in defaults.
+
+`--init` generates a template with all rules listed and commented.
+
+### Suppression (`gulie/suppression.scm`)
+
+```scheme
+;; gulie:suppress trailing-whitespace   — suppress on next line
+(define x    "messy")
+
+(define x    "messy") ; gulie:suppress  — suppress on this line
+
+;; gulie:disable line-length            — region disable
+... code ...
+;; gulie:enable line-length             — region enable
+```
+
+Parsed from raw text before rules run. Produces a suppression map that filters
+diagnostics after all rules have emitted.
+
+---
+
+## Indentation rules
+
+The key data is `scheme-indent-function` values from `.dir-locals.el` — an
+integer N meaning "N arguments on first line, then body indented +2":
+
+```scheme
+(define *default-indent-rules*
+  '((define . 1) (define* . 1) (define-public . 1) (define-syntax . 1)
+    (define-module . 0) (lambda . 1) (lambda* . 1)
+    (let . 1) (let* . 1) (letrec . 1) (letrec* . 1)
+    (if . #f) (cond . 0) (case . 1) (when . 1) (unless . 1)
+    (match . 1) (syntax-case . 2) (with-syntax . 1)
+    (begin . 0) (do . 2) (parameterize . 1) (guard . 1)))
+```
+
+Overridable via config `indent-rules`. The indentation checker walks the CST,
+identifies the form by its first symbol child, looks up the rule, and compares
+actual indentation to expected.
+
+---
+
+## Formatting conventions (Guile vs Guix)
+
+Both use 2-space indent, same special-form conventions. Key difference:
+- **Guile:** 72-char fill column, `;;; {Section}` headers
+- **Guix:** 78-80 char fill column, `;;` headers
+
+Our default config targets Guile conventions. A Guix preset can override
+`line-length` and comment style.
+
+---
+
+## Formatter: cost-based optimal pretty-printing
+
+The formatter (`gulie/formatter.scm`) is a later-phase component that
+**rewrites** files with correct layout, as opposed to the indentation checker
+which merely **reports** violations.
+
+### Why cost-based?
+
+When deciding where to break lines in a long expression, there are often multiple
+valid options. A greedy approach (fill as much as fits, then break) produces
+mediocre output — it can't "look ahead" to see that a break earlier would produce
+a better overall layout. The Wadler/Leijen family of algorithms evaluates
+alternative layouts and selects the optimal one.
+
+### The algorithm (Wadler/Leijen, as used by fmt's `pretty-expressive`)
+
+The pretty-printer works with an abstract **document** type:
+
+```
+doc = text(string)       — literal text
+    | line               — line break (or space if flattened)
+    | nest(n, doc)       — increase indent by n
+    | concat(doc, doc)   — concatenation
+    | alt(doc, doc)      — choose better of two layouts
+    | group(doc)         — try flat first, break if doesn't fit
+```
+
+The key operator is `alt(a, b)` — "try layout A, but if it overflows the page
+width, use layout B instead." The algorithm evaluates both alternatives and
+picks the one with the lower **cost vector**:
+
+```
+cost = [badness, height, characters]
+
+  badness    — quadratic penalty for exceeding page width
+  height     — number of lines used
+  characters — total chars (tiebreaker)
+```
+
+This produces provably optimal output: the layout that minimises overflow while
+using the fewest lines.
+
+### How it fits our architecture
+
+```
+CST (from tokenizer + cst.scm)
+  → [doc generator] convert CST nodes to abstract doc, using form-specific rules
+  → [layout solver] evaluate alternatives, select optimal layout
+  → [renderer] emit formatted text with comments preserved
+```
+
+The **doc generator** uses the same form-identification logic as the indentation
+checker (first symbol child of a CST node) to apply form-specific layout rules.
+For example:
+
+- `define` — name on first line, body indented
+- `let` — bindings as aligned block, body indented
+- `cond` — each clause on its own line
+
+These rules are data (the `indent-rules` table extended with layout hints),
+making the formatter configurable just like the checker.
+
+### Implementation approach
+
+We can either:
+1. **Port `pretty-expressive`** from Racket — the core algorithm is ~300 lines,
+   well-documented in academic papers
+2. **Upgrade Guile's `(ice-9 pretty-print)`** — it already knows form-specific
+   indentation rules but uses greedy layout; we'd replace the layout engine with
+   cost-based selection
+
+Option 1 is cleaner (purpose-built). Option 2 reuses more existing code but
+would be a heavier modification. We'll decide when we reach that phase.
+
+### Phase note
+
+The formatter is **Phase 6** work. Phases 0-4 deliver a useful checker without
+it. The indentation checker (Phase 4) validates existing formatting; the
+formatter (Phase 6) rewrites it. The checker comes first because it's simpler
+and immediately useful in CI.
+
+---
+
+## CLI interface
+
+```
+gulie [OPTIONS] [FILE|DIR...]
+
+  --check           Report issues, exit non-zero on findings (default)
+  --fix             Fix mode: auto-fix what's possible, report the rest
+  --format          Format mode: rewrite files with optimal layout
+  --init            Generate .gulie.sexp template
+  --pass PASS       Run only: surface, semantic, all (default: all)
+  --rule RULE       Enable only this rule (repeatable)
+  --disable RULE    Disable this rule (repeatable)
+  --severity SEV    Minimum severity: error, warning, info
+  --output FORMAT   Output: standard (default), json, compact
+  --config FILE     Config file path (default: auto-discover)
+  --list-rules      List all rules and exit
+  --version         Print version
+```
+
+Exit codes: 0 = clean, 1 = findings, 2 = config error, 3 = internal error.
+
+---
+
+## Implementation phases
+
+### Phase 0: Skeleton
+- `bin/gulie` — shebang script, loads CLI module
+- `(gulie cli)` — basic arg parsing (`--check`, `--version`, file args)
+- `(gulie diagnostic)` — record type + standard formatter
+- `(gulie rule)` — record type + registry + `register-rule!`
+- `(gulie engine)` — discovers `.scm` files, runs line rules, reports
+- One trivial rule: `trailing-whitespace` (line rule)
+- **Verification:** `gulie --check some-file.scm` reports trailing whitespace
+
+### Phase 1: Tokenizer + CST + surface rules
+- `(gulie tokenizer)` — hand-written lexer
+- `(gulie cst)` — token → tree
+- Surface rules: `trailing-whitespace`, `line-length`, `no-tabs`, `blank-lines`
+- Comment rule: `comment-semicolons` (check `;`/`;;`/`;;;` usage)
+- Roundtrip test: tokenize → concat = original
+- Snapshot tests for each rule
+
+### Phase 2: Semantic rules (compiler pass)
+- `(gulie compiler)` — `read-and-compile` wrapper, warning capture
+- Semantic rules wrapping Guile's built-in analyses:
+  `unused-variable`, `unused-toplevel`, `unbound-variable`, `arity-mismatch`,
+  `format-string`, `shadowed-toplevel`, `unused-module`
+- **Verification:** run against Guile and Guix source files, check false-positive rate
+
+### Phase 3: Config + suppression
+- `(gulie config)` — `.gulie.sexp` loading + merging
+- `(gulie suppression)` — inline comment suppression
+- `--init` command
+- Rule enable/disable via config and CLI
+
+### Phase 4: Indentation checking
+- `(gulie rules indentation)` — CST-based indent checker
+- Default indent rules for standard Guile forms
+- Configurable `indent-rules` in `.gulie.sexp`
+
+### Phase 5: Fix mode + idiom rules
+- `(gulie fixer)` — bottom-to-top edit application
+- Auto-fix for: trailing whitespace, line-length (where possible)
+- `(gulie rules idiom)` — `match`-based pattern suggestions on Tree-IL
+- `(gulie rules module-form)` — `define-module` form checks (sorted imports, etc.)
+
+### Phase 6: Formatter (cost-based optimal layout)
+- `(gulie formatter)` — Wadler/Leijen pretty-printer with cost-based selection
+- Abstract document type: `text`, `line`, `nest`, `concat`, `alt`, `group`
+- Form-specific layout rules (reuse indent-rules table + layout hints)
+- Comment preservation through formatting
+- `--format` CLI mode
+- **Verification:** format Guile/Guix source files, diff against originals,
+  verify roundtrip stability (format twice = same output)
+
+### Phase 7: Cross-module analysis (future)
+- Load multiple modules, walk dependency graph
+- Unused exports, cross-module arity checks
+- `--pass cross-module` CLI option
+
+---
+
+## Testing strategy
+
+1. **Roundtrip test** (tokenizer): tokenize → concat must equal original input
+2. **Snapshot tests**: `fixtures/violations/rule-name.scm` + `.expected` pairs
+3. **Clean file tests**: `fixtures/clean/*.scm` must produce zero diagnostics
+4. **Unit tests**: `(srfi srfi-64)` for tokenizer, CST, config, diagnostics
+5. **Real-world corpus**: run against `test/guix/` and `refs/guile/module/` for
+   false-positive rate validation
+6. **Formatter idempotency**: `format(format(x)) = format(x)` for all test files
+
+---
+
+## Key design decisions
+
+| Decision | Rationale |
+|----------|-----------|
+| Hand-written tokenizer, not extending Guile's reader | The reader is ~1000 lines of nested closures not designed for extension. A clean 200-line tokenizer is easier to write/test. |
+| Two independent passes, not a unified AST | Reader strips comments irrecoverably. Accepting this gives clean separation. |
+| Delegate to Guile's built-in analyses | They're battle-tested, handle macroexpansion edge cases, and are maintained upstream. |
+| `(ice-9 match)` for idiom rules, not logic programming | Built-in, fast, sufficient. miniKanren can be added later if needed. |
+| S-expression config, not YAML/TOML | Zero deps. Our users write Scheme. `(read)` does the parsing. |
+| Flat CST (parens + interleaved tokens), not rich AST | Enough for indentation/formatting checks. No overengineering. |
+| Cost-based optimal layout for the formatter | Greedy formatters produce mediocre output. Wadler/Leijen is cleaner and provably correct. Worth the investment when we reach that phase. |
+| Checker first, formatter later | Checking is simpler, immediately useful in CI, and validates the tokenizer/CST infrastructure that the formatter will build on. |
+
+---
+
+## Critical files to reference during implementation
+
+- `refs/guile/module/ice-9/read.scm:949-973` — what the reader discards (our tokenizer must keep)
+- `refs/guile/module/language/tree-il/analyze.scm:1461-1479` — `make-analyzer` API
+- `refs/guile/module/system/base/compile.scm:298-340` — `read-and-compile` / `compile`
+- `refs/guile/module/system/base/message.scm:83-220` — `%warning-types` definitions
+- `refs/guile/module/language/tree-il.scm` — Tree-IL node types and traversal
+- `refs/guile/module/ice-9/pretty-print.scm` — existing pretty-printer (form-specific rules to extract)
+- `refs/mallet/src/parser/tokenizer.lisp` — reference tokenizer (163 lines)
+- `refs/fmt/conventions.rkt` — form-specific formatting rules (100+ forms)
+- `refs/fmt/main.rkt` — cost-based layout selection implementation