First iteration

2026-04-01 23:35:50 +02:00
commit d0115672dd
29 changed files with 3553 additions and 0 deletions
--- a/.envrc
+++ b/.envrc
@@ -0,0 +1,12 @@
 #!/usr/bin/env bash
 export DIRENV_WARN_TIMEOUT=20s
 eval "$(devenv direnvrc)"
 # `use devenv` supports the same options as the `devenv shell` command.
 #
 # To silence all output, use `--quiet`.
 #
 # Example usage: use devenv --quiet --impure --option services.postgres.enable:bool true
 use devenv
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,13 @@
 # Devenv
 .devenv*
 devenv.local.nix
 devenv.local.yaml
 # direnv
 .direnv
 # pre-commit
 .pre-commit-config.yaml
 # Temporary
 /refs/
--- a/README.md
+++ b/README.md
@@ -0,0 +1,215 @@
 # gulie
 A linter, static analyser, and formatter for [Guile Scheme](https://www.gnu.org/software/guile/).
 ```
 $ gulie gulie/
 gulie/engine.scm:97:80: warning: line-length: line exceeds 80 characters (82)
 gulie/tokenizer.scm:131:0: warning: trailing-whitespace: trailing whitespace
 ```
 ## Why
 No linter, formatter, or static analysis tool exists for Guile Scheme. gulie
 fills that gap with a two-pass architecture that catches both surface-level
 formatting issues and deep semantic problems.
 ## Features
 - **Surface rules** (no parsing needed): trailing whitespace, line length, tabs,
  excessive blank lines, comment style conventions
 - **Semantic rules** (via Guile's compiler): unused variables, unbound
  variables, arity mismatches, format string errors, shadowed top-levels,
  unused modules
 - **Inline suppression**: `; gulie:suppress rule-name` on a line, or region
  disable/enable blocks
 - **Auto-fix mode**: `--fix` applies automatic corrections where available
 - **Configuration**: `.gulie.sexp` in your project root, overridable via CLI
 - **CI friendly**: exit code 0 for clean, 1 for findings
 ## Requirements
 - [Guile](https://www.gnu.org/software/guile/) 3.0 or later
 ## Installation
 Clone the repository and ensure `bin/gulie` is on your `PATH`, or run it
 directly:
 ```sh
 bin/gulie --check .
 ```
 ## Usage
 ```
 gulie [OPTIONS] [FILE|DIR...]
 Options:
  -h, --help         Show help message
  -v, --version      Print version
  --check            Check mode (default): report issues, exit non-zero on findings
  --fix              Fix mode: auto-fix what's possible, report the rest
  --init             Generate .gulie.sexp template in current directory
  --pass PASS        Run only: surface, semantic, all (default: all)
  --config FILE      Config file path (default: auto-discover .gulie.sexp)
  --rule RULE        Enable only this rule
  --disable RULE     Disable this rule
  --severity SEV     Minimum severity: error, warning, info
  --output FORMAT    Output format: standard (default), json, compact
  --list-rules       List all available rules
 ```
 ### Examples
 Check a single file:
 ```sh
 gulie mylib.scm
 ```
 Check an entire project:
 ```sh
 gulie src/
 ```
 Auto-fix trailing whitespace and other fixable issues:
 ```sh
 gulie --fix src/
 ```
 Generate a config template:
 ```sh
 gulie --init
 ```
 ## Configuration
 gulie looks for `.gulie.sexp` in the current directory and parent directories.
 Generate a template with `gulie --init`.
 ```scheme
 ((line-length . 80)
 (indent . 2)
 (max-blank-lines . 2)
 (enable trailing-whitespace line-length no-tabs blank-lines
         comment-semicolons unused-variable unbound-variable
         arity-mismatch)
 (disable)
 (rules
   (line-length (max . 100)))
 (indent-rules
   (define . 1) (let . 1) (lambda . 1)
   (with-syntax . 1) (match . 1))
 (ignore "build/**" ".direnv/**"))
 ```
 ## Inline Suppression
 Suppress a rule on the current line:
 ```scheme
 (define x    "messy") ; gulie:suppress trailing-whitespace
 ```
 Suppress all rules on the next line:
 ```scheme
 ;; gulie:suppress
 (define intentionally-long-variable-name "value")
 ```
 Region disable/enable:
 ```scheme
 ;; gulie:disable line-length
 (define long-line ...............................................)
 (define another .................................................)
 ;; gulie:enable line-length
 ```
 ## Architecture
 gulie uses a two-pass design:
 ```
                      .gulie.sexp
                           |
 file.scm --+--> [Tokenizer] --> tokens --> [CST parser] --> CST
             |        |
             |  [Pass 1: Surface]  line rules + CST rules
             |        |
             |   diagnostics-1
             |
             +--> [Guile compiler] --> Tree-IL --> CPS
                       |
                 [Pass 2: Semantic]  Guile's built-in analyses
                       |
                  diagnostics-2
                       |
             [merge + suppress + sort + report]
 ```
 **Pass 1** uses a hand-written tokenizer that preserves all whitespace, comments,
 and exact source text. The critical invariant:
 `(string-concatenate (map token-text (tokenize input)))` reproduces the input
 exactly. This feeds a lightweight concrete syntax tree for formatting checks.
 **Pass 2** delegates to Guile's own compiler and analysis infrastructure:
 `unused-variable-analysis`, `arity-analysis`, `format-analysis`, and others.
 These are battle-tested and handle macroexpansion correctly.
 The two passes are independent because Guile's reader irrecoverably strips
 comments and whitespace — there is no way to get formatting info and semantic
 info from a single parse.
 ## Rules
 | Rule | Type | Category | Description |
 |------|------|----------|-------------|
 | `trailing-whitespace` | line | format | Trailing spaces or tabs |
 | `line-length` | line | format | Line exceeds maximum width |
 | `no-tabs` | line | format | Tab characters in source |
 | `blank-lines` | line | format | Excessive consecutive blank lines |
 | `comment-semicolons` | cst | style | Comment style conventions (`;`/`;;`/`;;;`) |
 | `unused-variable` | semantic | correctness | Unused local variable |
 | `unused-toplevel` | semantic | correctness | Unused top-level definition |
 | `unused-module` | semantic | correctness | Unused module import |
 | `unbound-variable` | semantic | correctness | Reference to undefined variable |
 | `arity-mismatch` | semantic | correctness | Wrong number of arguments |
 | `shadowed-toplevel` | semantic | correctness | Top-level binding shadows import |
 | `format-string` | semantic | correctness | Format string validation |
 ## Module Structure
 ```
 gulie/
  cli.scm           Command-line interface
  config.scm         Configuration loading and merging
  diagnostic.scm     Diagnostic record type and formatting
  tokenizer.scm      Hand-written lexer preserving all tokens
  cst.scm            Token stream to concrete syntax tree
  compiler.scm       Guile compiler wrapper for semantic analysis
  rule.scm           Rule record type and registry
  engine.scm         Orchestrator: file discovery, pass sequencing
  suppression.scm    Inline suppression parsing and filtering
  rules/
    surface.scm      Line-based formatting rules
    comments.scm     Comment style rules
 ```
 ## Testing
 ```sh
 guile --no-auto-compile -L . -s test/run-tests.scm
 ```
 84 tests covering tokenizer roundtrip, CST parsing, surface rules, suppression,
 and semantic analysis.
 ## Licence
 [TODO: add licence]
--- a/bin/gulie
+++ b/bin/gulie
@@ -0,0 +1,24 @@
 #!/usr/bin/env -S guile --no-auto-compile -e main -s
 !#
 ;;; gulie — a linter and formatter for Guile Scheme
 ;; Add project root to load path
 (let ((dir (dirname (dirname (current-filename)))))
  (set! %load-path (cons dir %load-path)))
 ;; Load rule modules (registering rules as a side effect)
 (use-modules (gulie rules surface))
 ;; Load optional modules if available
 (false-if-exception (use-modules (gulie tokenizer)))
 (false-if-exception (use-modules (gulie cst)))
 (false-if-exception (use-modules (gulie rules comments)))
 (false-if-exception (use-modules (gulie rules indentation)))
 (false-if-exception (use-modules (gulie compiler)))
 (false-if-exception (use-modules (gulie rules semantic)))
 ;; Run
 (use-modules (gulie cli))
 (define (main args)
  (exit ((@@ (gulie cli) main) args)))
--- a/devenv.lock
+++ b/devenv.lock
@@ -0,0 +1,123 @@
 {
  "nodes": {
    "devenv": {
      "locked": {
        "dir": "src/modules",
        "lastModified": 1775040883,
        "owner": "cachix",
        "repo": "devenv",
        "rev": "c277ffa27759cd230089700da568864446528e80",
        "type": "github"
      },
      "original": {
        "dir": "src/modules",
        "owner": "cachix",
        "repo": "devenv",
        "type": "github"
      }
    },
    "flake-compat": {
      "flake": false,
      "locked": {
        "lastModified": 1767039857,
        "owner": "NixOS",
        "repo": "flake-compat",
        "rev": "5edf11c44bc78a0d334f6334cdaf7d60d732daab",
        "type": "github"
      },
      "original": {
        "owner": "NixOS",
        "repo": "flake-compat",
        "type": "github"
      }
    },
    "git-hooks": {
      "inputs": {
        "flake-compat": "flake-compat",
        "gitignore": "gitignore",
        "nixpkgs": [
          "nixpkgs"
        ]
      },
      "locked": {
        "lastModified": 1775036584,
        "owner": "cachix",
        "repo": "git-hooks.nix",
        "rev": "4e0eb042b67d863b1b34b3f64d52ceb9cd926735",
        "type": "github"
      },
      "original": {
        "owner": "cachix",
        "repo": "git-hooks.nix",
        "type": "github"
      }
    },
    "gitignore": {
      "inputs": {
        "nixpkgs": [
          "git-hooks",
          "nixpkgs"
        ]
      },
      "locked": {
        "lastModified": 1762808025,
        "owner": "hercules-ci",
        "repo": "gitignore.nix",
        "rev": "cb5e3fdca1de58ccbc3ef53de65bd372b48f567c",
        "type": "github"
      },
      "original": {
        "owner": "hercules-ci",
        "repo": "gitignore.nix",
        "type": "github"
      }
    },
    "nixpkgs": {
      "inputs": {
        "nixpkgs-src": "nixpkgs-src"
      },
      "locked": {
        "lastModified": 1774287239,
        "owner": "cachix",
        "repo": "devenv-nixpkgs",
        "rev": "fa7125ea7f1ae5430010a6e071f68375a39bd24c",
        "type": "github"
      },
      "original": {
        "owner": "cachix",
        "ref": "rolling",
        "repo": "devenv-nixpkgs",
        "type": "github"
      }
    },
    "nixpkgs-src": {
      "flake": false,
      "locked": {
        "lastModified": 1769922788,
        "narHash": "sha256-H3AfG4ObMDTkTJYkd8cz1/RbY9LatN5Mk4UF48VuSXc=",
        "owner": "NixOS",
        "repo": "nixpkgs",
        "rev": "207d15f1a6603226e1e223dc79ac29c7846da32e",
        "type": "github"
      },
      "original": {
        "owner": "NixOS",
        "ref": "nixpkgs-unstable",
        "repo": "nixpkgs",
        "type": "github"
      }
    },
    "root": {
      "inputs": {
        "devenv": "devenv",
        "git-hooks": "git-hooks",
        "nixpkgs": "nixpkgs",
        "pre-commit-hooks": [
          "git-hooks"
        ]
      }
    }
  },
  "root": "root",
  "version": 7
 }
--- a/devenv.nix
+++ b/devenv.nix
@@ -0,0 +1,48 @@
 { pkgs, lib, config, inputs, ... }:
 {
  # https://devenv.sh/basics/
  # env.GREET = "devenv";
  # https://devenv.sh/packages/
  packages = with pkgs; [
    guile
  ];
  # https://devenv.sh/languages/
  # languages.rust.enable = true;
  # https://devenv.sh/processes/
  # processes.dev.exec = "${lib.getExe pkgs.watchexec} -n -- ls -la";
  # https://devenv.sh/services/
  # services.postgres.enable = true;
  # https://devenv.sh/scripts/
  # scripts.hello.exec = ''
  #   echo hello from $GREET
  # '';
  # https://devenv.sh/basics/
  # enterShell = ''
  #   hello         # Run scripts directly
  #   git --version # Use packages
  # '';
  # https://devenv.sh/tasks/
  # tasks = {
  #   "myproj:setup".exec = "mytool build";
  #   "devenv:enterShell".after = [ "myproj:setup" ];
  # };
  # https://devenv.sh/tests/
  # enterTest = ''
  #   echo "Running tests"
  #   git --version | grep --color=auto "${pkgs.git.version}"
  # '';
  # https://devenv.sh/git-hooks/
  # git-hooks.hooks.shellcheck.enable = true;
  # See full reference at https://devenv.sh/reference/options/
 }
--- a/devenv.yaml
+++ b/devenv.yaml
@@ -0,0 +1,15 @@
 # yaml-language-server: $schema=https://devenv.sh/devenv.schema.json
 inputs:
  nixpkgs:
    url: github:cachix/devenv-nixpkgs/rolling
 # If you're using non-OSS software, you can set allowUnfree to true.
 # allowUnfree: true
 # If you're willing to use a package that's vulnerable
 # permittedInsecurePackages:
 #  - "openssl-1.1.1w"
 # If you have more than one devenv you can merge them
 #imports:
 # - ./backend
--- a/docs/INSPIRATION.md
+++ b/docs/INSPIRATION.md
@@ -0,0 +1,813 @@
 # Inspiration: Existing Lisp Linters, Formatters & Static Analysers
 Survey of reference tools in `./refs/` — what they do, how they work, and what
 we can steal for a Guile linter/formatter.
 ---
 ## Table of Contents
 | Tool | Ecosystem | Type | Language |
 |------|-----------|------|----------|
 | [Eastwood](#eastwood) | Clojure | Linter (bug-finder) | Clojure/JVM |
 | [fmt](#fmt) | Racket | Formatter | Racket |
 | [Kibit](#kibit) | Clojure | Linter (idiom suggester) | Clojure |
 | [Mallet](#mallet) | Common Lisp | Linter + formatter + fixer | Common Lisp |
 | [OCICL Lint](#ocicl-lint) | Common Lisp | Linter + fixer | Common Lisp |
 | [racket-review](#racket-review) | Racket | Linter | Racket |
 | [SBLint](#sblint) | Common Lisp (SBCL) | Compiler-driven linter | Common Lisp |
 ---
 ## Eastwood
 **Repo:** `refs/eastwood/` — Clojure linter (v1.4.3, by Jonas Enlund)
 ### What it does
 A **bug-finding linter** for Clojure. Focuses on detecting actual errors
 (wrong arity, undefined vars, misplaced docstrings) rather than enforcing style.
 Achieves high accuracy by using the same compilation infrastructure as the
 Clojure compiler itself.
 ### How it works
 ```
 File discovery (tools.namespace)
  → Topological sort by :require/:use deps
  → For each namespace:
      Parse → Macroexpand → AST (tools.analyzer.jvm) → eval
      → Run linter functions over AST nodes
      → Filter warnings by config
  → Report
 ```
 Key: uses `tools.analyzer.jvm/analyze+eval` — it actually **compiles and
 evaluates** source code to build an AST. This gives compiler-grade accuracy but
 means it can only lint code that successfully loads.
 ### Architecture
 - **`lint.clj`** — Central coordinator: linter registry, namespace ordering,
  main analysis loop
 - **`analyze-ns.clj`** — AST generation via tools.analyzer
 - **`passes.clj`** — Custom analysis passes (reflection validation, def-name
  propagation)
 - **`linters/*.clj`** — Individual linter implementations (~8 files)
 - **`reporting-callbacks.clj`** — Output formatters (multimethod dispatch)
 - **`util.clj`** — Config loading, AST walking, warning filtering
 ### Rules (25+)
 | Category | Examples |
 |----------|----------|
 | Arity | `:wrong-arity` — function called with wrong arg count |
 | Definitions | `:def-in-def`, `:redefd-vars`, `:misplaced-docstrings` |
 | Unused | `:unused-private-vars`, `:unused-fn-args`, `:unused-locals`, `:unused-namespaces` |
 | Suspicious | `:constant-test`, `:suspicious-expression`, `:suspicious-test` |
 | Style | `:unlimited-use`, `:non-dynamic-earmuffs`, `:local-shadows-var` |
 | Interop | `:reflection`, `:boxed-math`, `:performance` |
 | Types | `:wrong-tag`, `:deprecations` |
 ### Configuration
 Rules are suppressed via **Clojure code** (not YAML/JSON):
 ```clojure
 (disable-warning
 {:linter :suspicious-expression
  :for-macro 'clojure.core/let
  :if-inside-macroexpansion-of #{'clojure.core/when-first}
  :within-depth 6
  :reason "False positive from when-first expansion"})
 ```
 Builtin config files ship for `clojure.core`, contrib libs, and popular
 third-party libraries. Users add their own via `:config-files` option.
 ### What we can learn
 - **Macroexpansion-aware suppression** — Can distinguish user code from
  macro-generated code; suppression rules can target specific macro expansions.
  Critical for any Lisp linter.
 - **Topological namespace ordering** — Analyse dependencies before dependents.
  Relevant if we want cross-module analysis.
 - **Linter registry pattern** — Each linter is a map `{:name :fn :enabled-by-default :url}`.
  Simple, extensible.
 - **Warning filtering pipeline** — Raw warnings → handle result → remove ignored
  faults → remove excluded kinds → filter by config → final warnings. Clean
  composable chain.
 - **Metadata preservation through AST transforms** — Custom `postwalk` that
  preserves metadata. Essential for accurate source locations.
 ---
 ## fmt
 **Repo:** `refs/fmt/` — Racket code formatter (v0.0.3, by Sorawee Porncharoenwase)
 ### What it does
 An **extensible code formatter** for Racket. Reads source, reformats according
 to style conventions using **cost-based optimal layout selection**. Supports
 custom formatting rules via pluggable formatter maps.
 ### How it works
 Clean **4-stage pipeline**:
 ```
 Source string
  → [1] Tokenize (syntax-color/module-lexer)
  → [2] Read/Parse → tree of node/atom/wrapper structs
  → [3] Realign (fix sexp-comments, quotes)
  → [4] Pretty-print (pretty-expressive library, cost-based)
  → Formatted string
 ```
 The pretty-printer uses the **Wadler/Leijen optimal layout algorithm** via the
 `pretty-expressive` library. It evaluates multiple layout alternatives and
 selects the one with the lowest cost vector.
 ### Architecture
 - **`tokenize.rkt`** (72 lines) — Lexer wrapper around Racket's `syntax-color`
 - **`read.rkt`** (135 lines) — Token stream → tree IR; preserves comments
 - **`realign.rkt`** (75 lines) — Post-process sexp-comments and quote prefixes
 - **`conventions.rkt`** (640 lines) — **All formatting rules** for 100+ Racket forms
 - **`core.rkt`** (167 lines) — `define-pretty` DSL, AST-to-document conversion
 - **`main.rkt`** (115 lines) — Public API, cost factory, entry point
 - **`params.rkt`** (38 lines) — Configuration parameters (width, indent, etc.)
 - **`raco.rkt`** (148 lines) — CLI interface (`raco fmt`)
 ### Formatting rules (100+)
 Rules are organised by form type in `conventions.rkt`:
 | Category | Forms |
 |----------|-------|
 | Control flow | `if`, `when`, `unless`, `cond`, `case-lambda` |
 | Definitions | `define`, `define-syntax`, `lambda`, `define/contract` |
 | Bindings | `let`, `let*`, `letrec`, `parameterize`, `with-handlers` |
 | Loops | `for`, `for/list`, `for/fold`, `for/hash` (15+ variants) |
 | Modules | `module`, `begin`, `class`, `interface` |
 | Macros | `syntax-rules`, `match`, `syntax-parse`, `syntax-case` |
 | Imports | `require`, `provide` — vertically stacked |
 ### Configuration
 **Pluggable formatter maps** — a function `(string? → procedure?)`:
 ```racket
 ;; .fmt.rkt
 (define (the-formatter-map s)
  (case s
    [("my-form") (format-uniform-body/helper 4)]
    [else #f]))  ; delegate to standard
 ```
 Formatter maps compose via `compose-formatter-map` (chain of responsibility).
 **Runtime parameters:**
 | Parameter | Default | Purpose |
 |-----------|---------|---------|
 | `current-width` | 102 | Page width limit |
 | `current-limit` | 120 | Computation width limit |
 | `current-max-blank-lines` | 1 | Max consecutive blank lines |
 | `current-indent` | 0 | Extra indentation |
 ### Cost-based layout selection
 The pretty-printer evaluates layout alternatives using a **3-dimensional cost
 vector** `[badness, height, characters]`:
 - **Badness** — Quadratic penalty for exceeding page width
 - **Height** — Number of lines used
 - **Characters** — Total character count (tiebreaker)
 This means the formatter provably selects the **optimal layout** within the
 configured width, not just the first one that fits.
 ### What we can learn
 - **Cost-based layout is the gold standard** for formatter quality. Worth
  investing in an optimal pretty-printer (Wadler/Leijen family) rather than
  ad-hoc heuristics.
 - **Staged pipeline** (tokenize → parse → realign → pretty-print) is clean,
  testable, and easy to reason about. Each stage has well-defined I/O.
 - **Form-specific formatting rules** (`define-pretty` DSL) — each Scheme
  special form gets a dedicated formatter. Extensible via user-provided maps.
 - **Comment preservation as metadata** — Comments are attached to AST nodes, not
  discarded. Essential for a practical formatter.
 - **Pattern-based extraction** — `match/extract` identifies which elements can
  stay inline vs. must be on separate lines. Smart structural analysis.
 - **Memoisation via weak hash tables** — Performance optimisation for AST
  traversal without memory leaks.
 - **Config file convention** — `.fmt.rkt` in project root, auto-discovered. We
  should do similar (`.gulie.scm` or similar).
 ---
 ## Kibit
 **Repo:** `refs/kibit/` — Clojure idiom suggester (v0.1.11, by Jonas Enlund)
 ### What it does
 A **static code analyser** that identifies non-idiomatic Clojure code and
 suggests more idiomatic replacements. Example: `(if x y nil)` → `(when x y)`.
 Supports auto-replacement via `--replace` flag.
 **Status:** Maintenance mode. Authors recommend **Splint** as successor
 (faster, more extensible).
 ### How it works
 ```
 Source file
  → Parse with edamame (side-effect-free reader)
  → Extract S-expressions
  → Tree walk (depth-first via clojure.walk/prewalk)
  → Match each node against rules (core.logic unification)
  → Simplify (iterative rewriting until fixpoint)
  → Report or Replace (via rewrite-clj zippers)
 ```
 The key insight: rules are expressed as **logic programming patterns** using
 `clojure.core.logic`. Pattern variables (`?x`, `?y`) unify against arbitrary
 subexpressions.
 ### Architecture
 - **`core.clj`** (33 lines) — Core simplification logic (tiny!)
 - **`check.clj`** (204 lines) — Public API for checking expressions/files
 - **`check/reader.clj`** (189 lines) — Source parsing with alias tracking
 - **`rules.clj`** (39 lines) — Rule aggregation and indexing
 - **`rules/*.clj`** (~153 lines) — Rule definitions by category
 - **`reporters.clj`** (59 lines) — Output formatters (text, markdown)
 - **`replace.clj`** (134 lines) — Auto-replacement via rewrite-clj zippers
 - **`driver.clj`** (144 lines) — CLI entry point, file discovery
 Total: ~1,105 lines. Remarkably compact.
 ### Rules (~60)
 Rules are defined via the `defrules` macro:
 ```clojure
 (defrules rules
  ;; Control structures
  [(if ?x ?y nil)         (when ?x ?y)]
  [(if ?x nil ?y)         (when-not ?x ?y)]
  [(if (not ?x) ?y ?z)   (if-not ?x ?y ?z)]
  [(do ?x)               ?x]
  ;; Arithmetic
  [(+ ?x 1)              (inc ?x)]
  [(- ?x 1)              (dec ?x)]
  ;; Collections
  [(not (empty? ?x))     (seq ?x)]
  [(into [] ?coll)       (vec ?coll)]
  ;; Equality
  [(= ?x nil)            (nil? ?x)]
  [(= 0 ?x)             (zero? ?x)])
 ```
 Categories: **control structures**, **arithmetic**, **collections**,
 **equality**, **miscellaneous** (string ops, Java interop, threading macros).
 ### Auto-replacement
 Uses **rewrite-clj zippers** — functional tree navigation that preserves
 whitespace, comments, and formatting when applying replacements. Navigate to the
 target node, swap it, regenerate text.
 ### What we can learn
 - **Logic programming for pattern matching** is beautifully expressive for
  "suggest X instead of Y" rules. `core.logic` unification makes patterns
  concise and bidirectional. We could use Guile's pattern matching or even a
  miniKanren implementation.
 - **Rule-as-data pattern** — Rules are just vectors `[pattern replacement]`.
  Easy to add, easy to test, easy for users to contribute.
 - **Iterative rewriting to fixpoint** — Apply rules until nothing changes.
  Catches nested patterns that only become apparent after an inner rewrite.
 - **Zipper-based source rewriting** — Preserves formatting/comments when
  applying fixes. Critical for auto-fix functionality.
 - **Side-effect-free parsing** — Using edamame instead of `clojure.core/read`
  avoids executing reader macros. Important for security and for analysing code
  with unknown dependencies.
 - **Guard-based filtering** — Composable predicates that decide whether to
  report a suggestion. Users can plug in custom guards.
 - **Two resolution modes** — `:toplevel` (entire defn) vs `:subform` (individual
  expressions). Different granularity for different use cases.
 ---
 ## Mallet
 **Repo:** `refs/mallet/` — Common Lisp linter + formatter + fixer (~15,800 LOC)
 ### What it does
 A **production-grade linter** for Common Lisp with 40+ rules across 7
 categories, auto-fixing, a powerful configuration system (presets with
 inheritance), and multiple suppression mechanisms. Targets SBCL.
 ### How it works
 **Three-phase pipeline:**
 ```
 File content
  → [1] Tokenize (hand-written tokenizer, preserves all tokens incl. comments)
  → [2] Parse (Eclector reader with parse-result protocol → forms with precise positions)
  → [3] Rule checking (text rules, token rules, form rules)
  → Suppression filtering
  → Auto-fix & formatting
  → Report
 ```
 **Critical design decision:** Symbols are stored as **strings**, not interned.
 This means the parser never needs to resolve packages — safe to analyse code
 with unknown dependencies.
 ### Architecture
 | Module | Lines | Purpose |
 |--------|-------|---------|
 | `main.lisp` | ~600 | CLI parsing, entry point |
 | `engine.lisp` | ~900 | Linting orchestration, suppression filtering |
 | `config.lisp` | ~1,200 | Config files, presets, path-specific overrides |
 | `parser/reader.lisp` | ~800 | Eclector integration, position tracking |
 | `parser/tokenizer.lisp` | ~200 | Hand-written tokenizer |
 | `suppression.lisp` | ~600 | Suppression state management |
 | `formatter.lisp` | ~400 | Output formatters (text, JSON, line) |
 | `fixer.lisp` | ~300 | Auto-fix application |
 | `rules/` | ~5,500 | 40+ individual rule implementations |
 ### Rules (40+)
 | Category | Rules | Examples |
 |----------|-------|----------|
 | Correctness | 2 | `ecase` with `otherwise`, missing `otherwise` |
 | Suspicious | 5 | Runtime `eval`, symbol interning, `ignore-errors` |
 | Practice | 6 | Avoid `:use` in `defpackage`, one package per file |
 | Cleanliness | 4 | Unused variables, unused loop vars, unused imports |
 | Style | 5 | `when`/`unless` vs `if` without else, needless `let*` |
 | Format | 6 | Line length, trailing whitespace, tabs, blank lines |
 | Metrics | 3 | Function length, cyclomatic complexity, comment ratio |
 | ASDF | 8 | Component strings, redundant prefixes, secondary systems |
 | Naming | 4 | `*special*` and `+constant+` conventions |
 | Documentation | 4 | Missing docstrings (functions, packages, variables) |
 Rules are **classes** inheriting from a base `rule` class with generic methods:
 ```lisp
 (defclass if-without-else-rule (base:rule)
  ()
  (:default-initargs
   :name :missing-else
   :severity :warning
   :category :style
   :type :form))
 (defmethod base:check-form ((rule if-without-else-rule) form file)
  ...)
 ```
 ### Configuration system
 **Layered presets with inheritance:**
 ```lisp
 (:mallet-config
 (:extends :strict)
 (:ignore "**/vendor/**")
 (:enable :cyclomatic-complexity :max 15)
 (:disable :function-length)
 (:set-severity :metrics :info)
 (:for-paths ("tests")
  (:enable :line-length :max 120)
  (:disable :unused-variables)))
 ```
 Built-in presets: `:default`, `:strict`, `:all`, `:none`.
 Precedence: CLI flags > config file > preset inheritance > built-in defaults.
 ### Suppression mechanisms (3 levels)
 1. **Declarations** — `#+mallet (declaim (mallet:suppress-next :rule-name))`
 2. **Inline comments** — `; mallet:suppress rule-name`
 3. **Region-based** — `; mallet:disable rule-name` / `; mallet:enable rule-name`
 4. **Stale suppression detection** — Warns when suppressions don't match any violation
 ### Auto-fix
 Fixes are collected, sorted bottom-to-top (to preserve line numbers), and
 applied in a single pass. Fix types: `:replace-line`, `:delete-range`,
 `:delete-lines`, `:replace-form`.
 ### What we can learn
 - **Symbols as strings** is a crucial insight for Lisp linters. Avoids
  package/module resolution entirely. We should do the same for Guile — parse
  symbols without interning them.
 - **Eclector-style parse-result protocol** — Every sub-expression gets precise
  line/column info. Invest in this early; it's the foundation of accurate error
  reporting.
 - **Three rule types** (text, token, form) — Clean separation. Text rules don't
  need parsing, token rules don't need a full AST, form rules get the full tree.
  Efficient and composable.
 - **Preset inheritance with path-specific overrides** — Powerful configuration
  that scales from solo projects to monorepos. `:for-paths` is particularly
  useful (different rules for `src/` vs `tests/`).
 - **Multiple suppression mechanisms** — Comment-based, declaration-based,
  region-based. Users need all three for real-world use.
 - **Stale suppression detection** — Prevents suppression comments from
  accumulating after the underlying issue is fixed. Brilliant.
 - **Rule metaclass pattern** — Base class + generic methods scales cleanly to
  40+ rules. Each rule is self-contained with its own severity, category, and
  check method.
 - **Bottom-to-top fix application** — Simple trick that avoids line number
  invalidation when applying multiple fixes to the same file.
 ---
 ## OCICL Lint
 **Repo:** `refs/ocicl/` — Common Lisp linter (part of the OCICL package manager)
 ### What it does
 A **129-rule linter with auto-fix** for Common Lisp, integrated into the OCICL
 package manager as a subcommand (`ocicl lint`). Supports dry-run mode,
 per-line suppression, and `.ocicl-lint.conf` configuration.
 ### How it works
 **Three-pass analysis:**
 ```
 File content
  → [Pass 1] Line-based rules (text-level: whitespace, tabs, line length)
  → [Pass 2] AST-based rules (via rewrite-cl zippers: naming, bindings, packages)
  → [Pass 3] Single-pass visitor rules (pattern matching: 50+ checks in one traversal)
  → Suppression filtering (per-line ; lint:suppress comments)
  → Auto-fix (via fixer registry)
  → Report
 ```
 ### Architecture
 ```
 lint/
 ├── linter.lisp        — Main orchestrator, issue aggregation, output formatting
 ├── config.lisp        — .ocicl-lint.conf parsing
 ├── parsing.lisp       — rewrite-cl wrapper (zipper API)
 ├── fixer.lisp         — Auto-fix infrastructure with RCS/backup support
 ├── main.lisp          — CLI entry point
 ├── rules/
 │   ├── line-based.lisp    — Text-level rules (9 rules)
 │   ├── ast.lisp           — AST-based rules (naming, lambda lists, bindings)
 │   └── single-pass.lisp   — Pattern matching rules (50+ in one walk)
 └── fixes/
    ├── whitespace.lisp    — Formatting fixes
    └── style.lisp         — Style rule fixes
 ```
 ### Rules (129)
 | Category | Count | Examples |
 |----------|-------|---------|
 | Formatting | 9 | Trailing whitespace, tabs, line length, blank lines |
 | File structure | 3 | SPDX headers, package declarations, reader errors |
 | Naming | 6 | Underscores, `*special*` style, `+constant+` style, vague names |
 | Boolean/conditionals | 18 | `(IF test T NIL)` → `test`, `(WHEN (NOT x) ...)` → `(UNLESS x ...)` |
 | Logic simplification | 12 | Flatten nested `AND`/`OR`, redundant conditions |
 | Arithmetic | 4 | `(+ x 1)` → `(1+ x)`, `(= x 0)` → `(zerop x)` |
 | List operations | 13 | `FIRST`/`REST` vs `CAR`/`CDR`, `(cons x nil)` → `(list x)` |
 | Comparison | 5 | `EQL` vs `EQ`, string equality, membership testing |
 | Sequence operations | 6 | `-IF-NOT` variants, `ASSOC` patterns |
 | Advanced/safety | 26 | Library suggestions, destructive ops on constants |
 ### Configuration
 INI-style `.ocicl-lint.conf`:
 ```ini
 max-line-length = 180
 suppress-rules = rule1, rule2, rule3
 suggest-libraries = alexandria, uiop, serapeum
 ```
 Per-line suppression:
 ```lisp
 (some-code) ; lint:suppress rule-name1 rule-name2
 (other-code) ; lint:suppress  ;; suppress ALL rules on this line
 ```
 ### Fixer registry
 ```lisp
 (register-fixer "rule-name" #'fixer-function)
 ```
 Fixers are decoupled from rule detection. Each fixer takes `(content issue)` and
 returns modified content or NIL. Supports RCS backup before modification.
 ### What we can learn
 - **Single-pass visitor for pattern rules** — 50+ pattern checks in one tree
  traversal. Much faster than running each rule separately. Good model for
  performance-sensitive linting.
 - **Quote awareness** — Detects quoted contexts (`'x`, `quote`, backtick) to
  avoid false positives inside macro templates. We'll need the same for Guile.
 - **Fixer registry pattern** — Decouples detection from fixing. Easy to add
  auto-fix for a rule without touching the rule itself.
 - **Library suggestion rules** — "You could use `(alexandria:when-let ...)`
  instead of this pattern." Interesting category that could work for Guile
  (SRFI suggestions, etc.).
 - **Three-pass architecture** — Line-based first (fastest, no parsing needed),
  then AST, then pattern matching. Each pass adds cost; skip what you don't need.
 ---
 ## racket-review
 **Repo:** `refs/racket-review/` — Racket linter (v0.2, by Bogdan Popa)
 ### What it does
 A **surface-level linter** for Racket modules. Intentionally does NOT expand
 macros — analyses syntax only, optimised for **speed**. Designed for tight
 editor integration (ships with Flycheck for Emacs).
 ### How it works
 ```
 File → read-syntax (Racket's built-in reader)
  → Validate as module form (#lang)
  → Walk syntax tree via syntax-parse
  → Track scopes, bindings, provides, usages
  → Report problems
 ```
 The entire rule system is built on Racket's `syntax/parse` — pattern matching
 on syntax objects with guard conditions and side effects.
 ### Architecture
 Remarkably compact:
 | File | Lines | Purpose |
 |------|-------|---------|
 | `lint.rkt` | 1,130 | **All linting rules** + semantic tracking |
 | `problem.rkt` | 26 | Problem data structure |
 | `cli.rkt` | 25 | CLI interface |
 | `ext.rkt` | 59 | Extension mechanism |
 ### Semantic tracking
 Maintains multiple **parameter-based state machines**:
 - **Scope stack** — Hierarchical scope with parent links, binding hash at each level
 - **Binding info** — Per-identifier: syntax object, usage count, check flag,
  related identifiers
 - **Provide tracking** — What's explicitly `provide`d vs `all-defined-out`
 - **Punted bindings** — Forward references resolved when definition is encountered
 - **Savepoints** — Save/restore state for tentative matching in complex patterns
 ### Rules
 **Errors (23 patterns):**
 - Identifier already defined in same scope
 - `if` missing else branch
 - `let`/`for` missing body
 - `case` clauses not quoted literals
 - Wrong match fallthrough pattern (`_` not `else`)
 - Provided but not defined
 **Warnings (17+ patterns):**
 - Identifier never used
 - Brackets: `let` bindings should use `[]`, not `()`
 - Requires not sorted (for-syntax first, then alphabetical)
 - Cond without else clause
 - Nested if (flatten to cond)
 - `racket/contract` → use `racket/contract/base`
 ### Suppression
 ```racket
 #|review: ignore|#   ;; Ignore entire file
 ;; noqa              ;; Ignore this line
 ;; review: ignore    ;; Ignore this line
 ```
 ### Extension mechanism
 Plugins register via Racket's package system:
 ```racket
 (define review-exts
  '((module-path predicate-proc lint-proc)))
 ```
 Extensions receive a `current-reviewer` parameter with API:
 `recur`, `track-error!`, `track-warning!`, `track-binding!`, `push-scope!`,
 `pop-scope!`, `save!`, `undo!`.
 ### What we can learn
 - **Surface-level analysis is fast and useful** — No macro expansion means
  instant feedback. Catches the majority of real mistakes. Good default for
  editor integration; deeper analysis can be opt-in.
 - **syntax-parse as rule DSL** — Pattern matching on syntax objects is a natural
  fit for Lisp linters. Guile has `syntax-case` and `match` which serve a
  similar role.
 - **Scope tracking with punted bindings** — Handles forward references in a
  single pass. Elegant solution for `letrec`-style bindings and mutual recursion.
 - **Savepoints for tentative matching** — Save/restore state when the parser
  enters a complex branch. If the branch fails, roll back. Useful for `cond`,
  `match`, etc.
 - **Plugin API via reviewer parameter** — Extensions get a well-defined API
  surface. Clean contract between core and plugins.
 - **Snapshot-based testing** — 134 test files with `.rkt`/`.rkt.out` pairs.
  Lint a file, compare output to expected. Simple, maintainable, high coverage.
 - **Bracket style enforcement** — Racket uses `[]` for bindings, `()` for
  application. Guile doesn't have this, but we could enforce consistent bracket
  usage or other parenthesis conventions.
 ---
 ## SBLint
 **Repo:** `refs/sblint/` — SBCL compiler-driven linter (~650 LOC)
 ### What it does
 A **compiler-assisted linter** for Common Lisp. Doesn't implement its own rules —
 instead, it **compiles code through SBCL** and surfaces all compiler diagnostics
 (errors, warnings, style notes) with proper file locations.
 ### How it works
 ```
 Source code
  → Resolve ASDF dependencies (topological sort)
  → Load dependencies via Quicklisp
  → Compile project via SBCL (handler-bind captures conditions)
  → Extract file/position from compiler internals (Swank protocol)
  → Convert byte offset → line:column
  → Deduplicate and report
 ```
 No custom parser. No AST. Just the compiler.
 ### Architecture
 | File | Lines | Purpose |
 |------|-------|---------|
 | `run-lint.lisp` | 277 | Core logic: lint file/system/directory |
 | `compiler-aux.lisp` | 33 | SBCL introspection bridge |
 | `asdf.lisp` | 153 | Dependency resolution graph |
 | `file-position.lisp` | 18 | Byte offset → line:column conversion |
 | `quicklisp.lisp` | 41 | Auto-install missing dependencies |
 | `sblint.ros` | — | CLI entry point (Roswell script) |
 ### What it catches
 Whatever SBCL catches:
 - Undefined variables and functions
 - Type mismatches (with SBCL's type inference)
 - Style warnings (ANSI compliance, naming)
 - Reader/syntax errors
 - Dead code paths
 - Unused declarations
 Filters out: redefinition warnings, Quicklisp dependency warnings, SBCL
 contrib warnings.
 ### What we can learn
 - **Leverage the host compiler** — Guile itself has `compile` and can produce
  warnings. We should capture Guile's own compiler diagnostics (undefined
  variables, unused imports, etc.) as a baseline — it's "free" accuracy.
 - **Condition-based error collection** — CL's condition system (≈ Guile's
  exception/handler system) lets you catch errors without stopping compilation.
  `handler-bind` continues execution after catching. Guile's `with-exception-handler`
  can do the same.
 - **Dependency-aware compilation** — Load dependencies first, then compile
  project. Catches "symbol not found" errors that surface-level analysis misses.
 - **Deduplication** — Multiple compilation passes can report the same issue.
  Hash table dedup is simple and effective.
 - **Minimal is viable** — 650 LOC total. A compiler-driven linter layer could
  be our first deliverable, augmented with custom rules later.
 ---
 ## Cross-cutting themes
 ### Parsing strategies
 | Strategy | Used by | Pros | Cons |
 |----------|---------|------|------|
 | Host compiler | SBLint, Eastwood | Maximum accuracy, type checking | Requires loading code, slow |
 | Custom reader with positions | Mallet, fmt | Full control, no side effects | Must maintain parser |
 | Language's built-in reader | racket-review | Free, well-tested | May lack position info |
 | Side-effect-free reader lib | Kibit (edamame) | Safe, preserves metadata | External dependency |
 | Zipper-based AST | OCICL (rewrite-cl) | Preserves formatting for fixes | Complex API |
 **For Guile:** We should explore whether `(ice-9 read)` or Guile's reader
 provides sufficient source location info. If not, a custom reader (or a reader
 wrapper that annotates with positions) is needed. Guile's `read-syntax` (if
 available) or source properties on read forms could be the answer.
 ### Rule definition patterns
 | Pattern | Used by | Character |
 |---------|---------|-----------|
 | Logic programming (unification) | Kibit | Elegant, concise; slow |
 | OOP classes + generic methods | Mallet | Scales well, self-contained rules |
 | Registry maps | Eastwood | Simple, data-driven |
 | Syntax-parse patterns | racket-review, fmt | Natural for Lisps |
 | Single-pass visitor | OCICL | High performance |
 | Compiler conditions | SBLint | Zero-effort, limited scope |
 **For Guile:** A combination seems right — `match`/`syntax-case` patterns for
 the rule DSL (natural in Scheme), with a registry for rule metadata (name,
 severity, category, enabled-by-default).
 ### Configuration patterns
 | Feature | Mallet | OCICL | Eastwood | Kibit | racket-review | fmt |
 |---------|--------|-------|----------|-------|---------------|-----|
 | Config file | `.mallet.lisp` | `.ocicl-lint.conf` | Clojure maps | `project.clj` | - | `.fmt.rkt` |
 | Presets | Yes (4) | - | - | - | - | - |
 | Preset inheritance | Yes | - | - | - | - | - |
 | Path-specific rules | Yes | - | - | - | - | - |
 | Inline suppression | Yes (3 mechanisms) | Yes | Yes | - | Yes | - |
 | Stale suppression detection | Yes | - | - | - | - | - |
 | CLI override | Yes | Yes | Yes | Yes | - | Yes |
 **For Guile:** Mallet's configuration system is the most sophisticated and
 worth emulating — presets, inheritance, path-specific overrides, and stale
 suppression detection.
 ### Auto-fix patterns
 | Tool | Fix mechanism | Preserves formatting? |
 |------|--------------|----------------------|
 | Kibit | rewrite-clj zippers | Yes |
 | Mallet | Bottom-to-top line replacement | Partial |
 | OCICL | Fixer registry + zipper AST | Yes |
 **For Guile:** Zipper-based AST manipulation (or Guile's SXML tools) for
 formatting-preserving fixes. The fixer registry pattern (OCICL) keeps rule
 detection and fixing decoupled.
 ### Output formats
 All tools support at minimum: `file:line:column: severity: message`
 Additional formats: JSON (Mallet), Markdown (Kibit), line-only for CI (Mallet).
 ---
 ## Feature wishlist for gulie
 Based on this survey, the features worth cherry-picking:
 ### Must-have (core)
 1. **Guile compiler diagnostics** — Capture Guile's own warnings as baseline (SBLint approach)
 2. **Custom reader with source positions** — Every form, subform, and token gets line:column
 3. **Staged pipeline** — Text rules → token rules → form rules (Mallet/OCICL)
 4. **Pattern-based rule DSL** — Using Guile's `match` or `syntax-case` (Kibit/racket-review inspiration)
 5. **Rule registry** — `{name, severity, category, enabled-by-default, check-fn}` (Eastwood)
 6. **Standard output format** — `file:line:column: severity: rule: message`
 7. **Inline suppression** — `; gulie:suppress rule-name` (Mallet/OCICL)
 ### Should-have (v1)
 8. **Config file** — `.gulie.scm` with presets and rule enable/disable (Mallet)
 9. **Auto-fix infrastructure** — Fixer registry, bottom-to-top application (OCICL/Mallet)
 10. **Idiom suggestions** — Pattern → replacement rules (Kibit style)
 11. **Unused binding detection** — Scope tracking with forward reference handling (racket-review)
 12. **Quote/unquote awareness** — Don't lint inside quoted forms (OCICL)
 13. **Snapshot-based testing** — `.scm`/`.expected` pairs (racket-review)
 ### Nice-to-have (v2+)
 14. **Code formatter** — Cost-based optimal layout (fmt)
 15. **Pluggable formatter maps** — Per-form formatting rules (fmt)
 16. **Path-specific rule overrides** — Different rules for `src/` vs `tests/` (Mallet)
 17. **Stale suppression detection** (Mallet)
 18. **Editor integration** — Flycheck/flymake for Emacs (racket-review)
 19. **Macroexpansion-aware analysis** — Suppress false positives from macro output (Eastwood)
 20. **Cyclomatic complexity and other metrics** (Mallet)
--- a/docs/PLAN.md
+++ b/docs/PLAN.md
@@ -0,0 +1,470 @@
 # Gulie — Guile Linter/Formatter: Architecture & Implementation Plan
 ## Context
 No linter, formatter, or static analyser exists for Guile Scheme. We're building
 one from scratch, called **gulie**. The tool is written in Guile itself, reusing
 as much of Guile's infrastructure as possible (reader, compiler, Tree-IL
 analyses, warning system). The design draws on patterns observed in 7 reference
 tools (see `docs/INSPIRATION.md`).
 Guile 3.0.11 is available in the devenv. No source code exists yet.
 ---
 ## High-level architecture
 Two independent passes, extensible to three:
 ```
                         .gulie.sexp (config)
                              |
  file.scm ──┬──> [Tokenizer] ──> tokens ──> [CST parser] ──> CST
              |         |
              |   [Pass 1: Surface]  line rules + CST rules
              |         |
              |    diagnostics-1
              |
              └──> [Guile reader] ──> s-exprs ──> [Guile compiler] ──> Tree-IL
                        |
                  [Pass 2: Semantic]  built-in analyses + custom Tree-IL rules
                        |
                   diagnostics-2
                        |
              [merge + suppress + sort + report/fix]
 ```
 **Why two passes?** Guile's reader (`ice-9/read.scm:949-973`) irrecoverably
 strips comments, whitespace, and datum comments in `next-non-whitespace`. There
 is no way to get formatting info AND semantic info from one parse. Accepting this
 and building two clean, independent passes is simpler than fighting the reader.
 ---
 ## Module structure
 ```
 gulie/
  bin/gulie                        # CLI entry point (executable Guile script)
  gulie/
    cli.scm                        # (gulie cli) — arg parsing, dispatch
    config.scm                     # (gulie config) — .gulie.sexp loading, defaults, merging
    diagnostic.scm                 # (gulie diagnostic) — record type, sorting, formatting
    tokenizer.scm                  # (gulie tokenizer) — hand-written lexer, preserves everything
    cst.scm                        # (gulie cst) — token stream → concrete syntax tree
    compiler.scm                   # (gulie compiler) — Guile compile wrapper, warning capture
    rule.scm                       # (gulie rule) — rule record, registry, define-rule macros
    engine.scm                     # (gulie engine) — orchestrator: file discovery, pass sequencing
    fixer.scm                      # (gulie fixer) — fix application (bottom-to-top edits)
    suppression.scm                # (gulie suppression) — ; gulie:suppress parsing/filtering
    formatter.scm                  # (gulie formatter) — cost-based optimal pretty-printer
    rules/
      surface.scm                  # (gulie rules surface) — trailing-ws, line-length, tabs, blanks
      indentation.scm              # (gulie rules indentation) — indent checking vs CST
      comments.scm                 # (gulie rules comments) — comment style conventions
      semantic.scm                 # (gulie rules semantic) — wrappers around Guile's analyses
      idiom.scm                    # (gulie rules idiom) — pattern-based suggestions via match
      module-form.scm              # (gulie rules module-form) — define-module checks
  test/
    test-tokenizer.scm
    test-cst.scm
    test-rules-surface.scm
    test-rules-semantic.scm
    fixtures/
      clean/                       # .scm files producing zero diagnostics
      violations/                  # .scm + .expected pairs (snapshot testing)
 ```
 ~16 source files. Each has one clear job.
 ---
 ## Key components
 ### Tokenizer (`gulie/tokenizer.scm`)
 Hand-written character-by-character state machine. Must handle the same lexical
 syntax as Guile's reader but **preserve** what the reader discards.
 ```scheme
 (define-record-type <token>
  (make-token type text line column)
  token?
  (type   token-type)      ;; symbol (see list below)
  (text   token-text)      ;; string: exact source text
  (line   token-line)      ;; integer: 1-based
  (column token-column))   ;; integer: 0-based
 ```
 Token types (~15): `open-paren`, `close-paren`, `symbol`, `number`, `string`,
 `keyword`, `boolean`, `character`, `prefix` (`'`, `` ` ``, `,`, `,@`, `#'`,
 etc.), `special` (`#;`, `#(`, `#vu8(`, etc.), `line-comment`, `block-comment`,
 `whitespace`, `newline`, `dot`.
 **Critical invariant:** `(string-concatenate (map token-text (tokenize input)))` must
 reproduce the original input exactly. This is our primary roundtrip test.
 Estimated size: ~200-250 lines. Reference: Mallet's tokenizer (163 lines CL).
 ### CST (`gulie/cst.scm`)
 Trivial parenthesised tree built from the token stream:
 ```scheme
 (define-record-type <cst-node>
  (make-cst-node open close children)
  cst-node?
  (open     cst-node-open)       ;; <token> for ( [ {
  (close    cst-node-close)      ;; <token> for ) ] }
  (children cst-node-children))  ;; list of <cst-node> | <token>
 ```
 Children is a flat list of interleaved atoms (tokens) and nested nodes. Comments
 and whitespace are children like anything else.
 The first non-whitespace symbol child of a `<cst-node>` identifies the form
 (`define`, `let`, `cond`, etc.) — enough for indentation rules.
 Estimated size: ~80-100 lines.
 ### Compiler wrapper (`gulie/compiler.scm`)
 Wraps Guile's compile pipeline to capture warnings as structured diagnostics:
 ```scheme
 ;; Key Guile APIs we delegate to:
 ;; - (system base compile): read-and-compile, compile, default-warning-level
 ;; - (language tree-il analyze): make-analyzer, analyze-tree
 ;; - (system base message): %warning-types, current-warning-port
 ```
 Strategy: call `read-and-compile` with `#:to 'tree-il` and `#:warning-level 2`
 while redirecting `current-warning-port` to a string port, then parse the
 warning output into `<diagnostic>` records. Alternatively, invoke `make-analyzer`
 directly and hook the warning printers.
 Guile's built-in analyses (all free):
 - `unused-variable-analysis`
 - `unused-toplevel-analysis`
 - `unused-module-analysis`
 - `shadowed-toplevel-analysis`
 - `make-use-before-definition-analysis` (unbound variables)
 - `arity-analysis` (wrong arg count)
 - `format-analysis` (format string validation)
 ### Rule system (`gulie/rule.scm`)
 ```scheme
 (define-record-type <rule>
  (make-rule name description severity category type check-proc fix-proc)
  rule?
  (name        rule-name)         ;; symbol
  (description rule-description)  ;; string
  (severity    rule-severity)     ;; 'error | 'warning | 'info
  (category    rule-category)     ;; 'format | 'style | 'correctness | 'idiom
  (type        rule-type)         ;; 'line | 'cst | 'tree-il
  (check-proc  rule-check-proc)  ;; procedure (signature depends on type)
  (fix-proc    rule-fix-proc))   ;; procedure | #f
 ```
 Three rule types with different check signatures:
 - **`'line`** — `(lambda (file line-num line-text config) -> diagnostics)` — fastest, no parsing
 - **`'cst`** — `(lambda (file cst config) -> diagnostics)` — needs tokenizer+CST
 - **`'tree-il`** — `(lambda (file tree-il env config) -> diagnostics)` — needs compilation
 Global registry: `*rules*` alist, populated at module load time via
 `register-rule!`. Convenience macros: `define-line-rule`, `define-cst-rule`,
 `define-tree-il-rule`.
 ### Diagnostic record (`gulie/diagnostic.scm`)
 ```scheme
 (define-record-type <diagnostic>
  (make-diagnostic file line column severity rule message fix)
  diagnostic?
  (file     diagnostic-file)      ;; string
  (line     diagnostic-line)      ;; integer, 1-based
  (column   diagnostic-column)    ;; integer, 0-based
  (severity diagnostic-severity)  ;; symbol
  (rule     diagnostic-rule)      ;; symbol
  (message  diagnostic-message)   ;; string
  (fix      diagnostic-fix))      ;; <fix> | #f
 ```
 Standard output: `file:line:column: severity: rule: message`
 ### Config (`gulie/config.scm`)
 File: `.gulie.sexp` in project root (plain s-expression, read with `(read)`,
 never evaluated):
 ```scheme
 ((line-length . 80)
 (indent . 2)
 (enable trailing-whitespace line-length unused-variable arity-mismatch)
 (disable tabs)
 (rules
   (line-length (max . 100)))
 (indent-rules
   (with-syntax . 1)
   (match . 1))
 (ignore "build/**" ".direnv/**"))
 ```
 Precedence: CLI flags > config file > built-in defaults.
 `--init` generates a template with all rules listed and commented.
 ### Suppression (`gulie/suppression.scm`)
 ```scheme
 ;; gulie:suppress trailing-whitespace   — suppress on next line
 (define x    "messy")
 (define x    "messy") ; gulie:suppress  — suppress on this line
 ;; gulie:disable line-length            — region disable
 ... code ...
 ;; gulie:enable line-length             — region enable
 ```
 Parsed from raw text before rules run. Produces a suppression map that filters
 diagnostics after all rules have emitted.
 ---
 ## Indentation rules
 The key data is `scheme-indent-function` values from `.dir-locals.el` — an
 integer N meaning "N arguments on first line, then body indented +2":
 ```scheme
 (define *default-indent-rules*
  '((define . 1) (define* . 1) (define-public . 1) (define-syntax . 1)
    (define-module . 0) (lambda . 1) (lambda* . 1)
    (let . 1) (let* . 1) (letrec . 1) (letrec* . 1)
    (if . #f) (cond . 0) (case . 1) (when . 1) (unless . 1)
    (match . 1) (syntax-case . 2) (with-syntax . 1)
    (begin . 0) (do . 2) (parameterize . 1) (guard . 1)))
 ```
 Overridable via config `indent-rules`. The indentation checker walks the CST,
 identifies the form by its first symbol child, looks up the rule, and compares
 actual indentation to expected.
 ---
 ## Formatting conventions (Guile vs Guix)
 Both use 2-space indent, same special-form conventions. Key difference:
 - **Guile:** 72-char fill column, `;;; {Section}` headers
 - **Guix:** 78-80 char fill column, `;;` headers
 Our default config targets Guile conventions. A Guix preset can override
 `line-length` and comment style.
 ---
 ## Formatter: cost-based optimal pretty-printing
 The formatter (`gulie/formatter.scm`) is a later-phase component that
 **rewrites** files with correct layout, as opposed to the indentation checker
 which merely **reports** violations.
 ### Why cost-based?
 When deciding where to break lines in a long expression, there are often multiple
 valid options. A greedy approach (fill as much as fits, then break) produces
 mediocre output — it can't "look ahead" to see that a break earlier would produce
 a better overall layout. The Wadler/Leijen family of algorithms evaluates
 alternative layouts and selects the optimal one.
 ### The algorithm (Wadler/Leijen, as used by fmt's `pretty-expressive`)
 The pretty-printer works with an abstract **document** type:
 ```
 doc = text(string)       — literal text
    | line               — line break (or space if flattened)
    | nest(n, doc)       — increase indent by n
    | concat(doc, doc)   — concatenation
    | alt(doc, doc)      — choose better of two layouts
    | group(doc)         — try flat first, break if doesn't fit
 ```
 The key operator is `alt(a, b)` — "try layout A, but if it overflows the page
 width, use layout B instead." The algorithm evaluates both alternatives and
 picks the one with the lower **cost vector**:
 ```
 cost = [badness, height, characters]
  badness    — quadratic penalty for exceeding page width
  height     — number of lines used
  characters — total chars (tiebreaker)
 ```
 This produces provably optimal output: the layout that minimises overflow while
 using the fewest lines.
 ### How it fits our architecture
 ```
 CST (from tokenizer + cst.scm)
  → [doc generator] convert CST nodes to abstract doc, using form-specific rules
  → [layout solver] evaluate alternatives, select optimal layout
  → [renderer] emit formatted text with comments preserved
 ```
 The **doc generator** uses the same form-identification logic as the indentation
 checker (first symbol child of a CST node) to apply form-specific layout rules.
 For example:
 - `define` — name on first line, body indented
 - `let` — bindings as aligned block, body indented
 - `cond` — each clause on its own line
 These rules are data (the `indent-rules` table extended with layout hints),
 making the formatter configurable just like the checker.
 ### Implementation approach
 We can either:
 1. **Port `pretty-expressive`** from Racket — the core algorithm is ~300 lines,
   well-documented in academic papers
 2. **Upgrade Guile's `(ice-9 pretty-print)`** — it already knows form-specific
   indentation rules but uses greedy layout; we'd replace the layout engine with
   cost-based selection
 Option 1 is cleaner (purpose-built). Option 2 reuses more existing code but
 would be a heavier modification. We'll decide when we reach that phase.
 ### Phase note
 The formatter is **Phase 6** work. Phases 0-4 deliver a useful checker without
 it. The indentation checker (Phase 4) validates existing formatting; the
 formatter (Phase 6) rewrites it. The checker comes first because it's simpler
 and immediately useful in CI.
 ---
 ## CLI interface
 ```
 gulie [OPTIONS] [FILE|DIR...]
  --check           Report issues, exit non-zero on findings (default)
  --fix             Fix mode: auto-fix what's possible, report the rest
  --format          Format mode: rewrite files with optimal layout
  --init            Generate .gulie.sexp template
  --pass PASS       Run only: surface, semantic, all (default: all)
  --rule RULE       Enable only this rule (repeatable)
  --disable RULE    Disable this rule (repeatable)
  --severity SEV    Minimum severity: error, warning, info
  --output FORMAT   Output: standard (default), json, compact
  --config FILE     Config file path (default: auto-discover)
  --list-rules      List all rules and exit
  --version         Print version
 ```
 Exit codes: 0 = clean, 1 = findings, 2 = config error, 3 = internal error.
 ---
 ## Implementation phases
 ### Phase 0: Skeleton
 - `bin/gulie` — shebang script, loads CLI module
 - `(gulie cli)` — basic arg parsing (`--check`, `--version`, file args)
 - `(gulie diagnostic)` — record type + standard formatter
 - `(gulie rule)` — record type + registry + `register-rule!`
 - `(gulie engine)` — discovers `.scm` files, runs line rules, reports
 - One trivial rule: `trailing-whitespace` (line rule)
 - **Verification:** `gulie --check some-file.scm` reports trailing whitespace
 ### Phase 1: Tokenizer + CST + surface rules
 - `(gulie tokenizer)` — hand-written lexer
 - `(gulie cst)` — token → tree
 - Surface rules: `trailing-whitespace`, `line-length`, `no-tabs`, `blank-lines`
 - Comment rule: `comment-semicolons` (check `;`/`;;`/`;;;` usage)
 - Roundtrip test: tokenize → concat = original
 - Snapshot tests for each rule
 ### Phase 2: Semantic rules (compiler pass)
 - `(gulie compiler)` — `read-and-compile` wrapper, warning capture
 - Semantic rules wrapping Guile's built-in analyses:
  `unused-variable`, `unused-toplevel`, `unbound-variable`, `arity-mismatch`,
  `format-string`, `shadowed-toplevel`, `unused-module`
 - **Verification:** run against Guile and Guix source files, check false-positive rate
 ### Phase 3: Config + suppression
 - `(gulie config)` — `.gulie.sexp` loading + merging
 - `(gulie suppression)` — inline comment suppression
 - `--init` command
 - Rule enable/disable via config and CLI
 ### Phase 4: Indentation checking
 - `(gulie rules indentation)` — CST-based indent checker
 - Default indent rules for standard Guile forms
 - Configurable `indent-rules` in `.gulie.sexp`
 ### Phase 5: Fix mode + idiom rules
 - `(gulie fixer)` — bottom-to-top edit application
 - Auto-fix for: trailing whitespace, line-length (where possible)
 - `(gulie rules idiom)` — `match`-based pattern suggestions on Tree-IL
 - `(gulie rules module-form)` — `define-module` form checks (sorted imports, etc.)
 ### Phase 6: Formatter (cost-based optimal layout)
 - `(gulie formatter)` — Wadler/Leijen pretty-printer with cost-based selection
 - Abstract document type: `text`, `line`, `nest`, `concat`, `alt`, `group`
 - Form-specific layout rules (reuse indent-rules table + layout hints)
 - Comment preservation through formatting
 - `--format` CLI mode
 - **Verification:** format Guile/Guix source files, diff against originals,
  verify roundtrip stability (format twice = same output)
 ### Phase 7: Cross-module analysis (future)
 - Load multiple modules, walk dependency graph
 - Unused exports, cross-module arity checks
 - `--pass cross-module` CLI option
 ---
 ## Testing strategy
 1. **Roundtrip test** (tokenizer): tokenize → concat must equal original input
 2. **Snapshot tests**: `fixtures/violations/rule-name.scm` + `.expected` pairs
 3. **Clean file tests**: `fixtures/clean/*.scm` must produce zero diagnostics
 4. **Unit tests**: `(srfi srfi-64)` for tokenizer, CST, config, diagnostics
 5. **Real-world corpus**: run against `test/guix/` and `refs/guile/module/` for
   false-positive rate validation
 6. **Formatter idempotency**: `format(format(x)) = format(x)` for all test files
 ---
 ## Key design decisions
 | Decision | Rationale |
 |----------|-----------|
 | Hand-written tokenizer, not extending Guile's reader | The reader is ~1000 lines of nested closures not designed for extension. A clean 200-line tokenizer is easier to write/test. |
 | Two independent passes, not a unified AST | Reader strips comments irrecoverably. Accepting this gives clean separation. |
 | Delegate to Guile's built-in analyses | They're battle-tested, handle macroexpansion edge cases, and are maintained upstream. |
 | `(ice-9 match)` for idiom rules, not logic programming | Built-in, fast, sufficient. miniKanren can be added later if needed. |
 | S-expression config, not YAML/TOML | Zero deps. Our users write Scheme. `(read)` does the parsing. |
 | Flat CST (parens + interleaved tokens), not rich AST | Enough for indentation/formatting checks. No overengineering. |
 | Cost-based optimal layout for the formatter | Greedy formatters produce mediocre output. Wadler/Leijen is cleaner and provably correct. Worth the investment when we reach that phase. |
 | Checker first, formatter later | Checking is simpler, immediately useful in CI, and validates the tokenizer/CST infrastructure that the formatter will build on. |
 ---
 ## Critical files to reference during implementation
 - `refs/guile/module/ice-9/read.scm:949-973` — what the reader discards (our tokenizer must keep)
 - `refs/guile/module/language/tree-il/analyze.scm:1461-1479` — `make-analyzer` API
 - `refs/guile/module/system/base/compile.scm:298-340` — `read-and-compile` / `compile`
 - `refs/guile/module/system/base/message.scm:83-220` — `%warning-types` definitions
 - `refs/guile/module/language/tree-il.scm` — Tree-IL node types and traversal
 - `refs/guile/module/ice-9/pretty-print.scm` — existing pretty-printer (form-specific rules to extract)
 - `refs/mallet/src/parser/tokenizer.lisp` — reference tokenizer (163 lines)
 - `refs/fmt/conventions.rkt` — form-specific formatting rules (100+ forms)
 - `refs/fmt/main.rkt` — cost-based layout selection implementation
--- a/gulie/cli.scm
+++ b/gulie/cli.scm
@@ -0,0 +1,105 @@
 ;;; (gulie cli) — command-line argument parsing and main dispatch
 (define-module (gulie cli)
  #:use-module (ice-9 getopt-long)
  #:use-module (ice-9 format)
  #:use-module (srfi srfi-1)
  #:use-module (gulie config)
  #:use-module (gulie engine)
  #:use-module (gulie rule)
  #:use-module (gulie diagnostic)
  #:export (main))
 (define version "0.1.0")
 (define option-spec
  '((help       (single-char #\h) (value #f))
    (version    (single-char #\v) (value #f))
    (check      (value #f))
    (fix        (value #f))
    (init       (value #f))
    (pass       (value #t))
    (config     (value #t))
    (rule       (value #t))
    (disable    (value #t))
    (severity   (value #t))
    (output     (value #t))
    (list-rules (value #f))))
 (define (show-help)
  (display "gulie — a linter and formatter for Guile Scheme\n\n")
  (display "Usage: gulie [OPTIONS] [FILE|DIR...]\n\n")
  (display "Options:\n")
  (display "  -h, --help         Show this help message\n")
  (display "  -v, --version      Print version\n")
  (display "  --check            Check mode (default): report issues\n")
  (display "  --fix              Fix mode: auto-fix what's possible\n")
  (display "  --init             Generate .gulie.sexp template\n")
  (display "  --pass PASS        Run only: surface, semantic, all (default: all)\n")
  (display "  --config FILE      Config file path\n")
  (display "  --rule RULE        Enable only this rule (repeatable)\n")
  (display "  --disable RULE     Disable this rule\n")
  (display "  --severity SEV     Minimum severity: error, warning, info\n")
  (display "  --output FORMAT    Output format: standard, json, compact\n")
  (display "  --list-rules       List all available rules\n"))
 (define (show-version)
  (format #t "gulie ~a~%" version))
 (define (list-all-rules)
  (let ((rules (all-rules)))
    (if (null? rules)
        (display "No rules registered.\n")
        (for-each
         (lambda (r)
           (format #t "  ~20a ~8a ~10a ~a~%"
                   (rule-name r)
                   (rule-severity r)
                   (rule-category r)
                   (rule-description r)))
         (sort rules (lambda (a b)
                       (string<? (symbol->string (rule-name a))
                                 (symbol->string (rule-name b)))))))))
 (define (main args)
  (let* ((options (getopt-long args option-spec))
         (rest    (option-ref options '() '())))
    (cond
     ((option-ref options 'help #f)
      (show-help)
      0)
     ((option-ref options 'version #f)
      (show-version)
      0)
     ((option-ref options 'list-rules #f)
      ;; Ensure rules are loaded
      (list-all-rules)
      0)
     ((option-ref options 'init #f)
      (let ((path ".gulie.sexp"))
        (if (file-exists? path)
            (begin
              (format (current-error-port) "~a already exists~%" path)
              2)
            (begin
              (call-with-output-file path generate-template)
              (format #t "i Generated ~a~%" path)
              0))))
     (else
      (let* ((config-path (option-ref options 'config #f))
             (user-config (load-config config-path))
             (config (merge-configs default-config user-config))
             (paths (if (null? rest) (list ".") rest))
             (ignore-pats (config-ignore-patterns config))
             (files (discover-scheme-files paths ignore-pats)))
        (if (null? files)
            (begin
              (display "No Scheme files found.\n" (current-error-port))
              0)
            (let ((count (lint-files files config)))
              (if (> count 0) 1 0))))))))
--- a/gulie/compiler.scm
+++ b/gulie/compiler.scm
@@ -0,0 +1,92 @@
 ;;; (gulie compiler) — Guile compiler wrapper for semantic analysis
 ;;;
 ;;; Wraps Guile's compile pipeline to capture compiler warnings
 ;;; (unused variables, arity mismatches, format string errors, etc.)
 ;;; as structured <diagnostic> records.
 (define-module (gulie compiler)
  #:use-module (system base compile)
  #:use-module (system base message)
  #:use-module (ice-9 regex)
  #:use-module (ice-9 match)
  #:use-module (srfi srfi-1)
  #:use-module (gulie diagnostic)
  #:export (compile-and-capture-warnings))
 ;; Regex to parse Guile's warning format:
 ;;   ;;; file:line:column: warning: message
 (define *warning-re*
  (make-regexp "^;;; ([^:]+):([0-9]+):([0-9]+): warning: (.+)$"))
 (define (parse-warning-line text file)
  "Parse a warning line from Guile's compiler output into a <diagnostic>."
  (let ((m (regexp-exec *warning-re* text)))
    (if m
        (let ((wfile (match:substring m 1))
              (wline (string->number (match:substring m 2)))
              (wcol  (string->number (match:substring m 3)))
              (wmsg  (match:substring m 4)))
          (make-diagnostic
           (if (string=? wfile "<unknown-location>") file wfile)
           wline wcol
           'warning
           (classify-warning wmsg)
           wmsg
           #f))
        #f)))
 (define (classify-warning msg)
  "Derive a rule name symbol from a warning message."
  (cond
   ((string-contains msg "unused variable")        'unused-variable)
   ((string-contains msg "unused local top-level")  'unused-toplevel)
   ((string-contains msg "unused module")            'unused-module)
   ((string-contains msg "shadows previous")         'shadowed-toplevel)
   ((string-contains msg "unbound variable")         'unbound-variable)
   ((string-contains msg "wrong number of arguments") 'arity-mismatch)
   ((string-contains msg "used before definition")   'use-before-definition)
   ((string-contains msg "macro")                     'macro-use-before-definition)
   ((string-contains msg "format")                    'format-string)
   ((string-contains msg "non-idempotent")            'non-idempotent-definition)
   ((string-contains msg "duplicate datum")           'duplicate-case-datum)
   ((string-contains msg "cannot be meaningfully")    'bad-case-datum)
   (else                                              'compiler-warning)))
 (define (compile-and-capture-warnings file text config)
  "Compile TEXT (as if from FILE) and capture all compiler warnings.
 Returns a list of <diagnostic> records."
  (let* ((warning-output (open-output-string))
         (diagnostics '()))
    (parameterize ((current-warning-port warning-output))
      (catch #t
        (lambda ()
          (let ((port (open-input-string text)))
            (set-port-filename! port file)
            ;; Compile to CPS — analyses run during tree-il→CPS lowering
            (read-and-compile port
                              #:from 'scheme
                              #:to 'cps
                              #:warning-level 3
                              #:env (make-fresh-user-module))))
        (lambda (key . args)
          ;; Compilation errors become diagnostics too
          (let ((msg (call-with-output-string
                       (lambda (p)
                         (display key p)
                         (display ": " p)
                         (for-each (lambda (a) (display a p) (display " " p))
                                   args)))))
            (set! diagnostics
              (cons (make-diagnostic file 1 0 'error 'compile-error msg #f)
                    diagnostics))))))
    ;; Parse captured warnings
    (let* ((output (get-output-string warning-output))
           (lines (string-split output #\newline)))
      (for-each
       (lambda (line)
         (when (> (string-length line) 0)
           (let ((diag (parse-warning-line line file)))
             (when diag
               (set! diagnostics (cons diag diagnostics))))))
       lines))
    diagnostics))
--- a/gulie/config.scm
+++ b/gulie/config.scm
@@ -0,0 +1,129 @@
 ;;; (gulie config) — configuration loading and merging
 ;;;
 ;;; Reads .gulie.sexp from project root (or CLI-specified path),
 ;;; merges with built-in defaults, and provides config accessors.
 (define-module (gulie config)
  #:use-module (ice-9 rdelim)
  #:use-module (srfi srfi-1)
  #:export (default-config
            load-config
            merge-configs
            config-ref
            config-line-length
            config-indent-width
            config-max-blank-lines
            config-enabled-rules
            config-disabled-rules
            config-ignore-patterns
            config-indent-rules
            generate-template))
 (define default-config
  '((line-length . 80)
    (indent . 2)
    (max-blank-lines . 2)
    (enable . ())
    (disable . ())
    (ignore . ())
    (indent-rules
     (define        . 1)
     (define*       . 1)
     (define-public . 1)
     (define-syntax . 1)
     (define-syntax-rule . 1)
     (define-module . 0)
     (define-record-type . 1)
     (lambda        . 1)
     (lambda*       . 1)
     (let           . 1)
     (let*          . 1)
     (letrec        . 1)
     (letrec*       . 1)
     (let-values    . 1)
     (if            . special)
     (cond          . 0)
     (case          . 1)
     (when          . 1)
     (unless        . 1)
     (match         . 1)
     (match-lambda  . 0)
     (match-lambda* . 0)
     (syntax-case   . 2)
     (syntax-rules  . 1)
     (with-syntax   . 1)
     (begin         . 0)
     (do            . 2)
     (parameterize  . 1)
     (guard         . 1)
     (with-exception-handler . 1)
     (call-with-values . 1)
     (receive       . 2)
     (use-modules   . 0)
     (with-fluids   . 1)
     (dynamic-wind  . 0))))
 (define (config-ref config key . default)
  "Look up KEY in CONFIG alist, returning DEFAULT if not found."
  (let ((pair (assq key config)))
    (if pair
        (cdr pair)
        (if (null? default) #f (car default)))))
 (define (config-line-length config)
  (or (config-ref config 'line-length) 80))
 (define (config-indent-width config)
  (or (config-ref config 'indent) 2))
 (define (config-max-blank-lines config)
  (or (config-ref config 'max-blank-lines) 2))
 (define (config-enabled-rules config)
  (or (config-ref config 'enable) '()))
 (define (config-disabled-rules config)
  (or (config-ref config 'disable) '()))
 (define (config-ignore-patterns config)
  (or (config-ref config 'ignore) '()))
 (define (config-indent-rules config)
  (or (config-ref config 'indent-rules) '()))
 (define (load-config path)
  "Load a .gulie.sexp config file at PATH. Returns an alist."
  (if (and path (file-exists? path))
      (call-with-input-file path
        (lambda (port)
          (let ((data (read port)))
            (if (list? data) data '()))))
      '()))
 (define (merge-configs base override)
  "Merge OVERRIDE config on top of BASE. Override wins for scalar values;
 lists are replaced, not appended."
  (let lp ((result base)
           (pairs override))
    (if (null? pairs)
        result
        (let ((pair (car pairs)))
          (lp (assq-set! (list-copy result) (car pair) (cdr pair))
              (cdr pairs))))))
 (define (find-config-file start-dir)
  "Search upward from START-DIR for .gulie.sexp. Returns path or #f."
  (let lp ((dir start-dir))
    (let ((candidate (string-append dir "/.gulie.sexp")))
      (cond
       ((file-exists? candidate) candidate)
       ((string=? dir "/") #f)
       (else (lp (dirname dir)))))))
 (define (generate-template port)
  "Write a template .gulie.sexp to PORT."
  (display ";;; gulie configuration\n" port)
  (display ";;; Place this file as .gulie.sexp in your project root.\n" port)
  (display ";;; All fields are optional — defaults are shown below.\n\n" port)
  (write default-config port)
  (newline port))
--- a/gulie/cst.scm
+++ b/gulie/cst.scm
@@ -0,0 +1,95 @@
 ;;; (gulie cst) — concrete syntax tree from token stream
 ;;;
 ;;; Builds a tree that mirrors the parenthesised structure of the source
 ;;; while preserving ALL tokens (whitespace, comments, atoms).
 (define-module (gulie cst)
  #:use-module (srfi srfi-9)
  #:use-module (srfi srfi-1)
  #:use-module (srfi srfi-11)
  #:use-module (gulie tokenizer)
  #:export (<cst-node>
            make-cst-node
            cst-node?
            cst-node-open
            cst-node-close
            cst-node-children
            parse-cst
            cst-form-name
            cst-significant-children))
 ;; A parenthesised group in the CST.
 (define-record-type <cst-node>
  (make-cst-node open close children)
  cst-node?
  (open     cst-node-open)      ;; <token> for ( [ {
  (close    cst-node-close)     ;; <token> for ) ] } or #f if unmatched
  (children cst-node-children)) ;; list of <cst-node> | <token>
 (define (parse-cst tokens)
  "Parse a flat list of tokens into a CST.
 Returns a <cst-node> with a synthetic root (no open/close parens)
 whose children are the top-level forms."
  (let lp ((remaining tokens) (children '()))
    (if (null? remaining)
        (make-cst-node #f #f (reverse children))
        (let ((tok (car remaining))
              (rest (cdr remaining)))
          (case (token-type tok)
            ((open-paren)
             ;; Recursively parse until matching close
             (let-values (((node remaining*) (parse-group tok rest)))
               (lp remaining* (cons node children))))
            ((close-paren)
             ;; Unmatched close paren at top level — keep as-is
             (lp rest (cons tok children)))
            (else
             (lp rest (cons tok children))))))))
 (define (parse-group open-token tokens)
  "Parse tokens after an open paren until the matching close.
 Returns (values <cst-node> remaining-tokens)."
  (let lp ((remaining tokens) (children '()))
    (if (null? remaining)
        ;; Unclosed paren
        (values (make-cst-node open-token #f (reverse children))
                '())
        (let ((tok (car remaining))
              (rest (cdr remaining)))
          (case (token-type tok)
            ((close-paren)
             (values (make-cst-node open-token tok (reverse children))
                     rest))
            ((open-paren)
             (let-values (((node remaining*) (parse-group tok rest)))
               (lp remaining* (cons node children))))
            (else
             (lp rest (cons tok children))))))))
 (define (cst-form-name node)
  "For a <cst-node>, return the first significant symbol child as a string,
 or #f if the node has no symbol children."
  (let lp ((children (cst-node-children node)))
    (if (null? children)
        #f
        (let ((child (car children)))
          (cond
           ((and (token? child)
                 (eq? (token-type child) 'symbol))
            (token-text child))
           ((and (token? child)
                 (memq (token-type child) '(whitespace newline line-comment
                                            block-comment prefix)))
            (lp (cdr children)))
           (else #f))))))
 (define (cst-significant-children node)
  "Return children of NODE that are not whitespace, newlines, or comments."
  (filter (lambda (child)
            (or (cst-node? child)
                (and (token? child)
                     (not (memq (token-type child)
                                '(whitespace newline line-comment
                                  block-comment))))))
          (cst-node-children node)))
--- a/gulie/diagnostic.scm
+++ b/gulie/diagnostic.scm
@@ -0,0 +1,91 @@
 ;;; (gulie diagnostic) — diagnostic record type, sorting, formatting
 ;;;
 ;;; A <diagnostic> represents a single finding from a rule check.
 ;;; Diagnostics are the universal currency between rules, the engine,
 ;;; suppression filtering, and the reporter.
 (define-module (gulie diagnostic)
  #:use-module (srfi srfi-9)
  #:use-module (ice-9 format)
  #:export (<diagnostic>
            make-diagnostic
            diagnostic?
            diagnostic-file
            diagnostic-line
            diagnostic-column
            diagnostic-severity
            diagnostic-rule
            diagnostic-message
            diagnostic-fix
            <fix>
            make-fix
            fix?
            fix-type
            fix-line
            fix-column
            fix-end-line
            fix-end-column
            fix-replacement
            diagnostic<?
            format-diagnostic
            format-diagnostics))
 ;; A single finding from a rule check.
 (define-record-type <diagnostic>
  (make-diagnostic file line column severity rule message fix)
  diagnostic?
  (file     diagnostic-file)      ;; string: file path
  (line     diagnostic-line)      ;; integer: 1-based line number
  (column   diagnostic-column)    ;; integer: 0-based column
  (severity diagnostic-severity)  ;; symbol: error | warning | info
  (rule     diagnostic-rule)      ;; symbol: rule name
  (message  diagnostic-message)   ;; string: human-readable message
  (fix      diagnostic-fix))      ;; <fix> | #f
 ;; An auto-fix that can be applied to resolve a diagnostic.
 (define-record-type <fix>
  (make-fix type line column end-line end-column replacement)
  fix?
  (type        fix-type)          ;; symbol: replace-line | delete-range | replace-range
  (line        fix-line)          ;; integer: 1-based
  (column      fix-column)        ;; integer: 0-based
  (end-line    fix-end-line)      ;; integer: 1-based
  (end-column  fix-end-column)    ;; integer: 0-based
  (replacement fix-replacement))  ;; string | #f
 (define (diagnostic<? a b)
  "Compare two diagnostics by file, then line, then column."
  (let ((fa (diagnostic-file a))
        (fb (diagnostic-file b)))
    (cond
     ((string<? fa fb) #t)
     ((string>? fa fb) #f)
     (else
      (let ((la (diagnostic-line a))
            (lb (diagnostic-line b)))
        (cond
         ((< la lb) #t)
         ((> la lb) #f)
         (else (< (diagnostic-column a) (diagnostic-column b)))))))))
 (define (severity->string sev)
  (symbol->string sev))
 (define (format-diagnostic diag)
  "Format a diagnostic as file:line:column: severity: rule: message"
  (format #f "~a:~a:~a: ~a: ~a: ~a"
          (diagnostic-file diag)
          (diagnostic-line diag)
          (diagnostic-column diag)
          (severity->string (diagnostic-severity diag))
          (diagnostic-rule diag)
          (diagnostic-message diag)))
 (define (format-diagnostics diags port)
  "Write all diagnostics to PORT, sorted by location."
  (for-each (lambda (d)
              (display (format-diagnostic d) port)
              (newline port))
            (sort diags diagnostic<?)))
--- a/gulie/engine.scm
+++ b/gulie/engine.scm
@@ -0,0 +1,139 @@
 ;;; (gulie engine) — orchestrator: file discovery, pass sequencing
 ;;;
 ;;; The engine ties everything together: discovers files, runs rules
 ;;; by type (line → cst → tree-il), collects diagnostics, applies
 ;;; suppression, and reports results.
 (define-module (gulie engine)
  #:use-module (ice-9 ftw)
  #:use-module (ice-9 rdelim)
  #:use-module (ice-9 regex)
  #:use-module (srfi srfi-1)
  #:use-module (gulie diagnostic)
  #:use-module (gulie rule)
  #:use-module (gulie config)
  #:use-module (gulie suppression)
  #:export (lint-file
            lint-files
            discover-scheme-files))
 (define (read-file-to-string path)
  "Read entire file at PATH into a string."
  (call-with-input-file path
    (lambda (port)
      (let lp ((acc '()))
        (let ((line (read-line port)))
          (if (eof-object? line)
              (string-join (reverse acc) "\n")
              (lp (cons line acc))))))))
 (define (run-line-rules file lines config)
  "Run all line-type rules against LINES. Returns list of diagnostics."
  (let ((line-rules (rules-of-type 'line))
        (diagnostics '())
        (consecutive-blanks 0))
    (when (not (null? line-rules))
      (let lp ((remaining lines) (line-num 1))
        (when (not (null? remaining))
          (let* ((line-text (car remaining))
                 (is-blank (or (string-null? line-text)
                               (string-every char-whitespace? line-text))))
            (set! consecutive-blanks
              (if is-blank (1+ consecutive-blanks) 0))
            (let ((augmented-config
                   (cons (cons '%consecutive-blanks consecutive-blanks)
                         config)))
              (for-each
               (lambda (rule)
                 (let ((results ((rule-check-proc rule)
                                 file line-num line-text augmented-config)))
                   (set! diagnostics (append results diagnostics))))
               line-rules))
            (lp (cdr remaining) (1+ line-num))))))
    diagnostics))
 (define (run-cst-rules file cst config)
  "Run all cst-type rules against CST. Returns list of diagnostics."
  (let ((cst-rules (rules-of-type 'cst)))
    (if (null? cst-rules)
        '()
        (append-map
         (lambda (rule)
           ((rule-check-proc rule) file cst config))
         cst-rules))))
 (define (lint-file file config)
  "Lint a single FILE with CONFIG. Returns a sorted list of diagnostics."
  (let* ((text (read-file-to-string file))
         (lines (string-split text #\newline))
         (diagnostics '()))
    ;; Pass 1: line-based surface rules
    (set! diagnostics (append (run-line-rules file lines config)
                              diagnostics))
    ;; Pass 1b: CST rules (if tokenizer is loaded)
    ;; Dynamically check if tokenizer module is available
    (let ((tok-mod (resolve-module '(gulie tokenizer) #:ensure #f)))
      (when tok-mod
        (let ((tokenize (module-ref tok-mod 'tokenize))
              (cst-mod (resolve-module '(gulie cst) #:ensure #f)))
          (when cst-mod
            (let* ((parse-cst (module-ref cst-mod 'parse-cst))
                   (tokens (tokenize text file))
                   (cst (parse-cst tokens)))
              (set! diagnostics (append (run-cst-rules file cst config)
                                        diagnostics)))))))
    ;; Pass 2: semantic rules (if compiler module is loaded)
    (let ((comp-mod (resolve-module '(gulie compiler) #:ensure #f)))
      (when comp-mod
        (let ((compile-and-capture (module-ref comp-mod 'compile-and-capture-warnings)))
          (set! diagnostics (append (compile-and-capture file text config)
                                    diagnostics)))))
    ;; Filter suppressions
    (let ((suppressions (parse-suppressions text)))
      (set! diagnostics (filter-suppressions diagnostics suppressions)))
    ;; Sort by location
    (sort diagnostics diagnostic<?)))
 (define (lint-files files config)
  "Lint multiple FILES. Returns total diagnostic count."
  (let ((total 0))
    (for-each
     (lambda (file)
       (let ((diags (lint-file file config)))
         (set! total (+ total (length diags)))
         (format-diagnostics diags (current-output-port))))
     files)
    total))
 (define (scheme-file? path)
  "Is PATH a Scheme source file?"
  (let ((ext (and (string-index path #\.)
                  (substring path (1+ (string-rindex path #\.))))))
    (and ext (member ext '("scm" "sld" "sls" "ss")))))
 (define (discover-scheme-files paths ignore-patterns)
  "Expand PATHS into a list of .scm files, recursing into directories.
 IGNORE-PATTERNS is a list of glob-like strings (currently supports simple suffix matching)."
  (define (ignored? file)
    (any (lambda (pat)
           (string-contains file pat))
         ignore-patterns))
  (append-map
   (lambda (path)
     (cond
      ((and (file-exists? path) (not (file-is-directory? path)))
       (if (and (scheme-file? path) (not (ignored? path)))
           (list path)
           '()))
      ((file-is-directory? path)
       (let ((files '()))
         (ftw path
              (lambda (filename statinfo flag)
                (when (and (eq? flag 'regular)
                           (scheme-file? filename)
                           (not (ignored? filename)))
                  (set! files (cons filename files)))
                #t))
         (sort files string<?)))
      (else '())))
   paths))
--- a/gulie/rule.scm
+++ b/gulie/rule.scm
@@ -0,0 +1,61 @@
 ;;; (gulie rule) — rule record type, registry, convenience macros
 ;;;
 ;;; Rules are the units of analysis. Each rule has a name, metadata,
 ;;; and a check procedure whose signature depends on the rule type.
 (define-module (gulie rule)
  #:use-module (srfi srfi-9)
  #:use-module (srfi srfi-1)
  #:export (<rule>
            make-rule
            rule?
            rule-name
            rule-description
            rule-severity
            rule-category
            rule-type
            rule-check-proc
            rule-fix-proc
            register-rule!
            all-rules
            rules-of-type
            find-rule
            clear-rules!))
 ;; A lint/format rule.
 (define-record-type <rule>
  (make-rule name description severity category type check-proc fix-proc)
  rule?
  (name        rule-name)         ;; symbol
  (description rule-description)  ;; string
  (severity    rule-severity)     ;; symbol: error | warning | info
  (category    rule-category)     ;; symbol: format | style | correctness | idiom
  (type        rule-type)         ;; symbol: line | cst | tree-il
  (check-proc  rule-check-proc)  ;; procedure
  (fix-proc    rule-fix-proc))   ;; procedure | #f
 ;; Global rule registry.
 (define *rules* '())
 (define (register-rule! rule)
  "Register a rule in the global registry."
  (set! *rules* (cons rule *rules*)))
 (define (all-rules)
  "Return all registered rules."
  (reverse *rules*))
 (define (rules-of-type type)
  "Return all registered rules of the given TYPE."
  (filter (lambda (r) (eq? (rule-type r) type))
          *rules*))
 (define (find-rule name)
  "Find a rule by NAME, or #f."
  (find (lambda (r) (eq? (rule-name r) name))
        *rules*))
 (define (clear-rules!)
  "Clear all registered rules. Useful for testing."
  (set! *rules* '()))
--- a/gulie/rules/comments.scm
+++ b/gulie/rules/comments.scm
@@ -0,0 +1,71 @@
 ;;; (gulie rules comments) — comment style conventions
 ;;;
 ;;; Checks that comments follow standard Scheme conventions:
 ;;; ;    — inline comments (after code on same line)
 ;;; ;;   — line comments (own line, aligned with code)
 ;;; ;;;  — section/file-level comments
 ;;; ;;;; — file headers
 (define-module (gulie rules comments)
  #:use-module (gulie rule)
  #:use-module (gulie diagnostic))
 ;; Count leading semicolons in a comment string.
 (define (count-semicolons text)
  (let lp ((i 0))
    (if (and (< i (string-length text))
             (char=? (string-ref text i) #\;))
        (lp (1+ i))
        i)))
 ;; Is a line (before the comment) only whitespace?
 (define (comment-only-line? line-text comment-col)
  (let lp ((i 0))
    (cond
     ((>= i comment-col) #t)
     ((char-whitespace? (string-ref line-text i)) (lp (1+ i)))
     (else #f))))
 (register-rule!
 (make-rule
  'comment-semicolons
  "Check comment semicolon count follows conventions"
  'info 'style 'line
  (lambda (file line-num line-text config)
    (let ((pos (string-index line-text #\;)))
      (if (not pos)
          '()
          ;; Check if the semicolon is inside a string (rough heuristic:
          ;; count quotes before the semicolon position)
          (let ((quotes-before (let lp ((i 0) (count 0) (in-escape #f))
                                 (cond
                                  ((>= i pos) count)
                                  ((and (not in-escape) (char=? (string-ref line-text i) #\\))
                                   (lp (1+ i) count #t))
                                  ((and (not in-escape) (char=? (string-ref line-text i) #\"))
                                   (lp (1+ i) (1+ count) #f))
                                  (else (lp (1+ i) count #f))))))
            (if (odd? quotes-before)
                ;; Inside a string — not a real comment
                '()
                (let* ((semis (count-semicolons (substring line-text pos)))
                       (own-line? (comment-only-line? line-text pos)))
                  (cond
                   ;; Inline comment (after code) should use single ;
                   ;; But we don't enforce this strictly — just flag ;;; or more inline
                   ((and (not own-line?) (>= semis 3))
                    (list (make-diagnostic
                           file line-num pos
                           'info 'comment-semicolons
                           "inline comments should use ; or ;; not ;;;"
                           #f)))
                   ;; Own-line comment with single ; (should be ;;)
                   ((and own-line? (= semis 1) (> (string-length line-text) (1+ pos))
                         (not (char=? (string-ref line-text (1+ pos)) #\!)))
                    (list (make-diagnostic
                           file line-num pos
                           'info 'comment-semicolons
                           "line comments should use ;; not ;"
                           #f)))
                   (else '()))))))))
  #f))
--- a/gulie/rules/surface.scm
+++ b/gulie/rules/surface.scm
@@ -0,0 +1,93 @@
 ;;; (gulie rules surface) — surface-level line rules
 ;;;
 ;;; These rules operate on raw text lines. They need no parsing —
 ;;; just the file path, line number, and line content.
 (define-module (gulie rules surface)
  #:use-module (gulie rule)
  #:use-module (gulie diagnostic))
 ;;; trailing-whitespace — trailing spaces or tabs at end of line
 (register-rule!
 (make-rule
  'trailing-whitespace
  "Line has trailing whitespace"
  'warning 'format 'line
  (lambda (file line-num line-text config)
    (let ((trimmed (string-trim-right line-text)))
      (if (and (not (string=? line-text trimmed))
               (> (string-length line-text) 0))
          (list (make-diagnostic
                 file line-num
                 (string-length trimmed)
                 'warning 'trailing-whitespace
                 "trailing whitespace"
                 (make-fix 'replace-line line-num 0
                           line-num (string-length line-text)
                           trimmed)))
          '())))
  #f))
 ;;; line-length — line exceeds maximum width
 (define (config-max-line-length config)
  (or (assq-ref config 'line-length) 80))
 (register-rule!
 (make-rule
  'line-length
  "Line exceeds maximum length"
  'warning 'format 'line
  (lambda (file line-num line-text config)
    (let ((max-len (config-max-line-length config)))
      (if (> (string-length line-text) max-len)
          (list (make-diagnostic
                 file line-num max-len
                 'warning 'line-length
                 (format #f "line exceeds ~a characters (~a)"
                         max-len (string-length line-text))
                 #f))
          '())))
  #f))
 ;;; no-tabs — tab characters in source
 (register-rule!
 (make-rule
  'no-tabs
  "Tab character found in source"
  'warning 'format 'line
  (lambda (file line-num line-text config)
    (let ((pos (string-index line-text #\tab)))
      (if pos
          (list (make-diagnostic
                 file line-num pos
                 'warning 'no-tabs
                 "tab character found; use spaces for indentation"
                 #f))
          '())))
  #f))
 ;;; blank-lines — excessive consecutive blank lines
 (register-rule!
 (make-rule
  'blank-lines
  "Excessive consecutive blank lines"
  'warning 'format 'line
  (lambda (file line-num line-text config)
    ;; This rule uses a stateful approach: the engine tracks consecutive
    ;; blank lines and passes the count via config. See engine.scm for
    ;; the blank-line counting logic.
    (let ((max-blanks (or (assq-ref config 'max-blank-lines) 2))
          (consecutive (or (assq-ref config '%consecutive-blanks) 0)))
      (if (and (string-every char-whitespace? line-text)
               (> consecutive max-blanks))
          (list (make-diagnostic
                 file line-num 0
                 'warning 'blank-lines
                 (format #f "more than ~a consecutive blank lines" max-blanks)
                 #f))
          '())))
  #f))
--- a/gulie/suppression.scm
+++ b/gulie/suppression.scm
@@ -0,0 +1,139 @@
 ;;; (gulie suppression) — inline suppression via comments
 ;;;
 ;;; Parses ; gulie:suppress and ; gulie:disable/enable directives
 ;;; from raw source text. Returns a suppression map used to filter
 ;;; diagnostics after all rules have run.
 (define-module (gulie suppression)
  #:use-module (ice-9 regex)
  #:use-module (ice-9 rdelim)
  #:use-module (srfi srfi-1)
  #:use-module (gulie diagnostic)
  #:export (parse-suppressions
            filter-suppressions))
 ;; A suppression entry.
 ;; line: 1-based line number this applies to
 ;; rules: list of rule name symbols, or #t for "all rules"
 ;; kind: 'this-line | 'next-line | 'region-start | 'region-end
 (define (make-suppression line rules kind)
  (list line rules kind))
 (define (suppression-line s) (car s))
 (define (suppression-rules s) (cadr s))
 (define (suppression-kind s) (caddr s))
 (define *suppress-re*
  (make-regexp ";+\\s*gulie:suppress\\s*(.*)$"))
 (define *disable-re*
  (make-regexp ";+\\s*gulie:disable\\s+(.+)$"))
 (define *enable-re*
  (make-regexp ";+\\s*gulie:enable\\s+(.+)$"))
 (define (parse-rule-names str)
  "Parse space-separated rule names from STR. Empty → #t (all rules)."
  (let ((trimmed (string-trim-both str)))
    (if (string-null? trimmed)
        #t
        (map string->symbol (string-split trimmed #\space)))))
 (define (parse-suppressions text)
  "Parse suppression directives from source TEXT.
 Returns a list of (line rules kind) entries."
  (let ((lines (string-split text #\newline))
        (result '()))
    (let lp ((lines lines) (line-num 1) (acc '()))
      (if (null? lines)
          (reverse acc)
          (let ((line (car lines)))
            (cond
             ;; ; gulie:suppress [rules...] — on the SAME line if code precedes it,
             ;; or on the NEXT line if the line is comment-only
             ((regexp-exec *suppress-re* line)
              => (lambda (m)
                   (let* ((rules (parse-rule-names (match:substring m 1)))
                          (trimmed (string-trim line))
                          ;; If line starts with ;, it's comment-only → suppress next line
                          (kind (if (char=? (string-ref trimmed 0) #\;)
                                    'next-line
                                    'this-line))
                          (target-line (if (eq? kind 'next-line)
                                          (1+ line-num)
                                          line-num)))
                     (lp (cdr lines) (1+ line-num)
                         (cons (make-suppression target-line rules kind) acc)))))
             ;; ; gulie:disable rule — region start
             ((regexp-exec *disable-re* line)
              => (lambda (m)
                   (let ((rules (parse-rule-names (match:substring m 1))))
                     (lp (cdr lines) (1+ line-num)
                         (cons (make-suppression line-num rules 'region-start) acc)))))
             ;; ; gulie:enable rule — region end
             ((regexp-exec *enable-re* line)
              => (lambda (m)
                   (let ((rules (parse-rule-names (match:substring m 1))))
                     (lp (cdr lines) (1+ line-num)
                         (cons (make-suppression line-num rules 'region-end) acc)))))
             (else
              (lp (cdr lines) (1+ line-num) acc))))))))
 (define (build-suppression-set suppressions)
  "Build a procedure (line rule-name) -> #t if suppressed."
  ;; Point suppressions: hash of line-num -> rules
  (let ((point-map (make-hash-table))
        (regions '()))
    ;; Collect point suppressions and regions
    (for-each
     (lambda (s)
       (case (suppression-kind s)
         ((this-line next-line)
          (hashv-set! point-map (suppression-line s)
                      (suppression-rules s)))
         ((region-start)
          (set! regions (cons s regions)))))
     suppressions)
    ;; Build region intervals
    (let ((region-intervals
           (let lp ((remaining (reverse regions)) (intervals '()))
             (if (null? remaining)
                 intervals
                 (let* ((start (car remaining))
                        ;; Find matching end
                        (end-entry (find (lambda (s)
                                          (and (eq? (suppression-kind s) 'region-end)
                                               (> (suppression-line s) (suppression-line start))
                                               (equal? (suppression-rules s)
                                                       (suppression-rules start))))
                                        suppressions)))
                   (lp (cdr remaining)
                       (cons (list (suppression-line start)
                                   (if end-entry (suppression-line end-entry) 999999)
                                   (suppression-rules start))
                             intervals)))))))
      ;; Return predicate
      (lambda (line rule-name)
        (or
         ;; Check point suppressions
         (let ((rules (hashv-ref point-map line)))
           (and rules
                (or (eq? rules #t)
                    (memq rule-name rules))))
         ;; Check region suppressions
         (any (lambda (interval)
                (and (>= line (car interval))
                     (<= line (cadr interval))
                     (let ((rules (caddr interval)))
                       (or (eq? rules #t)
                           (memq rule-name rules)))))
              region-intervals))))))
 (define (filter-suppressions diagnostics suppressions)
  "Remove diagnostics that are suppressed."
  (if (null? suppressions)
      diagnostics
      (let ((suppressed? (build-suppression-set suppressions)))
        (filter (lambda (d)
                  (not (suppressed? (diagnostic-line d) (diagnostic-rule d))))
                diagnostics))))
--- a/gulie/tokenizer.scm
+++ b/gulie/tokenizer.scm
@@ -0,0 +1,348 @@
 ;;; (gulie tokenizer) — hand-written lexer preserving all tokens
 ;;;
 ;;; Tokenises Guile Scheme source code into a flat list of <token>
 ;;; records, preserving whitespace, comments, and exact source text.
 ;;;
 ;;; Critical invariant:
 ;;;   (string-concatenate (map token-text (tokenize input)))
 ;;;     ≡ input
 (define-module (gulie tokenizer)
  #:use-module (srfi srfi-9)
  #:use-module (srfi srfi-11)
  #:export (<token>
            make-token
            token?
            token-type
            token-text
            token-line
            token-column
            tokenize))
 (define-record-type <token>
  (make-token type text line column)
  token?
  (type   token-type)    ;; symbol
  (text   token-text)    ;; string: exact source text
  (line   token-line)    ;; integer: 1-based
  (column token-column)) ;; integer: 0-based
 (define (tokenize input filename)
  "Tokenise INPUT string. Returns a list of <token> records.
 FILENAME is used only for error messages."
  (let ((port (open-input-string input))
        (tokens '())
        (line 1)
        (col 0))
    (define (peek) (peek-char port))
    (define (advance!)
      (let ((ch (read-char port)))
        (cond
         ((eof-object? ch) ch)
         ((char=? ch #\newline)
          (set! line (1+ line))
          (set! col 0)
          ch)
         (else
          (set! col (1+ col))
          ch))))
    (define (emit! type text start-line start-col)
      (set! tokens (cons (make-token type text start-line start-col)
                         tokens)))
    (define (collect-while first pred)
      "Collect chars starting with FIRST while PRED holds on peek."
      (let lp ((acc (list first)))
        (let ((ch (peek)))
          (if (and (not (eof-object? ch)) (pred ch))
              (begin (advance!) (lp (cons ch acc)))
              (list->string (reverse acc))))))
    (define (char-hex? ch)
      (or (char-numeric? ch)
          (memv ch '(#\a #\b #\c #\d #\e #\f
                     #\A #\B #\C #\D #\E #\F))))
    (define (delimiter? ch)
      (or (eof-object? ch)
          (char-whitespace? ch)
          (memv ch '(#\( #\) #\[ #\] #\{ #\} #\" #\; #\#))))
    (define (read-string-literal)
      "Read string body after opening quote. Returns full string including quotes."
      (let lp ((acc (list #\")))
        (let ((ch (advance!)))
          (cond
           ((eof-object? ch)
            (list->string (reverse acc)))
           ((char=? ch #\\)
            (let ((next (advance!)))
              (if (eof-object? next)
                  (list->string (reverse (cons ch acc)))
                  (lp (cons next (cons ch acc))))))
           ((char=? ch #\")
            (list->string (reverse (cons ch acc))))
           (else
            (lp (cons ch acc)))))))
    (define (read-line-comment first)
      "Read from FIRST (;) to end of line, not including the newline."
      (collect-while first (lambda (ch) (not (char=? ch #\newline)))))
    (define (read-block-comment)
      "Read block comment body after #|. Returns full text including #| and |#."
      (let lp ((acc (list #\| #\#)) (depth 1))
        (let ((ch (advance!)))
          (cond
           ((eof-object? ch)
            (list->string (reverse acc)))
           ((and (char=? ch #\|) (eqv? (peek) #\#))
            (let ((hash (advance!)))
              (if (= depth 1)
                  (list->string (reverse (cons hash (cons ch acc))))
                  (lp (cons hash (cons ch acc)) (1- depth)))))
           ((and (char=? ch #\#) (eqv? (peek) #\|))
            (let ((pipe (advance!)))
              (lp (cons pipe (cons ch acc)) (1+ depth))))
           (else
            (lp (cons ch acc) depth))))))
    (define (read-character-literal)
      "Read character literal after #\\. Returns full text including #\\."
      (let ((ch (advance!)))
        (cond
         ((eof-object? ch) "#\\")
         ((and (char-alphabetic? ch)
               (let ((pk (peek)))
                 (and (not (eof-object? pk))
                      (char-alphabetic? pk))))
          (string-append "#\\"
                         (collect-while ch char-alphabetic?)))
         ((and (char=? ch #\x)
               (let ((pk (peek)))
                 (and (not (eof-object? pk))
                      (char-hex? pk))))
          (string-append "#\\"
                         (collect-while ch
                           (lambda (c) (or (char-hex? c) (char=? c #\x))))))
         (else
          (string-append "#\\" (string ch))))))
    (define (read-shebang)
      "Read shebang/directive after #!. Returns full text."
      (let lp ((acc (list #\! #\#)))
        (let ((c (advance!)))
          (cond
           ((eof-object? c)
            (list->string (reverse acc)))
           ((and (char=? c #\!) (eqv? (peek) #\#))
            (let ((h (advance!)))
              (list->string (reverse (cons h (cons c acc))))))
           (else
            (lp (cons c acc)))))))
    (define (read-other-sharp next)
      "Read # sequence where NEXT is first char after #. Returns full text."
      (let ((text (string-append "#"
                    (collect-while next
                      (lambda (c) (not (delimiter? c)))))))
        (if (eqv? (peek) #\()
            (begin (advance!)
                   (values 'special (string-append text "(")))
            (values 'special text))))
    ;; Main tokenisation loop
    (let lp ()
      (let ((ch (peek)))
        (cond
         ((eof-object? ch)
          (reverse tokens))
         ;; Newline
         ((char=? ch #\newline)
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'newline "\n" sl sc)
            (lp)))
         ;; Whitespace (non-newline)
         ((char-whitespace? ch)
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'whitespace
                   (collect-while ch
                     (lambda (c) (and (char-whitespace? c)
                                      (not (char=? c #\newline)))))
                   sl sc)
            (lp)))
         ;; Line comment
         ((char=? ch #\;)
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'line-comment (read-line-comment ch) sl sc)
            (lp)))
         ;; String
         ((char=? ch #\")
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'string (read-string-literal) sl sc)
            (lp)))
         ;; Open paren
         ((memv ch '(#\( #\[ #\{))
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'open-paren (string ch) sl sc)
            (lp)))
         ;; Close paren
         ((memv ch '(#\) #\] #\}))
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'close-paren (string ch) sl sc)
            (lp)))
         ;; Quote
         ((char=? ch #\')
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'prefix "'" sl sc)
            (lp)))
         ;; Quasiquote
         ((char=? ch #\`)
          (let ((sl line) (sc col))
            (advance!)
            (emit! 'prefix "`" sl sc)
            (lp)))
         ;; Unquote / unquote-splicing
         ((char=? ch #\,)
          (let ((sl line) (sc col))
            (advance!)
            (cond
             ((eqv? (peek) #\@)
              (advance!)
              (emit! 'prefix ",@" sl sc))
             (else
              (emit! 'prefix "," sl sc)))
            (lp)))
         ;; Sharp sequences
         ((char=? ch #\#)
          (let ((sl line) (sc col))
            (advance!) ;; consume #
            (let ((next (peek)))
              (cond
               ;; Block comment #|...|#
               ((eqv? next #\|)
                (advance!)
                (emit! 'block-comment (read-block-comment) sl sc)
                (lp))
               ;; Datum comment #;
               ((eqv? next #\;)
                (advance!)
                (emit! 'special "#;" sl sc)
                (lp))
               ;; Boolean #t, #f, #true, #false
               ((or (eqv? next #\t) (eqv? next #\f))
                (advance!)
                (let* ((rest (if (and (not (eof-object? (peek)))
                                      (char-alphabetic? (peek)))
                                 (collect-while next char-alphabetic?)
                                 (string next)))
                       (text (string-append "#" rest)))
                  (emit! 'boolean text sl sc)
                  (lp)))
               ;; Character literal #\x
               ((eqv? next #\\)
                (advance!)
                (emit! 'character (read-character-literal) sl sc)
                (lp))
               ;; Keyword #:foo
               ((eqv? next #\:)
                (advance!)
                (let ((name (if (and (not (eof-object? (peek)))
                                     (not (delimiter? (peek))))
                                (collect-while (advance!)
                                  (lambda (c) (not (delimiter? c))))
                                "")))
                  (emit! 'keyword (string-append "#:" name) sl sc)
                  (lp)))
               ;; Syntax shorthands: #', #`, #,, #,@
               ((eqv? next #\')
                (advance!)
                (emit! 'prefix "#'" sl sc)
                (lp))
               ((eqv? next #\`)
                (advance!)
                (emit! 'prefix "#`" sl sc)
                (lp))
               ((eqv? next #\,)
                (advance!)
                (cond
                 ((eqv? (peek) #\@)
                  (advance!)
                  (emit! 'prefix "#,@" sl sc))
                 (else
                  (emit! 'prefix "#," sl sc)))
                (lp))
               ;; Vector #(
               ((eqv? next #\()
                (advance!)
                (emit! 'special "#(" sl sc)
                (lp))
               ;; Shebang #!...!#
               ((eqv? next #\!)
                (advance!)
                (emit! 'block-comment (read-shebang) sl sc)
                (lp))
               ;; Other # sequences: #vu8(, #*, etc.
               ((and (not (eof-object? next))
                     (not (delimiter? next)))
                (advance!)
                (let-values (((type text) (read-other-sharp next)))
                  (emit! type text sl sc)
                  (lp)))
               ;; Bare # at delimiter boundary
               (else
                (emit! 'symbol "#" sl sc)
                (lp))))))
         ;; Dot
         ((char=? ch #\.)
          (let ((sl line) (sc col))
            (advance!)
            (if (delimiter? (peek))
                (begin
                  (emit! 'dot "." sl sc)
                  (lp))
                (let ((text (collect-while ch
                              (lambda (c) (not (delimiter? c))))))
                  (emit! 'symbol text sl sc)
                  (lp)))))
         ;; Everything else: symbol or number
         (else
          (let ((sl line) (sc col))
            (advance!)
            (let ((text (collect-while ch
                          (lambda (c) (not (delimiter? c))))))
              (emit! (if (string->number text) 'number 'symbol)
                     text sl sc)
              (lp)))))))))
--- a/test/fixtures/clean/well-formatted.scm
+++ b/test/fixtures/clean/well-formatted.scm
@@ -0,0 +1,13 @@
 ;;; A well-formatted Guile source file.
 ;;; This should produce zero surface diagnostics.
 (define-module (test well-formatted)
  #:export (greet add))
 ;; Greet a person by name.
 (define (greet name)
  (string-append "Hello, " name "!"))
 ;; Add two numbers.
 (define (add a b)
  (+ a b))
--- a/test/fixtures/violations/semantic.scm
+++ b/test/fixtures/violations/semantic.scm
@@ -0,0 +1,9 @@
 (define-module (test semantic)
  #:use-module (ice-9 format))
 (define (foo x y)
  (let ((unused 42))
    (+ x 1)))
 (define (bar a)
  (baz a))
--- a/test/fixtures/violations/surface.scm
+++ b/test/fixtures/violations/surface.scm
@@ -0,0 +1,12 @@
 (define x 42)
 (define y "hello")
 (define z (+ x y))
 ;; This line is fine
 (define (long-function-name-that-exceeds-the-default-eighty-character-limit arg1 arg2 arg3 arg4 arg5)
  (+ arg1 arg2))
 ;; After too many blank lines
 (define w 99)
--- a/test/run-tests.scm
+++ b/test/run-tests.scm
@@ -0,0 +1,34 @@
 #!/usr/bin/env -S guile --no-auto-compile -s
 !#
 ;;; Test runner for gulie
 ;; Add project root to load path
 (let ((dir (dirname (dirname (current-filename)))))
  (set! %load-path (cons dir %load-path)))
 (use-modules (srfi srfi-64))
 ;; Configure test runner for CI-friendly output
 (test-runner-current
 (let ((runner (test-runner-simple)))
   (test-runner-on-final! runner
     (lambda (runner)
       (let ((pass (test-runner-pass-count runner))
             (fail (test-runner-fail-count runner))
             (skip (test-runner-skip-count runner)))
         (newline)
         (format #t "Results: ~a passed, ~a failed, ~a skipped~%"
                 pass fail skip)
         (when (> fail 0)
           (exit 1)))))
   runner))
 ;; Load and run all test files (paths relative to project root)
 (let ((root (dirname (dirname (current-filename)))))
  (define (load-test name)
    (load (string-append root "/test/" name)))
  (load-test "test-tokenizer.scm")
  (load-test "test-cst.scm")
  (load-test "test-rules.scm")
  (load-test "test-suppression.scm")
  (load-test "test-compiler.scm"))
--- a/test/test-compiler.scm
+++ b/test/test-compiler.scm
@@ -0,0 +1,56 @@
 ;;; Tests for (gulie compiler) — semantic analysis pass
 (use-modules (srfi srfi-64)
             (srfi srfi-1)
             (gulie compiler)
             (gulie diagnostic))
 (test-begin "compiler")
 (test-group "unused-variable"
  (let ((diags (compile-and-capture-warnings
                "test.scm"
                "(define (foo x)\n  (let ((unused 42))\n    x))\n"
                '())))
    (test-assert "detects unused variable"
      (any (lambda (d) (eq? (diagnostic-rule d) 'unused-variable))
           diags))))
 (test-group "unbound-variable"
  (let ((diags (compile-and-capture-warnings
                "test.scm"
                "(define (foo x)\n  (+ x unknown-thing))\n"
                '())))
    (test-assert "detects unbound variable"
      (any (lambda (d) (eq? (diagnostic-rule d) 'unbound-variable))
           diags))))
 (test-group "arity-mismatch"
  (let ((diags (compile-and-capture-warnings
                "test.scm"
                "(define (foo x) x)\n(define (bar) (foo 1 2 3))\n"
                '())))
    (test-assert "detects arity mismatch"
      (any (lambda (d) (eq? (diagnostic-rule d) 'arity-mismatch))
           diags))))
 (test-group "clean-code"
  (let ((diags (compile-and-capture-warnings
                "test.scm"
                "(define (foo x) (+ x 1))\n"
                '())))
    ;; May have unused-toplevel but no real errors
    (test-assert "no compile errors"
      (not (any (lambda (d) (eq? (diagnostic-severity d) 'error))
                diags)))))
 (test-group "syntax-error"
  (let ((diags (compile-and-capture-warnings
                "test.scm"
                "(define (foo x) (+ x"
                '())))
    (test-assert "catches syntax error"
      (any (lambda (d) (eq? (diagnostic-severity d) 'error))
           diags))))
 (test-end "compiler")
--- a/test/test-cst.scm
+++ b/test/test-cst.scm
@@ -0,0 +1,59 @@
 ;;; Tests for (gulie cst)
 (use-modules (srfi srfi-1)
             (srfi srfi-64)
             (gulie tokenizer)
             (gulie cst))
 (test-begin "cst")
 (test-group "basic-parsing"
  (let* ((tokens (tokenize "(define x 42)" "test.scm"))
         (cst (parse-cst tokens)))
    (test-assert "root is cst-node" (cst-node? cst))
    (test-assert "root has no open paren" (not (cst-node-open cst)))
    (let ((sig (cst-significant-children cst)))
      (test-equal "one top-level form" 1 (length sig))
      (test-assert "top-level is cst-node" (cst-node? (car sig))))))
 (test-group "form-name"
  (let* ((tokens (tokenize "(define x 42)" "test.scm"))
         (cst (parse-cst tokens))
         (form (car (cst-significant-children cst))))
    (test-equal "form name is define" "define" (cst-form-name form))))
 (test-group "nested-forms"
  (let* ((tokens (tokenize "(let ((x 1)) (+ x 2))" "test.scm"))
         (cst (parse-cst tokens))
         (form (car (cst-significant-children cst))))
    (test-equal "form name is let" "let" (cst-form-name form))
    ;; Should have nested cst-nodes for ((x 1)) and (+ x 2)
    (let ((inner-nodes (filter cst-node? (cst-node-children form))))
      (test-assert "has nested nodes" (>= (length inner-nodes) 2)))))
 (test-group "multiple-top-level"
  (let* ((tokens (tokenize "(define a 1)\n(define b 2)\n(define c 3)" "test.scm"))
         (cst (parse-cst tokens))
         (sig (cst-significant-children cst)))
    (test-equal "three top-level forms" 3 (length sig))))
 (test-group "comments-preserved"
  (let* ((tokens (tokenize ";; header\n(define x 1)\n" "test.scm"))
         (cst (parse-cst tokens))
         (children (cst-node-children cst)))
    ;; Should include the comment as a token child
    (test-assert "has comment token"
      (any (lambda (c)
             (and (token? c) (eq? (token-type c) 'line-comment)))
           children))))
 (test-group "prefix-handling"
  (let* ((tokens (tokenize "'(1 2 3)" "test.scm"))
         (cst (parse-cst tokens))
         (children (cst-node-children cst)))
    (test-assert "has prefix token"
      (any (lambda (c)
             (and (token? c) (eq? (token-type c) 'prefix)))
           children))))
 (test-end "cst")
--- a/test/test-rules.scm
+++ b/test/test-rules.scm
@@ -0,0 +1,104 @@
 ;;; Tests for rule modules
 (use-modules (srfi srfi-64)
             (srfi srfi-1)
             (gulie rule)
             (gulie diagnostic)
             (gulie rules surface)
             (gulie rules comments))
 (test-begin "rules")
 ;;; Surface rules
 (test-group "trailing-whitespace"
  (let ((rule (find-rule 'trailing-whitespace)))
    (test-assert "rule registered" rule)
    (test-equal "clean line produces no diagnostics"
      '()
      ((rule-check-proc rule) "f.scm" 1 "(define x 42)" '()))
    (let ((diags ((rule-check-proc rule) "f.scm" 1 "(define x 42)  " '())))
      (test-equal "trailing spaces detected" 1 (length diags))
      (test-equal "correct column"
        (string-length "(define x 42)")
        (diagnostic-column (car diags))))
    (test-equal "empty line no diagnostic"
      '()
      ((rule-check-proc rule) "f.scm" 1 "" '()))))
 (test-group "line-length"
  (let ((rule (find-rule 'line-length)))
    (test-assert "rule registered" rule)
    (test-equal "short line ok"
      '()
      ((rule-check-proc rule) "f.scm" 1 "(define x 42)" '()))
    (let* ((long-line (make-string 81 #\x))
           (diags ((rule-check-proc rule) "f.scm" 1 long-line '())))
      (test-equal "long line detected" 1 (length diags)))
    (let* ((config '((line-length . 120)))
           (line (make-string 100 #\x))
           (diags ((rule-check-proc rule) "f.scm" 1 line config)))
      (test-equal "respects config" 0 (length diags)))))
 (test-group "no-tabs"
  (let ((rule (find-rule 'no-tabs)))
    (test-assert "rule registered" rule)
    (test-equal "no tabs ok"
      '()
      ((rule-check-proc rule) "f.scm" 1 "  (define x 1)" '()))
    (let ((diags ((rule-check-proc rule) "f.scm" 1 "\t(define x 1)" '())))
      (test-equal "tab detected" 1 (length diags)))))
 (test-group "blank-lines"
  (let ((rule (find-rule 'blank-lines)))
    (test-assert "rule registered" rule)
    (test-equal "normal blank ok"
      '()
      ((rule-check-proc rule) "f.scm" 5 ""
       '((max-blank-lines . 2) (%consecutive-blanks . 1))))
    (let ((diags ((rule-check-proc rule) "f.scm" 5 ""
                  '((max-blank-lines . 2) (%consecutive-blanks . 3)))))
      (test-equal "excessive blanks detected" 1 (length diags)))))
 ;;; Comment rules
 (test-group "comment-semicolons"
  (let ((rule (find-rule 'comment-semicolons)))
    (test-assert "rule registered" rule)
    (test-equal "double semicolon on own line ok"
      '()
      ((rule-check-proc rule) "f.scm" 1 "  ;; good comment" '()))
    ;; Single semicolon on own line
    (let ((diags ((rule-check-proc rule) "f.scm" 1 "  ; bad comment" '())))
      (test-equal "single ; on own line flagged" 1 (length diags)))))
 ;;; Diagnostic formatting
 (test-group "diagnostic-format"
  (let ((d (make-diagnostic "foo.scm" 10 5 'warning 'test-rule "oops" #f)))
    (test-equal "format matches expected"
      "foo.scm:10:5: warning: test-rule: oops"
      (format-diagnostic d))))
 (test-group "diagnostic-sorting"
  (let ((d1 (make-diagnostic "a.scm" 10 0 'warning 'r "m" #f))
        (d2 (make-diagnostic "a.scm" 5 0 'warning 'r "m" #f))
        (d3 (make-diagnostic "b.scm" 1 0 'warning 'r "m" #f)))
    (let ((sorted (sort (list d1 d2 d3) diagnostic<?)))
      (test-equal "first is a.scm:5" 5 (diagnostic-line (car sorted)))
      (test-equal "second is a.scm:10" 10 (diagnostic-line (cadr sorted)))
      (test-equal "third is b.scm" "b.scm" (diagnostic-file (caddr sorted))))))
 (test-end "rules")
--- a/test/test-suppression.scm
+++ b/test/test-suppression.scm
@@ -0,0 +1,43 @@
 ;;; Tests for (gulie suppression)
 (use-modules (srfi srfi-64)
             (gulie suppression)
             (gulie diagnostic))
 (test-begin "suppression")
 (test-group "parse-inline-suppress"
  (let ((supps (parse-suppressions
                "(define x 1) ; gulie:suppress trailing-whitespace\n")))
    (test-equal "one suppression" 1 (length supps))
    (test-equal "this-line kind" 'this-line (caddr (car supps)))))
 (test-group "parse-next-line-suppress"
  (let ((supps (parse-suppressions
                ";; gulie:suppress line-length\n(define x 1)\n")))
    (test-equal "one suppression" 1 (length supps))
    (test-equal "targets line 2" 2 (car (car supps)))))
 (test-group "parse-suppress-all"
  (let ((supps (parse-suppressions
                "(define x 1) ; gulie:suppress\n")))
    (test-equal "one suppression" 1 (length supps))
    (test-eq "all rules" #t (cadr (car supps)))))
 (test-group "filter-diagnostics"
  (let ((diags (list (make-diagnostic "f.scm" 1 0 'warning 'trailing-whitespace "tw" #f)
                     (make-diagnostic "f.scm" 2 0 'warning 'line-length "ll" #f)))
        (supps (parse-suppressions
                "(define x 1) ; gulie:suppress trailing-whitespace\n(define y 2)\n")))
    (let ((filtered (filter-suppressions diags supps)))
      (test-equal "one diagnostic filtered" 1 (length filtered))
      (test-eq "remaining is line-length" 'line-length
        (diagnostic-rule (car filtered))))))
 (test-group "region-suppression"
  (let ((supps (parse-suppressions
                ";; gulie:disable line-length\n(define x 1)\n(define y 2)\n;; gulie:enable line-length\n(define z 3)\n")))
    ;; Should have region-start and region-end
    (test-assert "has region entries" (>= (length supps) 2))))
 (test-end "suppression")
--- a/test/test-tokenizer.scm
+++ b/test/test-tokenizer.scm
@@ -0,0 +1,127 @@
 ;;; Tests for (gulie tokenizer)
 (use-modules (srfi srfi-1)
             (srfi srfi-64)
             (ice-9 rdelim)
             (gulie tokenizer))
 (test-begin "tokenizer")
 ;;; Roundtrip invariant — the most critical test
 (test-group "roundtrip"
  (define (roundtrip-ok? input)
    (let* ((tokens (tokenize input "test.scm"))
           (result (string-concatenate (map token-text tokens))))
      (string=? input result)))
  (test-assert "empty input"
    (roundtrip-ok? ""))
  (test-assert "simple expression"
    (roundtrip-ok? "(define x 42)"))
  (test-assert "nested expressions"
    (roundtrip-ok? "(define (foo x)\n  (+ x 1))\n"))
  (test-assert "string with escapes"
    (roundtrip-ok? "(define s \"hello \\\"world\\\"\")"))
  (test-assert "line comment"
    (roundtrip-ok? ";; a comment\n(define x 1)\n"))
  (test-assert "block comment"
    (roundtrip-ok? "#| block\ncomment |#\n(define x 1)"))
  (test-assert "nested block comment"
    (roundtrip-ok? "#| outer #| inner |# outer |#"))
  (test-assert "datum comment"
    (roundtrip-ok? "#;(skip this) (keep this)"))
  (test-assert "character literals"
    (roundtrip-ok? "(list #\\space #\\newline #\\a #\\x41)"))
  (test-assert "keywords"
    (roundtrip-ok? "(foo #:bar #:baz)"))
  (test-assert "booleans"
    (roundtrip-ok? "(list #t #f #true #false)"))
  (test-assert "vectors"
    (roundtrip-ok? "#(1 2 3)"))
  (test-assert "quasiquote and unquote"
    (roundtrip-ok? "`(a ,b ,@c)"))
  (test-assert "syntax shorthands"
    (roundtrip-ok? "#'x #`x #,x #,@x"))
  (test-assert "dot notation"
    (roundtrip-ok? "(a . b)"))
  (test-assert "numbers"
    (roundtrip-ok? "(+ 1 2.5 -3 +4 1/3 #xff)"))
  (test-assert "square brackets"
    (roundtrip-ok? "(let ([x 1] [y 2]) (+ x y))"))
  (test-assert "multiline with mixed content"
    (roundtrip-ok? "(define-module (foo bar)
  #:use-module (ice-9 format)
  #:export (baz))
 ;;; Section header
 (define (baz x)
  ;; body comment
  (format #t \"value: ~a\\n\" x))
 "))
  ;; Real-world file roundtrip
  (test-assert "real guile source file"
    (let ((text (call-with-input-file "refs/guile/module/ice-9/pretty-print.scm"
                  (lambda (port)
                    (let lp ((acc '()))
                      (let ((ch (read-char port)))
                        (if (eof-object? ch)
                            (list->string (reverse acc))
                            (lp (cons ch acc)))))))))
      (roundtrip-ok? text))))
 ;;; Token type classification
 (test-group "token-types"
  (define (first-token-type input)
    (token-type (car (tokenize input "test.scm"))))
  (test-eq "symbol" 'symbol (first-token-type "foo"))
  (test-eq "number" 'number (first-token-type "42"))
  (test-eq "string" 'string (first-token-type "\"hello\""))
  (test-eq "open-paren" 'open-paren (first-token-type "("))
  (test-eq "close-paren" 'close-paren (first-token-type ")"))
  (test-eq "boolean-true" 'boolean (first-token-type "#t"))
  (test-eq "boolean-false" 'boolean (first-token-type "#f"))
  (test-eq "keyword" 'keyword (first-token-type "#:foo"))
  (test-eq "character" 'character (first-token-type "#\\a"))
  (test-eq "line-comment" 'line-comment (first-token-type ";; hi"))
  (test-eq "prefix-quote" 'prefix (first-token-type "'"))
  (test-eq "prefix-quasiquote" 'prefix (first-token-type "`"))
  (test-eq "dot" 'dot (first-token-type ". "))
  (test-eq "newline" 'newline (first-token-type "\n")))
 ;;; Source location tracking
 (test-group "source-locations"
  (let ((tokens (tokenize "(define\n  x\n  42)" "test.scm")))
    (test-equal "first token line" 1 (token-line (car tokens)))
    (test-equal "first token col" 0 (token-column (car tokens)))
    ;; Find 'x' token
    (let ((x-tok (find (lambda (t) (and (eq? (token-type t) 'symbol)
                                        (string=? (token-text t) "x")))
                       tokens)))
      (test-assert "found x token" x-tok)
      (test-equal "x on line 2" 2 (token-line x-tok))
      (test-equal "x at column 2" 2 (token-column x-tok)))))
 (test-end "tokenizer")