First iteration
This commit is contained in:
813
docs/INSPIRATION.md
Normal file
813
docs/INSPIRATION.md
Normal file
@@ -0,0 +1,813 @@
|
||||
# Inspiration: Existing Lisp Linters, Formatters & Static Analysers
|
||||
|
||||
Survey of reference tools in `./refs/` — what they do, how they work, and what
|
||||
we can steal for a Guile linter/formatter.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
| Tool | Ecosystem | Type | Language |
|
||||
|------|-----------|------|----------|
|
||||
| [Eastwood](#eastwood) | Clojure | Linter (bug-finder) | Clojure/JVM |
|
||||
| [fmt](#fmt) | Racket | Formatter | Racket |
|
||||
| [Kibit](#kibit) | Clojure | Linter (idiom suggester) | Clojure |
|
||||
| [Mallet](#mallet) | Common Lisp | Linter + formatter + fixer | Common Lisp |
|
||||
| [OCICL Lint](#ocicl-lint) | Common Lisp | Linter + fixer | Common Lisp |
|
||||
| [racket-review](#racket-review) | Racket | Linter | Racket |
|
||||
| [SBLint](#sblint) | Common Lisp (SBCL) | Compiler-driven linter | Common Lisp |
|
||||
|
||||
---
|
||||
|
||||
## Eastwood
|
||||
|
||||
**Repo:** `refs/eastwood/` — Clojure linter (v1.4.3, by Jonas Enlund)
|
||||
|
||||
### What it does
|
||||
|
||||
A **bug-finding linter** for Clojure. Focuses on detecting actual errors
|
||||
(wrong arity, undefined vars, misplaced docstrings) rather than enforcing style.
|
||||
Achieves high accuracy by using the same compilation infrastructure as the
|
||||
Clojure compiler itself.
|
||||
|
||||
### How it works
|
||||
|
||||
```
|
||||
File discovery (tools.namespace)
|
||||
→ Topological sort by :require/:use deps
|
||||
→ For each namespace:
|
||||
Parse → Macroexpand → AST (tools.analyzer.jvm) → eval
|
||||
→ Run linter functions over AST nodes
|
||||
→ Filter warnings by config
|
||||
→ Report
|
||||
```
|
||||
|
||||
Key: uses `tools.analyzer.jvm/analyze+eval` — it actually **compiles and
|
||||
evaluates** source code to build an AST. This gives compiler-grade accuracy but
|
||||
means it can only lint code that successfully loads.
|
||||
|
||||
### Architecture
|
||||
|
||||
- **`lint.clj`** — Central coordinator: linter registry, namespace ordering,
|
||||
main analysis loop
|
||||
- **`analyze-ns.clj`** — AST generation via tools.analyzer
|
||||
- **`passes.clj`** — Custom analysis passes (reflection validation, def-name
|
||||
propagation)
|
||||
- **`linters/*.clj`** — Individual linter implementations (~8 files)
|
||||
- **`reporting-callbacks.clj`** — Output formatters (multimethod dispatch)
|
||||
- **`util.clj`** — Config loading, AST walking, warning filtering
|
||||
|
||||
### Rules (25+)
|
||||
|
||||
| Category | Examples |
|
||||
|----------|----------|
|
||||
| Arity | `:wrong-arity` — function called with wrong arg count |
|
||||
| Definitions | `:def-in-def`, `:redefd-vars`, `:misplaced-docstrings` |
|
||||
| Unused | `:unused-private-vars`, `:unused-fn-args`, `:unused-locals`, `:unused-namespaces` |
|
||||
| Suspicious | `:constant-test`, `:suspicious-expression`, `:suspicious-test` |
|
||||
| Style | `:unlimited-use`, `:non-dynamic-earmuffs`, `:local-shadows-var` |
|
||||
| Interop | `:reflection`, `:boxed-math`, `:performance` |
|
||||
| Types | `:wrong-tag`, `:deprecations` |
|
||||
|
||||
### Configuration
|
||||
|
||||
Rules are suppressed via **Clojure code** (not YAML/JSON):
|
||||
|
||||
```clojure
|
||||
(disable-warning
|
||||
{:linter :suspicious-expression
|
||||
:for-macro 'clojure.core/let
|
||||
:if-inside-macroexpansion-of #{'clojure.core/when-first}
|
||||
:within-depth 6
|
||||
:reason "False positive from when-first expansion"})
|
||||
```
|
||||
|
||||
Builtin config files ship for `clojure.core`, contrib libs, and popular
|
||||
third-party libraries. Users add their own via `:config-files` option.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Macroexpansion-aware suppression** — Can distinguish user code from
|
||||
macro-generated code; suppression rules can target specific macro expansions.
|
||||
Critical for any Lisp linter.
|
||||
- **Topological namespace ordering** — Analyse dependencies before dependents.
|
||||
Relevant if we want cross-module analysis.
|
||||
- **Linter registry pattern** — Each linter is a map `{:name :fn :enabled-by-default :url}`.
|
||||
Simple, extensible.
|
||||
- **Warning filtering pipeline** — Raw warnings → handle result → remove ignored
|
||||
faults → remove excluded kinds → filter by config → final warnings. Clean
|
||||
composable chain.
|
||||
- **Metadata preservation through AST transforms** — Custom `postwalk` that
|
||||
preserves metadata. Essential for accurate source locations.
|
||||
|
||||
---
|
||||
|
||||
## fmt
|
||||
|
||||
**Repo:** `refs/fmt/` — Racket code formatter (v0.0.3, by Sorawee Porncharoenwase)
|
||||
|
||||
### What it does
|
||||
|
||||
An **extensible code formatter** for Racket. Reads source, reformats according
|
||||
to style conventions using **cost-based optimal layout selection**. Supports
|
||||
custom formatting rules via pluggable formatter maps.
|
||||
|
||||
### How it works
|
||||
|
||||
Clean **4-stage pipeline**:
|
||||
|
||||
```
|
||||
Source string
|
||||
→ [1] Tokenize (syntax-color/module-lexer)
|
||||
→ [2] Read/Parse → tree of node/atom/wrapper structs
|
||||
→ [3] Realign (fix sexp-comments, quotes)
|
||||
→ [4] Pretty-print (pretty-expressive library, cost-based)
|
||||
→ Formatted string
|
||||
```
|
||||
|
||||
The pretty-printer uses the **Wadler/Leijen optimal layout algorithm** via the
|
||||
`pretty-expressive` library. It evaluates multiple layout alternatives and
|
||||
selects the one with the lowest cost vector.
|
||||
|
||||
### Architecture
|
||||
|
||||
- **`tokenize.rkt`** (72 lines) — Lexer wrapper around Racket's `syntax-color`
|
||||
- **`read.rkt`** (135 lines) — Token stream → tree IR; preserves comments
|
||||
- **`realign.rkt`** (75 lines) — Post-process sexp-comments and quote prefixes
|
||||
- **`conventions.rkt`** (640 lines) — **All formatting rules** for 100+ Racket forms
|
||||
- **`core.rkt`** (167 lines) — `define-pretty` DSL, AST-to-document conversion
|
||||
- **`main.rkt`** (115 lines) — Public API, cost factory, entry point
|
||||
- **`params.rkt`** (38 lines) — Configuration parameters (width, indent, etc.)
|
||||
- **`raco.rkt`** (148 lines) — CLI interface (`raco fmt`)
|
||||
|
||||
### Formatting rules (100+)
|
||||
|
||||
Rules are organised by form type in `conventions.rkt`:
|
||||
|
||||
| Category | Forms |
|
||||
|----------|-------|
|
||||
| Control flow | `if`, `when`, `unless`, `cond`, `case-lambda` |
|
||||
| Definitions | `define`, `define-syntax`, `lambda`, `define/contract` |
|
||||
| Bindings | `let`, `let*`, `letrec`, `parameterize`, `with-handlers` |
|
||||
| Loops | `for`, `for/list`, `for/fold`, `for/hash` (15+ variants) |
|
||||
| Modules | `module`, `begin`, `class`, `interface` |
|
||||
| Macros | `syntax-rules`, `match`, `syntax-parse`, `syntax-case` |
|
||||
| Imports | `require`, `provide` — vertically stacked |
|
||||
|
||||
### Configuration
|
||||
|
||||
**Pluggable formatter maps** — a function `(string? → procedure?)`:
|
||||
|
||||
```racket
|
||||
;; .fmt.rkt
|
||||
(define (the-formatter-map s)
|
||||
(case s
|
||||
[("my-form") (format-uniform-body/helper 4)]
|
||||
[else #f])) ; delegate to standard
|
||||
```
|
||||
|
||||
Formatter maps compose via `compose-formatter-map` (chain of responsibility).
|
||||
|
||||
**Runtime parameters:**
|
||||
|
||||
| Parameter | Default | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| `current-width` | 102 | Page width limit |
|
||||
| `current-limit` | 120 | Computation width limit |
|
||||
| `current-max-blank-lines` | 1 | Max consecutive blank lines |
|
||||
| `current-indent` | 0 | Extra indentation |
|
||||
|
||||
### Cost-based layout selection
|
||||
|
||||
The pretty-printer evaluates layout alternatives using a **3-dimensional cost
|
||||
vector** `[badness, height, characters]`:
|
||||
|
||||
- **Badness** — Quadratic penalty for exceeding page width
|
||||
- **Height** — Number of lines used
|
||||
- **Characters** — Total character count (tiebreaker)
|
||||
|
||||
This means the formatter provably selects the **optimal layout** within the
|
||||
configured width, not just the first one that fits.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Cost-based layout is the gold standard** for formatter quality. Worth
|
||||
investing in an optimal pretty-printer (Wadler/Leijen family) rather than
|
||||
ad-hoc heuristics.
|
||||
- **Staged pipeline** (tokenize → parse → realign → pretty-print) is clean,
|
||||
testable, and easy to reason about. Each stage has well-defined I/O.
|
||||
- **Form-specific formatting rules** (`define-pretty` DSL) — each Scheme
|
||||
special form gets a dedicated formatter. Extensible via user-provided maps.
|
||||
- **Comment preservation as metadata** — Comments are attached to AST nodes, not
|
||||
discarded. Essential for a practical formatter.
|
||||
- **Pattern-based extraction** — `match/extract` identifies which elements can
|
||||
stay inline vs. must be on separate lines. Smart structural analysis.
|
||||
- **Memoisation via weak hash tables** — Performance optimisation for AST
|
||||
traversal without memory leaks.
|
||||
- **Config file convention** — `.fmt.rkt` in project root, auto-discovered. We
|
||||
should do similar (`.gulie.scm` or similar).
|
||||
|
||||
---
|
||||
|
||||
## Kibit
|
||||
|
||||
**Repo:** `refs/kibit/` — Clojure idiom suggester (v0.1.11, by Jonas Enlund)
|
||||
|
||||
### What it does
|
||||
|
||||
A **static code analyser** that identifies non-idiomatic Clojure code and
|
||||
suggests more idiomatic replacements. Example: `(if x y nil)` → `(when x y)`.
|
||||
Supports auto-replacement via `--replace` flag.
|
||||
|
||||
**Status:** Maintenance mode. Authors recommend **Splint** as successor
|
||||
(faster, more extensible).
|
||||
|
||||
### How it works
|
||||
|
||||
```
|
||||
Source file
|
||||
→ Parse with edamame (side-effect-free reader)
|
||||
→ Extract S-expressions
|
||||
→ Tree walk (depth-first via clojure.walk/prewalk)
|
||||
→ Match each node against rules (core.logic unification)
|
||||
→ Simplify (iterative rewriting until fixpoint)
|
||||
→ Report or Replace (via rewrite-clj zippers)
|
||||
```
|
||||
|
||||
The key insight: rules are expressed as **logic programming patterns** using
|
||||
`clojure.core.logic`. Pattern variables (`?x`, `?y`) unify against arbitrary
|
||||
subexpressions.
|
||||
|
||||
### Architecture
|
||||
|
||||
- **`core.clj`** (33 lines) — Core simplification logic (tiny!)
|
||||
- **`check.clj`** (204 lines) — Public API for checking expressions/files
|
||||
- **`check/reader.clj`** (189 lines) — Source parsing with alias tracking
|
||||
- **`rules.clj`** (39 lines) — Rule aggregation and indexing
|
||||
- **`rules/*.clj`** (~153 lines) — Rule definitions by category
|
||||
- **`reporters.clj`** (59 lines) — Output formatters (text, markdown)
|
||||
- **`replace.clj`** (134 lines) — Auto-replacement via rewrite-clj zippers
|
||||
- **`driver.clj`** (144 lines) — CLI entry point, file discovery
|
||||
|
||||
Total: ~1,105 lines. Remarkably compact.
|
||||
|
||||
### Rules (~60)
|
||||
|
||||
Rules are defined via the `defrules` macro:
|
||||
|
||||
```clojure
|
||||
(defrules rules
|
||||
;; Control structures
|
||||
[(if ?x ?y nil) (when ?x ?y)]
|
||||
[(if ?x nil ?y) (when-not ?x ?y)]
|
||||
[(if (not ?x) ?y ?z) (if-not ?x ?y ?z)]
|
||||
[(do ?x) ?x]
|
||||
|
||||
;; Arithmetic
|
||||
[(+ ?x 1) (inc ?x)]
|
||||
[(- ?x 1) (dec ?x)]
|
||||
|
||||
;; Collections
|
||||
[(not (empty? ?x)) (seq ?x)]
|
||||
[(into [] ?coll) (vec ?coll)]
|
||||
|
||||
;; Equality
|
||||
[(= ?x nil) (nil? ?x)]
|
||||
[(= 0 ?x) (zero? ?x)])
|
||||
```
|
||||
|
||||
Categories: **control structures**, **arithmetic**, **collections**,
|
||||
**equality**, **miscellaneous** (string ops, Java interop, threading macros).
|
||||
|
||||
### Auto-replacement
|
||||
|
||||
Uses **rewrite-clj zippers** — functional tree navigation that preserves
|
||||
whitespace, comments, and formatting when applying replacements. Navigate to the
|
||||
target node, swap it, regenerate text.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Logic programming for pattern matching** is beautifully expressive for
|
||||
"suggest X instead of Y" rules. `core.logic` unification makes patterns
|
||||
concise and bidirectional. We could use Guile's pattern matching or even a
|
||||
miniKanren implementation.
|
||||
- **Rule-as-data pattern** — Rules are just vectors `[pattern replacement]`.
|
||||
Easy to add, easy to test, easy for users to contribute.
|
||||
- **Iterative rewriting to fixpoint** — Apply rules until nothing changes.
|
||||
Catches nested patterns that only become apparent after an inner rewrite.
|
||||
- **Zipper-based source rewriting** — Preserves formatting/comments when
|
||||
applying fixes. Critical for auto-fix functionality.
|
||||
- **Side-effect-free parsing** — Using edamame instead of `clojure.core/read`
|
||||
avoids executing reader macros. Important for security and for analysing code
|
||||
with unknown dependencies.
|
||||
- **Guard-based filtering** — Composable predicates that decide whether to
|
||||
report a suggestion. Users can plug in custom guards.
|
||||
- **Two resolution modes** — `:toplevel` (entire defn) vs `:subform` (individual
|
||||
expressions). Different granularity for different use cases.
|
||||
|
||||
---
|
||||
|
||||
## Mallet
|
||||
|
||||
**Repo:** `refs/mallet/` — Common Lisp linter + formatter + fixer (~15,800 LOC)
|
||||
|
||||
### What it does
|
||||
|
||||
A **production-grade linter** for Common Lisp with 40+ rules across 7
|
||||
categories, auto-fixing, a powerful configuration system (presets with
|
||||
inheritance), and multiple suppression mechanisms. Targets SBCL.
|
||||
|
||||
### How it works
|
||||
|
||||
**Three-phase pipeline:**
|
||||
|
||||
```
|
||||
File content
|
||||
→ [1] Tokenize (hand-written tokenizer, preserves all tokens incl. comments)
|
||||
→ [2] Parse (Eclector reader with parse-result protocol → forms with precise positions)
|
||||
→ [3] Rule checking (text rules, token rules, form rules)
|
||||
→ Suppression filtering
|
||||
→ Auto-fix & formatting
|
||||
→ Report
|
||||
```
|
||||
|
||||
**Critical design decision:** Symbols are stored as **strings**, not interned.
|
||||
This means the parser never needs to resolve packages — safe to analyse code
|
||||
with unknown dependencies.
|
||||
|
||||
### Architecture
|
||||
|
||||
| Module | Lines | Purpose |
|
||||
|--------|-------|---------|
|
||||
| `main.lisp` | ~600 | CLI parsing, entry point |
|
||||
| `engine.lisp` | ~900 | Linting orchestration, suppression filtering |
|
||||
| `config.lisp` | ~1,200 | Config files, presets, path-specific overrides |
|
||||
| `parser/reader.lisp` | ~800 | Eclector integration, position tracking |
|
||||
| `parser/tokenizer.lisp` | ~200 | Hand-written tokenizer |
|
||||
| `suppression.lisp` | ~600 | Suppression state management |
|
||||
| `formatter.lisp` | ~400 | Output formatters (text, JSON, line) |
|
||||
| `fixer.lisp` | ~300 | Auto-fix application |
|
||||
| `rules/` | ~5,500 | 40+ individual rule implementations |
|
||||
|
||||
### Rules (40+)
|
||||
|
||||
| Category | Rules | Examples |
|
||||
|----------|-------|----------|
|
||||
| Correctness | 2 | `ecase` with `otherwise`, missing `otherwise` |
|
||||
| Suspicious | 5 | Runtime `eval`, symbol interning, `ignore-errors` |
|
||||
| Practice | 6 | Avoid `:use` in `defpackage`, one package per file |
|
||||
| Cleanliness | 4 | Unused variables, unused loop vars, unused imports |
|
||||
| Style | 5 | `when`/`unless` vs `if` without else, needless `let*` |
|
||||
| Format | 6 | Line length, trailing whitespace, tabs, blank lines |
|
||||
| Metrics | 3 | Function length, cyclomatic complexity, comment ratio |
|
||||
| ASDF | 8 | Component strings, redundant prefixes, secondary systems |
|
||||
| Naming | 4 | `*special*` and `+constant+` conventions |
|
||||
| Documentation | 4 | Missing docstrings (functions, packages, variables) |
|
||||
|
||||
Rules are **classes** inheriting from a base `rule` class with generic methods:
|
||||
|
||||
```lisp
|
||||
(defclass if-without-else-rule (base:rule)
|
||||
()
|
||||
(:default-initargs
|
||||
:name :missing-else
|
||||
:severity :warning
|
||||
:category :style
|
||||
:type :form))
|
||||
|
||||
(defmethod base:check-form ((rule if-without-else-rule) form file)
|
||||
...)
|
||||
```
|
||||
|
||||
### Configuration system
|
||||
|
||||
**Layered presets with inheritance:**
|
||||
|
||||
```lisp
|
||||
(:mallet-config
|
||||
(:extends :strict)
|
||||
(:ignore "**/vendor/**")
|
||||
(:enable :cyclomatic-complexity :max 15)
|
||||
(:disable :function-length)
|
||||
(:set-severity :metrics :info)
|
||||
(:for-paths ("tests")
|
||||
(:enable :line-length :max 120)
|
||||
(:disable :unused-variables)))
|
||||
```
|
||||
|
||||
Built-in presets: `:default`, `:strict`, `:all`, `:none`.
|
||||
|
||||
Precedence: CLI flags > config file > preset inheritance > built-in defaults.
|
||||
|
||||
### Suppression mechanisms (3 levels)
|
||||
|
||||
1. **Declarations** — `#+mallet (declaim (mallet:suppress-next :rule-name))`
|
||||
2. **Inline comments** — `; mallet:suppress rule-name`
|
||||
3. **Region-based** — `; mallet:disable rule-name` / `; mallet:enable rule-name`
|
||||
4. **Stale suppression detection** — Warns when suppressions don't match any violation
|
||||
|
||||
### Auto-fix
|
||||
|
||||
Fixes are collected, sorted bottom-to-top (to preserve line numbers), and
|
||||
applied in a single pass. Fix types: `:replace-line`, `:delete-range`,
|
||||
`:delete-lines`, `:replace-form`.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Symbols as strings** is a crucial insight for Lisp linters. Avoids
|
||||
package/module resolution entirely. We should do the same for Guile — parse
|
||||
symbols without interning them.
|
||||
- **Eclector-style parse-result protocol** — Every sub-expression gets precise
|
||||
line/column info. Invest in this early; it's the foundation of accurate error
|
||||
reporting.
|
||||
- **Three rule types** (text, token, form) — Clean separation. Text rules don't
|
||||
need parsing, token rules don't need a full AST, form rules get the full tree.
|
||||
Efficient and composable.
|
||||
- **Preset inheritance with path-specific overrides** — Powerful configuration
|
||||
that scales from solo projects to monorepos. `:for-paths` is particularly
|
||||
useful (different rules for `src/` vs `tests/`).
|
||||
- **Multiple suppression mechanisms** — Comment-based, declaration-based,
|
||||
region-based. Users need all three for real-world use.
|
||||
- **Stale suppression detection** — Prevents suppression comments from
|
||||
accumulating after the underlying issue is fixed. Brilliant.
|
||||
- **Rule metaclass pattern** — Base class + generic methods scales cleanly to
|
||||
40+ rules. Each rule is self-contained with its own severity, category, and
|
||||
check method.
|
||||
- **Bottom-to-top fix application** — Simple trick that avoids line number
|
||||
invalidation when applying multiple fixes to the same file.
|
||||
|
||||
---
|
||||
|
||||
## OCICL Lint
|
||||
|
||||
**Repo:** `refs/ocicl/` — Common Lisp linter (part of the OCICL package manager)
|
||||
|
||||
### What it does
|
||||
|
||||
A **129-rule linter with auto-fix** for Common Lisp, integrated into the OCICL
|
||||
package manager as a subcommand (`ocicl lint`). Supports dry-run mode,
|
||||
per-line suppression, and `.ocicl-lint.conf` configuration.
|
||||
|
||||
### How it works
|
||||
|
||||
**Three-pass analysis:**
|
||||
|
||||
```
|
||||
File content
|
||||
→ [Pass 1] Line-based rules (text-level: whitespace, tabs, line length)
|
||||
→ [Pass 2] AST-based rules (via rewrite-cl zippers: naming, bindings, packages)
|
||||
→ [Pass 3] Single-pass visitor rules (pattern matching: 50+ checks in one traversal)
|
||||
→ Suppression filtering (per-line ; lint:suppress comments)
|
||||
→ Auto-fix (via fixer registry)
|
||||
→ Report
|
||||
```
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
lint/
|
||||
├── linter.lisp — Main orchestrator, issue aggregation, output formatting
|
||||
├── config.lisp — .ocicl-lint.conf parsing
|
||||
├── parsing.lisp — rewrite-cl wrapper (zipper API)
|
||||
├── fixer.lisp — Auto-fix infrastructure with RCS/backup support
|
||||
├── main.lisp — CLI entry point
|
||||
├── rules/
|
||||
│ ├── line-based.lisp — Text-level rules (9 rules)
|
||||
│ ├── ast.lisp — AST-based rules (naming, lambda lists, bindings)
|
||||
│ └── single-pass.lisp — Pattern matching rules (50+ in one walk)
|
||||
└── fixes/
|
||||
├── whitespace.lisp — Formatting fixes
|
||||
└── style.lisp — Style rule fixes
|
||||
```
|
||||
|
||||
### Rules (129)
|
||||
|
||||
| Category | Count | Examples |
|
||||
|----------|-------|---------|
|
||||
| Formatting | 9 | Trailing whitespace, tabs, line length, blank lines |
|
||||
| File structure | 3 | SPDX headers, package declarations, reader errors |
|
||||
| Naming | 6 | Underscores, `*special*` style, `+constant+` style, vague names |
|
||||
| Boolean/conditionals | 18 | `(IF test T NIL)` → `test`, `(WHEN (NOT x) ...)` → `(UNLESS x ...)` |
|
||||
| Logic simplification | 12 | Flatten nested `AND`/`OR`, redundant conditions |
|
||||
| Arithmetic | 4 | `(+ x 1)` → `(1+ x)`, `(= x 0)` → `(zerop x)` |
|
||||
| List operations | 13 | `FIRST`/`REST` vs `CAR`/`CDR`, `(cons x nil)` → `(list x)` |
|
||||
| Comparison | 5 | `EQL` vs `EQ`, string equality, membership testing |
|
||||
| Sequence operations | 6 | `-IF-NOT` variants, `ASSOC` patterns |
|
||||
| Advanced/safety | 26 | Library suggestions, destructive ops on constants |
|
||||
|
||||
### Configuration
|
||||
|
||||
INI-style `.ocicl-lint.conf`:
|
||||
|
||||
```ini
|
||||
max-line-length = 180
|
||||
suppress-rules = rule1, rule2, rule3
|
||||
suggest-libraries = alexandria, uiop, serapeum
|
||||
```
|
||||
|
||||
Per-line suppression:
|
||||
|
||||
```lisp
|
||||
(some-code) ; lint:suppress rule-name1 rule-name2
|
||||
(other-code) ; lint:suppress ;; suppress ALL rules on this line
|
||||
```
|
||||
|
||||
### Fixer registry
|
||||
|
||||
```lisp
|
||||
(register-fixer "rule-name" #'fixer-function)
|
||||
```
|
||||
|
||||
Fixers are decoupled from rule detection. Each fixer takes `(content issue)` and
|
||||
returns modified content or NIL. Supports RCS backup before modification.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Single-pass visitor for pattern rules** — 50+ pattern checks in one tree
|
||||
traversal. Much faster than running each rule separately. Good model for
|
||||
performance-sensitive linting.
|
||||
- **Quote awareness** — Detects quoted contexts (`'x`, `quote`, backtick) to
|
||||
avoid false positives inside macro templates. We'll need the same for Guile.
|
||||
- **Fixer registry pattern** — Decouples detection from fixing. Easy to add
|
||||
auto-fix for a rule without touching the rule itself.
|
||||
- **Library suggestion rules** — "You could use `(alexandria:when-let ...)`
|
||||
instead of this pattern." Interesting category that could work for Guile
|
||||
(SRFI suggestions, etc.).
|
||||
- **Three-pass architecture** — Line-based first (fastest, no parsing needed),
|
||||
then AST, then pattern matching. Each pass adds cost; skip what you don't need.
|
||||
|
||||
---
|
||||
|
||||
## racket-review
|
||||
|
||||
**Repo:** `refs/racket-review/` — Racket linter (v0.2, by Bogdan Popa)
|
||||
|
||||
### What it does
|
||||
|
||||
A **surface-level linter** for Racket modules. Intentionally does NOT expand
|
||||
macros — analyses syntax only, optimised for **speed**. Designed for tight
|
||||
editor integration (ships with Flycheck for Emacs).
|
||||
|
||||
### How it works
|
||||
|
||||
```
|
||||
File → read-syntax (Racket's built-in reader)
|
||||
→ Validate as module form (#lang)
|
||||
→ Walk syntax tree via syntax-parse
|
||||
→ Track scopes, bindings, provides, usages
|
||||
→ Report problems
|
||||
```
|
||||
|
||||
The entire rule system is built on Racket's `syntax/parse` — pattern matching
|
||||
on syntax objects with guard conditions and side effects.
|
||||
|
||||
### Architecture
|
||||
|
||||
Remarkably compact:
|
||||
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `lint.rkt` | 1,130 | **All linting rules** + semantic tracking |
|
||||
| `problem.rkt` | 26 | Problem data structure |
|
||||
| `cli.rkt` | 25 | CLI interface |
|
||||
| `ext.rkt` | 59 | Extension mechanism |
|
||||
|
||||
### Semantic tracking
|
||||
|
||||
Maintains multiple **parameter-based state machines**:
|
||||
|
||||
- **Scope stack** — Hierarchical scope with parent links, binding hash at each level
|
||||
- **Binding info** — Per-identifier: syntax object, usage count, check flag,
|
||||
related identifiers
|
||||
- **Provide tracking** — What's explicitly `provide`d vs `all-defined-out`
|
||||
- **Punted bindings** — Forward references resolved when definition is encountered
|
||||
- **Savepoints** — Save/restore state for tentative matching in complex patterns
|
||||
|
||||
### Rules
|
||||
|
||||
**Errors (23 patterns):**
|
||||
- Identifier already defined in same scope
|
||||
- `if` missing else branch
|
||||
- `let`/`for` missing body
|
||||
- `case` clauses not quoted literals
|
||||
- Wrong match fallthrough pattern (`_` not `else`)
|
||||
- Provided but not defined
|
||||
|
||||
**Warnings (17+ patterns):**
|
||||
- Identifier never used
|
||||
- Brackets: `let` bindings should use `[]`, not `()`
|
||||
- Requires not sorted (for-syntax first, then alphabetical)
|
||||
- Cond without else clause
|
||||
- Nested if (flatten to cond)
|
||||
- `racket/contract` → use `racket/contract/base`
|
||||
|
||||
### Suppression
|
||||
|
||||
```racket
|
||||
#|review: ignore|# ;; Ignore entire file
|
||||
;; noqa ;; Ignore this line
|
||||
;; review: ignore ;; Ignore this line
|
||||
```
|
||||
|
||||
### Extension mechanism
|
||||
|
||||
Plugins register via Racket's package system:
|
||||
|
||||
```racket
|
||||
(define review-exts
|
||||
'((module-path predicate-proc lint-proc)))
|
||||
```
|
||||
|
||||
Extensions receive a `current-reviewer` parameter with API:
|
||||
`recur`, `track-error!`, `track-warning!`, `track-binding!`, `push-scope!`,
|
||||
`pop-scope!`, `save!`, `undo!`.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Surface-level analysis is fast and useful** — No macro expansion means
|
||||
instant feedback. Catches the majority of real mistakes. Good default for
|
||||
editor integration; deeper analysis can be opt-in.
|
||||
- **syntax-parse as rule DSL** — Pattern matching on syntax objects is a natural
|
||||
fit for Lisp linters. Guile has `syntax-case` and `match` which serve a
|
||||
similar role.
|
||||
- **Scope tracking with punted bindings** — Handles forward references in a
|
||||
single pass. Elegant solution for `letrec`-style bindings and mutual recursion.
|
||||
- **Savepoints for tentative matching** — Save/restore state when the parser
|
||||
enters a complex branch. If the branch fails, roll back. Useful for `cond`,
|
||||
`match`, etc.
|
||||
- **Plugin API via reviewer parameter** — Extensions get a well-defined API
|
||||
surface. Clean contract between core and plugins.
|
||||
- **Snapshot-based testing** — 134 test files with `.rkt`/`.rkt.out` pairs.
|
||||
Lint a file, compare output to expected. Simple, maintainable, high coverage.
|
||||
- **Bracket style enforcement** — Racket uses `[]` for bindings, `()` for
|
||||
application. Guile doesn't have this, but we could enforce consistent bracket
|
||||
usage or other parenthesis conventions.
|
||||
|
||||
---
|
||||
|
||||
## SBLint
|
||||
|
||||
**Repo:** `refs/sblint/` — SBCL compiler-driven linter (~650 LOC)
|
||||
|
||||
### What it does
|
||||
|
||||
A **compiler-assisted linter** for Common Lisp. Doesn't implement its own rules —
|
||||
instead, it **compiles code through SBCL** and surfaces all compiler diagnostics
|
||||
(errors, warnings, style notes) with proper file locations.
|
||||
|
||||
### How it works
|
||||
|
||||
```
|
||||
Source code
|
||||
→ Resolve ASDF dependencies (topological sort)
|
||||
→ Load dependencies via Quicklisp
|
||||
→ Compile project via SBCL (handler-bind captures conditions)
|
||||
→ Extract file/position from compiler internals (Swank protocol)
|
||||
→ Convert byte offset → line:column
|
||||
→ Deduplicate and report
|
||||
```
|
||||
|
||||
No custom parser. No AST. Just the compiler.
|
||||
|
||||
### Architecture
|
||||
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `run-lint.lisp` | 277 | Core logic: lint file/system/directory |
|
||||
| `compiler-aux.lisp` | 33 | SBCL introspection bridge |
|
||||
| `asdf.lisp` | 153 | Dependency resolution graph |
|
||||
| `file-position.lisp` | 18 | Byte offset → line:column conversion |
|
||||
| `quicklisp.lisp` | 41 | Auto-install missing dependencies |
|
||||
| `sblint.ros` | — | CLI entry point (Roswell script) |
|
||||
|
||||
### What it catches
|
||||
|
||||
Whatever SBCL catches:
|
||||
- Undefined variables and functions
|
||||
- Type mismatches (with SBCL's type inference)
|
||||
- Style warnings (ANSI compliance, naming)
|
||||
- Reader/syntax errors
|
||||
- Dead code paths
|
||||
- Unused declarations
|
||||
|
||||
Filters out: redefinition warnings, Quicklisp dependency warnings, SBCL
|
||||
contrib warnings.
|
||||
|
||||
### What we can learn
|
||||
|
||||
- **Leverage the host compiler** — Guile itself has `compile` and can produce
|
||||
warnings. We should capture Guile's own compiler diagnostics (undefined
|
||||
variables, unused imports, etc.) as a baseline — it's "free" accuracy.
|
||||
- **Condition-based error collection** — CL's condition system (≈ Guile's
|
||||
exception/handler system) lets you catch errors without stopping compilation.
|
||||
`handler-bind` continues execution after catching. Guile's `with-exception-handler`
|
||||
can do the same.
|
||||
- **Dependency-aware compilation** — Load dependencies first, then compile
|
||||
project. Catches "symbol not found" errors that surface-level analysis misses.
|
||||
- **Deduplication** — Multiple compilation passes can report the same issue.
|
||||
Hash table dedup is simple and effective.
|
||||
- **Minimal is viable** — 650 LOC total. A compiler-driven linter layer could
|
||||
be our first deliverable, augmented with custom rules later.
|
||||
|
||||
---
|
||||
|
||||
## Cross-cutting themes
|
||||
|
||||
### Parsing strategies
|
||||
|
||||
| Strategy | Used by | Pros | Cons |
|
||||
|----------|---------|------|------|
|
||||
| Host compiler | SBLint, Eastwood | Maximum accuracy, type checking | Requires loading code, slow |
|
||||
| Custom reader with positions | Mallet, fmt | Full control, no side effects | Must maintain parser |
|
||||
| Language's built-in reader | racket-review | Free, well-tested | May lack position info |
|
||||
| Side-effect-free reader lib | Kibit (edamame) | Safe, preserves metadata | External dependency |
|
||||
| Zipper-based AST | OCICL (rewrite-cl) | Preserves formatting for fixes | Complex API |
|
||||
|
||||
**For Guile:** We should explore whether `(ice-9 read)` or Guile's reader
|
||||
provides sufficient source location info. If not, a custom reader (or a reader
|
||||
wrapper that annotates with positions) is needed. Guile's `read-syntax` (if
|
||||
available) or source properties on read forms could be the answer.
|
||||
|
||||
### Rule definition patterns
|
||||
|
||||
| Pattern | Used by | Character |
|
||||
|---------|---------|-----------|
|
||||
| Logic programming (unification) | Kibit | Elegant, concise; slow |
|
||||
| OOP classes + generic methods | Mallet | Scales well, self-contained rules |
|
||||
| Registry maps | Eastwood | Simple, data-driven |
|
||||
| Syntax-parse patterns | racket-review, fmt | Natural for Lisps |
|
||||
| Single-pass visitor | OCICL | High performance |
|
||||
| Compiler conditions | SBLint | Zero-effort, limited scope |
|
||||
|
||||
**For Guile:** A combination seems right — `match`/`syntax-case` patterns for
|
||||
the rule DSL (natural in Scheme), with a registry for rule metadata (name,
|
||||
severity, category, enabled-by-default).
|
||||
|
||||
### Configuration patterns
|
||||
|
||||
| Feature | Mallet | OCICL | Eastwood | Kibit | racket-review | fmt |
|
||||
|---------|--------|-------|----------|-------|---------------|-----|
|
||||
| Config file | `.mallet.lisp` | `.ocicl-lint.conf` | Clojure maps | `project.clj` | - | `.fmt.rkt` |
|
||||
| Presets | Yes (4) | - | - | - | - | - |
|
||||
| Preset inheritance | Yes | - | - | - | - | - |
|
||||
| Path-specific rules | Yes | - | - | - | - | - |
|
||||
| Inline suppression | Yes (3 mechanisms) | Yes | Yes | - | Yes | - |
|
||||
| Stale suppression detection | Yes | - | - | - | - | - |
|
||||
| CLI override | Yes | Yes | Yes | Yes | - | Yes |
|
||||
|
||||
**For Guile:** Mallet's configuration system is the most sophisticated and
|
||||
worth emulating — presets, inheritance, path-specific overrides, and stale
|
||||
suppression detection.
|
||||
|
||||
### Auto-fix patterns
|
||||
|
||||
| Tool | Fix mechanism | Preserves formatting? |
|
||||
|------|--------------|----------------------|
|
||||
| Kibit | rewrite-clj zippers | Yes |
|
||||
| Mallet | Bottom-to-top line replacement | Partial |
|
||||
| OCICL | Fixer registry + zipper AST | Yes |
|
||||
|
||||
**For Guile:** Zipper-based AST manipulation (or Guile's SXML tools) for
|
||||
formatting-preserving fixes. The fixer registry pattern (OCICL) keeps rule
|
||||
detection and fixing decoupled.
|
||||
|
||||
### Output formats
|
||||
|
||||
All tools support at minimum: `file:line:column: severity: message`
|
||||
|
||||
Additional formats: JSON (Mallet), Markdown (Kibit), line-only for CI (Mallet).
|
||||
|
||||
---
|
||||
|
||||
## Feature wishlist for gulie
|
||||
|
||||
Based on this survey, the features worth cherry-picking:
|
||||
|
||||
### Must-have (core)
|
||||
|
||||
1. **Guile compiler diagnostics** — Capture Guile's own warnings as baseline (SBLint approach)
|
||||
2. **Custom reader with source positions** — Every form, subform, and token gets line:column
|
||||
3. **Staged pipeline** — Text rules → token rules → form rules (Mallet/OCICL)
|
||||
4. **Pattern-based rule DSL** — Using Guile's `match` or `syntax-case` (Kibit/racket-review inspiration)
|
||||
5. **Rule registry** — `{name, severity, category, enabled-by-default, check-fn}` (Eastwood)
|
||||
6. **Standard output format** — `file:line:column: severity: rule: message`
|
||||
7. **Inline suppression** — `; gulie:suppress rule-name` (Mallet/OCICL)
|
||||
|
||||
### Should-have (v1)
|
||||
|
||||
8. **Config file** — `.gulie.scm` with presets and rule enable/disable (Mallet)
|
||||
9. **Auto-fix infrastructure** — Fixer registry, bottom-to-top application (OCICL/Mallet)
|
||||
10. **Idiom suggestions** — Pattern → replacement rules (Kibit style)
|
||||
11. **Unused binding detection** — Scope tracking with forward reference handling (racket-review)
|
||||
12. **Quote/unquote awareness** — Don't lint inside quoted forms (OCICL)
|
||||
13. **Snapshot-based testing** — `.scm`/`.expected` pairs (racket-review)
|
||||
|
||||
### Nice-to-have (v2+)
|
||||
|
||||
14. **Code formatter** — Cost-based optimal layout (fmt)
|
||||
15. **Pluggable formatter maps** — Per-form formatting rules (fmt)
|
||||
16. **Path-specific rule overrides** — Different rules for `src/` vs `tests/` (Mallet)
|
||||
17. **Stale suppression detection** (Mallet)
|
||||
18. **Editor integration** — Flycheck/flymake for Emacs (racket-review)
|
||||
19. **Macroexpansion-aware analysis** — Suppress false positives from macro output (Eastwood)
|
||||
20. **Cyclomatic complexity and other metrics** (Mallet)
|
||||
470
docs/PLAN.md
Normal file
470
docs/PLAN.md
Normal file
@@ -0,0 +1,470 @@
|
||||
# Gulie — Guile Linter/Formatter: Architecture & Implementation Plan
|
||||
|
||||
## Context
|
||||
|
||||
No linter, formatter, or static analyser exists for Guile Scheme. We're building
|
||||
one from scratch, called **gulie**. The tool is written in Guile itself, reusing
|
||||
as much of Guile's infrastructure as possible (reader, compiler, Tree-IL
|
||||
analyses, warning system). The design draws on patterns observed in 7 reference
|
||||
tools (see `docs/INSPIRATION.md`).
|
||||
|
||||
Guile 3.0.11 is available in the devenv. No source code exists yet.
|
||||
|
||||
---
|
||||
|
||||
## High-level architecture
|
||||
|
||||
Two independent passes, extensible to three:
|
||||
|
||||
```
|
||||
.gulie.sexp (config)
|
||||
|
|
||||
file.scm ──┬──> [Tokenizer] ──> tokens ──> [CST parser] ──> CST
|
||||
| |
|
||||
| [Pass 1: Surface] line rules + CST rules
|
||||
| |
|
||||
| diagnostics-1
|
||||
|
|
||||
└──> [Guile reader] ──> s-exprs ──> [Guile compiler] ──> Tree-IL
|
||||
|
|
||||
[Pass 2: Semantic] built-in analyses + custom Tree-IL rules
|
||||
|
|
||||
diagnostics-2
|
||||
|
|
||||
[merge + suppress + sort + report/fix]
|
||||
```
|
||||
|
||||
**Why two passes?** Guile's reader (`ice-9/read.scm:949-973`) irrecoverably
|
||||
strips comments, whitespace, and datum comments in `next-non-whitespace`. There
|
||||
is no way to get formatting info AND semantic info from one parse. Accepting this
|
||||
and building two clean, independent passes is simpler than fighting the reader.
|
||||
|
||||
---
|
||||
|
||||
## Module structure
|
||||
|
||||
```
|
||||
gulie/
|
||||
bin/gulie # CLI entry point (executable Guile script)
|
||||
gulie/
|
||||
cli.scm # (gulie cli) — arg parsing, dispatch
|
||||
config.scm # (gulie config) — .gulie.sexp loading, defaults, merging
|
||||
diagnostic.scm # (gulie diagnostic) — record type, sorting, formatting
|
||||
tokenizer.scm # (gulie tokenizer) — hand-written lexer, preserves everything
|
||||
cst.scm # (gulie cst) — token stream → concrete syntax tree
|
||||
compiler.scm # (gulie compiler) — Guile compile wrapper, warning capture
|
||||
rule.scm # (gulie rule) — rule record, registry, define-rule macros
|
||||
engine.scm # (gulie engine) — orchestrator: file discovery, pass sequencing
|
||||
fixer.scm # (gulie fixer) — fix application (bottom-to-top edits)
|
||||
suppression.scm # (gulie suppression) — ; gulie:suppress parsing/filtering
|
||||
formatter.scm # (gulie formatter) — cost-based optimal pretty-printer
|
||||
rules/
|
||||
surface.scm # (gulie rules surface) — trailing-ws, line-length, tabs, blanks
|
||||
indentation.scm # (gulie rules indentation) — indent checking vs CST
|
||||
comments.scm # (gulie rules comments) — comment style conventions
|
||||
semantic.scm # (gulie rules semantic) — wrappers around Guile's analyses
|
||||
idiom.scm # (gulie rules idiom) — pattern-based suggestions via match
|
||||
module-form.scm # (gulie rules module-form) — define-module checks
|
||||
test/
|
||||
test-tokenizer.scm
|
||||
test-cst.scm
|
||||
test-rules-surface.scm
|
||||
test-rules-semantic.scm
|
||||
fixtures/
|
||||
clean/ # .scm files producing zero diagnostics
|
||||
violations/ # .scm + .expected pairs (snapshot testing)
|
||||
```
|
||||
|
||||
~16 source files. Each has one clear job.
|
||||
|
||||
---
|
||||
|
||||
## Key components
|
||||
|
||||
### Tokenizer (`gulie/tokenizer.scm`)
|
||||
|
||||
Hand-written character-by-character state machine. Must handle the same lexical
|
||||
syntax as Guile's reader but **preserve** what the reader discards.
|
||||
|
||||
```scheme
|
||||
(define-record-type <token>
|
||||
(make-token type text line column)
|
||||
token?
|
||||
(type token-type) ;; symbol (see list below)
|
||||
(text token-text) ;; string: exact source text
|
||||
(line token-line) ;; integer: 1-based
|
||||
(column token-column)) ;; integer: 0-based
|
||||
```
|
||||
|
||||
Token types (~15): `open-paren`, `close-paren`, `symbol`, `number`, `string`,
|
||||
`keyword`, `boolean`, `character`, `prefix` (`'`, `` ` ``, `,`, `,@`, `#'`,
|
||||
etc.), `special` (`#;`, `#(`, `#vu8(`, etc.), `line-comment`, `block-comment`,
|
||||
`whitespace`, `newline`, `dot`.
|
||||
|
||||
**Critical invariant:** `(string-concatenate (map token-text (tokenize input)))` must
|
||||
reproduce the original input exactly. This is our primary roundtrip test.
|
||||
|
||||
Estimated size: ~200-250 lines. Reference: Mallet's tokenizer (163 lines CL).
|
||||
|
||||
### CST (`gulie/cst.scm`)
|
||||
|
||||
Trivial parenthesised tree built from the token stream:
|
||||
|
||||
```scheme
|
||||
(define-record-type <cst-node>
|
||||
(make-cst-node open close children)
|
||||
cst-node?
|
||||
(open cst-node-open) ;; <token> for ( [ {
|
||||
(close cst-node-close) ;; <token> for ) ] }
|
||||
(children cst-node-children)) ;; list of <cst-node> | <token>
|
||||
```
|
||||
|
||||
Children is a flat list of interleaved atoms (tokens) and nested nodes. Comments
|
||||
and whitespace are children like anything else.
|
||||
|
||||
The first non-whitespace symbol child of a `<cst-node>` identifies the form
|
||||
(`define`, `let`, `cond`, etc.) — enough for indentation rules.
|
||||
|
||||
Estimated size: ~80-100 lines.
|
||||
|
||||
### Compiler wrapper (`gulie/compiler.scm`)
|
||||
|
||||
Wraps Guile's compile pipeline to capture warnings as structured diagnostics:
|
||||
|
||||
```scheme
|
||||
;; Key Guile APIs we delegate to:
|
||||
;; - (system base compile): read-and-compile, compile, default-warning-level
|
||||
;; - (language tree-il analyze): make-analyzer, analyze-tree
|
||||
;; - (system base message): %warning-types, current-warning-port
|
||||
```
|
||||
|
||||
Strategy: call `read-and-compile` with `#:to 'tree-il` and `#:warning-level 2`
|
||||
while redirecting `current-warning-port` to a string port, then parse the
|
||||
warning output into `<diagnostic>` records. Alternatively, invoke `make-analyzer`
|
||||
directly and hook the warning printers.
|
||||
|
||||
Guile's built-in analyses (all free):
|
||||
- `unused-variable-analysis`
|
||||
- `unused-toplevel-analysis`
|
||||
- `unused-module-analysis`
|
||||
- `shadowed-toplevel-analysis`
|
||||
- `make-use-before-definition-analysis` (unbound variables)
|
||||
- `arity-analysis` (wrong arg count)
|
||||
- `format-analysis` (format string validation)
|
||||
|
||||
### Rule system (`gulie/rule.scm`)
|
||||
|
||||
```scheme
|
||||
(define-record-type <rule>
|
||||
(make-rule name description severity category type check-proc fix-proc)
|
||||
rule?
|
||||
(name rule-name) ;; symbol
|
||||
(description rule-description) ;; string
|
||||
(severity rule-severity) ;; 'error | 'warning | 'info
|
||||
(category rule-category) ;; 'format | 'style | 'correctness | 'idiom
|
||||
(type rule-type) ;; 'line | 'cst | 'tree-il
|
||||
(check-proc rule-check-proc) ;; procedure (signature depends on type)
|
||||
(fix-proc rule-fix-proc)) ;; procedure | #f
|
||||
```
|
||||
|
||||
Three rule types with different check signatures:
|
||||
- **`'line`** — `(lambda (file line-num line-text config) -> diagnostics)` — fastest, no parsing
|
||||
- **`'cst`** — `(lambda (file cst config) -> diagnostics)` — needs tokenizer+CST
|
||||
- **`'tree-il`** — `(lambda (file tree-il env config) -> diagnostics)` — needs compilation
|
||||
|
||||
Global registry: `*rules*` alist, populated at module load time via
|
||||
`register-rule!`. Convenience macros: `define-line-rule`, `define-cst-rule`,
|
||||
`define-tree-il-rule`.
|
||||
|
||||
### Diagnostic record (`gulie/diagnostic.scm`)
|
||||
|
||||
```scheme
|
||||
(define-record-type <diagnostic>
|
||||
(make-diagnostic file line column severity rule message fix)
|
||||
diagnostic?
|
||||
(file diagnostic-file) ;; string
|
||||
(line diagnostic-line) ;; integer, 1-based
|
||||
(column diagnostic-column) ;; integer, 0-based
|
||||
(severity diagnostic-severity) ;; symbol
|
||||
(rule diagnostic-rule) ;; symbol
|
||||
(message diagnostic-message) ;; string
|
||||
(fix diagnostic-fix)) ;; <fix> | #f
|
||||
```
|
||||
|
||||
Standard output: `file:line:column: severity: rule: message`
|
||||
|
||||
### Config (`gulie/config.scm`)
|
||||
|
||||
File: `.gulie.sexp` in project root (plain s-expression, read with `(read)`,
|
||||
never evaluated):
|
||||
|
||||
```scheme
|
||||
((line-length . 80)
|
||||
(indent . 2)
|
||||
(enable trailing-whitespace line-length unused-variable arity-mismatch)
|
||||
(disable tabs)
|
||||
(rules
|
||||
(line-length (max . 100)))
|
||||
(indent-rules
|
||||
(with-syntax . 1)
|
||||
(match . 1))
|
||||
(ignore "build/**" ".direnv/**"))
|
||||
```
|
||||
|
||||
Precedence: CLI flags > config file > built-in defaults.
|
||||
|
||||
`--init` generates a template with all rules listed and commented.
|
||||
|
||||
### Suppression (`gulie/suppression.scm`)
|
||||
|
||||
```scheme
|
||||
;; gulie:suppress trailing-whitespace — suppress on next line
|
||||
(define x "messy")
|
||||
|
||||
(define x "messy") ; gulie:suppress — suppress on this line
|
||||
|
||||
;; gulie:disable line-length — region disable
|
||||
... code ...
|
||||
;; gulie:enable line-length — region enable
|
||||
```
|
||||
|
||||
Parsed from raw text before rules run. Produces a suppression map that filters
|
||||
diagnostics after all rules have emitted.
|
||||
|
||||
---
|
||||
|
||||
## Indentation rules
|
||||
|
||||
The key data is `scheme-indent-function` values from `.dir-locals.el` — an
|
||||
integer N meaning "N arguments on first line, then body indented +2":
|
||||
|
||||
```scheme
|
||||
(define *default-indent-rules*
|
||||
'((define . 1) (define* . 1) (define-public . 1) (define-syntax . 1)
|
||||
(define-module . 0) (lambda . 1) (lambda* . 1)
|
||||
(let . 1) (let* . 1) (letrec . 1) (letrec* . 1)
|
||||
(if . #f) (cond . 0) (case . 1) (when . 1) (unless . 1)
|
||||
(match . 1) (syntax-case . 2) (with-syntax . 1)
|
||||
(begin . 0) (do . 2) (parameterize . 1) (guard . 1)))
|
||||
```
|
||||
|
||||
Overridable via config `indent-rules`. The indentation checker walks the CST,
|
||||
identifies the form by its first symbol child, looks up the rule, and compares
|
||||
actual indentation to expected.
|
||||
|
||||
---
|
||||
|
||||
## Formatting conventions (Guile vs Guix)
|
||||
|
||||
Both use 2-space indent, same special-form conventions. Key difference:
|
||||
- **Guile:** 72-char fill column, `;;; {Section}` headers
|
||||
- **Guix:** 78-80 char fill column, `;;` headers
|
||||
|
||||
Our default config targets Guile conventions. A Guix preset can override
|
||||
`line-length` and comment style.
|
||||
|
||||
---
|
||||
|
||||
## Formatter: cost-based optimal pretty-printing
|
||||
|
||||
The formatter (`gulie/formatter.scm`) is a later-phase component that
|
||||
**rewrites** files with correct layout, as opposed to the indentation checker
|
||||
which merely **reports** violations.
|
||||
|
||||
### Why cost-based?
|
||||
|
||||
When deciding where to break lines in a long expression, there are often multiple
|
||||
valid options. A greedy approach (fill as much as fits, then break) produces
|
||||
mediocre output — it can't "look ahead" to see that a break earlier would produce
|
||||
a better overall layout. The Wadler/Leijen family of algorithms evaluates
|
||||
alternative layouts and selects the optimal one.
|
||||
|
||||
### The algorithm (Wadler/Leijen, as used by fmt's `pretty-expressive`)
|
||||
|
||||
The pretty-printer works with an abstract **document** type:
|
||||
|
||||
```
|
||||
doc = text(string) — literal text
|
||||
| line — line break (or space if flattened)
|
||||
| nest(n, doc) — increase indent by n
|
||||
| concat(doc, doc) — concatenation
|
||||
| alt(doc, doc) — choose better of two layouts
|
||||
| group(doc) — try flat first, break if doesn't fit
|
||||
```
|
||||
|
||||
The key operator is `alt(a, b)` — "try layout A, but if it overflows the page
|
||||
width, use layout B instead." The algorithm evaluates both alternatives and
|
||||
picks the one with the lower **cost vector**:
|
||||
|
||||
```
|
||||
cost = [badness, height, characters]
|
||||
|
||||
badness — quadratic penalty for exceeding page width
|
||||
height — number of lines used
|
||||
characters — total chars (tiebreaker)
|
||||
```
|
||||
|
||||
This produces provably optimal output: the layout that minimises overflow while
|
||||
using the fewest lines.
|
||||
|
||||
### How it fits our architecture
|
||||
|
||||
```
|
||||
CST (from tokenizer + cst.scm)
|
||||
→ [doc generator] convert CST nodes to abstract doc, using form-specific rules
|
||||
→ [layout solver] evaluate alternatives, select optimal layout
|
||||
→ [renderer] emit formatted text with comments preserved
|
||||
```
|
||||
|
||||
The **doc generator** uses the same form-identification logic as the indentation
|
||||
checker (first symbol child of a CST node) to apply form-specific layout rules.
|
||||
For example:
|
||||
|
||||
- `define` — name on first line, body indented
|
||||
- `let` — bindings as aligned block, body indented
|
||||
- `cond` — each clause on its own line
|
||||
|
||||
These rules are data (the `indent-rules` table extended with layout hints),
|
||||
making the formatter configurable just like the checker.
|
||||
|
||||
### Implementation approach
|
||||
|
||||
We can either:
|
||||
1. **Port `pretty-expressive`** from Racket — the core algorithm is ~300 lines,
|
||||
well-documented in academic papers
|
||||
2. **Upgrade Guile's `(ice-9 pretty-print)`** — it already knows form-specific
|
||||
indentation rules but uses greedy layout; we'd replace the layout engine with
|
||||
cost-based selection
|
||||
|
||||
Option 1 is cleaner (purpose-built). Option 2 reuses more existing code but
|
||||
would be a heavier modification. We'll decide when we reach that phase.
|
||||
|
||||
### Phase note
|
||||
|
||||
The formatter is **Phase 6** work. Phases 0-4 deliver a useful checker without
|
||||
it. The indentation checker (Phase 4) validates existing formatting; the
|
||||
formatter (Phase 6) rewrites it. The checker comes first because it's simpler
|
||||
and immediately useful in CI.
|
||||
|
||||
---
|
||||
|
||||
## CLI interface
|
||||
|
||||
```
|
||||
gulie [OPTIONS] [FILE|DIR...]
|
||||
|
||||
--check Report issues, exit non-zero on findings (default)
|
||||
--fix Fix mode: auto-fix what's possible, report the rest
|
||||
--format Format mode: rewrite files with optimal layout
|
||||
--init Generate .gulie.sexp template
|
||||
--pass PASS Run only: surface, semantic, all (default: all)
|
||||
--rule RULE Enable only this rule (repeatable)
|
||||
--disable RULE Disable this rule (repeatable)
|
||||
--severity SEV Minimum severity: error, warning, info
|
||||
--output FORMAT Output: standard (default), json, compact
|
||||
--config FILE Config file path (default: auto-discover)
|
||||
--list-rules List all rules and exit
|
||||
--version Print version
|
||||
```
|
||||
|
||||
Exit codes: 0 = clean, 1 = findings, 2 = config error, 3 = internal error.
|
||||
|
||||
---
|
||||
|
||||
## Implementation phases
|
||||
|
||||
### Phase 0: Skeleton
|
||||
- `bin/gulie` — shebang script, loads CLI module
|
||||
- `(gulie cli)` — basic arg parsing (`--check`, `--version`, file args)
|
||||
- `(gulie diagnostic)` — record type + standard formatter
|
||||
- `(gulie rule)` — record type + registry + `register-rule!`
|
||||
- `(gulie engine)` — discovers `.scm` files, runs line rules, reports
|
||||
- One trivial rule: `trailing-whitespace` (line rule)
|
||||
- **Verification:** `gulie --check some-file.scm` reports trailing whitespace
|
||||
|
||||
### Phase 1: Tokenizer + CST + surface rules
|
||||
- `(gulie tokenizer)` — hand-written lexer
|
||||
- `(gulie cst)` — token → tree
|
||||
- Surface rules: `trailing-whitespace`, `line-length`, `no-tabs`, `blank-lines`
|
||||
- Comment rule: `comment-semicolons` (check `;`/`;;`/`;;;` usage)
|
||||
- Roundtrip test: tokenize → concat = original
|
||||
- Snapshot tests for each rule
|
||||
|
||||
### Phase 2: Semantic rules (compiler pass)
|
||||
- `(gulie compiler)` — `read-and-compile` wrapper, warning capture
|
||||
- Semantic rules wrapping Guile's built-in analyses:
|
||||
`unused-variable`, `unused-toplevel`, `unbound-variable`, `arity-mismatch`,
|
||||
`format-string`, `shadowed-toplevel`, `unused-module`
|
||||
- **Verification:** run against Guile and Guix source files, check false-positive rate
|
||||
|
||||
### Phase 3: Config + suppression
|
||||
- `(gulie config)` — `.gulie.sexp` loading + merging
|
||||
- `(gulie suppression)` — inline comment suppression
|
||||
- `--init` command
|
||||
- Rule enable/disable via config and CLI
|
||||
|
||||
### Phase 4: Indentation checking
|
||||
- `(gulie rules indentation)` — CST-based indent checker
|
||||
- Default indent rules for standard Guile forms
|
||||
- Configurable `indent-rules` in `.gulie.sexp`
|
||||
|
||||
### Phase 5: Fix mode + idiom rules
|
||||
- `(gulie fixer)` — bottom-to-top edit application
|
||||
- Auto-fix for: trailing whitespace, line-length (where possible)
|
||||
- `(gulie rules idiom)` — `match`-based pattern suggestions on Tree-IL
|
||||
- `(gulie rules module-form)` — `define-module` form checks (sorted imports, etc.)
|
||||
|
||||
### Phase 6: Formatter (cost-based optimal layout)
|
||||
- `(gulie formatter)` — Wadler/Leijen pretty-printer with cost-based selection
|
||||
- Abstract document type: `text`, `line`, `nest`, `concat`, `alt`, `group`
|
||||
- Form-specific layout rules (reuse indent-rules table + layout hints)
|
||||
- Comment preservation through formatting
|
||||
- `--format` CLI mode
|
||||
- **Verification:** format Guile/Guix source files, diff against originals,
|
||||
verify roundtrip stability (format twice = same output)
|
||||
|
||||
### Phase 7: Cross-module analysis (future)
|
||||
- Load multiple modules, walk dependency graph
|
||||
- Unused exports, cross-module arity checks
|
||||
- `--pass cross-module` CLI option
|
||||
|
||||
---
|
||||
|
||||
## Testing strategy
|
||||
|
||||
1. **Roundtrip test** (tokenizer): tokenize → concat must equal original input
|
||||
2. **Snapshot tests**: `fixtures/violations/rule-name.scm` + `.expected` pairs
|
||||
3. **Clean file tests**: `fixtures/clean/*.scm` must produce zero diagnostics
|
||||
4. **Unit tests**: `(srfi srfi-64)` for tokenizer, CST, config, diagnostics
|
||||
5. **Real-world corpus**: run against `test/guix/` and `refs/guile/module/` for
|
||||
false-positive rate validation
|
||||
6. **Formatter idempotency**: `format(format(x)) = format(x)` for all test files
|
||||
|
||||
---
|
||||
|
||||
## Key design decisions
|
||||
|
||||
| Decision | Rationale |
|
||||
|----------|-----------|
|
||||
| Hand-written tokenizer, not extending Guile's reader | The reader is ~1000 lines of nested closures not designed for extension. A clean 200-line tokenizer is easier to write/test. |
|
||||
| Two independent passes, not a unified AST | Reader strips comments irrecoverably. Accepting this gives clean separation. |
|
||||
| Delegate to Guile's built-in analyses | They're battle-tested, handle macroexpansion edge cases, and are maintained upstream. |
|
||||
| `(ice-9 match)` for idiom rules, not logic programming | Built-in, fast, sufficient. miniKanren can be added later if needed. |
|
||||
| S-expression config, not YAML/TOML | Zero deps. Our users write Scheme. `(read)` does the parsing. |
|
||||
| Flat CST (parens + interleaved tokens), not rich AST | Enough for indentation/formatting checks. No overengineering. |
|
||||
| Cost-based optimal layout for the formatter | Greedy formatters produce mediocre output. Wadler/Leijen is cleaner and provably correct. Worth the investment when we reach that phase. |
|
||||
| Checker first, formatter later | Checking is simpler, immediately useful in CI, and validates the tokenizer/CST infrastructure that the formatter will build on. |
|
||||
|
||||
---
|
||||
|
||||
## Critical files to reference during implementation
|
||||
|
||||
- `refs/guile/module/ice-9/read.scm:949-973` — what the reader discards (our tokenizer must keep)
|
||||
- `refs/guile/module/language/tree-il/analyze.scm:1461-1479` — `make-analyzer` API
|
||||
- `refs/guile/module/system/base/compile.scm:298-340` — `read-and-compile` / `compile`
|
||||
- `refs/guile/module/system/base/message.scm:83-220` — `%warning-types` definitions
|
||||
- `refs/guile/module/language/tree-il.scm` — Tree-IL node types and traversal
|
||||
- `refs/guile/module/ice-9/pretty-print.scm` — existing pretty-printer (form-specific rules to extract)
|
||||
- `refs/mallet/src/parser/tokenizer.lisp` — reference tokenizer (163 lines)
|
||||
- `refs/fmt/conventions.rkt` — form-specific formatting rules (100+ forms)
|
||||
- `refs/fmt/main.rkt` — cost-based layout selection implementation
|
||||
Reference in New Issue
Block a user