# Tribes local-control API The local-control broker is a small Guile daemon listening on a Unix-domain socket. It fronts every operator action that a Tribes deployment can take on its own host: - **resolve** a `SystemTarget` into a build plan. - **prepare** a build (pull channels + `guix system build`) without activating it. - **commit** a previously-prepared generation (`guix system switch-generation`). - **rollback** to a retained store path or, failing that, rebuild from a plan and switch. - **abort** an in-flight job. - inspect **status** and **generations**. This document specifies the wire schema. The BEAM client at `tribes/lib/tribes/local_control.ex` should be updated to match it. ## Transport - HTTP/1.1 over a Unix-domain socket. The path is configurable via `TRIBES_LOCAL_CONTROL_SOCKET` (default `/var/run/tribes/local-control.sock`). - Permissions: socket owned by `root:tribes`, mode `0660`. - Request bodies are JSON (`Content-Type: application/json`). - Responses are JSON. ## Concurrency model The broker runs a single POSIX worker thread. The HTTP request thread is never blocked on a long-running Guix call: any operation that may exceed about a second (`prepare`, `commit`, `rollback`) is enqueued on the worker and returns `202 Accepted` immediately. The caller then polls `GET /v1/deployment/status` for completion. There is at most one job in flight at any time. A new submission with the same `plan_hash` as the running job is **idempotent**: the broker returns the in-flight snapshot rather than queuing a duplicate. A submission with a different `plan_hash` while another job runs returns `409 busy`. ## Endpoints ### `GET /v1/deployment` and `GET /v1/deployment/status` Returns a status snapshot. Polling interval recommendation: 1 s during an active job, with linear back-off to 5 s after the first minute of polling. Snapshot fields: - `schemaVersion` — string, currently `"2"`. - `ok` — boolean. - `status` — high-level state. One of: `idle | queued | running | pulling | building | switching | completed | failed | aborted`. - `phase` — fine-grained phase identical to `status` for in-flight jobs; `ready` after a successful `prepare`, `active` after a successful `commit`/`rollback`. - `job_id` — opaque identifier of the in-flight or last-completed job. `"job-N"` where N is monotonic for the broker process lifetime. - `plan_hash` — the plan hash this job is operating on. - `started_at`, `last_event_at` — RFC 3339 timestamps. - `store_path` — the deployment target's `/gnu/store/...-system` path: the prepared store path after `prepare`, or the selected profile store path after `commit`/`rollback`. - `selectedSystem` — canonical `/gnu/store/...-system` path currently selected by `/var/guix/profiles/system`. - `runningSystem` — canonical `/gnu/store/...-system` path currently exposed by `/run/current-system`. - `generation_number` — the system profile generation number. - `gc_pinned` — boolean. `true` when the broker holds a GC root via `--root=` so the prepared system is not collected before a `commit`. - `built_at`, `activated_at` — RFC 3339 timestamps when present. - `code` — typed error code on failure (see *Error taxonomy*). - `reason` — human-readable error message on failure. - `plugins` — array of plugin names in the deployed plan. ### `GET /v1/deployment/generations` Returns the list of recorded generations in newest-first order. Each entry: ```json { "store_path": "/gnu/store/...-system", "generation_number": 42, "plan_hash": "plan-abcd...", "status": "active" | "ready" | "superseded", "gc_pinned": true, "built_at": "2026-04-25T13:01:02Z", "activated_at": "2026-04-25T13:01:42Z" } ``` ### `POST /v1/deployment/resolve` Synchronous. Body: a `SystemTarget` JSON object. Response: - `200` with `{ "schemaVersion": "2", "ok": true, "plan": { ... } }` on success. The `plan` object includes a `plan_hash` and is suitable for feeding into `prepare`. - `409` with the resolver error envelope on capability/manifest/trust failures. ### `POST /v1/deployment/prepare` Asynchronous. Body: a plan object containing `plan_hash` and `resolved_plugins`. - `202` with `{ "schemaVersion": "2", "status": "queued", "job_id": "...", "plan_hash": "...", "started_at": "..." }` on accept (or on idempotent re-submit of the running job). - `409` with `{ "ok": false, "status": "busy", "reason": "deployment already in progress", "job_id": "...", "plan_hash": "...", ... }` when another plan is already in flight. - `400` on validation error. The job pulls channels, runs `guix system build --root=...`, registers the resulting GC root, and records a `ready` generation. The final snapshot is visible at `GET /v1/deployment/status`. ### `POST /v1/deployment/commit` Asynchronous. Body: `{ "plan_hash": "..." }`. - `202` on accept. The job switches the system profile to the previously-prepared generation, then re-runs activation and Guix's normal Shepherd service-definition upgrade step inside the pulled/current Guix profile used for the prepare build. Activation runs with `GUIX_NEW_SYSTEM` set to the selected generation so `/run/current-system` follows the profile. Like upstream `guix system reconfigure`, this does not imply that every already-running service process was restarted. Tribes may then schedule an asynchronous `tribes` service restart as part of higher-level rollout convergence, while `tribes-local-control` self-update remains a separate deferred concern. On success the snapshot reaches `phase: "active"` with `status: "completed"`. - `409` if no generation is prepared for that `plan_hash`. The snapshot's error code is `generation_not_prepared`. - `409 busy` if another job is in flight. ### `POST /v1/deployment/rollback` Asynchronous. Body: ```json { "store_path": "/gnu/store/...-system", "plan": { ...optional fallback plan... } } ``` The broker walks these cases in order: 1. The requested `store_path` is the selected system → just record the activation, no build, no switch. 2. We have a recorded local-control generation number for that `store_path` → switch to it directly. 3. The `store_path` appears in Guix's system profile links (`/var/guix/profiles/system-*-link`), even if local-control did not record it → switch to that profile generation directly. This covers the installed baseline generation used by emergency/public rollback. 4. The store path is gone but `plan` is supplied → re-prepare and commit. If none apply the snapshot reports `code: "rollback_infeasible"`. Current limitation: rollback does not run core/plugin down migrations. The public Tribes admin rollback flow currently omits the fallback `plan` on purpose so explicit rollback to a baseline generation cannot replay the rollout being rolled back. ### `POST /v1/deployment/abort` Synchronous. Marks the in-flight job as aborted and writes a snapshot with `status: "aborted"`. (v1: does not yet SIGTERM a running helper subprocess — the operation completes when the helper next checks back in.) ## Error taxonomy Every failed operation returns a `code` matching one of these tokens: - `channel_untrusted` — channel references a signer not in the `TrustedSigner` table. - `signature_invalid` — a channel's commit signature failed verification. - `channel_commit_unreachable` — the configured commit cannot be fetched from the channel URL. - `missing_capability` — a plugin requires a capability that no other plugin provides. - `capability_cycle` — the plugin capability graph contains a cycle. - `duplicate_plugin` — the system target lists the same plugin twice. - `manifest_invalid` — a requested plugin name is unknown to the channel registry. - `host_api_mismatch` — the resolved plan needs a host API version the node cannot honour. - `migration_target_conflict` — two plugins disagree about a migration target version. - `build_failed` — `guix system build` returned non-zero. - `switch_failed` — `guix system switch-generation` returned non-zero. - `rollback_infeasible` — the broker cannot reach the requested store path by either retained generation or rebuild. - `helper_crashed` — `tribes-guix-helper` exited without emitting a structured terminal frame. - `busy` — another job is in flight; the request was rejected. - `invalid_request` — payload missed a required field or violated a limit. ## Helper protocol (internal) The broker spawns `tribes-guix-helper` for every long operation and parses its stdout as NDJSON. The helper emits one of: ```json {"event":"phase","phase":"pulling","ts":"..."} {"event":"phase","phase":"building","ts":"...","derivation":"/gnu/store/..."} {"event":"done","store_path":"/gnu/store/...","generation_number":42,"ts":"..."} {"event":"error","code":"channel_commit_unreachable","message":"...","details":{...},"ts":"..."} ``` The broker uses the last `event: "phase"` frame to update its snapshot in real time, and the final `done` or `error` frame to compute the operation result. If the helper exits without a terminal frame the broker synthesizes `{ "code": "helper_crashed", "details": { "exit_status": N, "signal": S } }`. This protocol is not part of the public API; it exists so the broker can stay small while still surfacing typed errors instead of regex-parsing `guix` stderr.