Files
guix-tribes/docs/LOCAL_CONTROL_API.md

9.4 KiB

Tribes local-control API

The local-control broker is a small Guile daemon listening on a Unix-domain socket. It fronts every operator action that a Tribes deployment can take on its own host:

  • resolve a SystemTarget into a build plan.
  • prepare a build (pull channels + guix system build) without activating it.
  • commit a previously-prepared generation (guix system switch-generation).
  • rollback to a retained store path or, failing that, rebuild from a plan and switch.
  • abort an in-flight job.
  • inspect status and generations.

This document specifies the wire schema. The BEAM client at tribes/lib/tribes/local_control.ex should be updated to match it.

Transport

  • HTTP/1.1 over a Unix-domain socket. The path is configurable via TRIBES_LOCAL_CONTROL_SOCKET (default /var/run/tribes/local-control.sock).
  • Permissions: socket owned by root:tribes, mode 0660.
  • Request bodies are JSON (Content-Type: application/json).
  • Responses are JSON.

Concurrency model

The broker runs a single POSIX worker thread. The HTTP request thread is never blocked on a long-running Guix call: any operation that may exceed about a second (prepare, commit, rollback) is enqueued on the worker and returns 202 Accepted immediately. The caller then polls GET /v1/deployment/status for completion.

There is at most one job in flight at any time. A new submission with the same plan_hash as the running job is idempotent: the broker returns the in-flight snapshot rather than queuing a duplicate. A submission with a different plan_hash while another job runs returns 409 busy.

Endpoints

GET /v1/deployment and GET /v1/deployment/status

Returns a status snapshot. Polling interval recommendation: 1 s during an active job, with linear back-off to 5 s after the first minute of polling.

Snapshot fields:

  • schemaVersion — string, currently "2".
  • ok — boolean.
  • status — high-level state. One of: idle | queued | running | pulling | building | switching | completed | failed | aborted.
  • phase — fine-grained phase identical to status for in-flight jobs; ready after a successful prepare, active after a successful commit/rollback.
  • job_id — opaque identifier of the in-flight or last-completed job. "job-N" where N is monotonic for the broker process lifetime.
  • plan_hash — the plan hash this job is operating on.
  • started_at, last_event_at — RFC 3339 timestamps.
  • store_path — the deployment target's /gnu/store/...-system path: the prepared store path after prepare, or the selected profile store path after commit/rollback.
  • selectedSystem — canonical /gnu/store/...-system path currently selected by /var/guix/profiles/system.
  • runningSystem — canonical /gnu/store/...-system path currently exposed by /run/current-system.
  • generation_number — the system profile generation number.
  • gc_pinned — boolean. true when the broker holds a GC root via --root= so the prepared system is not collected before a commit.
  • built_at, activated_at — RFC 3339 timestamps when present.
  • code — typed error code on failure (see Error taxonomy).
  • reason — human-readable error message on failure.
  • plugins — array of plugin names in the deployed plan.

GET /v1/deployment/generations

Returns the list of recorded generations in newest-first order. Each entry:

{
  "store_path": "/gnu/store/...-system",
  "generation_number": 42,
  "plan_hash": "plan-abcd...",
  "status": "active" | "ready" | "superseded",
  "gc_pinned": true,
  "built_at": "2026-04-25T13:01:02Z",
  "activated_at": "2026-04-25T13:01:42Z"
}

POST /v1/deployment/resolve

Synchronous. Body: a SystemTarget JSON object. Response:

  • 200 with { "schemaVersion": "2", "ok": true, "plan": { ... } } on success. The plan object includes a plan_hash and is suitable for feeding into prepare.
  • 409 with the resolver error envelope on capability/manifest/trust failures.

POST /v1/deployment/prepare

Asynchronous. Body: a plan object containing plan_hash and resolved_plugins.

  • 202 with { "schemaVersion": "2", "status": "queued", "job_id": "...", "plan_hash": "...", "started_at": "..." } on accept (or on idempotent re-submit of the running job).
  • 409 with { "ok": false, "status": "busy", "reason": "deployment already in progress", "job_id": "...", "plan_hash": "...", ... } when another plan is already in flight.
  • 400 on validation error.

The job pulls channels, runs guix system build --root=..., registers the resulting GC root, and records a ready generation. The final snapshot is visible at GET /v1/deployment/status.

POST /v1/deployment/commit

Asynchronous. Body: { "plan_hash": "..." }.

  • 202 on accept. The job switches the system profile to the previously-prepared generation, then re-runs activation and Guix's normal Shepherd service-definition upgrade step inside the pulled/current Guix profile used for the prepare build. Activation runs with GUIX_NEW_SYSTEM set to the selected generation so /run/current-system follows the profile. Like upstream guix system reconfigure, this does not imply that every already-running service process was restarted. Tribes may then schedule an asynchronous tribes service restart as part of higher-level rollout convergence, while tribes-local-control self-update remains a separate deferred concern. On success the snapshot reaches phase: "active" with status: "completed".
  • 409 if no generation is prepared for that plan_hash. The snapshot's error code is generation_not_prepared.
  • 409 busy if another job is in flight.

POST /v1/deployment/rollback

Asynchronous. Body:

{
  "store_path": "/gnu/store/...-system",
  "plan": { ...optional fallback plan... }
}

The broker walks these cases in order:

  1. The requested store_path is the selected system → just record the activation, no build, no switch.
  2. We have a recorded local-control generation number for that store_path → switch to it directly.
  3. The store_path appears in Guix's system profile links (/var/guix/profiles/system-*-link), even if local-control did not record it → switch to that profile generation directly. This covers the installed baseline generation used by emergency/public rollback.
  4. The store path is gone but plan is supplied → re-prepare and commit.

If none apply the snapshot reports code: "rollback_infeasible".

Current limitation: rollback does not run core/plugin down migrations. The public Tribes admin rollback flow currently omits the fallback plan on purpose so explicit rollback to a baseline generation cannot replay the rollout being rolled back.

POST /v1/deployment/abort

Synchronous. Marks the in-flight job as aborted and writes a snapshot with status: "aborted". (v1: does not yet SIGTERM a running helper subprocess — the operation completes when the helper next checks back in.)

Error taxonomy

Every failed operation returns a code matching one of these tokens:

  • channel_untrusted — channel references a signer not in the TrustedSigner table.
  • signature_invalid — a channel's commit signature failed verification.
  • channel_commit_unreachable — the configured commit cannot be fetched from the channel URL.
  • missing_capability — a plugin requires a capability that no other plugin provides.
  • capability_cycle — the plugin capability graph contains a cycle.
  • duplicate_plugin — the system target lists the same plugin twice.
  • manifest_invalid — a requested plugin name is unknown to the channel registry.
  • host_api_mismatch — the resolved plan needs a host API version the node cannot honour.
  • migration_target_conflict — two plugins disagree about a migration target version.
  • build_failedguix system build returned non-zero.
  • switch_failedguix system switch-generation returned non-zero.
  • rollback_infeasible — the broker cannot reach the requested store path by either retained generation or rebuild.
  • helper_crashedtribes-guix-helper exited without emitting a structured terminal frame.
  • busy — another job is in flight; the request was rejected.
  • invalid_request — payload missed a required field or violated a limit.

Helper protocol (internal)

The broker spawns tribes-guix-helper for every long operation and parses its stdout as NDJSON. The helper emits one of:

{"event":"phase","phase":"pulling","ts":"..."}
{"event":"phase","phase":"building","ts":"...","derivation":"/gnu/store/..."}
{"event":"done","store_path":"/gnu/store/...","generation_number":42,"ts":"..."}
{"event":"error","code":"channel_commit_unreachable","message":"...","details":{...},"ts":"..."}

The broker uses the last event: "phase" frame to update its snapshot in real time, and the final done or error frame to compute the operation result. If the helper exits without a terminal frame the broker synthesizes { "code": "helper_crashed", "details": { "exit_status": N, "signal": S } }.

This protocol is not part of the public API; it exists so the broker can stay small while still surfacing typed errors instead of regex-parsing guix stderr.