9.4 KiB
Tribes local-control API
The local-control broker is a small Guile daemon listening on a Unix-domain socket. It fronts every operator action that a Tribes deployment can take on its own host:
- resolve a
SystemTargetinto a build plan. - prepare a build (pull channels +
guix system build) without activating it. - commit a previously-prepared generation (
guix system switch-generation). - rollback to a retained store path or, failing that, rebuild from a plan and switch.
- abort an in-flight job.
- inspect status and generations.
This document specifies the wire schema. The BEAM client at
tribes/lib/tribes/local_control.ex should be updated to match it.
Transport
- HTTP/1.1 over a Unix-domain socket. The path is configurable via
TRIBES_LOCAL_CONTROL_SOCKET(default/var/run/tribes/local-control.sock). - Permissions: socket owned by
root:tribes, mode0660. - Request bodies are JSON (
Content-Type: application/json). - Responses are JSON.
Concurrency model
The broker runs a single POSIX worker thread. The HTTP request thread is
never blocked on a long-running Guix call: any operation that may exceed
about a second (prepare, commit, rollback) is enqueued on the worker
and returns 202 Accepted immediately. The caller then polls
GET /v1/deployment/status for completion.
There is at most one job in flight at any time. A new submission with the
same plan_hash as the running job is idempotent: the broker returns
the in-flight snapshot rather than queuing a duplicate. A submission with a
different plan_hash while another job runs returns 409 busy.
Endpoints
GET /v1/deployment and GET /v1/deployment/status
Returns a status snapshot. Polling interval recommendation: 1 s during an active job, with linear back-off to 5 s after the first minute of polling.
Snapshot fields:
schemaVersion— string, currently"2".ok— boolean.status— high-level state. One of:idle | queued | running | pulling | building | switching | completed | failed | aborted.phase— fine-grained phase identical tostatusfor in-flight jobs;readyafter a successfulprepare,activeafter a successfulcommit/rollback.job_id— opaque identifier of the in-flight or last-completed job."job-N"where N is monotonic for the broker process lifetime.plan_hash— the plan hash this job is operating on.started_at,last_event_at— RFC 3339 timestamps.store_path— the deployment target's/gnu/store/...-systempath: the prepared store path afterprepare, or the selected profile store path aftercommit/rollback.selectedSystem— canonical/gnu/store/...-systempath currently selected by/var/guix/profiles/system.runningSystem— canonical/gnu/store/...-systempath currently exposed by/run/current-system.generation_number— the system profile generation number.gc_pinned— boolean.truewhen the broker holds a GC root via--root=so the prepared system is not collected before acommit.built_at,activated_at— RFC 3339 timestamps when present.code— typed error code on failure (see Error taxonomy).reason— human-readable error message on failure.plugins— array of plugin names in the deployed plan.
GET /v1/deployment/generations
Returns the list of recorded generations in newest-first order. Each entry:
{
"store_path": "/gnu/store/...-system",
"generation_number": 42,
"plan_hash": "plan-abcd...",
"status": "active" | "ready" | "superseded",
"gc_pinned": true,
"built_at": "2026-04-25T13:01:02Z",
"activated_at": "2026-04-25T13:01:42Z"
}
POST /v1/deployment/resolve
Synchronous. Body: a SystemTarget JSON object. Response:
200with{ "schemaVersion": "2", "ok": true, "plan": { ... } }on success. Theplanobject includes aplan_hashand is suitable for feeding intoprepare.409with the resolver error envelope on capability/manifest/trust failures.
POST /v1/deployment/prepare
Asynchronous. Body: a plan object containing plan_hash and
resolved_plugins.
202with{ "schemaVersion": "2", "status": "queued", "job_id": "...", "plan_hash": "...", "started_at": "..." }on accept (or on idempotent re-submit of the running job).409with{ "ok": false, "status": "busy", "reason": "deployment already in progress", "job_id": "...", "plan_hash": "...", ... }when another plan is already in flight.400on validation error.
The job pulls channels, runs guix system build --root=..., registers the
resulting GC root, and records a ready generation. The final snapshot is
visible at GET /v1/deployment/status.
POST /v1/deployment/commit
Asynchronous. Body: { "plan_hash": "..." }.
202on accept. The job switches the system profile to the previously-prepared generation, then re-runs activation and Guix's normal Shepherd service-definition upgrade step inside the pulled/current Guix profile used for the prepare build. Activation runs withGUIX_NEW_SYSTEMset to the selected generation so/run/current-systemfollows the profile. Like upstreamguix system reconfigure, this does not imply that every already-running service process was restarted. Tribes may then schedule an asynchronoustribesservice restart as part of higher-level rollout convergence, whiletribes-local-controlself-update remains a separate deferred concern. On success the snapshot reachesphase: "active"withstatus: "completed".409if no generation is prepared for thatplan_hash. The snapshot's error code isgeneration_not_prepared.409 busyif another job is in flight.
POST /v1/deployment/rollback
Asynchronous. Body:
{
"store_path": "/gnu/store/...-system",
"plan": { ...optional fallback plan... }
}
The broker walks these cases in order:
- The requested
store_pathis the selected system → just record the activation, no build, no switch. - We have a recorded local-control generation number for that
store_path→ switch to it directly. - The
store_pathappears in Guix's system profile links (/var/guix/profiles/system-*-link), even if local-control did not record it → switch to that profile generation directly. This covers the installed baseline generation used by emergency/public rollback. - The store path is gone but
planis supplied → re-prepare and commit.
If none apply the snapshot reports code: "rollback_infeasible".
Current limitation: rollback does not run core/plugin down migrations. The
public Tribes admin rollback flow currently omits the fallback plan on
purpose so explicit rollback to a baseline generation cannot replay the rollout
being rolled back.
POST /v1/deployment/abort
Synchronous. Marks the in-flight job as aborted and writes a snapshot with
status: "aborted". (v1: does not yet SIGTERM a running helper subprocess —
the operation completes when the helper next checks back in.)
Error taxonomy
Every failed operation returns a code matching one of these tokens:
channel_untrusted— channel references a signer not in theTrustedSignertable.signature_invalid— a channel's commit signature failed verification.channel_commit_unreachable— the configured commit cannot be fetched from the channel URL.missing_capability— a plugin requires a capability that no other plugin provides.capability_cycle— the plugin capability graph contains a cycle.duplicate_plugin— the system target lists the same plugin twice.manifest_invalid— a requested plugin name is unknown to the channel registry.host_api_mismatch— the resolved plan needs a host API version the node cannot honour.migration_target_conflict— two plugins disagree about a migration target version.build_failed—guix system buildreturned non-zero.switch_failed—guix system switch-generationreturned non-zero.rollback_infeasible— the broker cannot reach the requested store path by either retained generation or rebuild.helper_crashed—tribes-guix-helperexited without emitting a structured terminal frame.busy— another job is in flight; the request was rejected.invalid_request— payload missed a required field or violated a limit.
Helper protocol (internal)
The broker spawns tribes-guix-helper for every long operation and parses
its stdout as NDJSON. The helper emits one of:
{"event":"phase","phase":"pulling","ts":"..."}
{"event":"phase","phase":"building","ts":"...","derivation":"/gnu/store/..."}
{"event":"done","store_path":"/gnu/store/...","generation_number":42,"ts":"..."}
{"event":"error","code":"channel_commit_unreachable","message":"...","details":{...},"ts":"..."}
The broker uses the last event: "phase" frame to update its snapshot in
real time, and the final done or error frame to compute the operation
result. If the helper exits without a terminal frame the broker synthesizes
{ "code": "helper_crashed", "details": { "exit_status": N, "signal": S } }.
This protocol is not part of the public API; it exists so the broker can
stay small while still surfacing typed errors instead of regex-parsing
guix stderr.