Files
guix-tribes/docs/LOCAL_CONTROL_API.md
self 7dec823794 chore: sync supertest dev channel to master
Source: guix-tribes master 2ea4cae872
Base: previous supertest-dev 4fee530b68
Mode: tree sync, preserving dev channel authorization
2026-06-08 08:02:39 +02:00

13 KiB

Tribes local-control API

The local-control broker is a small Guile daemon listening on a Unix-domain socket. It fronts every operator action that a Tribes deployment can take on its own host:

  • resolve a SystemTarget into a build plan.
  • prepare a build (pull channels + guix system build) without activating it.
  • commit a previously-prepared generation (guix system switch-generation).
  • rollback to a retained store path or, failing that, rebuild from a plan and switch.
  • abort an in-flight job.
  • discover channel update candidates from Guix's existing Git checkouts.
  • inspect status and generations.

This document specifies the wire schema. The BEAM client at tribes/lib/tribes/local_control.ex should be updated to match it.

Transport

  • HTTP/1.1 over a Unix-domain socket. The path is configurable via TRIBES_LOCAL_CONTROL_SOCKET (default /var/run/tribes/local-control.sock).
  • Permissions: socket owned by root:tribes, mode 0660.
  • Request bodies are JSON (Content-Type: application/json).
  • Responses are JSON.

Concurrency model

The broker runs a single POSIX worker thread. The HTTP request thread is never blocked on a long-running Guix call: any operation that may exceed about a second (prepare, commit, rollback) is enqueued on the worker and returns 202 Accepted immediately. The caller then polls GET /v1/deployment/status for completion.

There is at most one job in flight at any time. A new submission with the same plan_hash as the running job is idempotent: the broker returns the in-flight snapshot rather than queuing a duplicate. A submission with a different plan_hash while another job runs returns 409 busy.

Endpoints

GET /v1/deployment and GET /v1/deployment/status

Returns a status snapshot. Polling interval recommendation: 1 s during an active job, with linear back-off to 5 s after the first minute of polling.

Snapshot fields:

  • schemaVersion — string, currently "2".
  • ok — boolean.
  • status — high-level state. One of: idle | queued | running | pulling | building | switching | completed | failed | aborted.
  • phase — fine-grained phase identical to status for in-flight jobs; ready after a successful prepare, active after a successful commit/rollback.
  • job_id — opaque identifier of the in-flight or last-completed job. "job-N" where N is monotonic for the broker process lifetime.
  • plan_hash — the plan hash this job is operating on.
  • started_at, last_event_at — RFC 3339 timestamps.
  • store_path — the deployment target's /gnu/store/...-system path: the prepared store path after prepare, or the selected profile store path after commit/rollback.
  • selectedSystem — canonical /gnu/store/...-system path currently selected by /var/guix/profiles/system.
  • runningSystem — canonical /gnu/store/...-system path currently exposed by /run/current-system.
  • generation_number — the system profile generation number.
  • gc_pinned — boolean. true when the broker holds a GC root via --root= so the prepared system is not collected before a commit.
  • built_at, activated_at — RFC 3339 timestamps when present.
  • code — typed error code on failure (see Error taxonomy).
  • reason — human-readable error message on failure.
  • plugins — array of plugin names in the deployed plan.

GET /v1/deployment/generations

Returns the current system channel provenance plus the list of recorded generations in newest-first order. The top-level current_channels field is parsed from /run/current-system/channels.scm when present and lets callers identify the initial installed channel pins before local-control has prepared its first generation.

Each generation entry:

{
  "store_path": "/gnu/store/...-system",
  "generation_number": 42,
  "plan_hash": "plan-abcd...",
  "status": "active" | "ready" | "superseded",
  "gc_pinned": true,
  "built_at": "2026-04-25T13:01:02Z",
  "activated_at": "2026-04-25T13:01:42Z",
  "channels": [
    {
      "channel_id": "guix-tribes",
      "name": "tribes",
      "url": "https://git.example.test/tribes/guix-tribes.git",
      "branch": "master",
      "commit": "abc123...",
      "position": 10
    }
  ]
}

channels is present for generations prepared by local-control from a plan that included resolved_channels. After guix pull succeeds, local-control records the pulled profile's guix describe --format=json commit for each matching channel, so branch-based plans become exact generation pins. Active generation channels are the preferred source for the currently installed channel commit; callers can fall back to top-level current_channels for the initial non-local-control install.

POST /v1/channels/updates

Synchronous. Discovers update candidates for configured channels by using the Guix channel Git checkouts under $XDG_CACHE_HOME/guix/checkouts or $HOME/.cache/guix/checkouts. The endpoint does not maintain its own checkout or update database; it locates the checkout whose remote.origin.url matches the requested channel URL, runs git fetch --tags --prune origin, and inspects Git refs directly.

Body:

{
  "mode": "semver_tags",
  "limit": 20,
  "channels": [
    {
      "id": "...",
      "name": "tribes",
      "url": "https://git.example.test/tribes/guix-tribes.git",
      "branch": "master",
      "current_commit": "abc123..."
    }
  ]
}

Response:

{
  "schemaVersion": "1",
  "ok": true,
  "mode": "semver_tags",
  "channels": [
    {
      "id": "...",
      "name": "tribes",
      "url": "https://git.example.test/tribes/guix-tribes.git",
      "branch": "master",
      "ok": true,
      "current_commit": "abc123...",
      "branch_head": "def456...",
      "candidates": [
        {
          "tag": "v1.2.3",
          "commit": "def456...",
          "short_commit": "def4567",
          "subject": "release 1.2.3",
          "message": "release 1.2.3\n",
          "committed_at": "2026-06-07T10:00:00+00:00"
        }
      ]
    }
  ]
}

Supported modes:

  • semver_tags — default. Candidates are tags matching vMAJOR.MINOR.PATCH with optional prerelease/build suffixes, reachable from the configured branch head, and descendants of current_commit when one is provided.
  • commits — advanced mode. Candidates are recent branch commits after current_commit when it is an ancestor of the branch head, otherwise recent commits from the branch head.

Guix channel authentication remains enforced later by deployment/prepare; this endpoint is discovery only.

Per-channel failures are returned inline with ok: false and an error code, e.g. checkout_not_found, fetch_failed, branch_not_found, or unsupported_mode.

POST /v1/deployment/resolve

Synchronous. Body: a SystemTarget JSON object. Response:

  • 200 with { "schemaVersion": "2", "ok": true, "plan": { ... } } on success. The plan object includes a plan_hash and is suitable for feeding into prepare.
  • 409 with the resolver error envelope on capability/manifest/trust failures.

POST /v1/deployment/prepare

Asynchronous. Body: a plan object containing plan_hash and resolved_plugins.

  • 202 with { "schemaVersion": "2", "status": "queued", "job_id": "...", "plan_hash": "...", "started_at": "..." } on accept (or on idempotent re-submit of the running job).
  • 409 with { "ok": false, "status": "busy", "reason": "deployment already in progress", "job_id": "...", "plan_hash": "...", ... } when another plan is already in flight.
  • 400 on validation error.

The job pulls channels, runs guix system build --root=..., pre-realizes the target system closure and the store inputs needed for the post-switch Shepherd service-definition upgrade, registers the resulting GC root, and records a ready generation. Keeping this work in prepare means missing substitutes or unexpectedly large local builds fail before the system profile is switched. The final snapshot is visible at GET /v1/deployment/status.

POST /v1/deployment/commit

Asynchronous. Body: { "plan_hash": "..." }.

  • 202 on accept. The job switches the system profile to the previously-prepared generation, then re-runs activation and Guix's normal Shepherd service-definition upgrade step inside the pulled/current Guix profile used for the prepare build. Activation runs with GUIX_NEW_SYSTEM set to the selected generation so /run/current-system follows the profile, and the NBDE boot-store activation hook copies GRUB-referenced /gnu/store items into /boot for nodes whose real store is on encrypted root. Like upstream guix system reconfigure, this does not imply that every already-running service process was restarted. Tribes may then schedule an asynchronous tribes service restart as part of higher-level rollout convergence, while tribes-local-control self-update remains a separate deferred concern. On later boots, tribes-boot-start starts the app only after Legion-managed secret files exist, keeping the first secrets-free boot quiet while allowing reboot recovery. On success the snapshot reaches phase: "active" with status: "completed".
  • 409 if no generation is prepared for that plan_hash. The snapshot's error code is generation_not_prepared.
  • 409 busy if another job is in flight.

POST /v1/deployment/rollback

Asynchronous. Body:

{
  "store_path": "/gnu/store/...-system",
  "plan": { ...optional fallback plan... }
}

The broker walks these cases in order:

  1. The requested store_path is the selected system → just record the activation, no build, no switch.
  2. We have a recorded local-control generation number for that store_path → switch to it directly.
  3. The store_path appears in Guix's system profile links (/var/guix/profiles/system-*-link), even if local-control did not record it → switch to that profile generation directly. This covers the installed baseline generation used by emergency/public rollback.
  4. The store path is gone but plan is supplied → re-prepare and commit.

If none apply the snapshot reports code: "rollback_infeasible".

Current limitation: rollback does not run core/plugin down migrations. The public Tribes admin rollback flow currently omits the fallback plan on purpose so explicit rollback to a baseline generation cannot replay the rollout being rolled back.

POST /v1/deployment/abort

Synchronous. Marks the in-flight job as aborted and writes a snapshot with status: "aborted". (v1: does not yet SIGTERM a running helper subprocess — the operation completes when the helper next checks back in.)

Error taxonomy

Every failed operation returns a code matching one of these tokens:

  • channel_untrusted — channel references a signer not in the TrustedSigner table.
  • signature_invalid — a channel's commit signature failed verification.
  • channel_commit_unreachable — the configured commit cannot be fetched from the channel URL.
  • missing_capability — a plugin requires a capability that no other plugin provides.
  • host_capability_missing — the pinned host and built-in plugin manifests have an unsatisfied capability contract.
  • capability_cycle — the plugin capability graph contains a cycle.
  • duplicate_plugin — the system target lists the same plugin twice.
  • manifest_invalid — a requested plugin name is unknown to the channel registry.
  • host_api_mismatch — the resolved plan needs a host API version the node cannot honour.
  • migration_target_conflict — two plugins disagree about a migration target version.
  • build_failedguix system build returned non-zero.
  • system_closure_preload_failed — the prepared system's referenced store closure could not be realized before switching.
  • service_upgrade_preload_failed — the post-switch Shepherd service-definition upgrade inputs could not be realized before switching.
  • switch_failedguix system switch-generation returned non-zero.
  • rollback_infeasible — the broker cannot reach the requested store path by either retained generation or rebuild.
  • helper_crashedtribes-guix-helper exited without emitting a structured terminal frame.
  • busy — another job is in flight; the request was rejected.
  • invalid_request — payload missed a required field or violated a limit.

Helper protocol (internal)

The broker spawns tribes-guix-helper for every long operation and parses its stdout as NDJSON. The helper emits one of:

{"event":"phase","phase":"pulling","ts":"..."}
{"event":"phase","phase":"building","ts":"...","derivation":"/gnu/store/..."}
{"event":"done","store_path":"/gnu/store/...","generation_number":42,"ts":"..."}
{"event":"error","code":"channel_commit_unreachable","message":"...","details":{...},"ts":"..."}

The broker uses the last event: "phase" frame to update its snapshot in real time, and the final done or error frame to compute the operation result. If the helper exits without a terminal frame the broker synthesizes { "code": "helper_crashed", "details": { "exit_status": N, "signal": S } }.

This protocol is not part of the public API; it exists so the broker can stay small while still surfacing typed errors instead of regex-parsing guix stderr.