You've already forked guix-tribes
7dec823794
Source: guix-tribes master2ea4cae872Base: previous supertest-dev4fee530b68Mode: tree sync, preserving dev channel authorization
330 lines
13 KiB
Markdown
330 lines
13 KiB
Markdown
# Tribes local-control API
|
|
|
|
The local-control broker is a small Guile daemon listening on a Unix-domain
|
|
socket. It fronts every operator action that a Tribes deployment can take on
|
|
its own host:
|
|
|
|
- **resolve** a `SystemTarget` into a build plan.
|
|
- **prepare** a build (pull channels + `guix system build`) without
|
|
activating it.
|
|
- **commit** a previously-prepared generation (`guix system
|
|
switch-generation`).
|
|
- **rollback** to a retained store path or, failing that, rebuild from a
|
|
plan and switch.
|
|
- **abort** an in-flight job.
|
|
- discover channel update candidates from Guix's existing Git checkouts.
|
|
- inspect **status** and **generations**.
|
|
|
|
This document specifies the wire schema. The BEAM client at
|
|
`tribes/lib/tribes/local_control.ex` should be updated to match it.
|
|
|
|
## Transport
|
|
|
|
- HTTP/1.1 over a Unix-domain socket. The path is configurable via
|
|
`TRIBES_LOCAL_CONTROL_SOCKET` (default `/var/run/tribes/local-control.sock`).
|
|
- Permissions: socket owned by `root:tribes`, mode `0660`.
|
|
- Request bodies are JSON (`Content-Type: application/json`).
|
|
- Responses are JSON.
|
|
|
|
## Concurrency model
|
|
|
|
The broker runs a single POSIX worker thread. The HTTP request thread is
|
|
never blocked on a long-running Guix call: any operation that may exceed
|
|
about a second (`prepare`, `commit`, `rollback`) is enqueued on the worker
|
|
and returns `202 Accepted` immediately. The caller then polls
|
|
`GET /v1/deployment/status` for completion.
|
|
|
|
There is at most one job in flight at any time. A new submission with the
|
|
same `plan_hash` as the running job is **idempotent**: the broker returns
|
|
the in-flight snapshot rather than queuing a duplicate. A submission with a
|
|
different `plan_hash` while another job runs returns `409 busy`.
|
|
|
|
## Endpoints
|
|
|
|
### `GET /v1/deployment` and `GET /v1/deployment/status`
|
|
|
|
Returns a status snapshot. Polling interval recommendation: 1 s during an
|
|
active job, with linear back-off to 5 s after the first minute of polling.
|
|
|
|
Snapshot fields:
|
|
|
|
- `schemaVersion` — string, currently `"2"`.
|
|
- `ok` — boolean.
|
|
- `status` — high-level state. One of:
|
|
`idle | queued | running | pulling | building | switching | completed |
|
|
failed | aborted`.
|
|
- `phase` — fine-grained phase identical to `status` for in-flight jobs;
|
|
`ready` after a successful `prepare`, `active` after a successful
|
|
`commit`/`rollback`.
|
|
- `job_id` — opaque identifier of the in-flight or last-completed job.
|
|
`"job-N"` where N is monotonic for the broker process lifetime.
|
|
- `plan_hash` — the plan hash this job is operating on.
|
|
- `started_at`, `last_event_at` — RFC 3339 timestamps.
|
|
- `store_path` — the deployment target's `/gnu/store/...-system` path:
|
|
the prepared store path after `prepare`, or the selected profile store path
|
|
after `commit`/`rollback`.
|
|
- `selectedSystem` — canonical `/gnu/store/...-system` path currently selected
|
|
by `/var/guix/profiles/system`.
|
|
- `runningSystem` — canonical `/gnu/store/...-system` path currently exposed by
|
|
`/run/current-system`.
|
|
- `generation_number` — the system profile generation number.
|
|
- `gc_pinned` — boolean. `true` when the broker holds a GC root via
|
|
`--root=` so the prepared system is not collected before a `commit`.
|
|
- `built_at`, `activated_at` — RFC 3339 timestamps when present.
|
|
- `code` — typed error code on failure (see *Error taxonomy*).
|
|
- `reason` — human-readable error message on failure.
|
|
- `plugins` — array of plugin names in the deployed plan.
|
|
|
|
### `GET /v1/deployment/generations`
|
|
|
|
Returns the current system channel provenance plus the list of recorded generations in newest-first order. The top-level `current_channels` field is parsed from `/run/current-system/channels.scm` when present and lets callers identify the initial installed channel pins before local-control has prepared its first generation.
|
|
|
|
Each generation entry:
|
|
|
|
```json
|
|
{
|
|
"store_path": "/gnu/store/...-system",
|
|
"generation_number": 42,
|
|
"plan_hash": "plan-abcd...",
|
|
"status": "active" | "ready" | "superseded",
|
|
"gc_pinned": true,
|
|
"built_at": "2026-04-25T13:01:02Z",
|
|
"activated_at": "2026-04-25T13:01:42Z",
|
|
"channels": [
|
|
{
|
|
"channel_id": "guix-tribes",
|
|
"name": "tribes",
|
|
"url": "https://git.example.test/tribes/guix-tribes.git",
|
|
"branch": "master",
|
|
"commit": "abc123...",
|
|
"position": 10
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
`channels` is present for generations prepared by local-control from a plan
|
|
that included `resolved_channels`. After `guix pull` succeeds, local-control
|
|
records the pulled profile's `guix describe --format=json` commit for each
|
|
matching channel, so branch-based plans become exact generation pins. Active
|
|
generation `channels` are the preferred source for the currently installed
|
|
channel commit; callers can fall back to top-level `current_channels` for the
|
|
initial non-local-control install.
|
|
|
|
### `POST /v1/channels/updates`
|
|
|
|
Synchronous. Discovers update candidates for configured channels by using the
|
|
Guix channel Git checkouts under `$XDG_CACHE_HOME/guix/checkouts` or
|
|
`$HOME/.cache/guix/checkouts`. The endpoint does not maintain its own checkout
|
|
or update database; it locates the checkout whose `remote.origin.url` matches
|
|
the requested channel URL, runs `git fetch --tags --prune origin`, and inspects
|
|
Git refs directly.
|
|
|
|
Body:
|
|
|
|
```json
|
|
{
|
|
"mode": "semver_tags",
|
|
"limit": 20,
|
|
"channels": [
|
|
{
|
|
"id": "...",
|
|
"name": "tribes",
|
|
"url": "https://git.example.test/tribes/guix-tribes.git",
|
|
"branch": "master",
|
|
"current_commit": "abc123..."
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Response:
|
|
|
|
```json
|
|
{
|
|
"schemaVersion": "1",
|
|
"ok": true,
|
|
"mode": "semver_tags",
|
|
"channels": [
|
|
{
|
|
"id": "...",
|
|
"name": "tribes",
|
|
"url": "https://git.example.test/tribes/guix-tribes.git",
|
|
"branch": "master",
|
|
"ok": true,
|
|
"current_commit": "abc123...",
|
|
"branch_head": "def456...",
|
|
"candidates": [
|
|
{
|
|
"tag": "v1.2.3",
|
|
"commit": "def456...",
|
|
"short_commit": "def4567",
|
|
"subject": "release 1.2.3",
|
|
"message": "release 1.2.3\n",
|
|
"committed_at": "2026-06-07T10:00:00+00:00"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Supported modes:
|
|
|
|
- `semver_tags` — default. Candidates are tags matching `vMAJOR.MINOR.PATCH`
|
|
with optional prerelease/build suffixes, reachable from the configured branch
|
|
head, and descendants of `current_commit` when one is provided.
|
|
- `commits` — advanced mode. Candidates are recent branch commits after
|
|
`current_commit` when it is an ancestor of the branch head, otherwise recent
|
|
commits from the branch head.
|
|
|
|
Guix channel authentication remains enforced later by `deployment/prepare`; this endpoint is discovery only.
|
|
|
|
Per-channel failures are returned inline with `ok: false` and an error code,
|
|
e.g. `checkout_not_found`, `fetch_failed`, `branch_not_found`, or
|
|
`unsupported_mode`.
|
|
|
|
### `POST /v1/deployment/resolve`
|
|
|
|
Synchronous. Body: a `SystemTarget` JSON object. Response:
|
|
|
|
- `200` with `{ "schemaVersion": "2", "ok": true, "plan": { ... } }` on
|
|
success. The `plan` object includes a `plan_hash` and is suitable for
|
|
feeding into `prepare`.
|
|
- `409` with the resolver error envelope on capability/manifest/trust
|
|
failures.
|
|
|
|
### `POST /v1/deployment/prepare`
|
|
|
|
Asynchronous. Body: a plan object containing `plan_hash` and
|
|
`resolved_plugins`.
|
|
|
|
- `202` with `{ "schemaVersion": "2", "status": "queued", "job_id": "...",
|
|
"plan_hash": "...", "started_at": "..." }` on accept (or on idempotent
|
|
re-submit of the running job).
|
|
- `409` with `{ "ok": false, "status": "busy", "reason": "deployment already in
|
|
progress", "job_id": "...", "plan_hash": "...", ... }` when another
|
|
plan is already in flight.
|
|
- `400` on validation error.
|
|
|
|
The job pulls channels, runs `guix system build --root=...`, pre-realizes the
|
|
target system closure and the store inputs needed for the post-switch Shepherd
|
|
service-definition upgrade, registers the resulting GC root, and records a
|
|
`ready` generation. Keeping this work in `prepare` means missing substitutes or
|
|
unexpectedly large local builds fail before the system profile is switched. The
|
|
final snapshot is visible at `GET /v1/deployment/status`.
|
|
|
|
### `POST /v1/deployment/commit`
|
|
|
|
Asynchronous. Body: `{ "plan_hash": "..." }`.
|
|
|
|
- `202` on accept. The job switches the system profile to the
|
|
previously-prepared generation, then re-runs activation and Guix's normal
|
|
Shepherd service-definition upgrade step inside the pulled/current Guix
|
|
profile used for the prepare build. Activation runs with `GUIX_NEW_SYSTEM`
|
|
set to the selected generation so `/run/current-system` follows the
|
|
profile, and the NBDE boot-store activation hook copies GRUB-referenced
|
|
`/gnu/store` items into `/boot` for nodes whose real store is on encrypted
|
|
root. Like upstream `guix system reconfigure`, this does not imply that
|
|
every already-running service process was restarted. Tribes may then
|
|
schedule an asynchronous `tribes` service restart as part of higher-level
|
|
rollout convergence, while `tribes-local-control` self-update remains a
|
|
separate deferred concern. On later boots, `tribes-boot-start` starts the
|
|
app only after Legion-managed secret files exist, keeping the first
|
|
secrets-free boot quiet while allowing reboot recovery. On success the
|
|
snapshot reaches `phase: "active"` with `status: "completed"`.
|
|
- `409` if no generation is prepared for that `plan_hash`. The snapshot's
|
|
error code is `generation_not_prepared`.
|
|
- `409 busy` if another job is in flight.
|
|
|
|
### `POST /v1/deployment/rollback`
|
|
|
|
Asynchronous. Body:
|
|
|
|
```json
|
|
{
|
|
"store_path": "/gnu/store/...-system",
|
|
"plan": { ...optional fallback plan... }
|
|
}
|
|
```
|
|
|
|
The broker walks these cases in order:
|
|
|
|
1. The requested `store_path` is the selected system → just record the
|
|
activation, no build, no switch.
|
|
2. We have a recorded local-control generation number for that `store_path`
|
|
→ switch to it directly.
|
|
3. The `store_path` appears in Guix's system profile links
|
|
(`/var/guix/profiles/system-*-link`), even if local-control did not record
|
|
it → switch to that profile generation directly. This covers the installed
|
|
baseline generation used by emergency/public rollback.
|
|
4. The store path is gone but `plan` is supplied → re-prepare and commit.
|
|
|
|
If none apply the snapshot reports `code: "rollback_infeasible"`.
|
|
|
|
Current limitation: rollback does not run core/plugin down migrations. The
|
|
public Tribes admin rollback flow currently omits the fallback `plan` on
|
|
purpose so explicit rollback to a baseline generation cannot replay the rollout
|
|
being rolled back.
|
|
|
|
### `POST /v1/deployment/abort`
|
|
|
|
Synchronous. Marks the in-flight job as aborted and writes a snapshot with
|
|
`status: "aborted"`. (v1: does not yet SIGTERM a running helper subprocess —
|
|
the operation completes when the helper next checks back in.)
|
|
|
|
## Error taxonomy
|
|
|
|
Every failed operation returns a `code` matching one of these tokens:
|
|
|
|
- `channel_untrusted` — channel references a signer not in the
|
|
`TrustedSigner` table.
|
|
- `signature_invalid` — a channel's commit signature failed verification.
|
|
- `channel_commit_unreachable` — the configured commit cannot be fetched
|
|
from the channel URL.
|
|
- `missing_capability` — a plugin requires a capability that no other
|
|
plugin provides.
|
|
- `host_capability_missing` — the pinned host and built-in plugin manifests
|
|
have an unsatisfied capability contract.
|
|
- `capability_cycle` — the plugin capability graph contains a cycle.
|
|
- `duplicate_plugin` — the system target lists the same plugin twice.
|
|
- `manifest_invalid` — a requested plugin name is unknown to the channel
|
|
registry.
|
|
- `host_api_mismatch` — the resolved plan needs a host API version the
|
|
node cannot honour.
|
|
- `migration_target_conflict` — two plugins disagree about a migration
|
|
target version.
|
|
- `build_failed` — `guix system build` returned non-zero.
|
|
- `system_closure_preload_failed` — the prepared system's referenced store
|
|
closure could not be realized before switching.
|
|
- `service_upgrade_preload_failed` — the post-switch Shepherd
|
|
service-definition upgrade inputs could not be realized before switching.
|
|
- `switch_failed` — `guix system switch-generation` returned non-zero.
|
|
- `rollback_infeasible` — the broker cannot reach the requested store
|
|
path by either retained generation or rebuild.
|
|
- `helper_crashed` — `tribes-guix-helper` exited without emitting a
|
|
structured terminal frame.
|
|
- `busy` — another job is in flight; the request was rejected.
|
|
- `invalid_request` — payload missed a required field or violated a limit.
|
|
|
|
## Helper protocol (internal)
|
|
|
|
The broker spawns `tribes-guix-helper` for every long operation and parses
|
|
its stdout as NDJSON. The helper emits one of:
|
|
|
|
```json
|
|
{"event":"phase","phase":"pulling","ts":"..."}
|
|
{"event":"phase","phase":"building","ts":"...","derivation":"/gnu/store/..."}
|
|
{"event":"done","store_path":"/gnu/store/...","generation_number":42,"ts":"..."}
|
|
{"event":"error","code":"channel_commit_unreachable","message":"...","details":{...},"ts":"..."}
|
|
```
|
|
|
|
The broker uses the last `event: "phase"` frame to update its snapshot in
|
|
real time, and the final `done` or `error` frame to compute the operation
|
|
result. If the helper exits without a terminal frame the broker synthesizes
|
|
`{ "code": "helper_crashed", "details": { "exit_status": N, "signal": S } }`.
|
|
|
|
This protocol is not part of the public API; it exists so the broker can
|
|
stay small while still surfacing typed errors instead of regex-parsing
|
|
`guix` stderr.
|