# tribes-supertest Real integration scenarios for Tribes deployments, driven through a checked-out Legion CLI. ## Overview This repo runs real deployment scenarios against cloud infrastructure and verifies what the deployed nodes actually booted and exposed. - `../legion_kk` is the only required external project. - Legion is treated as the deployment authority. - supertest records what Legion asked Guix to install and what the nodes actually booted. The runner currently invokes Legion's headless CLI directly: - CLI entrypoint: `../legion_kk/src/engine/cli-main.ts` - State: isolated per run under `.state/supertest/...` - Artifacts: command logs, sanitized Legion state, and remote node diagnostics ## Requirements - This repo checked out locally - `../legion_kk` checked out beside it - Node/npm available in the current shell - Legion dependencies available in `../legion_kk` - A usable Legion kexec installer default, normally the generated mirror pin in `../legion_kk` Cloud/provider credentials must be present in the environment: - `LEGION_UNLOCK_PASSWORD` - `HCLOUD_TOKEN` - `OVH_APP_KEY` or `OVH_APPLICATION_KEY` - `OVH_APP_SECRET` or `OVH_APPLICATION_SECRET` - `OVH_CONSUMER_KEY` - `SCW_ACCESS_KEY` - `SCW_SECRET_KEY` - `SCW_DEFAULT_PROJECT_ID` Optional but commonly useful: - `SCW_DEFAULT_ZONE` - `OVH_ENDPOINT` - `SUPERTEST_KEEP_NODES=1` - `SUPERTEST_KEXEC_IMAGE=/abs/path/to/guix-kexec-installer.tar.gz` or `SUPERTEST_KEXEC_IMAGE=https://mirror.example/tribes-1/guix-kexec-installer-x86_64-linux-latest.tar.gz` - `SUPERTEST_CERT_MODE=self-signed` (test-only mode to skip ACME and keep self-signed edge certs) - `hcloud` and `scw` CLI tooling in the shell for manual inspection or intervention ## Install Install this repo's dependencies: ```bash npm install ``` If you want to check the project locally before a live run: ```bash npm run typecheck npm test npm run build ``` Check the Guix substitute servers before spending cloud time: ```bash npm run preflight:substitutes npm run preflight:substitutes -- --plugin sender ``` Delete leftover cloud resources without using Legion state: ```bash scripts/cleanup-cloud-resources scripts/cleanup-cloud-resources --dry-run ``` ## Basic Usage List scenarios: ```bash npm run scenario:list ``` Run the single-node scenario: ```bash npm run scenario:single-node-init ``` Run the manual single-node scenario against an existing host: ```bash SUPERTEST_MANUAL_HOST_IP=203.0.113.10 \ SUPERTEST_MANUAL_USERNAME=ubuntu \ SUPERTEST_MANUAL_PASSWORD=secret \ npm run scenario:manual-node-init ``` Run the single-node plugin rollout/rollback scenario: ```bash npm run scenario:single-node-plugin-rollout-rollback ``` Run the single-node Sender ingest/HLS scenario: ```bash npm run scenario:single-node-sender ``` Run the clustered Sender fanout/reboot scenario: ```bash npm run scenario:cluster-sender-fanout-reboot ``` Run the cluster lifecycle scenario: ```bash npm run scenario:cluster-lifecycle ``` Run the clustered plugin sync split-brain scenario: ```bash npm run scenario:cluster-plugin-rollout-sync-split-brain ``` Keep created nodes around for inspection: ```bash SUPERTEST_KEEP_NODES=1 npm run scenario:cluster-lifecycle ``` Run one scenario directly with the generic entrypoint: ```bash npm run scenario -- single-node-init npm run scenario -- manual-node-init npm run scenario -- single-node-plugin-rollout-rollback npm run scenario -- single-node-sender npm run scenario -- cluster-sender-fanout-reboot npm run scenario -- cluster-plugin-rollout-sync-split-brain npm run scenario -- cluster-lifecycle ``` ## Alias Groups Aliases run several scenarios sequentially and fail fast on the first failing scenario. All scenarios in an alias share one `SUPERTEST_RUN_ID`, so their artifacts land in sibling directories under the same run. Each group is the minimal set of scenarios that preserves its coverage. ```bash npm run alias:tribes # Tribes core: no-plugin + supertest scenarios npm run alias:sender # Sender ingest/HLS scenarios npm run alias:kobold # Trust-backed Kobold dataset scenarios npm run alias:all # deduplicated union of every named group ``` - `tribes` — Tribes core: `single-node-init`, `cluster-plugin-integrated-rollout`, `cluster-plugin-rollout-sync-split-brain`, `cluster-lifecycle`. `single-node-plugin-rollout-rollback` is omitted because its coverage is a subset of the cluster rollout scenarios (its rollback executor-status check is folded into `cluster-plugin-rollout-sync-split-brain`); run it on its own with `npm run scenario:single-node-plugin-rollout-rollback`. - `sender` — `single-node-sender`, `cluster-sender-fanout-reboot` (both kept; the single-node run uniquely covers external endpoint types, the direct Vinyl HLS metric, audio ingest, and a Hetzner-init origin). - `kobold` — `cluster-kobold-public-private`. - `all` — the deduplicated union of the named groups. This excludes `manual-node-init` (needs a manually supplied host, `SUPERTEST_MANUAL_*`) and the redundant `single-node-plugin-rollout-rollback`. Equivalent to `npm run scenario -- group `. See `npm run scenario` (no arguments) for the current grouping. ## Dev Branch Helper For rapid `guix-tribes` dev-channel iteration, use: ```bash scripts/test-dev-branch --plugin supertest single-node-plugin-rollout-rollback ``` The helper updates and pushes the hard-coded `guix-tribes` `supertest-dev` branch, exports the required `SUPERTEST_GUIX_TRIBES_*` environment, and runs the scenario with `SUPERTEST_CERT_MODE=self-signed` by default. It verifies that the pinned `tribes` and optional `tribes-plugin-$NAME` commits are already reachable from their `origin` remotes; it does not push those source repos. Useful subcommands: ```bash scripts/test-dev-branch prepare --plugin supertest scripts/test-dev-branch reset scripts/test-dev-branch env scripts/test-dev-branch ssh scripts/test-dev-branch rpc -- 'Node.self()' ``` Use `--tribes-repo` or `--plugin-repo` to point at a clean worktree when the main checkout contains unrelated local work. ## Implemented Scenarios - `single-node-init` Provisions one Hetzner init node and captures deployed channels, service status, and NBDE state. - `manual-node-init` Imports one existing Ubuntu-compatible host from `SUPERTEST_MANUAL_HOST_IP`, `SUPERTEST_MANUAL_USERNAME`, and `SUPERTEST_MANUAL_PASSWORD`, then captures deployed channels, service status, and NBDE state. - `single-node-plugin-rollout-rollback` Provisions one Hetzner init node, applies a plugin rollout through the public admin API, and rolls back to the pre-rollout generation. - `single-node-sender` Provisions one Hetzner init node, installs the `sender` plugin when needed, starts RTMP ingest through Legion, publishes an audio test stream, verifies HLS playlist and segment output, and stops the stream. - `cluster-sender-fanout-reboot` Builds a three-node OVH/Hetzner/Scaleway cluster, installs the `sender` plugin when needed, starts one origin plus two HLS edges through Legion, publishes a 6 Mbit/s video test stream, runs HLS viewers against every node, verifies Sender viewer-count rollups through the admin API, reboots one edge, and verifies recovery. Set `SUPERTEST_SENDER_VIEWERS_PER_NODE` to override the default of 3 viewers per node. - `cluster-kobold-public-private` Builds a two-node Hetzner/Scaleway cluster, rolls out Trust-backed Kobold (the preview must resolve both `tribe-one-kobold` and `tribe-one-trust`), creates public and private datasets, and verifies Trust-gated access: public read-only after a Trust hello, private denial before grants, explicit private read/write grants, and that private datasets stay local to their origin. - `cluster-plugin-integrated-rollout` Rolls out the `supertest` plugin on a Hetzner init node, seeds plugin data, then adds Scaleway and OVH join nodes and verifies the new nodes install with the active plugin target already integrated (schema ready, seed data synced, and the plugin-enabled system target present on every node). - `cluster-plugin-rollout-sync-split-brain` Builds a three-node Hetzner/Scaleway/OVH cluster, applies the `supertest` plugin rollout, validates synced table writes and cluster pubsub across a temporary sync-port partition/rejoin, and rolls back every node. - `cluster-lifecycle` Builds a mixed Hetzner/Scaleway cluster, removes a node, reconciles NBDE, adds a replacement node, and reconciles again. Rollout cross-repo implementation notes + progress tracker: - `docs/ROLLOUT_CROSS_REPO_PLAN_PROGRESS.md` ## Artifacts Each run writes to: ```text .state/supertest/-/ ``` Inside a scenario directory you will typically find: - `config-summary.json` - `scenario.json` - `legion-checkout.json` - `commands/` - `snapshots/` Snapshots include: - `nodes-list.json` - `providers-list.json` - `legion-state.raw.json` - `legion-state.sanitized.json` - `remote//...` Remote diagnostics currently include checks such as: - `guix system describe` - `herd status tribes` - `herd status postgres` - `curl http://127.0.0.1:4000/healthz` - `/root/legion/tribes-admin.sh bootstrap-ready` - node public key - LUKS UUID - local boot-key presence - clevis bindings - peer Tang reachability ## Important Notes - supertest uses isolated Legion state per run. It does not reuse your normal Legion desktop state. - `guix system describe` is the relevant proof for the installed system channels. - `guix describe` is not sufficient for that check and should not be used for scenario assertions here. - With `SUPERTEST_KEEP_NODES=1`, cleanup is skipped on purpose. ## Cleanup If a kept run leaves nodes behind, destroy them using the same isolated Legion state directory that created them. Example: ```bash env \ LEGION_STATE_DIR=.state/supertest//single-node-init/legion-state \ LEGION_CACHE_DIR=.state/supertest//single-node-init/legion-cache \ LEGION_APP_ROOT=../legion_kk \ LEGION_UNLOCK_PASSWORD="$LEGION_UNLOCK_PASSWORD" \ node --import tsx ../legion_kk/src/engine/cli-main.ts \ nodes destroy --materialize ``` To inspect remaining tracked nodes in that state: ```bash env \ LEGION_STATE_DIR=.state/supertest//single-node-init/legion-state \ LEGION_CACHE_DIR=.state/supertest//single-node-init/legion-cache \ LEGION_APP_ROOT=../legion_kk \ LEGION_UNLOCK_PASSWORD="$LEGION_UNLOCK_PASSWORD" \ node --import tsx ../legion_kk/src/engine/cli-main.ts \ nodes list --json ``` If a run has to be aborted hard, or Legion state no longer matches provider reality, wipe provider-side resources directly: ```bash scripts/cleanup-cloud-resources ``` This helper does not read or delete Legion state or supertest artifacts. It uses the provider CLIs from the current shell to remove billable runtime resources such as instances, volumes, snapshots, IP allocations, and load balancers. By default it preserves non-test SSH keys; use `--all-keys` if the provider credentials' project should be fully cleared. ## Manual Intervention The normal control path is Legion's CLI, but it is useful to have provider tooling available for inspection or emergency cleanup. - `hcloud` Useful for checking server status, IPs, volumes, rescue state, and deleting instances directly if Legion state and provider reality diverge. - `scw` Useful for checking Scaleway instances, IPs, volumes, bootscripts, and deleting or inspecting resources outside the test runner. These tools are optional for normal runs, but they are practical when: - a run is kept with `SUPERTEST_KEEP_NODES=1` - a deployment fails halfway through and you want provider-side visibility - Legion state needs to be compared against provider-side reality - cleanup has to be completed manually ## Useful Environment Variables - `SUPERTEST_RUN_ID` Override the generated run id. - `SUPERTEST_ARTIFACT_ROOT` Override the artifact directory root. - `SUPERTEST_TRIBE_NAME` Override the configured tribe name. - `SUPERTEST_TRIBE_DOMAIN` Override the configured tribe domain. - `SUPERTEST_ACME_EMAIL` Override the ACME email passed to Legion. - `SUPERTEST_CERT_MODE` `acme` (default) or `self-signed`. When set to `self-signed`, supertest injects `LEGION_TEST_CERT_MODE=self-signed` for Legion CLI commands. - `SUPERTEST_HETZNER_INSTANCE` Override the Hetzner instance/offer selection. - `SUPERTEST_OVH_INSTANCE` Override the OVH instance/offer selection. - `SUPERTEST_SCALEWAY_INSTANCE` Override the Scaleway instance/offer selection. - `SUPERTEST_HETZNER_BOOT_MODE` Override Hetzner boot mode. - `SUPERTEST_SCALEWAY_BOOT_MODE` Override Scaleway boot mode. - `SUPERTEST_BOOTSTRAP_PASSWORD_ENV` Override the env var name used for Legion `config init`. - `SUPERTEST_PLUGIN_NAME` Override the plugin name used by plugin rollout scenarios (defaults to `supertest`).