Add the scope/action CLI for scenarios, topologies, blocks, and groups with generated registry listings. Replace legacy scenario and alias npm scripts with the packaged supertest bin. Include removal of the obsolete rollout progress document.
10 KiB
tribes-supertest
Real integration scenarios for Tribes deployments, driven through a checked-out Legion CLI.
Overview
This repo runs real deployment scenarios against cloud infrastructure and verifies what the deployed nodes actually booted and exposed.
../legion_kkis the only required external project.- Legion is treated as the deployment authority.
- supertest records what Legion asked Guix to install and what the nodes actually booted.
The runner currently invokes Legion's headless CLI directly:
- CLI entrypoint:
../legion_kk/src/engine/cli-main.ts - State: isolated per run under
.state/supertest/... - Artifacts: command logs, sanitized Legion state, and remote node diagnostics
Requirements
- This repo checked out locally
../legion_kkchecked out beside it- Node/npm available in the current shell
- Legion dependencies available in
../legion_kk - A usable Legion kexec installer default, normally the generated mirror pin in
../legion_kk
Cloud/provider credentials must be present in the environment:
LEGION_UNLOCK_PASSWORDHCLOUD_TOKENOVH_APP_KEYorOVH_APPLICATION_KEYOVH_APP_SECRETorOVH_APPLICATION_SECRETOVH_CONSUMER_KEYSCW_ACCESS_KEYSCW_SECRET_KEYSCW_DEFAULT_PROJECT_ID
Optional but commonly useful:
SCW_DEFAULT_ZONEOVH_ENDPOINTSUPERTEST_KEEP_NODES=1SUPERTEST_KEXEC_IMAGE=/abs/path/to/guix-kexec-installer.tar.gzorSUPERTEST_KEXEC_IMAGE=https://mirror.example/tribes-1/guix-kexec-installer-x86_64-linux-latest.tar.gzSUPERTEST_CERT_MODE=self-signed(test-only mode to skip ACME and keep self-signed edge certs)hcloudandscwCLI tooling in the shell for manual inspection or intervention
Install
Install this repo's dependencies:
npm install
If you want to check the project locally before a live run:
npm run typecheck
npm test
npm run build
Check the Guix substitute servers before spending cloud time:
node --import tsx scripts/check-substitutes.ts
node --import tsx scripts/check-substitutes.ts --plugin sender
Delete leftover cloud resources without using Legion state:
scripts/cleanup-cloud-resources
scripts/cleanup-cloud-resources --dry-run
Basic Usage
Build the CLI, then list registered scenarios, topologies, and blocks:
npm run build
./dist/cli.js scenario list
./dist/cli.js topology list
./dist/cli.js block list
Run a scenario or a topology directly:
./dist/cli.js scenario run single-node-init
./dist/cli.js topology run cluster-lifecycle
Keep created nodes around for inspection:
./dist/cli.js scenario run cluster-lifecycle --keep-nodes
Groups run several scenarios sequentially and fail fast on the first failing
scenario. All scenarios in a group share one SUPERTEST_RUN_ID, so their
artifacts land in sibling directories under the same run.
./dist/cli.js group run tribes
./dist/cli.js group run all
See ./dist/cli.js help for generated listings and the current group
membership.
Dev Branch Helper
For rapid guix-tribes dev-channel iteration, use:
scripts/test-dev-branch --plugin supertest single-node-plugin-rollout-rollback
The helper updates and pushes the hard-coded guix-tribes supertest-dev
branch, exports the required SUPERTEST_GUIX_TRIBES_* environment, and runs
the scenario with SUPERTEST_CERT_MODE=self-signed by default. It verifies that
the pinned tribes and optional tribes-plugin-$NAME commits are already
reachable from their origin remotes; it does not push those source repos.
Useful subcommands:
scripts/test-dev-branch prepare --plugin supertest
scripts/test-dev-branch reset
scripts/test-dev-branch env
scripts/test-dev-branch ssh <node>
scripts/test-dev-branch rpc <node> -- 'Node.self()'
Use --tribes-repo or --plugin-repo to point at a clean worktree when the
main checkout contains unrelated local work.
Implemented Scenarios
single-node-initProvisions one Hetzner init node and captures deployed channels, service status, and NBDE state.manual-node-initImports one existing Ubuntu-compatible host fromSUPERTEST_MANUAL_HOST_IP,SUPERTEST_MANUAL_USERNAME, andSUPERTEST_MANUAL_PASSWORD, then captures deployed channels, service status, and NBDE state.single-node-plugin-rollout-rollbackProvisions one Hetzner init node, applies a plugin rollout through the public admin API, and rolls back to the pre-rollout generation.single-node-senderProvisions one Hetzner init node, installs thesenderplugin when needed, starts RTMP ingest through Legion, publishes an audio test stream, verifies HLS playlist and segment output, and stops the stream.cluster-sender-fanout-rebootBuilds a three-node OVH/Hetzner/Scaleway cluster, installs thesenderplugin when needed, starts one origin plus two HLS edges through Legion, publishes a 6 Mbit/s video test stream, runs HLS viewers against every node, verifies Sender viewer-count rollups through the admin API, reboots one edge, and verifies recovery. SetSUPERTEST_SENDER_VIEWERS_PER_NODEto override the default of 3 viewers per node.cluster-kobold-public-privateBuilds a two-node Hetzner/Scaleway cluster, rolls out Trust-backed Kobold (the preview must resolve bothtribe-one-koboldandtribe-one-trust), creates public and private datasets, and verifies Trust-gated access: public read-only after a Trust hello, private denial before grants, explicit private read/write grants, and that private datasets stay local to their origin.cluster-plugin-integrated-rolloutRolls out thesupertestplugin on a Hetzner init node, seeds plugin data, then adds Scaleway and OVH join nodes and verifies the new nodes install with the active plugin target already integrated (schema ready, seed data synced, and the plugin-enabled system target present on every node).cluster-plugin-rollout-sync-split-brainBuilds a three-node Hetzner/Scaleway/OVH cluster, applies thesupertestplugin rollout, validates synced table writes and cluster pubsub across a temporary sync-port partition/rejoin, and rolls back every node.cluster-lifecycleBuilds a mixed Hetzner/Scaleway cluster, removes a node, reconciles NBDE, adds a replacement node, and reconciles again.
Rollout cross-repo implementation notes + progress tracker:
docs/ROLLOUT_CROSS_REPO_PLAN_PROGRESS.md
Artifacts
Each run writes to:
.state/supertest/<run-id>-<scenario>/
Inside a scenario directory you will typically find:
config-summary.jsonscenario.jsonlegion-checkout.jsoncommands/snapshots/
Snapshots include:
nodes-list.jsonproviders-list.jsonlegion-state.raw.jsonlegion-state.sanitized.jsonremote/<node-id>/...
Remote diagnostics currently include checks such as:
guix system describeherd status tribesherd status postgrescurl http://127.0.0.1:4000/healthz/root/legion/tribes-admin.sh bootstrap-ready- node public key
- LUKS UUID
- local boot-key presence
- clevis bindings
- peer Tang reachability
Important Notes
- supertest uses isolated Legion state per run. It does not reuse your normal Legion desktop state.
guix system describeis the relevant proof for the installed system channels.guix describeis not sufficient for that check and should not be used for scenario assertions here.- With
SUPERTEST_KEEP_NODES=1, cleanup is skipped on purpose.
Cleanup
If a kept run leaves nodes behind, destroy them using the same isolated Legion state directory that created them.
Example:
env \
LEGION_STATE_DIR=.state/supertest/<run>/single-node-init/legion-state \
LEGION_CACHE_DIR=.state/supertest/<run>/single-node-init/legion-cache \
LEGION_APP_ROOT=../legion_kk \
LEGION_UNLOCK_PASSWORD="$LEGION_UNLOCK_PASSWORD" \
node --import tsx ../legion_kk/src/engine/cli-main.ts \
nodes destroy --materialize <node-id>
To inspect remaining tracked nodes in that state:
env \
LEGION_STATE_DIR=.state/supertest/<run>/single-node-init/legion-state \
LEGION_CACHE_DIR=.state/supertest/<run>/single-node-init/legion-cache \
LEGION_APP_ROOT=../legion_kk \
LEGION_UNLOCK_PASSWORD="$LEGION_UNLOCK_PASSWORD" \
node --import tsx ../legion_kk/src/engine/cli-main.ts \
nodes list --json
If a run has to be aborted hard, or Legion state no longer matches provider reality, wipe provider-side resources directly:
scripts/cleanup-cloud-resources
This helper does not read or delete Legion state or supertest artifacts. It
uses the provider CLIs from the current shell to remove billable runtime
resources such as instances, volumes, snapshots, IP allocations, and load
balancers. By default it preserves non-test SSH keys; use --all-keys if the
provider credentials' project should be fully cleared.
Manual Intervention
The normal control path is Legion's CLI, but it is useful to have provider tooling available for inspection or emergency cleanup.
hcloudUseful for checking server status, IPs, volumes, rescue state, and deleting instances directly if Legion state and provider reality diverge.scwUseful for checking Scaleway instances, IPs, volumes, bootscripts, and deleting or inspecting resources outside the test runner.
These tools are optional for normal runs, but they are practical when:
- a run is kept with
SUPERTEST_KEEP_NODES=1 - a deployment fails halfway through and you want provider-side visibility
- Legion state needs to be compared against provider-side reality
- cleanup has to be completed manually
Useful Environment Variables
SUPERTEST_RUN_IDOverride the generated run id.SUPERTEST_ARTIFACT_ROOTOverride the artifact directory root.SUPERTEST_TRIBE_NAMEOverride the configured tribe name.SUPERTEST_CERT_MODEacme(default) orself-signed. The current baseline scenarios do not create a managed domain; ACME mode validates the IP-only certificate path. When set toself-signed, supertest injectsLEGION_TEST_CERT_MODE=self-signedfor Legion CLI commands.SUPERTEST_HETZNER_INSTANCEOverride the Hetzner instance/offer selection.SUPERTEST_OVH_INSTANCEOverride the OVH instance/offer selection.SUPERTEST_SCALEWAY_INSTANCEOverride the Scaleway instance/offer selection.SUPERTEST_HETZNER_BOOT_MODEOverride Hetzner boot mode.SUPERTEST_SCALEWAY_BOOT_MODEOverride Scaleway boot mode.SUPERTEST_PLUGIN_NAMEOverride the plugin name used by plugin rollout scenarios (defaults tosupertest).