Remove the old restore/operator-task recovery model and make actor/execution state the source of truth. Align GUI and CLI node actions around start, stop, reboot, retry, and reinstall, with provider image reset folded into reinstall when supported. Update statechart and architecture docs to match the reinstall/repair model.
20 KiB
Adding a Provider
This document explains how to add a new provider to Legion's current architecture.
It is written against the current codebase, not an idealized future state. That means it includes the boring synchronization work too: shared types, provider-module wiring, catalog policy, headless CLI, optional renderer support, tests, and docs.
The fastest way to get this right is:
- Treat OVH as the current reference for a "full integrated provider".
- Treat Hetzner as the simpler reference for a "well-exercised provider".
- Reuse the shared deployment/runtime path. Do not create a separate deployment implementation for the new provider.
Scope
Legion has provider modules with different surfaces:
- API-backed providers expose cloud surfaces such as compute, DNS, domain, or firewall adapters.
manualis also a provider module. Its implementation is user-mediated instead of API-mediated, so it currently has metadata, manual server import, and user-task firewall capabilities but no cloud adapter.
This guide is mostly for adding a new API-backed cloud provider, meaning:
- real credentials
- compute catalog support
- compute provisioning
- optional DNS service integration
- shared node deployment via the existing NBDE path
Current architecture
The important boundary is:
- provider-specific API logic lives in
src/main/cloud/providers/*.ts - each provider registers through a
ProviderModulefromsrc/main/cloud/providers/*-module.ts - shared provider mutation recovery lives in
src/main/cloud/providers/api-resilience.ts - provider lookup, ordering, and enable/disable filtering live in
src/main/cloud/providers/registry.ts - shared orchestration stays generic in:
src/main/ops/engine-ops.tssrc/main/resources/*/reconcile.tssrc/main/cloud/provider-broker.tssrc/main/cloud/provider-persistence.tssrc/main/deployment/runtime.tssrc/main/deployment/service.ts
- shared types and provider unions live in
src/shared/app.ts - catalog policy and deployment defaults are attached to the provider module
- catalog policy types live in
src/shared/provider-catalog-policy.ts - deployment profile types live in
src/shared/provider-deployment-policy.ts - the headless engine surface lives in
src/engine/runtime.ts - CLI wiring currently goes through:
src/main/cli/cli.tssrc/main/cli/node-cli-service.ts
That means adding a provider should mostly be:
- one new provider API implementation file, if it talks to an external API
- one new provider module file
- adding that module to the registry's module list
- provider credential surface wiring
- tests
The deployment flow after the VM exists should remain shared.
Step 1. Extend the shared provider model
Start in src/shared/app.ts.
Update the provider unions first:
ProviderKindIntegratedProviderKind- any derived unions that should include the new provider:
ServerProviderKindStandardDnsProviderKind
Then add the new provider credential interface and include it in ProviderCredentials.
Use OVH and Hetzner as the shape references:
HetznerProviderCredentialsOvhProviderCredentials
You will also want to review any request types that implicitly assume the existing kinds, especially:
ProductCatalogRequestProviderConfigProviderDescriptor
If the provider needs additional project-level data like projectId, keep it explicit and stable in the shared type.
Step 2. Add the provider module contract implementation
Create a provider-module file in src/main/cloud/providers.
Current examples:
src/main/cloud/providers/hetzner-module.tssrc/main/cloud/providers/ovh-module.tssrc/main/cloud/providers/scaleway-module.tssrc/main/cloud/providers/mock-module.tssrc/main/cloud/providers/manual-module.ts
The module implements ProviderModule from src/main/cloud/providers/provider-module.ts.
It owns:
kinddescriptorcapabilities- credential defaults, normalization, resolution, and configured checks
- optional
catalogPolicy - optional
deploymentProfile - optional API-backed adapter surfaces
- optional
enabled(env)gate
If the provider is disabled by enabled(env), Legion hides its persisted configuration and resources from app snapshots without deleting them.
Step 3. Implement provider API logic in src/main/cloud/providers
For API-backed providers, create a provider implementation file following the current pattern:
src/main/cloud/providers/hetzner.tssrc/main/cloud/providers/ovh.tssrc/main/cloud/providers/scaleway.ts
At minimum, the provider implementation should export functions covering the operations used by its module adapter:
- descriptor
- credential validation
- compute catalog loading
- DNS catalog loading if supported
- compute inventory
- compute provisioning
- compute start/stop/reboot/destroy if supported
- reinstall from image if supported
- DNS zone and record operations if supported
The registry currently expects functions in the same style as:
getHetznerProviderDescriptorvalidateHetznerCredentialsgetHetznerCatalogprovisionHetznerServerdestroyHetznerServergetOvhProviderDescriptorvalidateOvhCredentialsgetOvhCatalogprovisionOvhServerdestroyOvhServer
Important rule:
- Normalize the provider's API output into Legion's shared catalog and observed-resource contracts in the provider module.
- Do not leak provider SDK shapes into the rest of the app.
In practice that means your provider module is responsible for mapping upstream data into:
ProductCatalogDnsProductCatalogProviderObservedServerProviderObservedZone
Provider API writes and ambiguous failures
Each provider module should keep a small provider-specific write wrapper, following the current pattern:
hetznerWriteRequestovhWriteRequestscalewayWriteRequest
Every provider mutation should pass through that wrapper:
POSTPUTPATCHDELETE
The wrapper is responsible for:
- logging the write through
OperationLogger.logApiWrite - normalizing provider errors into
ApiErrorDetails - using
recoverAmbiguousProviderMutationfromsrc/main/cloud/providers/api-resilience.ts - returning the recovered value when a read-back proves that the provider already reached the desired state
Do not blindly retry non-idempotent writes after network errors, timeouts, fetch failed, or 5xx responses. Those errors are ambiguous: the provider may have accepted the request and failed while returning the response. Prefer read-after-failure verification.
The verifier passed as verifyAfterAmbiguousError should prove the desired final state, not just prove that a similar resource exists:
- create SSH key: find a key with the expected name and public key
- create server: find a server with the expected local Legion identity, name, and tags
- update firewall: read the firewall and compare the expected metadata/rules
- delete resource: treat already-gone/not-found as success
If a write cannot be verified with provider state, leave verifyAfterAmbiguousError out and let the original error bubble. Add a short code comment only when the lack of verification is non-obvious.
Cleanup and delete paths should be absence-driven and tolerant of resources that are already gone. They should not wait for cluster decommission when local provisioning state shows that the node never successfully joined the cluster.
Step 4. Register the provider module
Update src/main/cloud/providers/registry.ts.
The registry should stay small. Do not add provider-specific branches there.
You need to:
- Import the new provider module.
- Add it to
PROVIDER_MODULES.
Provider-specific capabilities, credentials, catalog policy, deployment profile, and adapter methods belong in the provider module, not in the registry.
If the provider talks to an API, its adapter must implement CloudProviderAdapter from src/main/cloud/providers/contracts.ts.
That means providing:
verifyCredentialscomputedns
Read the contract in src/main/cloud/providers/contracts.ts carefully. It is the actual integration boundary.
Development/test-only providers
If the provider is intentionally fake, keep that explicit instead of making it impersonate a real provider.
The current example is mock, displayed as MockKing24. Its provider module is enabled only when the GUI mock network is active:
LEGION_GUI_MOCK_NETWORK=1
For this kind of provider:
- still add shared provider types, a provider module, catalog policy, deployment profile, and renderer support if the UI should show it
- do not add live catalog or billed E2E coverage
- keep credentials local and deterministic if they configure failure modes such as error rate, latency, or seed
- make the provider module disabled by default
- keep stored mock resources hidden, not deleted, when the mock gate is inactive
- keep real-provider tests for real-provider semantics instead of treating the mock as a substitute for Hetzner/OVH/Scaleway coverage
Step 5. Add provider catalog policy
Add the provider's ProviderCatalogPolicy and attach it to the provider module.
Every integrated provider needs a ProviderCatalogPolicy entry:
defaultImage- optional
defaultRegion defaultProductSelector- any catalog metadata enrichment rules
This drives:
- default offer selection
- default image choice
- region defaults
- quote generation expectations
Do not leave the new provider out here. selectProvisioningOffer() relies on the normalized catalog shape, but the app also expects a sane default offer for provider-driven flows and tests.
If your provider needs a preferred starter offer, encode it here.
Legion's compute model now treats all instance lines as offers:
- fixed Hetzner-style instances are offers with zero variables
- configurable providers expose the same offer shape, but may add variables such as
diskGb
Step 6. Add provider deployment profile
Add the provider's ProviderDeploymentProfile and attach it to the provider module.
This is intentionally small right now. Today it carries:
defaultBootMode
NodeDeploymentRuntime uses this policy when nodes add does not explicitly pass --boot-mode.
That means a new compute provider must define its deployment profile before the headless node path is usable.
Current examples:
- Hetzner:
bios - OVH:
bios - Scaleway:
efi
Keep this provider-specific and minimal. The actual deploy pipeline stays shared.
Step 7. Make the headless engine accept the provider
Most of the engine is already generic once the registry knows about the provider.
Still review these files for hardcoded provider assumptions:
src/engine/runtime.tssrc/main/product-catalog.tssrc/main/ops/engine-ops.tssrc/main/cloud/provider-broker.tssrc/main/cloud/provider-persistence.tssrc/main/deployment/runtime.ts
Things to verify:
getProviderAdapter(providerKind)works with the new kind- catalog fetches work through
ProviderCatalogStore - provisioning can resolve the new provider configuration and offer
- the node runtime can choose a deployment profile for the provider
The main rule here is:
- if you find a provider-specific branch in orchestration code, remove it if possible
- if it must stay, make the new provider explicit there
Step 8. Add CLI support for provider configure
The headless CLI still contains provider-specific credential parsing.
Update:
src/main/cli/node-cli-service.tssrc/main/cli/cli.ts
Specifically:
- extend
ProviderConfigureRequesthandling - add CLI flags and env var resolution for the new provider's credentials
- extend
resolveProviderUpsert
Keep the CLI contract explicit and simple. Follow the current style:
- direct flag support
--...-envsupport- stable default env var names
If the provider is intended to support headless node deployment, this step is mandatory.
For development/test-only UI providers, CLI support is optional. If you skip it, make sure CLI parsing rejects the provider clearly instead of accepting a half-configured provider.
Step 9. Update renderer support if the provider should appear in the app UI
If the provider is headless-only for now, you can skip this section.
If it should be visible and editable in the Electron UI, the provider descriptor should do most of the work.
The renderer currently uses:
ProviderDescriptor.credentialFieldsfor the settings provider formProviderDescriptor.capabilitiesfor provider lists such as server creationProviderDescriptor.defaultProviderIdfor the provider's stable persisted id- renderer translations keyed by provider kind and credential field id
For a provider with ordinary credential fields, you should not add a provider-specific settings card.
Descriptor-driven provider forms
Review:
src/renderer/src/forms/provider-forms.tssrc/renderer/src/state/settings-actions.tssrc/renderer/src/components/SettingsDialog.sveltesrc/renderer/src/state/model.tssrc/renderer/src/state/view-flows.ts
These should remain generic:
- settings drafts are stored as
ProviderConfigDrafts - credential values are keyed by descriptor field id
- the renderer sends
{ kind, ...fieldValues }as credentials - the main-process provider module normalizes credentials and decides whether they are configured
Only add custom renderer code if the provider needs unusual credential UX that cannot be represented by descriptor fields.
Provider selection UI
Review provider selection surfaces if the provider has new capabilities:
src/renderer/src/components/workspace/ServerDialog.sveltesrc/renderer/src/components/workspace/DomainDialog.sveltesrc/renderer/src/components/workspace/ZoneDialog.svelte
Server creation is descriptor-driven for providers with computeCatalog.
Domain and DNS flows still use narrower domain/DNS form unions where provider semantics are more specific. Extend those deliberately if the new provider supports standard DNS or domain registration.
Manual provider UX
The manual provider does not have credentials. It exposes a static compute catalog and captures server connection details as offer variables:
- public IP or hostname
- SSH username
- password for the initial managed-key bootstrap
Manual server plans copy those offer variables into manualConfig at the main-process planning boundary and assume an Ubuntu-compatible host. The deployment path auto-detects privilege mode from the SSH user: root runs directly, other users are expected to have passwordless sudo.
Manual firewall handling is also user-mediated. The firewall planner keeps manual firewall reconciliation out of provider mutation paths; it does not silently pretend provider firewall reconciliation happened.
Translations
Update renderer i18n files for:
- provider labels in
src/renderer/src/i18n/locales/ - any provider-specific credential field labels that are not already covered by generic field ids
- any UI copy you add
All user-facing strings must be added for:
en-GBde-DEes-ES
Step 10. Add live catalog smoke coverage
Every integrated provider should have a read-only live test before destructive node deployment coverage.
Current examples:
tests/integration/hcloud.catalog.test.tstests/integration/ovh.smoke.test.tstests/integration/scaleway.smoke.test.ts
Add a provider-specific integration smoke test that validates at least:
- authentication works
- the compute catalog loads
- the DNS catalog loads, if supported
- the normalized catalog has sensible defaults
selectProvisioningOffer()works with the provider's IDs
If the provider depends on a generated client or provider SDK schema surface, check the API endpoints you rely on directly in this test, similar to OVH's schema assertions.
Then wire it into package.json:
- add a dedicated script
- include it in
test:integration:liveif appropriate
Also update TESTING.md.
Step 11. Add destructive headless node deployment coverage
Do not create a new test harness per provider.
Use the shared live harness:
tests/billed/e2e/cli-live-test-harness.ts
Create only a thin provider wrapper, following:
tests/billed/e2e/hetzner-cli.test.tstests/billed/e2e/ovh-cli.test.tstests/billed/e2e/scaleway-cli.test.ts
Your wrapper should provide:
- required credential env vars
- optional provider-specific env vars such as instance/offer selection or boot mode
providerConfigureArgsproviderEnv- provider label/kind
The harness already handles:
- isolated
HOME config initprovider configurenodes addnodes list --jsonnodes destroy- live stdout/stderr streaming
LEGION_CC_E2E_KEEP_INSTANCE=1- exporting the managed SSH key and printing an SSH command
Also update:
package.jsonTESTING.md
Step 12. Validate the provider step by step
Do not jump straight to the destructive E2E.
Use this order.
1. Static verification
Run:
npm run lint
npm run typecheck:node
If the renderer was touched:
npm run typecheck
2. Mocked/local tests
Run the most relevant existing suites:
npm run test:unit
npm run test:integration:mock
At minimum, make sure any provider-related unit tests still pass:
tests/unit/provider-broker.test.tstests/unit/cli.test.tstests/unit/node-deployment-runtime.test.ts
Add targeted unit tests if the new provider introduces special selection or credential logic.
For provider mutations, also add tests around ambiguous errors for the high-risk paths:
- create server
- delete server
- create managed SSH key
- delete managed SSH key
- firewall/security-group create, update, and delete when supported
The expected behaviour is not "retry until it works"; it is "read provider state and accept success only when the desired final state is visible".
3. Live catalog smoke test
Run the new provider's read-only live integration test first.
This should confirm:
- credentials are valid
- catalog normalization works
- a default offer is selectable
Only after this passes should you attempt real deployment.
4. Destructive headless E2E
Run the provider's live CLI test.
For a new provider, prefer the first debug run with:
export LEGION_CC_E2E_KEEP_INSTANCE=1
That keeps the VM and prints the exported managed SSH key path and a ready-to-use ssh command.
This is the final validation target for the provider addition.
Full checklist
Use this as the compact review list before calling the provider done.
src/shared/app.tssrc/main/state/store.tssrc/main/state/schema.tssrc/main/cloud/providers/<provider>.tssrc/main/cloud/providers/<provider>-module.tssrc/main/cloud/providers/registry.tssrc/shared/provider-catalog-policy.tssrc/main/cli/cli.tssrc/main/cli/node-cli-service.ts- renderer form/model/i18n files if the provider should appear in the UI
tests/integration/<provider>.smoke.test.tstests/billed/e2e/<provider>-cli.test.tspackage.jsonTESTING.md
Current gotchas
These are easy to miss in the current codebase.
ProviderKindand related unions are duplicated by design across several helper types. Update all affected unions, not just the top-level one.- Provider defaults and credential normalization live in the provider module. If persisted state behaves strangely, check the module's credential contract first.
node-cli-service.tsstill hardcodes credential/env handling for supported providers.- The renderer still contains explicit provider branches, especially where DNS/domain support differs by provider. A provider can be fully usable headlessly before the UI knows about it.
- The shared deployment path is generic, but boot defaults are provider-specific via the provider module's deployment profile.
- New provider tests should reuse
cli-live-test-harness.ts, not fork it.
Suggested development order
If you want the shortest path to a working integration:
- shared types
- provider module
- registry
- state normalization/defaults
- catalog policy
- deployment profile
- CLI provider configure
- read-only live smoke test
- destructive live CLI E2E
- renderer support
That gets the infrastructure path working first and keeps UI work separate.