Files

T

self 3b38221abe refactor: simplify node recovery actions

Remove the old restore/operator-task recovery model and make actor/execution state the source of truth. Align GUI and CLI node actions around start, stop, reboot, retry, and reinstall, with provider image reset folded into reinstall when supported.

Update statechart and architecture docs to match the reinstall/repair model.

2026-06-17 14:09:18 +02:00

20 KiB

Raw Permalink Blame History

Adding a Provider

This document explains how to add a new provider to Legion's current architecture.

It is written against the current codebase, not an idealized future state. That means it includes the boring synchronization work too: shared types, provider-module wiring, catalog policy, headless CLI, optional renderer support, tests, and docs.

The fastest way to get this right is:

Treat OVH as the current reference for a "full integrated provider".
Treat Hetzner as the simpler reference for a "well-exercised provider".
Reuse the shared deployment/runtime path. Do not create a separate deployment implementation for the new provider.

Scope

Legion has provider modules with different surfaces:

API-backed providers expose cloud surfaces such as compute, DNS, domain, or firewall adapters.
manual is also a provider module. Its implementation is user-mediated instead of API-mediated, so it currently has metadata, manual server import, and user-task firewall capabilities but no cloud adapter.

This guide is mostly for adding a new API-backed cloud provider, meaning:

real credentials
compute catalog support
compute provisioning
optional DNS service integration
shared node deployment via the existing NBDE path

Current architecture

The important boundary is:

provider-specific API logic lives in src/main/cloud/providers/*.ts
each provider registers through a ProviderModule from src/main/cloud/providers/*-module.ts
shared provider mutation recovery lives in src/main/cloud/providers/api-resilience.ts
provider lookup, ordering, and enable/disable filtering live in src/main/cloud/providers/registry.ts
shared orchestration stays generic in:
- src/main/ops/engine-ops.ts
- src/main/resources/*/reconcile.ts
- src/main/cloud/provider-broker.ts
- src/main/cloud/provider-persistence.ts
- src/main/deployment/runtime.ts
- src/main/deployment/service.ts
shared types and provider unions live in src/shared/app.ts
catalog policy and deployment defaults are attached to the provider module
catalog policy types live in src/shared/provider-catalog-policy.ts
deployment profile types live in src/shared/provider-deployment-policy.ts
the headless engine surface lives in src/engine/runtime.ts
CLI wiring currently goes through:
- src/main/cli/cli.ts
- src/main/cli/node-cli-service.ts

That means adding a provider should mostly be:

one new provider API implementation file, if it talks to an external API
one new provider module file
adding that module to the registry's module list
provider credential surface wiring
tests

The deployment flow after the VM exists should remain shared.

Step 1. Extend the shared provider model

Start in src/shared/app.ts.

Update the provider unions first:

ProviderKind
IntegratedProviderKind
any derived unions that should include the new provider:
- ServerProviderKind
- StandardDnsProviderKind

Then add the new provider credential interface and include it in ProviderCredentials.

Use OVH and Hetzner as the shape references:

HetznerProviderCredentials
OvhProviderCredentials

You will also want to review any request types that implicitly assume the existing kinds, especially:

ProductCatalogRequest
ProviderConfig
ProviderDescriptor

If the provider needs additional project-level data like projectId, keep it explicit and stable in the shared type.

Step 2. Add the provider module contract implementation

Create a provider-module file in src/main/cloud/providers.

Current examples:

src/main/cloud/providers/hetzner-module.ts
src/main/cloud/providers/ovh-module.ts
src/main/cloud/providers/scaleway-module.ts
src/main/cloud/providers/mock-module.ts
src/main/cloud/providers/manual-module.ts

The module implements ProviderModule from src/main/cloud/providers/provider-module.ts.

It owns:

kind
descriptor
capabilities
credential defaults, normalization, resolution, and configured checks
optional catalogPolicy
optional deploymentProfile
optional API-backed adapter surfaces
optional enabled(env) gate

If the provider is disabled by enabled(env), Legion hides its persisted configuration and resources from app snapshots without deleting them.

Step 3. Implement provider API logic in `src/main/cloud/providers`

For API-backed providers, create a provider implementation file following the current pattern:

src/main/cloud/providers/hetzner.ts
src/main/cloud/providers/ovh.ts
src/main/cloud/providers/scaleway.ts

At minimum, the provider implementation should export functions covering the operations used by its module adapter:

descriptor
credential validation
compute catalog loading
DNS catalog loading if supported
compute inventory
compute provisioning
compute start/stop/reboot/destroy if supported
reinstall from image if supported
DNS zone and record operations if supported

The registry currently expects functions in the same style as:

getHetznerProviderDescriptor
validateHetznerCredentials
getHetznerCatalog
provisionHetznerServer
destroyHetznerServer
getOvhProviderDescriptor
validateOvhCredentials
getOvhCatalog
provisionOvhServer
destroyOvhServer

Important rule:

Normalize the provider's API output into Legion's shared catalog and observed-resource contracts in the provider module.
Do not leak provider SDK shapes into the rest of the app.

In practice that means your provider module is responsible for mapping upstream data into:

ProductCatalog
DnsProductCatalog
ProviderObservedServer
ProviderObservedZone

Provider API writes and ambiguous failures

Each provider module should keep a small provider-specific write wrapper, following the current pattern:

hetznerWriteRequest
ovhWriteRequest
scalewayWriteRequest

Every provider mutation should pass through that wrapper:

POST
PUT
PATCH
DELETE

The wrapper is responsible for:

logging the write through OperationLogger.logApiWrite
normalizing provider errors into ApiErrorDetails
using recoverAmbiguousProviderMutation from src/main/cloud/providers/api-resilience.ts
returning the recovered value when a read-back proves that the provider already reached the desired state

Do not blindly retry non-idempotent writes after network errors, timeouts, fetch failed, or 5xx responses. Those errors are ambiguous: the provider may have accepted the request and failed while returning the response. Prefer read-after-failure verification.

The verifier passed as verifyAfterAmbiguousError should prove the desired final state, not just prove that a similar resource exists:

create SSH key: find a key with the expected name and public key
create server: find a server with the expected local Legion identity, name, and tags
update firewall: read the firewall and compare the expected metadata/rules
delete resource: treat already-gone/not-found as success

If a write cannot be verified with provider state, leave verifyAfterAmbiguousError out and let the original error bubble. Add a short code comment only when the lack of verification is non-obvious.

Cleanup and delete paths should be absence-driven and tolerant of resources that are already gone. They should not wait for cluster decommission when local provisioning state shows that the node never successfully joined the cluster.

Step 4. Register the provider module

Update src/main/cloud/providers/registry.ts.

The registry should stay small. Do not add provider-specific branches there.

You need to:

Import the new provider module.
Add it to PROVIDER_MODULES.

Provider-specific capabilities, credentials, catalog policy, deployment profile, and adapter methods belong in the provider module, not in the registry.

If the provider talks to an API, its adapter must implement CloudProviderAdapter from src/main/cloud/providers/contracts.ts.

That means providing:

verifyCredentials
compute
dns

Read the contract in src/main/cloud/providers/contracts.ts carefully. It is the actual integration boundary.

Development/test-only providers

If the provider is intentionally fake, keep that explicit instead of making it impersonate a real provider.

The current example is mock, displayed as MockKing24. Its provider module is enabled only when the GUI mock network is active:

LEGION_GUI_MOCK_NETWORK=1

For this kind of provider:

still add shared provider types, a provider module, catalog policy, deployment profile, and renderer support if the UI should show it
do not add live catalog or billed E2E coverage
keep credentials local and deterministic if they configure failure modes such as error rate, latency, or seed
make the provider module disabled by default
keep stored mock resources hidden, not deleted, when the mock gate is inactive
keep real-provider tests for real-provider semantics instead of treating the mock as a substitute for Hetzner/OVH/Scaleway coverage

Step 5. Add provider catalog policy

Add the provider's ProviderCatalogPolicy and attach it to the provider module.

Every integrated provider needs a ProviderCatalogPolicy entry:

defaultImage
optional defaultRegion
defaultProductSelector
any catalog metadata enrichment rules

This drives:

default offer selection
default image choice
region defaults
quote generation expectations

Do not leave the new provider out here. selectProvisioningOffer() relies on the normalized catalog shape, but the app also expects a sane default offer for provider-driven flows and tests.

If your provider needs a preferred starter offer, encode it here.

Legion's compute model now treats all instance lines as offers:

fixed Hetzner-style instances are offers with zero variables
configurable providers expose the same offer shape, but may add variables such as diskGb

Step 6. Add provider deployment profile

Add the provider's ProviderDeploymentProfile and attach it to the provider module.

This is intentionally small right now. Today it carries:

defaultBootMode

NodeDeploymentRuntime uses this policy when nodes add does not explicitly pass --boot-mode.

That means a new compute provider must define its deployment profile before the headless node path is usable.

Current examples:

Hetzner: bios
OVH: bios
Scaleway: efi

Keep this provider-specific and minimal. The actual deploy pipeline stays shared.

Step 7. Make the headless engine accept the provider

Most of the engine is already generic once the registry knows about the provider.

Still review these files for hardcoded provider assumptions:

src/engine/runtime.ts
src/main/product-catalog.ts
src/main/ops/engine-ops.ts
src/main/cloud/provider-broker.ts
src/main/cloud/provider-persistence.ts
src/main/deployment/runtime.ts

Things to verify:

getProviderAdapter(providerKind) works with the new kind
catalog fetches work through ProviderCatalogStore
provisioning can resolve the new provider configuration and offer
the node runtime can choose a deployment profile for the provider

The main rule here is:

if you find a provider-specific branch in orchestration code, remove it if possible
if it must stay, make the new provider explicit there

Step 8. Add CLI support for `provider configure`

The headless CLI still contains provider-specific credential parsing.

Update:

src/main/cli/node-cli-service.ts
src/main/cli/cli.ts

Specifically:

extend ProviderConfigureRequest handling
add CLI flags and env var resolution for the new provider's credentials
extend resolveProviderUpsert

Keep the CLI contract explicit and simple. Follow the current style:

direct flag support
--...-env support
stable default env var names

If the provider is intended to support headless node deployment, this step is mandatory.

For development/test-only UI providers, CLI support is optional. If you skip it, make sure CLI parsing rejects the provider clearly instead of accepting a half-configured provider.

Step 9. Update renderer support if the provider should appear in the app UI

If the provider is headless-only for now, you can skip this section.

If it should be visible and editable in the Electron UI, the provider descriptor should do most of the work.

The renderer currently uses:

ProviderDescriptor.credentialFields for the settings provider form
ProviderDescriptor.capabilities for provider lists such as server creation
ProviderDescriptor.defaultProviderId for the provider's stable persisted id
renderer translations keyed by provider kind and credential field id

For a provider with ordinary credential fields, you should not add a provider-specific settings card.

Descriptor-driven provider forms

Review:

src/renderer/src/forms/provider-forms.ts
src/renderer/src/state/settings-actions.ts
src/renderer/src/components/SettingsDialog.svelte
src/renderer/src/state/model.ts
src/renderer/src/state/view-flows.ts

These should remain generic:

settings drafts are stored as ProviderConfigDrafts
credential values are keyed by descriptor field id
the renderer sends { kind, ...fieldValues } as credentials
the main-process provider module normalizes credentials and decides whether they are configured

Only add custom renderer code if the provider needs unusual credential UX that cannot be represented by descriptor fields.

Provider selection UI

Review provider selection surfaces if the provider has new capabilities:

src/renderer/src/components/workspace/ServerDialog.svelte
src/renderer/src/components/workspace/DomainDialog.svelte
src/renderer/src/components/workspace/ZoneDialog.svelte

Server creation is descriptor-driven for providers with computeCatalog.

Domain and DNS flows still use narrower domain/DNS form unions where provider semantics are more specific. Extend those deliberately if the new provider supports standard DNS or domain registration.

Manual provider UX

The manual provider does not have credentials. It exposes a static compute catalog and captures server connection details as offer variables:

public IP or hostname
SSH username
password for the initial managed-key bootstrap

Manual server plans copy those offer variables into manualConfig at the main-process planning boundary and assume an Ubuntu-compatible host. The deployment path auto-detects privilege mode from the SSH user: root runs directly, other users are expected to have passwordless sudo.

Manual firewall handling is also user-mediated. The firewall planner keeps manual firewall reconciliation out of provider mutation paths; it does not silently pretend provider firewall reconciliation happened.

Translations

Update renderer i18n files for:

provider labels in src/renderer/src/i18n/locales/
any provider-specific credential field labels that are not already covered by generic field ids
any UI copy you add

All user-facing strings must be added for:

en-GB
de-DE
es-ES

Step 10. Add live catalog smoke coverage

Every integrated provider should have a read-only live test before destructive node deployment coverage.

Current examples:

tests/integration/hcloud.catalog.test.ts
tests/integration/ovh.smoke.test.ts
tests/integration/scaleway.smoke.test.ts

Add a provider-specific integration smoke test that validates at least:

authentication works
the compute catalog loads
the DNS catalog loads, if supported
the normalized catalog has sensible defaults
selectProvisioningOffer() works with the provider's IDs

If the provider depends on a generated client or provider SDK schema surface, check the API endpoints you rely on directly in this test, similar to OVH's schema assertions.

Then wire it into package.json:

add a dedicated script
include it in test:integration:live if appropriate

Also update TESTING.md.

Step 11. Add destructive headless node deployment coverage

Do not create a new test harness per provider.

Use the shared live harness:

tests/billed/e2e/cli-live-test-harness.ts

Create only a thin provider wrapper, following:

tests/billed/e2e/hetzner-cli.test.ts
tests/billed/e2e/ovh-cli.test.ts
tests/billed/e2e/scaleway-cli.test.ts

Your wrapper should provide:

required credential env vars
optional provider-specific env vars such as instance/offer selection or boot mode
providerConfigureArgs
providerEnv
provider label/kind

The harness already handles:

isolated HOME
config init
provider configure
nodes add
nodes list --json
nodes destroy
live stdout/stderr streaming
LEGION_CC_E2E_KEEP_INSTANCE=1
exporting the managed SSH key and printing an SSH command

Also update:

package.json
TESTING.md

Step 12. Validate the provider step by step

Do not jump straight to the destructive E2E.

Use this order.

1. Static verification

Run:

npm run lint
npm run typecheck:node

If the renderer was touched:

npm run typecheck

2. Mocked/local tests

Run the most relevant existing suites:

npm run test:unit
npm run test:integration:mock

At minimum, make sure any provider-related unit tests still pass:

tests/unit/provider-broker.test.ts
tests/unit/cli.test.ts
tests/unit/node-deployment-runtime.test.ts

Add targeted unit tests if the new provider introduces special selection or credential logic.

For provider mutations, also add tests around ambiguous errors for the high-risk paths:

create server
delete server
create managed SSH key
delete managed SSH key
firewall/security-group create, update, and delete when supported

The expected behaviour is not "retry until it works"; it is "read provider state and accept success only when the desired final state is visible".

3. Live catalog smoke test

Run the new provider's read-only live integration test first.

This should confirm:

credentials are valid
catalog normalization works
a default offer is selectable

Only after this passes should you attempt real deployment.

4. Destructive headless E2E

Run the provider's live CLI test.

For a new provider, prefer the first debug run with:

export LEGION_CC_E2E_KEEP_INSTANCE=1

That keeps the VM and prints the exported managed SSH key path and a ready-to-use ssh command.

This is the final validation target for the provider addition.

Full checklist

Use this as the compact review list before calling the provider done.

src/shared/app.ts
src/main/state/store.ts
src/main/state/schema.ts
src/main/cloud/providers/<provider>.ts
src/main/cloud/providers/<provider>-module.ts
src/main/cloud/providers/registry.ts
src/shared/provider-catalog-policy.ts
src/main/cli/cli.ts
src/main/cli/node-cli-service.ts
renderer form/model/i18n files if the provider should appear in the UI
tests/integration/<provider>.smoke.test.ts
tests/billed/e2e/<provider>-cli.test.ts
package.json
TESTING.md

Current gotchas

These are easy to miss in the current codebase.

ProviderKind and related unions are duplicated by design across several helper types. Update all affected unions, not just the top-level one.
Provider defaults and credential normalization live in the provider module. If persisted state behaves strangely, check the module's credential contract first.
node-cli-service.ts still hardcodes credential/env handling for supported providers.
The renderer still contains explicit provider branches, especially where DNS/domain support differs by provider. A provider can be fully usable headlessly before the UI knows about it.
The shared deployment path is generic, but boot defaults are provider-specific via the provider module's deployment profile.
New provider tests should reuse cli-live-test-harness.ts, not fork it.

Suggested development order

If you want the shortest path to a working integration:

shared types
provider module
registry
state normalization/defaults
catalog policy
deployment profile
CLI provider configure
read-only live smoke test
destructive live CLI E2E
renderer support

That gets the infrastructure path working first and keeps UI work separate.

20 KiB Raw Permalink Blame History

Adding a Provider

Scope

Current architecture

Step 1. Extend the shared provider model

Step 2. Add the provider module contract implementation

Step 3. Implement provider API logic in src/main/cloud/providers

Provider API writes and ambiguous failures

Step 4. Register the provider module

Development/test-only providers

Step 5. Add provider catalog policy

Step 6. Add provider deployment profile

Step 7. Make the headless engine accept the provider

Step 8. Add CLI support for provider configure

Step 9. Update renderer support if the provider should appear in the app UI

Descriptor-driven provider forms

Provider selection UI

Manual provider UX

Translations

Step 10. Add live catalog smoke coverage

Step 11. Add destructive headless node deployment coverage

Step 12. Validate the provider step by step

1. Static verification

2. Mocked/local tests

3. Live catalog smoke test

4. Destructive headless E2E

Full checklist

Current gotchas

Suggested development order

20 KiB

Raw Permalink Blame History

Step 3. Implement provider API logic in `src/main/cloud/providers`

Step 8. Add CLI support for `provider configure`