# Adding a Provider This document explains how to add a new provider to Legion's current architecture. It is written against the current codebase, not an idealized future state. That means it includes the boring synchronization work too: shared types, provider-module wiring, catalog policy, headless CLI, optional renderer support, tests, and docs. The fastest way to get this right is: 1. Treat OVH as the current reference for a "full integrated provider". 2. Treat Hetzner as the simpler reference for a "well-exercised provider". 3. Reuse the shared deployment/runtime path. Do not create a separate deployment implementation for the new provider. ## Scope Legion has provider modules with different surfaces: - API-backed providers expose cloud surfaces such as compute, DNS, domain, or firewall adapters. - `manual` is also a provider module. Its implementation is user-mediated instead of API-mediated, so it currently has metadata, manual server import, and user-task firewall capabilities but no cloud adapter. This guide is mostly for adding a new API-backed cloud provider, meaning: - real credentials - compute catalog support - compute provisioning - optional DNS service integration - shared node deployment via the existing NBDE path ## Current architecture The important boundary is: - provider-specific API logic lives in `src/main/cloud/providers/*.ts` - each provider registers through a `ProviderModule` from `src/main/cloud/providers/*-module.ts` - shared provider mutation recovery lives in `src/main/cloud/providers/api-resilience.ts` - provider lookup, ordering, and enable/disable filtering live in `src/main/cloud/providers/registry.ts` - shared orchestration stays generic in: - `src/main/ops/engine-ops.ts` - `src/main/resources/*/reconcile.ts` - `src/main/cloud/provider-broker.ts` - `src/main/cloud/provider-persistence.ts` - `src/main/deployment/runtime.ts` - `src/main/deployment/service.ts` - shared types and provider unions live in `src/shared/app.ts` - catalog policy and deployment defaults are attached to the provider module - catalog policy types live in `src/shared/provider-catalog-policy.ts` - deployment profile types live in `src/shared/provider-deployment-policy.ts` - the headless engine surface lives in `src/engine/runtime.ts` - CLI wiring currently goes through: - `src/main/cli/cli.ts` - `src/main/cli/node-cli-service.ts` That means adding a provider should mostly be: - one new provider API implementation file, if it talks to an external API - one new provider module file - adding that module to the registry's module list - provider credential surface wiring - tests The deployment flow after the VM exists should remain shared. ## Step 1. Extend the shared provider model Start in `src/shared/app.ts`. Update the provider unions first: - `ProviderKind` - `IntegratedProviderKind` - any derived unions that should include the new provider: - `ServerProviderKind` - `StandardDnsProviderKind` Then add the new provider credential interface and include it in `ProviderCredentials`. Use OVH and Hetzner as the shape references: - `HetznerProviderCredentials` - `OvhProviderCredentials` You will also want to review any request types that implicitly assume the existing kinds, especially: - `ProductCatalogRequest` - `ProviderConfig` - `ProviderDescriptor` If the provider needs additional project-level data like `projectId`, keep it explicit and stable in the shared type. ## Step 2. Add the provider module contract implementation Create a provider-module file in `src/main/cloud/providers`. Current examples: - `src/main/cloud/providers/hetzner-module.ts` - `src/main/cloud/providers/ovh-module.ts` - `src/main/cloud/providers/scaleway-module.ts` - `src/main/cloud/providers/mock-module.ts` - `src/main/cloud/providers/manual-module.ts` The module implements `ProviderModule` from `src/main/cloud/providers/provider-module.ts`. It owns: - `kind` - `descriptor` - `capabilities` - credential defaults, normalization, resolution, and configured checks - optional `catalogPolicy` - optional `deploymentProfile` - optional API-backed adapter surfaces - optional `enabled(env)` gate If the provider is disabled by `enabled(env)`, Legion hides its persisted configuration and resources from app snapshots without deleting them. ## Step 3. Implement provider API logic in `src/main/cloud/providers` For API-backed providers, create a provider implementation file following the current pattern: - `src/main/cloud/providers/hetzner.ts` - `src/main/cloud/providers/ovh.ts` - `src/main/cloud/providers/scaleway.ts` At minimum, the provider implementation should export functions covering the operations used by its module adapter: - descriptor - credential validation - compute catalog loading - DNS catalog loading if supported - compute inventory - compute provisioning - compute start/stop/reboot/destroy if supported - reinstall from image if supported - DNS zone and record operations if supported The registry currently expects functions in the same style as: - `getHetznerProviderDescriptor` - `validateHetznerCredentials` - `getHetznerCatalog` - `provisionHetznerServer` - `destroyHetznerServer` - `getOvhProviderDescriptor` - `validateOvhCredentials` - `getOvhCatalog` - `provisionOvhServer` - `destroyOvhServer` Important rule: - Normalize the provider's API output into Legion's shared catalog and observed-resource contracts in the provider module. - Do not leak provider SDK shapes into the rest of the app. In practice that means your provider module is responsible for mapping upstream data into: - `ProductCatalog` - `DnsProductCatalog` - `ProviderObservedServer` - `ProviderObservedZone` ### Provider API writes and ambiguous failures Each provider module should keep a small provider-specific write wrapper, following the current pattern: - `hetznerWriteRequest` - `ovhWriteRequest` - `scalewayWriteRequest` Every provider mutation should pass through that wrapper: - `POST` - `PUT` - `PATCH` - `DELETE` The wrapper is responsible for: - logging the write through `OperationLogger.logApiWrite` - normalizing provider errors into `ApiErrorDetails` - using `recoverAmbiguousProviderMutation` from `src/main/cloud/providers/api-resilience.ts` - returning the recovered value when a read-back proves that the provider already reached the desired state Do not blindly retry non-idempotent writes after network errors, timeouts, `fetch failed`, or 5xx responses. Those errors are ambiguous: the provider may have accepted the request and failed while returning the response. Prefer read-after-failure verification. The verifier passed as `verifyAfterAmbiguousError` should prove the desired final state, not just prove that a similar resource exists: - create SSH key: find a key with the expected name and public key - create server: find a server with the expected local Legion identity, name, and tags - update firewall: read the firewall and compare the expected metadata/rules - delete resource: treat already-gone/not-found as success If a write cannot be verified with provider state, leave `verifyAfterAmbiguousError` out and let the original error bubble. Add a short code comment only when the lack of verification is non-obvious. Cleanup and delete paths should be absence-driven and tolerant of resources that are already gone. They should not wait for cluster decommission when local provisioning state shows that the node never successfully joined the cluster. ## Step 4. Register the provider module Update `src/main/cloud/providers/registry.ts`. The registry should stay small. Do not add provider-specific branches there. You need to: 1. Import the new provider module. 2. Add it to `PROVIDER_MODULES`. Provider-specific capabilities, credentials, catalog policy, deployment profile, and adapter methods belong in the provider module, not in the registry. If the provider talks to an API, its `adapter` must implement `CloudProviderAdapter` from `src/main/cloud/providers/contracts.ts`. That means providing: - `verifyCredentials` - `compute` - `dns` Read the contract in `src/main/cloud/providers/contracts.ts` carefully. It is the actual integration boundary. ### Development/test-only providers If the provider is intentionally fake, keep that explicit instead of making it impersonate a real provider. The current example is `mock`, displayed as `MockKing24`. Its provider module is enabled only when the GUI mock network is active: - `LEGION_GUI_MOCK_NETWORK=1` For this kind of provider: - still add shared provider types, a provider module, catalog policy, deployment profile, and renderer support if the UI should show it - do not add live catalog or billed E2E coverage - keep credentials local and deterministic if they configure failure modes such as error rate, latency, or seed - make the provider module disabled by default - keep stored mock resources hidden, not deleted, when the mock gate is inactive - keep real-provider tests for real-provider semantics instead of treating the mock as a substitute for Hetzner/OVH/Scaleway coverage ## Step 5. Add provider catalog policy Add the provider's `ProviderCatalogPolicy` and attach it to the provider module. Every integrated provider needs a `ProviderCatalogPolicy` entry: - `defaultImage` - optional `defaultRegion` - `defaultProductSelector` - any catalog metadata enrichment rules This drives: - default offer selection - default image choice - region defaults - quote generation expectations Do not leave the new provider out here. `selectProvisioningOffer()` relies on the normalized catalog shape, but the app also expects a sane default offer for provider-driven flows and tests. If your provider needs a preferred starter offer, encode it here. Legion's compute model now treats all instance lines as offers: - fixed Hetzner-style instances are offers with zero variables - configurable providers expose the same offer shape, but may add variables such as `diskGb` ## Step 6. Add provider deployment profile Add the provider's `ProviderDeploymentProfile` and attach it to the provider module. This is intentionally small right now. Today it carries: - `defaultBootMode` `NodeDeploymentRuntime` uses this policy when `nodes add` does not explicitly pass `--boot-mode`. That means a new compute provider must define its deployment profile before the headless node path is usable. Current examples: - Hetzner: `bios` - OVH: `bios` - Scaleway: `efi` Keep this provider-specific and minimal. The actual deploy pipeline stays shared. ## Step 7. Make the headless engine accept the provider Most of the engine is already generic once the registry knows about the provider. Still review these files for hardcoded provider assumptions: - `src/engine/runtime.ts` - `src/main/product-catalog.ts` - `src/main/ops/engine-ops.ts` - `src/main/cloud/provider-broker.ts` - `src/main/cloud/provider-persistence.ts` - `src/main/deployment/runtime.ts` Things to verify: - `getProviderAdapter(providerKind)` works with the new kind - catalog fetches work through `ProviderCatalogStore` - provisioning can resolve the new provider configuration and offer - the node runtime can choose a deployment profile for the provider The main rule here is: - if you find a provider-specific branch in orchestration code, remove it if possible - if it must stay, make the new provider explicit there ## Step 8. Add CLI support for `provider configure` The headless CLI still contains provider-specific credential parsing. Update: - `src/main/cli/node-cli-service.ts` - `src/main/cli/cli.ts` Specifically: - extend `ProviderConfigureRequest` handling - add CLI flags and env var resolution for the new provider's credentials - extend `resolveProviderUpsert` Keep the CLI contract explicit and simple. Follow the current style: - direct flag support - `--...-env` support - stable default env var names If the provider is intended to support headless node deployment, this step is mandatory. For development/test-only UI providers, CLI support is optional. If you skip it, make sure CLI parsing rejects the provider clearly instead of accepting a half-configured provider. ## Step 9. Update renderer support if the provider should appear in the app UI If the provider is headless-only for now, you can skip this section. If it should be visible and editable in the Electron UI, the provider descriptor should do most of the work. The renderer currently uses: - `ProviderDescriptor.credentialFields` for the settings provider form - `ProviderDescriptor.capabilities` for provider lists such as server creation - `ProviderDescriptor.defaultProviderId` for the provider's stable persisted id - renderer translations keyed by provider kind and credential field id For a provider with ordinary credential fields, you should not add a provider-specific settings card. ### Descriptor-driven provider forms Review: - `src/renderer/src/forms/provider-forms.ts` - `src/renderer/src/state/settings-actions.ts` - `src/renderer/src/components/SettingsDialog.svelte` - `src/renderer/src/state/model.ts` - `src/renderer/src/state/view-flows.ts` These should remain generic: - settings drafts are stored as `ProviderConfigDrafts` - credential values are keyed by descriptor field id - the renderer sends `{ kind, ...fieldValues }` as credentials - the main-process provider module normalizes credentials and decides whether they are configured Only add custom renderer code if the provider needs unusual credential UX that cannot be represented by descriptor fields. ### Provider selection UI Review provider selection surfaces if the provider has new capabilities: - `src/renderer/src/components/workspace/ServerDialog.svelte` - `src/renderer/src/components/workspace/DomainDialog.svelte` - `src/renderer/src/components/workspace/ZoneDialog.svelte` Server creation is descriptor-driven for providers with `computeCatalog`. Domain and DNS flows still use narrower domain/DNS form unions where provider semantics are more specific. Extend those deliberately if the new provider supports standard DNS or domain registration. ### Manual provider UX The manual provider does not have credentials. It exposes a static compute catalog and captures server connection details as offer variables: - public IP or hostname - SSH username - password for the initial managed-key bootstrap Manual server plans copy those offer variables into `manualConfig` at the main-process planning boundary and assume an Ubuntu-compatible host. The deployment path auto-detects privilege mode from the SSH user: `root` runs directly, other users are expected to have passwordless sudo. Manual firewall handling is also user-mediated. The firewall planner keeps manual firewall reconciliation out of provider mutation paths; it does not silently pretend provider firewall reconciliation happened. ### Translations Update renderer i18n files for: - provider labels in `src/renderer/src/i18n/locales/` - any provider-specific credential field labels that are not already covered by generic field ids - any UI copy you add All user-facing strings must be added for: - `en-GB` - `de-DE` - `es-ES` ## Step 10. Add live catalog smoke coverage Every integrated provider should have a read-only live test before destructive node deployment coverage. Current examples: - `tests/integration/hcloud.catalog.test.ts` - `tests/integration/ovh.smoke.test.ts` - `tests/integration/scaleway.smoke.test.ts` Add a provider-specific integration smoke test that validates at least: 1. authentication works 2. the compute catalog loads 3. the DNS catalog loads, if supported 4. the normalized catalog has sensible defaults 5. `selectProvisioningOffer()` works with the provider's IDs If the provider depends on a generated client or provider SDK schema surface, check the API endpoints you rely on directly in this test, similar to OVH's schema assertions. Then wire it into `package.json`: - add a dedicated script - include it in `test:integration:live` if appropriate Also update `TESTING.md`. ## Step 11. Add destructive headless node deployment coverage Do not create a new test harness per provider. Use the shared live harness: - `tests/billed/e2e/cli-live-test-harness.ts` Create only a thin provider wrapper, following: - `tests/billed/e2e/hetzner-cli.test.ts` - `tests/billed/e2e/ovh-cli.test.ts` - `tests/billed/e2e/scaleway-cli.test.ts` Your wrapper should provide: - required credential env vars - optional provider-specific env vars such as instance/offer selection or boot mode - `providerConfigureArgs` - `providerEnv` - provider label/kind The harness already handles: - isolated `HOME` - `config init` - `provider configure` - `nodes add` - `nodes list --json` - `nodes destroy` - live stdout/stderr streaming - `LEGION_CC_E2E_KEEP_INSTANCE=1` - exporting the managed SSH key and printing an SSH command Also update: - `package.json` - `TESTING.md` ## Step 12. Validate the provider step by step Do not jump straight to the destructive E2E. Use this order. ### 1. Static verification Run: ```bash npm run lint npm run typecheck:node ``` If the renderer was touched: ```bash npm run typecheck ``` ### 2. Mocked/local tests Run the most relevant existing suites: ```bash npm run test:unit npm run test:integration:mock ``` At minimum, make sure any provider-related unit tests still pass: - `tests/unit/provider-broker.test.ts` - `tests/unit/cli.test.ts` - `tests/unit/node-deployment-runtime.test.ts` Add targeted unit tests if the new provider introduces special selection or credential logic. For provider mutations, also add tests around ambiguous errors for the high-risk paths: - create server - delete server - create managed SSH key - delete managed SSH key - firewall/security-group create, update, and delete when supported The expected behaviour is not "retry until it works"; it is "read provider state and accept success only when the desired final state is visible". ### 3. Live catalog smoke test Run the new provider's read-only live integration test first. This should confirm: - credentials are valid - catalog normalization works - a default offer is selectable Only after this passes should you attempt real deployment. ### 4. Destructive headless E2E Run the provider's live CLI test. For a new provider, prefer the first debug run with: ```bash export LEGION_CC_E2E_KEEP_INSTANCE=1 ``` That keeps the VM and prints the exported managed SSH key path and a ready-to-use `ssh` command. This is the final validation target for the provider addition. ## Full checklist Use this as the compact review list before calling the provider done. - `src/shared/app.ts` - `src/main/state/store.ts` - `src/main/state/schema.ts` - `src/main/cloud/providers/.ts` - `src/main/cloud/providers/-module.ts` - `src/main/cloud/providers/registry.ts` - `src/shared/provider-catalog-policy.ts` - `src/main/cli/cli.ts` - `src/main/cli/node-cli-service.ts` - renderer form/model/i18n files if the provider should appear in the UI - `tests/integration/.smoke.test.ts` - `tests/billed/e2e/-cli.test.ts` - `package.json` - `TESTING.md` ## Current gotchas These are easy to miss in the current codebase. - `ProviderKind` and related unions are duplicated by design across several helper types. Update all affected unions, not just the top-level one. - Provider defaults and credential normalization live in the provider module. If persisted state behaves strangely, check the module's credential contract first. - `node-cli-service.ts` still hardcodes credential/env handling for supported providers. - The renderer still contains explicit provider branches, especially where DNS/domain support differs by provider. A provider can be fully usable headlessly before the UI knows about it. - The shared deployment path is generic, but boot defaults are provider-specific via the provider module's deployment profile. - New provider tests should reuse `cli-live-test-harness.ts`, not fork it. ## Suggested development order If you want the shortest path to a working integration: 1. shared types 2. provider module 3. registry 4. state normalization/defaults 5. catalog policy 6. deployment profile 7. CLI provider configure 8. read-only live smoke test 9. destructive live CLI E2E 10. renderer support That gets the infrastructure path working first and keeps UI work separate.