# Kobold Architecture This document describes the intended Kobold architecture: dynamic datasets with schema-as-data, commit-oriented history, fast local projections, Trust-gated tribe-to-tribe exchange, and generic GraphQL/repo-style APIs. Kobold is not a SQL table generator and is not a second cluster-sync engine. It is a dynamic data/repo layer built on top of the host primitives that already exist in Tribes: - **Ash resources** define Kobold's stable internal model. - **AshPostgres** stores Kobold metadata, commits, refs, and projections. - **AshNostrSync** syncs Kobold resources across nodes in the same tribe. - **Parrhesia/Nostr** carries signed sync events and fast backfill. - **Trust + Tribes.Access** decide which other tribes may discover, read, write, or propose changes. The core invariant is: > Kobold visibility never limits same-tribe cluster sync. Visibility only limits > external tribe access. A private dataset, private commit, or private draft is still replicated inside the local tribe for durability and multi-node consistency. It is simply not advertised or exported to other tribes unless an access rule explicitly allows that. ## Goals Kobold should support this user flow: 1. A user discovers a remote dataset. 2. The user clones it into their tribe. 3. The user explores it with the generic Kobold UI/API. 4. The user adds fields/columns dynamically. 5. The user makes private local drafts. 6. The user installs a supportive plugin later for richer domain-specific UI. 7. The same logical dataset remains in place. 8. The user prepares a proposal/PR back to the upstream tribe. This requires dynamic storage. User-defined columns cannot require new Elixir modules, migrations, or host recompilation. ## High-level Shape ```text Dynamic dataset schema -> ResourceDefinition fields as data Local edits -> Commit chunks -> Bookmark/ref movement -> Ref-scoped RecordProjection updates -> AshNostrSync syncs within the tribe External sharing -> Kobold catalog/clone/proposal gateway -> Trust + access checks -> Parrhesia fast event paging/copy ``` Kobold's canonical data is the commit/ref history. Projections are materialized read models. ```text Dataset ResourceDefinition Commit Bookmark / Ref Proposal RecordProjection ``` ## Stable Resources Kobold itself uses a small fixed set of Ash resources. These are normal static Elixir modules and normal AshPostgres-backed tables. ### Dataset A dataset is the logical repo boundary. It contains stable metadata such as: - `id` - `name` - `description` - `owner_pubkey` - `origin_tribe_pubkey` - `origin_dataset_id` - `schema_name` - `schema_version` - `external_visibility` - `metadata` The dataset is the primary unit of: - discovery - clone/fork - access policy - supportive plugin compatibility - proposal targeting Dataset rows are low-churn. Record edits do not rewrite the dataset row. ### ResourceDefinition A resource definition describes one record type inside a dataset. Example resources: - `SeedVariety` - `SeedLot` - `Supplier` Fields are schema-as-data: ```json { "name": { "type": "string", "required": true }, "germination_days": { "type": "integer" }, "source": { "type": "string" } } ``` Adding a column means updating the resource definition and recording that schema change as a commit. It does not require DDL or code generation. ### Commit A commit is an immutable chunk of dataset operations. A commit has: - `id` / commit id - `change_id` - `dataset_id` - `parent_commit_ids` - `author_user_id` - `author_pubkey` - `message` - `visibility` - `draft?` - `operations` - `metadata` The `change_id` is the stable identity of an evolving change. A new version of the same logical change may have a new commit id while preserving the same change id, matching the Jujutsu mental model. Operations are stored as JSON data. Examples: ```json { "op": "upsert", "resource_name": "SeedVariety", "record_id": "...", "fields": { "name": "Black Krim", "germination_days": 8 } } ``` ```json { "op": "delete", "resource_name": "SeedVariety", "record_id": "..." } ``` ```json { "op": "schema.add_field", "resource_name": "SeedVariety", "field_name": "source", "definition": { "type": "string" } } ``` Commits are the canonical edit history. There is no separate per-record event table layered on top of AshNostrSync. ### Bookmark / Ref A bookmark is a named pointer to a commit. Examples: ```text main origin//main draft/user// proposal/ ``` A bookmark has: - `dataset_id` - `name` - `commit_id` - `scope`: `main | draft | proposal | remote` - `visibility`: `private | shared | public` - `owner_user_id` - `base_ref_name` - `metadata` Bookmarks are the sharing boundary. The existence of a commit does not mean another tribe can fetch it. A remote tribe may fetch only commits reachable from bookmarks/proposals that are externally visible and allowed by policy. ### Proposal A proposal is a PR-like object that asks another tribe or local maintainer to review and accept a set of commits. A proposal has: - `dataset_id` - `source_ref_name` - `source_head_commit_id` - `target_ref_name` - `author_user_id` - `author_pubkey` - `target_tribe_pubkey` - `status`: `draft | submitted | accepted | rejected | closed` - `message` - `metadata` A proposal normally points at a proposal ref: ```text proposal/ ``` Remote tribes see proposal refs, not private local draft refs. ### RecordProjection A record projection is a materialized view for reading and UI rendering. It is not the canonical history. A projection row has: - `dataset_id` - `ref_name` - `base_ref_name` - `resource_name` - `record_id` - `fields` - `deleted?` - `head_commit_id` The uniqueness key is: ```text dataset_id + ref_name + resource_name + record_id ``` This lets multiple users have independent drafts without overwriting each other's working view. ## Projection Strategy Kobold uses ref-scoped projections. ### Main projection `main` is projected as a complete current-state read model: ```text ref_name = "main" base_ref_name = null ``` Generic reads default to `main`. ### Draft/proposal projections Draft and proposal refs are delta projections over a base ref: ```text ref_name = "draft/user/alice/change-123" base_ref_name = "main" ``` The draft projection stores only changed rows. Reads overlay draft rows on top of the base projection: ```text base rows from main + delta rows from draft/user/alice/change-123 - rows where delta.deleted? = true ``` This avoids copying an entire dataset per user draft. ### Applying commits A commit is applied to the projection for the ref it belongs to. Moving a bookmark changes which head commit defines that ref. Projection rebuilds replay commits reachable from the ref and materialize the resulting rows. Projection rebuilds are safe because the canonical state is in commits and refs. ## Edit Workflow Local users may create private draft commits at any time, subject to local user permissions. A normal local edit flow is: ```text user edits record -> find or create private draft ref -> create or amend a draft commit chunk -> apply commit to the draft projection -> move draft bookmark -> AshNostrSync publishes the commit/bookmark inside the tribe ``` The draft is private externally but cluster-synced internally. ```text private draft != local-only storage private draft == not externally visible ``` When the user is ready to share: ```text draft ref -> proposal ref -> proposal object -> optional remote submission ``` Accepting a proposal moves or merges into the target ref, usually `main`. ## Jujutsu-like Semantics Kobold borrows these concepts from Jujutsu/Git-like systems: - **Commit**: immutable operation chunk. - **Change**: stable logical edit identity represented by `change_id`. - **Bookmark/ref**: named pointer to a commit. - **Working copy**: a user's draft ref and its projection. - **Proposal**: a reviewable request to accept commits into another ref. - **Clone/fetch/pull**: transfer dataset metadata, refs, and commits. Unlike Git, the user-facing data model remains structured records and dynamic fields. Users do not manipulate files or raw patches. ## Internal Cluster Sync Kobold resources that define canonical state are AshNostrSync resources: - `Dataset` - `ResourceDefinition` - `Commit` - `Bookmark` - `Proposal` Record projections may be synced for speed or rebuilt locally from commits. The canonical source is always commits plus refs. Cluster sync is independent of external visibility: | Object | Cluster sync | External access | | --- | --- | --- | | Public dataset | yes | policy-gated | | Private dataset | yes | deny by default | | Private draft | yes | not externally reachable | | Proposal | yes | visible if submitted/shared and allowed | Kobold must not suppress AshNostrSync publication merely because a dataset is private. ## External Tribe Access External access is controlled by Trust and `Tribes.Access`. Kobold policy actions include: - `advertise`: may this tribe see the dataset/ref in catalog? - `read`: may this tribe fetch metadata, refs, and commits? - `write`: may this tribe submit commits/proposals? - `admin`: may this tribe administer sharing rules? Default public dataset policy may allow: ```text subject_type = tribe subject_id = * action = read / advertise condition = min_trust_score >= 0 ``` Private datasets deny external access by default. Remote access is evaluated against the remote tribe pubkey, not tribe name. Tribe names are display metadata and are not unique identities. ## Clone and Fetch Over the Wire A clone transfers canonical history, not projections. The logical clone sequence is: ```text remote catalog/discovery -> user chooses dataset -> Kobold verifies advertise/read access -> fetch Dataset metadata -> fetch ResourceDefinitions -> fetch allowed Bookmarks/refs -> fetch reachable Commits -> rebuild local projections ``` The initial clone envelope can be represented as: ```json { "dataset": { "id": "...", "name": "Seed Catalog" }, "resources": [ { "name": "SeedVariety", "fields": { "name": { "type": "string" } } } ], "bookmarks": [ { "name": "origin//main", "commit_id": "..." } ], "commits": [ { "id": "...", "change_id": "...", "parent_commit_ids": [], "operations": [] } ] } ``` For real datasets, clone/fetch is paged rather than one large response. ## Parrhesia Fast Copy and `SYNC-PAGE` Kobold should use Parrhesia's optimized sync/backfill machinery for initial clone and incremental fetch. The ideal host API is a set of bulk sync/copy primitives, for example: ```elixir Parrhesia.Sync.page_events(source, filter, opts) Parrhesia.Sync.stream_events(source, filter, opts) Parrhesia.Sync.copy_events(source, target, opts) Parrhesia.Sync.import_events(events, opts) ``` These APIs should support: - keyset cursors - page size - ascending/descending order - batched event frames - signature verification - deduplication - progress callbacks - authorization callbacks Parrhesia's `SYNC-PAGE` is a good low-level primitive for this because it pages stored events by `(created_at, event_id)` and can return batched `EVENTS` frames. Kobold should not expose an unrestricted relay backfill to arbitrary remote tribes. The flow should be: ```text remote clone request -> Kobold authenticates remote tribe -> Kobold evaluates Trust/access -> Kobold builds allowed event filters -> Parrhesia fast pagination/copy runs under that authorization -> receiver imports events and rebuilds projections ``` For efficient dataset-level clone, Kobold/AshNostrSync events need filterable tags such as: ```text ["r", "plugins.kobold.commit"] ["dataset", dataset_id] ["ref", ref_name] ``` Without dataset/ref tags, clone would need to scan all Kobold commit events and filter by payload content, which does not scale. ## GraphQL Kobold exposes a generic repo-style GraphQL API over its stable resources and actions. The generic GraphQL schema is static: ```graphql type KoboldRecordProjection { datasetId: ID! refName: String! resourceName: String! recordId: ID! fields: JSON! headCommitId: ID } ``` Dynamic user columns remain inside `fields: JSON`. Clients build dynamic forms and tables by reading `ResourceDefinition.fields`. GraphQL is a good fit for repo actions: - `koboldDatasets` - `koboldDataset` - `koboldResources` - `koboldRecords` - `koboldCommits` - `koboldBookmarks` - `koboldCreateDataset` - `koboldUpsertRecord` - `koboldCreateDraft` - `koboldCommitDraft` - `koboldCloneDataset` - `koboldFetchDataset` - `koboldCreateProposal` - `koboldSubmitProposal` - `koboldAcceptProposal` Kobold can use `AshGraphql` for these stable resources and actions. ### Dynamic plugin schemas The host should not need recompilation when a plugin is installed. A single merged Absinthe schema containing dynamically installed plugin domains is not the natural fit for AshGraphql. Instead, the host exposes one GraphQL endpoint with routing: ```text POST /graphql ``` A request selects a schema using a header, query parameter, or GraphQL extensions payload: ```json { "extensions": { "tribes": { "plugin": "kobold" } }, "query": "..." } ``` The gateway dispatches to: ```text core schema -> TribesWeb.GraphqlSchema plugin kobold schema -> KoboldWeb.GraphqlSchema plugin trust schema -> TrustWeb.GraphqlSchema ``` This gives one endpoint without requiring host recompilation. Introspection is per routed schema rather than one globally merged graph. ## Supportive Plugins Supportive plugins do not own Kobold storage. They declare compatibility with a schema or dataset type and provide richer behavior on top of the same logical Kobold dataset. A supportive plugin may provide: - custom UI - richer validation - import/export flows - computed views - proposal review UI - domain-specific actions - optional typed GraphQL for known schema families The generic Kobold dataset remains dynamic and schema-as-data. Installing a supportive plugin improves the experience but does not migrate the dataset into plugin-owned tables. This preserves the desired flow: ```text clone with generic Kobold -> explore/edit dynamically -> install supportive plugin later -> continue using the same dataset identity/history ``` ## Query and Performance Model The basic storage model is JSONB/map-based projections. This is deliberate so users can add fields dynamically. Important indexes include: ```text RecordProjection(dataset_id, ref_name, resource_name, record_id) unique RecordProjection(dataset_id, ref_name, resource_name) RecordProjection(head_commit_id) Commit(dataset_id, inserted_at) Commit(dataset_id, change_id) Bookmark(dataset_id, name) unique ``` For common dynamic-field queries, Kobold can add incremental optimizations: 1. JSONB GIN indexes for broad containment queries. 2. Promoted expression indexes for hot fields. 3. A secondary field index table for heavily queried fields. 4. Supportive-plugin views for known schema families. The first implementation should favor simple, correct dynamic behavior over runtime DDL. ## Commit Sizing Commits are chunks. They should not be unbounded. Guidelines: - Small UI edit: one commit/change. - Bulk import: chunked commits, e.g. 100-1000 operations each. - Large files/blobs: content-addressed attachment references, not inline JSON. Chunking keeps sync retry behavior and projection rebuilds manageable. ## Security Boundaries Kobold has three distinct access scopes: ### Same-tribe cluster sync Nodes in the same tribe receive Kobold sync events according to cluster sync rules. Kobold dataset visibility does not block this. ### Local users Local users may see/edit datasets and draft refs according to local user/admin permissions. A user's private draft ref is not visible to other users unless shared locally. ### Remote tribes Remote tribes are identified by pubkey. They may only see externally visible refs and commits if Trust/access policies allow the requested action. Remote write access should normally create a proposal or signed write request. The owning tribe validates and accepts it into its own history. Remote writes do not blindly mutate the owner's main ref. ## API Compatibility Kobold may expose compatibility shapes for older tests or clients, such as an `events` array derived from commits. These are API views, not separate canonical storage. The canonical edit model is: ```text Commit + Bookmark + Projection ``` not: ```text DatasetEvent + Projection ``` ## Implementation Priorities 1. Keep dynamic datasets schema-as-data. 2. Keep commits/bookmarks as canonical history. 3. Scope projections by ref/user. 4. Use AshNostrSync for all cluster-relevant Kobold canonical resources. 5. Ensure private means externally private, not cluster-local. 6. Add routed plugin GraphQL schemas for generic repo actions. 7. Add Parrhesia bulk sync/copy APIs and dataset/ref tags for fast clone. 8. Add proposals and remote write review. 9. Optimize dynamic-field queries only after real usage demonstrates need.