Files
self c312e22750
CI / Test (push) Failing after 15s
docs: describe Kobold architecture
Document the intended dynamic dataset architecture with commit chunks, ref-scoped projections, Trust-gated external access, routed GraphQL schemas, and Parrhesia-backed clone/fetch flows.
2026-05-29 13:44:01 +02:00

17 KiB

Kobold Architecture

This document describes the intended Kobold architecture: dynamic datasets with schema-as-data, commit-oriented history, fast local projections, Trust-gated tribe-to-tribe exchange, and generic GraphQL/repo-style APIs.

Kobold is not a SQL table generator and is not a second cluster-sync engine. It is a dynamic data/repo layer built on top of the host primitives that already exist in Tribes:

  • Ash resources define Kobold's stable internal model.
  • AshPostgres stores Kobold metadata, commits, refs, and projections.
  • AshNostrSync syncs Kobold resources across nodes in the same tribe.
  • Parrhesia/Nostr carries signed sync events and fast backfill.
  • Trust + Tribes.Access decide which other tribes may discover, read, write, or propose changes.

The core invariant is:

Kobold visibility never limits same-tribe cluster sync. Visibility only limits external tribe access.

A private dataset, private commit, or private draft is still replicated inside the local tribe for durability and multi-node consistency. It is simply not advertised or exported to other tribes unless an access rule explicitly allows that.

Goals

Kobold should support this user flow:

  1. A user discovers a remote dataset.
  2. The user clones it into their tribe.
  3. The user explores it with the generic Kobold UI/API.
  4. The user adds fields/columns dynamically.
  5. The user makes private local drafts.
  6. The user installs a supportive plugin later for richer domain-specific UI.
  7. The same logical dataset remains in place.
  8. The user prepares a proposal/PR back to the upstream tribe.

This requires dynamic storage. User-defined columns cannot require new Elixir modules, migrations, or host recompilation.

High-level Shape

Dynamic dataset schema
  -> ResourceDefinition fields as data

Local edits
  -> Commit chunks
  -> Bookmark/ref movement
  -> Ref-scoped RecordProjection updates
  -> AshNostrSync syncs within the tribe

External sharing
  -> Kobold catalog/clone/proposal gateway
  -> Trust + access checks
  -> Parrhesia fast event paging/copy

Kobold's canonical data is the commit/ref history. Projections are materialized read models.

Dataset
ResourceDefinition
Commit
Bookmark / Ref
Proposal
RecordProjection

Stable Resources

Kobold itself uses a small fixed set of Ash resources. These are normal static Elixir modules and normal AshPostgres-backed tables.

Dataset

A dataset is the logical repo boundary.

It contains stable metadata such as:

  • id
  • name
  • description
  • owner_pubkey
  • origin_tribe_pubkey
  • origin_dataset_id
  • schema_name
  • schema_version
  • external_visibility
  • metadata

The dataset is the primary unit of:

  • discovery
  • clone/fork
  • access policy
  • supportive plugin compatibility
  • proposal targeting

Dataset rows are low-churn. Record edits do not rewrite the dataset row.

ResourceDefinition

A resource definition describes one record type inside a dataset.

Example resources:

  • SeedVariety
  • SeedLot
  • Supplier

Fields are schema-as-data:

{
  "name": { "type": "string", "required": true },
  "germination_days": { "type": "integer" },
  "source": { "type": "string" }
}

Adding a column means updating the resource definition and recording that schema change as a commit. It does not require DDL or code generation.

Commit

A commit is an immutable chunk of dataset operations.

A commit has:

  • id / commit id
  • change_id
  • dataset_id
  • parent_commit_ids
  • author_user_id
  • author_pubkey
  • message
  • visibility
  • draft?
  • operations
  • metadata

The change_id is the stable identity of an evolving change. A new version of the same logical change may have a new commit id while preserving the same change id, matching the Jujutsu mental model.

Operations are stored as JSON data. Examples:

{
  "op": "upsert",
  "resource_name": "SeedVariety",
  "record_id": "...",
  "fields": {
    "name": "Black Krim",
    "germination_days": 8
  }
}
{
  "op": "delete",
  "resource_name": "SeedVariety",
  "record_id": "..."
}
{
  "op": "schema.add_field",
  "resource_name": "SeedVariety",
  "field_name": "source",
  "definition": { "type": "string" }
}

Commits are the canonical edit history. There is no separate per-record event table layered on top of AshNostrSync.

Bookmark / Ref

A bookmark is a named pointer to a commit.

Examples:

main
origin/<tribe_pubkey>/main
draft/user/<user_id>/<change_id>
proposal/<proposal_id>

A bookmark has:

  • dataset_id
  • name
  • commit_id
  • scope: main | draft | proposal | remote
  • visibility: private | shared | public
  • owner_user_id
  • base_ref_name
  • metadata

Bookmarks are the sharing boundary. The existence of a commit does not mean another tribe can fetch it. A remote tribe may fetch only commits reachable from bookmarks/proposals that are externally visible and allowed by policy.

Proposal

A proposal is a PR-like object that asks another tribe or local maintainer to review and accept a set of commits.

A proposal has:

  • dataset_id
  • source_ref_name
  • source_head_commit_id
  • target_ref_name
  • author_user_id
  • author_pubkey
  • target_tribe_pubkey
  • status: draft | submitted | accepted | rejected | closed
  • message
  • metadata

A proposal normally points at a proposal ref:

proposal/<proposal_id>

Remote tribes see proposal refs, not private local draft refs.

RecordProjection

A record projection is a materialized view for reading and UI rendering. It is not the canonical history.

A projection row has:

  • dataset_id
  • ref_name
  • base_ref_name
  • resource_name
  • record_id
  • fields
  • deleted?
  • head_commit_id

The uniqueness key is:

dataset_id + ref_name + resource_name + record_id

This lets multiple users have independent drafts without overwriting each other's working view.

Projection Strategy

Kobold uses ref-scoped projections.

Main projection

main is projected as a complete current-state read model:

ref_name = "main"
base_ref_name = null

Generic reads default to main.

Draft/proposal projections

Draft and proposal refs are delta projections over a base ref:

ref_name = "draft/user/alice/change-123"
base_ref_name = "main"

The draft projection stores only changed rows. Reads overlay draft rows on top of the base projection:

base rows from main
+ delta rows from draft/user/alice/change-123
- rows where delta.deleted? = true

This avoids copying an entire dataset per user draft.

Applying commits

A commit is applied to the projection for the ref it belongs to. Moving a bookmark changes which head commit defines that ref. Projection rebuilds replay commits reachable from the ref and materialize the resulting rows.

Projection rebuilds are safe because the canonical state is in commits and refs.

Edit Workflow

Local users may create private draft commits at any time, subject to local user permissions.

A normal local edit flow is:

user edits record
  -> find or create private draft ref
  -> create or amend a draft commit chunk
  -> apply commit to the draft projection
  -> move draft bookmark
  -> AshNostrSync publishes the commit/bookmark inside the tribe

The draft is private externally but cluster-synced internally.

private draft != local-only storage
private draft == not externally visible

When the user is ready to share:

draft ref
  -> proposal ref
  -> proposal object
  -> optional remote submission

Accepting a proposal moves or merges into the target ref, usually main.

Jujutsu-like Semantics

Kobold borrows these concepts from Jujutsu/Git-like systems:

  • Commit: immutable operation chunk.
  • Change: stable logical edit identity represented by change_id.
  • Bookmark/ref: named pointer to a commit.
  • Working copy: a user's draft ref and its projection.
  • Proposal: a reviewable request to accept commits into another ref.
  • Clone/fetch/pull: transfer dataset metadata, refs, and commits.

Unlike Git, the user-facing data model remains structured records and dynamic fields. Users do not manipulate files or raw patches.

Internal Cluster Sync

Kobold resources that define canonical state are AshNostrSync resources:

  • Dataset
  • ResourceDefinition
  • Commit
  • Bookmark
  • Proposal

Record projections may be synced for speed or rebuilt locally from commits. The canonical source is always commits plus refs.

Cluster sync is independent of external visibility:

Object Cluster sync External access
Public dataset yes policy-gated
Private dataset yes deny by default
Private draft yes not externally reachable
Proposal yes visible if submitted/shared and allowed

Kobold must not suppress AshNostrSync publication merely because a dataset is private.

External Tribe Access

External access is controlled by Trust and Tribes.Access.

Kobold policy actions include:

  • advertise: may this tribe see the dataset/ref in catalog?
  • read: may this tribe fetch metadata, refs, and commits?
  • write: may this tribe submit commits/proposals?
  • admin: may this tribe administer sharing rules?

Default public dataset policy may allow:

subject_type = tribe
subject_id = *
action = read / advertise
condition = min_trust_score >= 0

Private datasets deny external access by default.

Remote access is evaluated against the remote tribe pubkey, not tribe name. Tribe names are display metadata and are not unique identities.

Clone and Fetch Over the Wire

A clone transfers canonical history, not projections.

The logical clone sequence is:

remote catalog/discovery
  -> user chooses dataset
  -> Kobold verifies advertise/read access
  -> fetch Dataset metadata
  -> fetch ResourceDefinitions
  -> fetch allowed Bookmarks/refs
  -> fetch reachable Commits
  -> rebuild local projections

The initial clone envelope can be represented as:

{
  "dataset": { "id": "...", "name": "Seed Catalog" },
  "resources": [
    { "name": "SeedVariety", "fields": { "name": { "type": "string" } } }
  ],
  "bookmarks": [
    { "name": "origin/<pubkey>/main", "commit_id": "..." }
  ],
  "commits": [
    {
      "id": "...",
      "change_id": "...",
      "parent_commit_ids": [],
      "operations": []
    }
  ]
}

For real datasets, clone/fetch is paged rather than one large response.

Parrhesia Fast Copy and SYNC-PAGE

Kobold should use Parrhesia's optimized sync/backfill machinery for initial clone and incremental fetch.

The ideal host API is a set of bulk sync/copy primitives, for example:

Parrhesia.Sync.page_events(source, filter, opts)
Parrhesia.Sync.stream_events(source, filter, opts)
Parrhesia.Sync.copy_events(source, target, opts)
Parrhesia.Sync.import_events(events, opts)

These APIs should support:

  • keyset cursors
  • page size
  • ascending/descending order
  • batched event frames
  • signature verification
  • deduplication
  • progress callbacks
  • authorization callbacks

Parrhesia's SYNC-PAGE is a good low-level primitive for this because it pages stored events by (created_at, event_id) and can return batched EVENTS frames.

Kobold should not expose an unrestricted relay backfill to arbitrary remote tribes. The flow should be:

remote clone request
  -> Kobold authenticates remote tribe
  -> Kobold evaluates Trust/access
  -> Kobold builds allowed event filters
  -> Parrhesia fast pagination/copy runs under that authorization
  -> receiver imports events and rebuilds projections

For efficient dataset-level clone, Kobold/AshNostrSync events need filterable tags such as:

["r", "plugins.kobold.commit"]
["dataset", dataset_id]
["ref", ref_name]

Without dataset/ref tags, clone would need to scan all Kobold commit events and filter by payload content, which does not scale.

GraphQL

Kobold exposes a generic repo-style GraphQL API over its stable resources and actions.

The generic GraphQL schema is static:

type KoboldRecordProjection {
  datasetId: ID!
  refName: String!
  resourceName: String!
  recordId: ID!
  fields: JSON!
  headCommitId: ID
}

Dynamic user columns remain inside fields: JSON. Clients build dynamic forms and tables by reading ResourceDefinition.fields.

GraphQL is a good fit for repo actions:

  • koboldDatasets
  • koboldDataset
  • koboldResources
  • koboldRecords
  • koboldCommits
  • koboldBookmarks
  • koboldCreateDataset
  • koboldUpsertRecord
  • koboldCreateDraft
  • koboldCommitDraft
  • koboldCloneDataset
  • koboldFetchDataset
  • koboldCreateProposal
  • koboldSubmitProposal
  • koboldAcceptProposal

Kobold can use AshGraphql for these stable resources and actions.

Dynamic plugin schemas

The host should not need recompilation when a plugin is installed. A single merged Absinthe schema containing dynamically installed plugin domains is not the natural fit for AshGraphql.

Instead, the host exposes one GraphQL endpoint with routing:

POST /graphql

A request selects a schema using a header, query parameter, or GraphQL extensions payload:

{
  "extensions": {
    "tribes": {
      "plugin": "kobold"
    }
  },
  "query": "..."
}

The gateway dispatches to:

core schema                  -> TribesWeb.GraphqlSchema
plugin kobold schema         -> KoboldWeb.GraphqlSchema
plugin trust schema          -> TrustWeb.GraphqlSchema

This gives one endpoint without requiring host recompilation. Introspection is per routed schema rather than one globally merged graph.

Supportive Plugins

Supportive plugins do not own Kobold storage. They declare compatibility with a schema or dataset type and provide richer behavior on top of the same logical Kobold dataset.

A supportive plugin may provide:

  • custom UI
  • richer validation
  • import/export flows
  • computed views
  • proposal review UI
  • domain-specific actions
  • optional typed GraphQL for known schema families

The generic Kobold dataset remains dynamic and schema-as-data. Installing a supportive plugin improves the experience but does not migrate the dataset into plugin-owned tables.

This preserves the desired flow:

clone with generic Kobold
  -> explore/edit dynamically
  -> install supportive plugin later
  -> continue using the same dataset identity/history

Query and Performance Model

The basic storage model is JSONB/map-based projections. This is deliberate so users can add fields dynamically.

Important indexes include:

RecordProjection(dataset_id, ref_name, resource_name, record_id) unique
RecordProjection(dataset_id, ref_name, resource_name)
RecordProjection(head_commit_id)
Commit(dataset_id, inserted_at)
Commit(dataset_id, change_id)
Bookmark(dataset_id, name) unique

For common dynamic-field queries, Kobold can add incremental optimizations:

  1. JSONB GIN indexes for broad containment queries.
  2. Promoted expression indexes for hot fields.
  3. A secondary field index table for heavily queried fields.
  4. Supportive-plugin views for known schema families.

The first implementation should favor simple, correct dynamic behavior over runtime DDL.

Commit Sizing

Commits are chunks. They should not be unbounded.

Guidelines:

  • Small UI edit: one commit/change.
  • Bulk import: chunked commits, e.g. 100-1000 operations each.
  • Large files/blobs: content-addressed attachment references, not inline JSON.

Chunking keeps sync retry behavior and projection rebuilds manageable.

Security Boundaries

Kobold has three distinct access scopes:

Same-tribe cluster sync

Nodes in the same tribe receive Kobold sync events according to cluster sync rules. Kobold dataset visibility does not block this.

Local users

Local users may see/edit datasets and draft refs according to local user/admin permissions. A user's private draft ref is not visible to other users unless shared locally.

Remote tribes

Remote tribes are identified by pubkey. They may only see externally visible refs and commits if Trust/access policies allow the requested action.

Remote write access should normally create a proposal or signed write request. The owning tribe validates and accepts it into its own history. Remote writes do not blindly mutate the owner's main ref.

API Compatibility

Kobold may expose compatibility shapes for older tests or clients, such as an events array derived from commits. These are API views, not separate canonical storage.

The canonical edit model is:

Commit + Bookmark + Projection

not:

DatasetEvent + Projection

Implementation Priorities

  1. Keep dynamic datasets schema-as-data.
  2. Keep commits/bookmarks as canonical history.
  3. Scope projections by ref/user.
  4. Use AshNostrSync for all cluster-relevant Kobold canonical resources.
  5. Ensure private means externally private, not cluster-local.
  6. Add routed plugin GraphQL schemas for generic repo actions.
  7. Add Parrhesia bulk sync/copy APIs and dataset/ref tags for fast clone.
  8. Add proposals and remote write review.
  9. Optimize dynamic-field queries only after real usage demonstrates need.