Document the intended dynamic dataset architecture with commit chunks, ref-scoped projections, Trust-gated external access, routed GraphQL schemas, and Parrhesia-backed clone/fetch flows.
17 KiB
Kobold Architecture
This document describes the intended Kobold architecture: dynamic datasets with schema-as-data, commit-oriented history, fast local projections, Trust-gated tribe-to-tribe exchange, and generic GraphQL/repo-style APIs.
Kobold is not a SQL table generator and is not a second cluster-sync engine. It is a dynamic data/repo layer built on top of the host primitives that already exist in Tribes:
- Ash resources define Kobold's stable internal model.
- AshPostgres stores Kobold metadata, commits, refs, and projections.
- AshNostrSync syncs Kobold resources across nodes in the same tribe.
- Parrhesia/Nostr carries signed sync events and fast backfill.
- Trust + Tribes.Access decide which other tribes may discover, read, write, or propose changes.
The core invariant is:
Kobold visibility never limits same-tribe cluster sync. Visibility only limits external tribe access.
A private dataset, private commit, or private draft is still replicated inside the local tribe for durability and multi-node consistency. It is simply not advertised or exported to other tribes unless an access rule explicitly allows that.
Goals
Kobold should support this user flow:
- A user discovers a remote dataset.
- The user clones it into their tribe.
- The user explores it with the generic Kobold UI/API.
- The user adds fields/columns dynamically.
- The user makes private local drafts.
- The user installs a supportive plugin later for richer domain-specific UI.
- The same logical dataset remains in place.
- The user prepares a proposal/PR back to the upstream tribe.
This requires dynamic storage. User-defined columns cannot require new Elixir modules, migrations, or host recompilation.
High-level Shape
Dynamic dataset schema
-> ResourceDefinition fields as data
Local edits
-> Commit chunks
-> Bookmark/ref movement
-> Ref-scoped RecordProjection updates
-> AshNostrSync syncs within the tribe
External sharing
-> Kobold catalog/clone/proposal gateway
-> Trust + access checks
-> Parrhesia fast event paging/copy
Kobold's canonical data is the commit/ref history. Projections are materialized read models.
Dataset
ResourceDefinition
Commit
Bookmark / Ref
Proposal
RecordProjection
Stable Resources
Kobold itself uses a small fixed set of Ash resources. These are normal static Elixir modules and normal AshPostgres-backed tables.
Dataset
A dataset is the logical repo boundary.
It contains stable metadata such as:
idnamedescriptionowner_pubkeyorigin_tribe_pubkeyorigin_dataset_idschema_nameschema_versionexternal_visibilitymetadata
The dataset is the primary unit of:
- discovery
- clone/fork
- access policy
- supportive plugin compatibility
- proposal targeting
Dataset rows are low-churn. Record edits do not rewrite the dataset row.
ResourceDefinition
A resource definition describes one record type inside a dataset.
Example resources:
SeedVarietySeedLotSupplier
Fields are schema-as-data:
{
"name": { "type": "string", "required": true },
"germination_days": { "type": "integer" },
"source": { "type": "string" }
}
Adding a column means updating the resource definition and recording that schema change as a commit. It does not require DDL or code generation.
Commit
A commit is an immutable chunk of dataset operations.
A commit has:
id/ commit idchange_iddataset_idparent_commit_idsauthor_user_idauthor_pubkeymessagevisibilitydraft?operationsmetadata
The change_id is the stable identity of an evolving change. A new version of
the same logical change may have a new commit id while preserving the same
change id, matching the Jujutsu mental model.
Operations are stored as JSON data. Examples:
{
"op": "upsert",
"resource_name": "SeedVariety",
"record_id": "...",
"fields": {
"name": "Black Krim",
"germination_days": 8
}
}
{
"op": "delete",
"resource_name": "SeedVariety",
"record_id": "..."
}
{
"op": "schema.add_field",
"resource_name": "SeedVariety",
"field_name": "source",
"definition": { "type": "string" }
}
Commits are the canonical edit history. There is no separate per-record event table layered on top of AshNostrSync.
Bookmark / Ref
A bookmark is a named pointer to a commit.
Examples:
main
origin/<tribe_pubkey>/main
draft/user/<user_id>/<change_id>
proposal/<proposal_id>
A bookmark has:
dataset_idnamecommit_idscope:main | draft | proposal | remotevisibility:private | shared | publicowner_user_idbase_ref_namemetadata
Bookmarks are the sharing boundary. The existence of a commit does not mean another tribe can fetch it. A remote tribe may fetch only commits reachable from bookmarks/proposals that are externally visible and allowed by policy.
Proposal
A proposal is a PR-like object that asks another tribe or local maintainer to review and accept a set of commits.
A proposal has:
dataset_idsource_ref_namesource_head_commit_idtarget_ref_nameauthor_user_idauthor_pubkeytarget_tribe_pubkeystatus:draft | submitted | accepted | rejected | closedmessagemetadata
A proposal normally points at a proposal ref:
proposal/<proposal_id>
Remote tribes see proposal refs, not private local draft refs.
RecordProjection
A record projection is a materialized view for reading and UI rendering. It is not the canonical history.
A projection row has:
dataset_idref_namebase_ref_nameresource_namerecord_idfieldsdeleted?head_commit_id
The uniqueness key is:
dataset_id + ref_name + resource_name + record_id
This lets multiple users have independent drafts without overwriting each other's working view.
Projection Strategy
Kobold uses ref-scoped projections.
Main projection
main is projected as a complete current-state read model:
ref_name = "main"
base_ref_name = null
Generic reads default to main.
Draft/proposal projections
Draft and proposal refs are delta projections over a base ref:
ref_name = "draft/user/alice/change-123"
base_ref_name = "main"
The draft projection stores only changed rows. Reads overlay draft rows on top of the base projection:
base rows from main
+ delta rows from draft/user/alice/change-123
- rows where delta.deleted? = true
This avoids copying an entire dataset per user draft.
Applying commits
A commit is applied to the projection for the ref it belongs to. Moving a bookmark changes which head commit defines that ref. Projection rebuilds replay commits reachable from the ref and materialize the resulting rows.
Projection rebuilds are safe because the canonical state is in commits and refs.
Edit Workflow
Local users may create private draft commits at any time, subject to local user permissions.
A normal local edit flow is:
user edits record
-> find or create private draft ref
-> create or amend a draft commit chunk
-> apply commit to the draft projection
-> move draft bookmark
-> AshNostrSync publishes the commit/bookmark inside the tribe
The draft is private externally but cluster-synced internally.
private draft != local-only storage
private draft == not externally visible
When the user is ready to share:
draft ref
-> proposal ref
-> proposal object
-> optional remote submission
Accepting a proposal moves or merges into the target ref, usually main.
Jujutsu-like Semantics
Kobold borrows these concepts from Jujutsu/Git-like systems:
- Commit: immutable operation chunk.
- Change: stable logical edit identity represented by
change_id. - Bookmark/ref: named pointer to a commit.
- Working copy: a user's draft ref and its projection.
- Proposal: a reviewable request to accept commits into another ref.
- Clone/fetch/pull: transfer dataset metadata, refs, and commits.
Unlike Git, the user-facing data model remains structured records and dynamic fields. Users do not manipulate files or raw patches.
Internal Cluster Sync
Kobold resources that define canonical state are AshNostrSync resources:
DatasetResourceDefinitionCommitBookmarkProposal
Record projections may be synced for speed or rebuilt locally from commits. The canonical source is always commits plus refs.
Cluster sync is independent of external visibility:
| Object | Cluster sync | External access |
|---|---|---|
| Public dataset | yes | policy-gated |
| Private dataset | yes | deny by default |
| Private draft | yes | not externally reachable |
| Proposal | yes | visible if submitted/shared and allowed |
Kobold must not suppress AshNostrSync publication merely because a dataset is private.
External Tribe Access
External access is controlled by Trust and Tribes.Access.
Kobold policy actions include:
advertise: may this tribe see the dataset/ref in catalog?read: may this tribe fetch metadata, refs, and commits?write: may this tribe submit commits/proposals?admin: may this tribe administer sharing rules?
Default public dataset policy may allow:
subject_type = tribe
subject_id = *
action = read / advertise
condition = min_trust_score >= 0
Private datasets deny external access by default.
Remote access is evaluated against the remote tribe pubkey, not tribe name. Tribe names are display metadata and are not unique identities.
Clone and Fetch Over the Wire
A clone transfers canonical history, not projections.
The logical clone sequence is:
remote catalog/discovery
-> user chooses dataset
-> Kobold verifies advertise/read access
-> fetch Dataset metadata
-> fetch ResourceDefinitions
-> fetch allowed Bookmarks/refs
-> fetch reachable Commits
-> rebuild local projections
The initial clone envelope can be represented as:
{
"dataset": { "id": "...", "name": "Seed Catalog" },
"resources": [
{ "name": "SeedVariety", "fields": { "name": { "type": "string" } } }
],
"bookmarks": [
{ "name": "origin/<pubkey>/main", "commit_id": "..." }
],
"commits": [
{
"id": "...",
"change_id": "...",
"parent_commit_ids": [],
"operations": []
}
]
}
For real datasets, clone/fetch is paged rather than one large response.
Parrhesia Fast Copy and SYNC-PAGE
Kobold should use Parrhesia's optimized sync/backfill machinery for initial clone and incremental fetch.
The ideal host API is a set of bulk sync/copy primitives, for example:
Parrhesia.Sync.page_events(source, filter, opts)
Parrhesia.Sync.stream_events(source, filter, opts)
Parrhesia.Sync.copy_events(source, target, opts)
Parrhesia.Sync.import_events(events, opts)
These APIs should support:
- keyset cursors
- page size
- ascending/descending order
- batched event frames
- signature verification
- deduplication
- progress callbacks
- authorization callbacks
Parrhesia's SYNC-PAGE is a good low-level primitive for this because it pages
stored events by (created_at, event_id) and can return batched EVENTS frames.
Kobold should not expose an unrestricted relay backfill to arbitrary remote tribes. The flow should be:
remote clone request
-> Kobold authenticates remote tribe
-> Kobold evaluates Trust/access
-> Kobold builds allowed event filters
-> Parrhesia fast pagination/copy runs under that authorization
-> receiver imports events and rebuilds projections
For efficient dataset-level clone, Kobold/AshNostrSync events need filterable tags such as:
["r", "plugins.kobold.commit"]
["dataset", dataset_id]
["ref", ref_name]
Without dataset/ref tags, clone would need to scan all Kobold commit events and filter by payload content, which does not scale.
GraphQL
Kobold exposes a generic repo-style GraphQL API over its stable resources and actions.
The generic GraphQL schema is static:
type KoboldRecordProjection {
datasetId: ID!
refName: String!
resourceName: String!
recordId: ID!
fields: JSON!
headCommitId: ID
}
Dynamic user columns remain inside fields: JSON. Clients build dynamic forms
and tables by reading ResourceDefinition.fields.
GraphQL is a good fit for repo actions:
koboldDatasetskoboldDatasetkoboldResourceskoboldRecordskoboldCommitskoboldBookmarkskoboldCreateDatasetkoboldUpsertRecordkoboldCreateDraftkoboldCommitDraftkoboldCloneDatasetkoboldFetchDatasetkoboldCreateProposalkoboldSubmitProposalkoboldAcceptProposal
Kobold can use AshGraphql for these stable resources and actions.
Dynamic plugin schemas
The host should not need recompilation when a plugin is installed. A single merged Absinthe schema containing dynamically installed plugin domains is not the natural fit for AshGraphql.
Instead, the host exposes one GraphQL endpoint with routing:
POST /graphql
A request selects a schema using a header, query parameter, or GraphQL extensions payload:
{
"extensions": {
"tribes": {
"plugin": "kobold"
}
},
"query": "..."
}
The gateway dispatches to:
core schema -> TribesWeb.GraphqlSchema
plugin kobold schema -> KoboldWeb.GraphqlSchema
plugin trust schema -> TrustWeb.GraphqlSchema
This gives one endpoint without requiring host recompilation. Introspection is per routed schema rather than one globally merged graph.
Supportive Plugins
Supportive plugins do not own Kobold storage. They declare compatibility with a schema or dataset type and provide richer behavior on top of the same logical Kobold dataset.
A supportive plugin may provide:
- custom UI
- richer validation
- import/export flows
- computed views
- proposal review UI
- domain-specific actions
- optional typed GraphQL for known schema families
The generic Kobold dataset remains dynamic and schema-as-data. Installing a supportive plugin improves the experience but does not migrate the dataset into plugin-owned tables.
This preserves the desired flow:
clone with generic Kobold
-> explore/edit dynamically
-> install supportive plugin later
-> continue using the same dataset identity/history
Query and Performance Model
The basic storage model is JSONB/map-based projections. This is deliberate so users can add fields dynamically.
Important indexes include:
RecordProjection(dataset_id, ref_name, resource_name, record_id) unique
RecordProjection(dataset_id, ref_name, resource_name)
RecordProjection(head_commit_id)
Commit(dataset_id, inserted_at)
Commit(dataset_id, change_id)
Bookmark(dataset_id, name) unique
For common dynamic-field queries, Kobold can add incremental optimizations:
- JSONB GIN indexes for broad containment queries.
- Promoted expression indexes for hot fields.
- A secondary field index table for heavily queried fields.
- Supportive-plugin views for known schema families.
The first implementation should favor simple, correct dynamic behavior over runtime DDL.
Commit Sizing
Commits are chunks. They should not be unbounded.
Guidelines:
- Small UI edit: one commit/change.
- Bulk import: chunked commits, e.g. 100-1000 operations each.
- Large files/blobs: content-addressed attachment references, not inline JSON.
Chunking keeps sync retry behavior and projection rebuilds manageable.
Security Boundaries
Kobold has three distinct access scopes:
Same-tribe cluster sync
Nodes in the same tribe receive Kobold sync events according to cluster sync rules. Kobold dataset visibility does not block this.
Local users
Local users may see/edit datasets and draft refs according to local user/admin permissions. A user's private draft ref is not visible to other users unless shared locally.
Remote tribes
Remote tribes are identified by pubkey. They may only see externally visible refs and commits if Trust/access policies allow the requested action.
Remote write access should normally create a proposal or signed write request. The owning tribe validates and accepts it into its own history. Remote writes do not blindly mutate the owner's main ref.
API Compatibility
Kobold may expose compatibility shapes for older tests or clients, such as an
events array derived from commits. These are API views, not separate canonical
storage.
The canonical edit model is:
Commit + Bookmark + Projection
not:
DatasetEvent + Projection
Implementation Priorities
- Keep dynamic datasets schema-as-data.
- Keep commits/bookmarks as canonical history.
- Scope projections by ref/user.
- Use AshNostrSync for all cluster-relevant Kobold canonical resources.
- Ensure private means externally private, not cluster-local.
- Add routed plugin GraphQL schemas for generic repo actions.
- Add Parrhesia bulk sync/copy APIs and dataset/ref tags for fast clone.
- Add proposals and remote write review.
- Optimize dynamic-field queries only after real usage demonstrates need.