Files
self c312e22750
CI / Test (push) Failing after 15s
docs: describe Kobold architecture
Document the intended dynamic dataset architecture with commit chunks, ref-scoped projections, Trust-gated external access, routed GraphQL schemas, and Parrhesia-backed clone/fetch flows.
2026-05-29 13:44:01 +02:00

696 lines
17 KiB
Markdown

# Kobold Architecture
This document describes the intended Kobold architecture: dynamic datasets with
schema-as-data, commit-oriented history, fast local projections, Trust-gated
tribe-to-tribe exchange, and generic GraphQL/repo-style APIs.
Kobold is not a SQL table generator and is not a second cluster-sync engine. It
is a dynamic data/repo layer built on top of the host primitives that already
exist in Tribes:
- **Ash resources** define Kobold's stable internal model.
- **AshPostgres** stores Kobold metadata, commits, refs, and projections.
- **AshNostrSync** syncs Kobold resources across nodes in the same tribe.
- **Parrhesia/Nostr** carries signed sync events and fast backfill.
- **Trust + Tribes.Access** decide which other tribes may discover, read, write,
or propose changes.
The core invariant is:
> Kobold visibility never limits same-tribe cluster sync. Visibility only limits
> external tribe access.
A private dataset, private commit, or private draft is still replicated inside
the local tribe for durability and multi-node consistency. It is simply not
advertised or exported to other tribes unless an access rule explicitly allows
that.
## Goals
Kobold should support this user flow:
1. A user discovers a remote dataset.
2. The user clones it into their tribe.
3. The user explores it with the generic Kobold UI/API.
4. The user adds fields/columns dynamically.
5. The user makes private local drafts.
6. The user installs a supportive plugin later for richer domain-specific UI.
7. The same logical dataset remains in place.
8. The user prepares a proposal/PR back to the upstream tribe.
This requires dynamic storage. User-defined columns cannot require new Elixir
modules, migrations, or host recompilation.
## High-level Shape
```text
Dynamic dataset schema
-> ResourceDefinition fields as data
Local edits
-> Commit chunks
-> Bookmark/ref movement
-> Ref-scoped RecordProjection updates
-> AshNostrSync syncs within the tribe
External sharing
-> Kobold catalog/clone/proposal gateway
-> Trust + access checks
-> Parrhesia fast event paging/copy
```
Kobold's canonical data is the commit/ref history. Projections are materialized
read models.
```text
Dataset
ResourceDefinition
Commit
Bookmark / Ref
Proposal
RecordProjection
```
## Stable Resources
Kobold itself uses a small fixed set of Ash resources. These are normal static
Elixir modules and normal AshPostgres-backed tables.
### Dataset
A dataset is the logical repo boundary.
It contains stable metadata such as:
- `id`
- `name`
- `description`
- `owner_pubkey`
- `origin_tribe_pubkey`
- `origin_dataset_id`
- `schema_name`
- `schema_version`
- `external_visibility`
- `metadata`
The dataset is the primary unit of:
- discovery
- clone/fork
- access policy
- supportive plugin compatibility
- proposal targeting
Dataset rows are low-churn. Record edits do not rewrite the dataset row.
### ResourceDefinition
A resource definition describes one record type inside a dataset.
Example resources:
- `SeedVariety`
- `SeedLot`
- `Supplier`
Fields are schema-as-data:
```json
{
"name": { "type": "string", "required": true },
"germination_days": { "type": "integer" },
"source": { "type": "string" }
}
```
Adding a column means updating the resource definition and recording that schema
change as a commit. It does not require DDL or code generation.
### Commit
A commit is an immutable chunk of dataset operations.
A commit has:
- `id` / commit id
- `change_id`
- `dataset_id`
- `parent_commit_ids`
- `author_user_id`
- `author_pubkey`
- `message`
- `visibility`
- `draft?`
- `operations`
- `metadata`
The `change_id` is the stable identity of an evolving change. A new version of
the same logical change may have a new commit id while preserving the same
change id, matching the Jujutsu mental model.
Operations are stored as JSON data. Examples:
```json
{
"op": "upsert",
"resource_name": "SeedVariety",
"record_id": "...",
"fields": {
"name": "Black Krim",
"germination_days": 8
}
}
```
```json
{
"op": "delete",
"resource_name": "SeedVariety",
"record_id": "..."
}
```
```json
{
"op": "schema.add_field",
"resource_name": "SeedVariety",
"field_name": "source",
"definition": { "type": "string" }
}
```
Commits are the canonical edit history. There is no separate per-record event
table layered on top of AshNostrSync.
### Bookmark / Ref
A bookmark is a named pointer to a commit.
Examples:
```text
main
origin/<tribe_pubkey>/main
draft/user/<user_id>/<change_id>
proposal/<proposal_id>
```
A bookmark has:
- `dataset_id`
- `name`
- `commit_id`
- `scope`: `main | draft | proposal | remote`
- `visibility`: `private | shared | public`
- `owner_user_id`
- `base_ref_name`
- `metadata`
Bookmarks are the sharing boundary. The existence of a commit does not mean
another tribe can fetch it. A remote tribe may fetch only commits reachable from
bookmarks/proposals that are externally visible and allowed by policy.
### Proposal
A proposal is a PR-like object that asks another tribe or local maintainer to
review and accept a set of commits.
A proposal has:
- `dataset_id`
- `source_ref_name`
- `source_head_commit_id`
- `target_ref_name`
- `author_user_id`
- `author_pubkey`
- `target_tribe_pubkey`
- `status`: `draft | submitted | accepted | rejected | closed`
- `message`
- `metadata`
A proposal normally points at a proposal ref:
```text
proposal/<proposal_id>
```
Remote tribes see proposal refs, not private local draft refs.
### RecordProjection
A record projection is a materialized view for reading and UI rendering. It is
not the canonical history.
A projection row has:
- `dataset_id`
- `ref_name`
- `base_ref_name`
- `resource_name`
- `record_id`
- `fields`
- `deleted?`
- `head_commit_id`
The uniqueness key is:
```text
dataset_id + ref_name + resource_name + record_id
```
This lets multiple users have independent drafts without overwriting each
other's working view.
## Projection Strategy
Kobold uses ref-scoped projections.
### Main projection
`main` is projected as a complete current-state read model:
```text
ref_name = "main"
base_ref_name = null
```
Generic reads default to `main`.
### Draft/proposal projections
Draft and proposal refs are delta projections over a base ref:
```text
ref_name = "draft/user/alice/change-123"
base_ref_name = "main"
```
The draft projection stores only changed rows. Reads overlay draft rows on top
of the base projection:
```text
base rows from main
+ delta rows from draft/user/alice/change-123
- rows where delta.deleted? = true
```
This avoids copying an entire dataset per user draft.
### Applying commits
A commit is applied to the projection for the ref it belongs to. Moving a
bookmark changes which head commit defines that ref. Projection rebuilds replay
commits reachable from the ref and materialize the resulting rows.
Projection rebuilds are safe because the canonical state is in commits and refs.
## Edit Workflow
Local users may create private draft commits at any time, subject to local user
permissions.
A normal local edit flow is:
```text
user edits record
-> find or create private draft ref
-> create or amend a draft commit chunk
-> apply commit to the draft projection
-> move draft bookmark
-> AshNostrSync publishes the commit/bookmark inside the tribe
```
The draft is private externally but cluster-synced internally.
```text
private draft != local-only storage
private draft == not externally visible
```
When the user is ready to share:
```text
draft ref
-> proposal ref
-> proposal object
-> optional remote submission
```
Accepting a proposal moves or merges into the target ref, usually `main`.
## Jujutsu-like Semantics
Kobold borrows these concepts from Jujutsu/Git-like systems:
- **Commit**: immutable operation chunk.
- **Change**: stable logical edit identity represented by `change_id`.
- **Bookmark/ref**: named pointer to a commit.
- **Working copy**: a user's draft ref and its projection.
- **Proposal**: a reviewable request to accept commits into another ref.
- **Clone/fetch/pull**: transfer dataset metadata, refs, and commits.
Unlike Git, the user-facing data model remains structured records and dynamic
fields. Users do not manipulate files or raw patches.
## Internal Cluster Sync
Kobold resources that define canonical state are AshNostrSync resources:
- `Dataset`
- `ResourceDefinition`
- `Commit`
- `Bookmark`
- `Proposal`
Record projections may be synced for speed or rebuilt locally from commits. The
canonical source is always commits plus refs.
Cluster sync is independent of external visibility:
| Object | Cluster sync | External access |
| --- | --- | --- |
| Public dataset | yes | policy-gated |
| Private dataset | yes | deny by default |
| Private draft | yes | not externally reachable |
| Proposal | yes | visible if submitted/shared and allowed |
Kobold must not suppress AshNostrSync publication merely because a dataset is
private.
## External Tribe Access
External access is controlled by Trust and `Tribes.Access`.
Kobold policy actions include:
- `advertise`: may this tribe see the dataset/ref in catalog?
- `read`: may this tribe fetch metadata, refs, and commits?
- `write`: may this tribe submit commits/proposals?
- `admin`: may this tribe administer sharing rules?
Default public dataset policy may allow:
```text
subject_type = tribe
subject_id = *
action = read / advertise
condition = min_trust_score >= 0
```
Private datasets deny external access by default.
Remote access is evaluated against the remote tribe pubkey, not tribe name.
Tribe names are display metadata and are not unique identities.
## Clone and Fetch Over the Wire
A clone transfers canonical history, not projections.
The logical clone sequence is:
```text
remote catalog/discovery
-> user chooses dataset
-> Kobold verifies advertise/read access
-> fetch Dataset metadata
-> fetch ResourceDefinitions
-> fetch allowed Bookmarks/refs
-> fetch reachable Commits
-> rebuild local projections
```
The initial clone envelope can be represented as:
```json
{
"dataset": { "id": "...", "name": "Seed Catalog" },
"resources": [
{ "name": "SeedVariety", "fields": { "name": { "type": "string" } } }
],
"bookmarks": [
{ "name": "origin/<pubkey>/main", "commit_id": "..." }
],
"commits": [
{
"id": "...",
"change_id": "...",
"parent_commit_ids": [],
"operations": []
}
]
}
```
For real datasets, clone/fetch is paged rather than one large response.
## Parrhesia Fast Copy and `SYNC-PAGE`
Kobold should use Parrhesia's optimized sync/backfill machinery for initial
clone and incremental fetch.
The ideal host API is a set of bulk sync/copy primitives, for example:
```elixir
Parrhesia.Sync.page_events(source, filter, opts)
Parrhesia.Sync.stream_events(source, filter, opts)
Parrhesia.Sync.copy_events(source, target, opts)
Parrhesia.Sync.import_events(events, opts)
```
These APIs should support:
- keyset cursors
- page size
- ascending/descending order
- batched event frames
- signature verification
- deduplication
- progress callbacks
- authorization callbacks
Parrhesia's `SYNC-PAGE` is a good low-level primitive for this because it pages
stored events by `(created_at, event_id)` and can return batched `EVENTS` frames.
Kobold should not expose an unrestricted relay backfill to arbitrary remote
tribes. The flow should be:
```text
remote clone request
-> Kobold authenticates remote tribe
-> Kobold evaluates Trust/access
-> Kobold builds allowed event filters
-> Parrhesia fast pagination/copy runs under that authorization
-> receiver imports events and rebuilds projections
```
For efficient dataset-level clone, Kobold/AshNostrSync events need filterable
tags such as:
```text
["r", "plugins.kobold.commit"]
["dataset", dataset_id]
["ref", ref_name]
```
Without dataset/ref tags, clone would need to scan all Kobold commit events and
filter by payload content, which does not scale.
## GraphQL
Kobold exposes a generic repo-style GraphQL API over its stable resources and
actions.
The generic GraphQL schema is static:
```graphql
type KoboldRecordProjection {
datasetId: ID!
refName: String!
resourceName: String!
recordId: ID!
fields: JSON!
headCommitId: ID
}
```
Dynamic user columns remain inside `fields: JSON`. Clients build dynamic forms
and tables by reading `ResourceDefinition.fields`.
GraphQL is a good fit for repo actions:
- `koboldDatasets`
- `koboldDataset`
- `koboldResources`
- `koboldRecords`
- `koboldCommits`
- `koboldBookmarks`
- `koboldCreateDataset`
- `koboldUpsertRecord`
- `koboldCreateDraft`
- `koboldCommitDraft`
- `koboldCloneDataset`
- `koboldFetchDataset`
- `koboldCreateProposal`
- `koboldSubmitProposal`
- `koboldAcceptProposal`
Kobold can use `AshGraphql` for these stable resources and actions.
### Dynamic plugin schemas
The host should not need recompilation when a plugin is installed. A single
merged Absinthe schema containing dynamically installed plugin domains is not the
natural fit for AshGraphql.
Instead, the host exposes one GraphQL endpoint with routing:
```text
POST /graphql
```
A request selects a schema using a header, query parameter, or GraphQL
extensions payload:
```json
{
"extensions": {
"tribes": {
"plugin": "kobold"
}
},
"query": "..."
}
```
The gateway dispatches to:
```text
core schema -> TribesWeb.GraphqlSchema
plugin kobold schema -> KoboldWeb.GraphqlSchema
plugin trust schema -> TrustWeb.GraphqlSchema
```
This gives one endpoint without requiring host recompilation. Introspection is
per routed schema rather than one globally merged graph.
## Supportive Plugins
Supportive plugins do not own Kobold storage. They declare compatibility with a
schema or dataset type and provide richer behavior on top of the same logical
Kobold dataset.
A supportive plugin may provide:
- custom UI
- richer validation
- import/export flows
- computed views
- proposal review UI
- domain-specific actions
- optional typed GraphQL for known schema families
The generic Kobold dataset remains dynamic and schema-as-data. Installing a
supportive plugin improves the experience but does not migrate the dataset into
plugin-owned tables.
This preserves the desired flow:
```text
clone with generic Kobold
-> explore/edit dynamically
-> install supportive plugin later
-> continue using the same dataset identity/history
```
## Query and Performance Model
The basic storage model is JSONB/map-based projections. This is deliberate so
users can add fields dynamically.
Important indexes include:
```text
RecordProjection(dataset_id, ref_name, resource_name, record_id) unique
RecordProjection(dataset_id, ref_name, resource_name)
RecordProjection(head_commit_id)
Commit(dataset_id, inserted_at)
Commit(dataset_id, change_id)
Bookmark(dataset_id, name) unique
```
For common dynamic-field queries, Kobold can add incremental optimizations:
1. JSONB GIN indexes for broad containment queries.
2. Promoted expression indexes for hot fields.
3. A secondary field index table for heavily queried fields.
4. Supportive-plugin views for known schema families.
The first implementation should favor simple, correct dynamic behavior over
runtime DDL.
## Commit Sizing
Commits are chunks. They should not be unbounded.
Guidelines:
- Small UI edit: one commit/change.
- Bulk import: chunked commits, e.g. 100-1000 operations each.
- Large files/blobs: content-addressed attachment references, not inline JSON.
Chunking keeps sync retry behavior and projection rebuilds manageable.
## Security Boundaries
Kobold has three distinct access scopes:
### Same-tribe cluster sync
Nodes in the same tribe receive Kobold sync events according to cluster sync
rules. Kobold dataset visibility does not block this.
### Local users
Local users may see/edit datasets and draft refs according to local user/admin
permissions. A user's private draft ref is not visible to other users unless
shared locally.
### Remote tribes
Remote tribes are identified by pubkey. They may only see externally visible
refs and commits if Trust/access policies allow the requested action.
Remote write access should normally create a proposal or signed write request.
The owning tribe validates and accepts it into its own history. Remote writes do
not blindly mutate the owner's main ref.
## API Compatibility
Kobold may expose compatibility shapes for older tests or clients, such as an
`events` array derived from commits. These are API views, not separate canonical
storage.
The canonical edit model is:
```text
Commit + Bookmark + Projection
```
not:
```text
DatasetEvent + Projection
```
## Implementation Priorities
1. Keep dynamic datasets schema-as-data.
2. Keep commits/bookmarks as canonical history.
3. Scope projections by ref/user.
4. Use AshNostrSync for all cluster-relevant Kobold canonical resources.
5. Ensure private means externally private, not cluster-local.
6. Add routed plugin GraphQL schemas for generic repo actions.
7. Add Parrhesia bulk sync/copy APIs and dataset/ref tags for fast clone.
8. Add proposals and remote write review.
9. Optimize dynamic-field queries only after real usage demonstrates need.