You've already forked tribes-plugin-kobold
c312e22750
CI / Test (push) Failing after 15s
Document the intended dynamic dataset architecture with commit chunks, ref-scoped projections, Trust-gated external access, routed GraphQL schemas, and Parrhesia-backed clone/fetch flows.
696 lines
17 KiB
Markdown
696 lines
17 KiB
Markdown
# Kobold Architecture
|
|
|
|
This document describes the intended Kobold architecture: dynamic datasets with
|
|
schema-as-data, commit-oriented history, fast local projections, Trust-gated
|
|
tribe-to-tribe exchange, and generic GraphQL/repo-style APIs.
|
|
|
|
Kobold is not a SQL table generator and is not a second cluster-sync engine. It
|
|
is a dynamic data/repo layer built on top of the host primitives that already
|
|
exist in Tribes:
|
|
|
|
- **Ash resources** define Kobold's stable internal model.
|
|
- **AshPostgres** stores Kobold metadata, commits, refs, and projections.
|
|
- **AshNostrSync** syncs Kobold resources across nodes in the same tribe.
|
|
- **Parrhesia/Nostr** carries signed sync events and fast backfill.
|
|
- **Trust + Tribes.Access** decide which other tribes may discover, read, write,
|
|
or propose changes.
|
|
|
|
The core invariant is:
|
|
|
|
> Kobold visibility never limits same-tribe cluster sync. Visibility only limits
|
|
> external tribe access.
|
|
|
|
A private dataset, private commit, or private draft is still replicated inside
|
|
the local tribe for durability and multi-node consistency. It is simply not
|
|
advertised or exported to other tribes unless an access rule explicitly allows
|
|
that.
|
|
|
|
## Goals
|
|
|
|
Kobold should support this user flow:
|
|
|
|
1. A user discovers a remote dataset.
|
|
2. The user clones it into their tribe.
|
|
3. The user explores it with the generic Kobold UI/API.
|
|
4. The user adds fields/columns dynamically.
|
|
5. The user makes private local drafts.
|
|
6. The user installs a supportive plugin later for richer domain-specific UI.
|
|
7. The same logical dataset remains in place.
|
|
8. The user prepares a proposal/PR back to the upstream tribe.
|
|
|
|
This requires dynamic storage. User-defined columns cannot require new Elixir
|
|
modules, migrations, or host recompilation.
|
|
|
|
## High-level Shape
|
|
|
|
```text
|
|
Dynamic dataset schema
|
|
-> ResourceDefinition fields as data
|
|
|
|
Local edits
|
|
-> Commit chunks
|
|
-> Bookmark/ref movement
|
|
-> Ref-scoped RecordProjection updates
|
|
-> AshNostrSync syncs within the tribe
|
|
|
|
External sharing
|
|
-> Kobold catalog/clone/proposal gateway
|
|
-> Trust + access checks
|
|
-> Parrhesia fast event paging/copy
|
|
```
|
|
|
|
Kobold's canonical data is the commit/ref history. Projections are materialized
|
|
read models.
|
|
|
|
```text
|
|
Dataset
|
|
ResourceDefinition
|
|
Commit
|
|
Bookmark / Ref
|
|
Proposal
|
|
RecordProjection
|
|
```
|
|
|
|
## Stable Resources
|
|
|
|
Kobold itself uses a small fixed set of Ash resources. These are normal static
|
|
Elixir modules and normal AshPostgres-backed tables.
|
|
|
|
### Dataset
|
|
|
|
A dataset is the logical repo boundary.
|
|
|
|
It contains stable metadata such as:
|
|
|
|
- `id`
|
|
- `name`
|
|
- `description`
|
|
- `owner_pubkey`
|
|
- `origin_tribe_pubkey`
|
|
- `origin_dataset_id`
|
|
- `schema_name`
|
|
- `schema_version`
|
|
- `external_visibility`
|
|
- `metadata`
|
|
|
|
The dataset is the primary unit of:
|
|
|
|
- discovery
|
|
- clone/fork
|
|
- access policy
|
|
- supportive plugin compatibility
|
|
- proposal targeting
|
|
|
|
Dataset rows are low-churn. Record edits do not rewrite the dataset row.
|
|
|
|
### ResourceDefinition
|
|
|
|
A resource definition describes one record type inside a dataset.
|
|
|
|
Example resources:
|
|
|
|
- `SeedVariety`
|
|
- `SeedLot`
|
|
- `Supplier`
|
|
|
|
Fields are schema-as-data:
|
|
|
|
```json
|
|
{
|
|
"name": { "type": "string", "required": true },
|
|
"germination_days": { "type": "integer" },
|
|
"source": { "type": "string" }
|
|
}
|
|
```
|
|
|
|
Adding a column means updating the resource definition and recording that schema
|
|
change as a commit. It does not require DDL or code generation.
|
|
|
|
### Commit
|
|
|
|
A commit is an immutable chunk of dataset operations.
|
|
|
|
A commit has:
|
|
|
|
- `id` / commit id
|
|
- `change_id`
|
|
- `dataset_id`
|
|
- `parent_commit_ids`
|
|
- `author_user_id`
|
|
- `author_pubkey`
|
|
- `message`
|
|
- `visibility`
|
|
- `draft?`
|
|
- `operations`
|
|
- `metadata`
|
|
|
|
The `change_id` is the stable identity of an evolving change. A new version of
|
|
the same logical change may have a new commit id while preserving the same
|
|
change id, matching the Jujutsu mental model.
|
|
|
|
Operations are stored as JSON data. Examples:
|
|
|
|
```json
|
|
{
|
|
"op": "upsert",
|
|
"resource_name": "SeedVariety",
|
|
"record_id": "...",
|
|
"fields": {
|
|
"name": "Black Krim",
|
|
"germination_days": 8
|
|
}
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"op": "delete",
|
|
"resource_name": "SeedVariety",
|
|
"record_id": "..."
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"op": "schema.add_field",
|
|
"resource_name": "SeedVariety",
|
|
"field_name": "source",
|
|
"definition": { "type": "string" }
|
|
}
|
|
```
|
|
|
|
Commits are the canonical edit history. There is no separate per-record event
|
|
table layered on top of AshNostrSync.
|
|
|
|
### Bookmark / Ref
|
|
|
|
A bookmark is a named pointer to a commit.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
main
|
|
origin/<tribe_pubkey>/main
|
|
draft/user/<user_id>/<change_id>
|
|
proposal/<proposal_id>
|
|
```
|
|
|
|
A bookmark has:
|
|
|
|
- `dataset_id`
|
|
- `name`
|
|
- `commit_id`
|
|
- `scope`: `main | draft | proposal | remote`
|
|
- `visibility`: `private | shared | public`
|
|
- `owner_user_id`
|
|
- `base_ref_name`
|
|
- `metadata`
|
|
|
|
Bookmarks are the sharing boundary. The existence of a commit does not mean
|
|
another tribe can fetch it. A remote tribe may fetch only commits reachable from
|
|
bookmarks/proposals that are externally visible and allowed by policy.
|
|
|
|
### Proposal
|
|
|
|
A proposal is a PR-like object that asks another tribe or local maintainer to
|
|
review and accept a set of commits.
|
|
|
|
A proposal has:
|
|
|
|
- `dataset_id`
|
|
- `source_ref_name`
|
|
- `source_head_commit_id`
|
|
- `target_ref_name`
|
|
- `author_user_id`
|
|
- `author_pubkey`
|
|
- `target_tribe_pubkey`
|
|
- `status`: `draft | submitted | accepted | rejected | closed`
|
|
- `message`
|
|
- `metadata`
|
|
|
|
A proposal normally points at a proposal ref:
|
|
|
|
```text
|
|
proposal/<proposal_id>
|
|
```
|
|
|
|
Remote tribes see proposal refs, not private local draft refs.
|
|
|
|
### RecordProjection
|
|
|
|
A record projection is a materialized view for reading and UI rendering. It is
|
|
not the canonical history.
|
|
|
|
A projection row has:
|
|
|
|
- `dataset_id`
|
|
- `ref_name`
|
|
- `base_ref_name`
|
|
- `resource_name`
|
|
- `record_id`
|
|
- `fields`
|
|
- `deleted?`
|
|
- `head_commit_id`
|
|
|
|
The uniqueness key is:
|
|
|
|
```text
|
|
dataset_id + ref_name + resource_name + record_id
|
|
```
|
|
|
|
This lets multiple users have independent drafts without overwriting each
|
|
other's working view.
|
|
|
|
## Projection Strategy
|
|
|
|
Kobold uses ref-scoped projections.
|
|
|
|
### Main projection
|
|
|
|
`main` is projected as a complete current-state read model:
|
|
|
|
```text
|
|
ref_name = "main"
|
|
base_ref_name = null
|
|
```
|
|
|
|
Generic reads default to `main`.
|
|
|
|
### Draft/proposal projections
|
|
|
|
Draft and proposal refs are delta projections over a base ref:
|
|
|
|
```text
|
|
ref_name = "draft/user/alice/change-123"
|
|
base_ref_name = "main"
|
|
```
|
|
|
|
The draft projection stores only changed rows. Reads overlay draft rows on top
|
|
of the base projection:
|
|
|
|
```text
|
|
base rows from main
|
|
+ delta rows from draft/user/alice/change-123
|
|
- rows where delta.deleted? = true
|
|
```
|
|
|
|
This avoids copying an entire dataset per user draft.
|
|
|
|
### Applying commits
|
|
|
|
A commit is applied to the projection for the ref it belongs to. Moving a
|
|
bookmark changes which head commit defines that ref. Projection rebuilds replay
|
|
commits reachable from the ref and materialize the resulting rows.
|
|
|
|
Projection rebuilds are safe because the canonical state is in commits and refs.
|
|
|
|
## Edit Workflow
|
|
|
|
Local users may create private draft commits at any time, subject to local user
|
|
permissions.
|
|
|
|
A normal local edit flow is:
|
|
|
|
```text
|
|
user edits record
|
|
-> find or create private draft ref
|
|
-> create or amend a draft commit chunk
|
|
-> apply commit to the draft projection
|
|
-> move draft bookmark
|
|
-> AshNostrSync publishes the commit/bookmark inside the tribe
|
|
```
|
|
|
|
The draft is private externally but cluster-synced internally.
|
|
|
|
```text
|
|
private draft != local-only storage
|
|
private draft == not externally visible
|
|
```
|
|
|
|
When the user is ready to share:
|
|
|
|
```text
|
|
draft ref
|
|
-> proposal ref
|
|
-> proposal object
|
|
-> optional remote submission
|
|
```
|
|
|
|
Accepting a proposal moves or merges into the target ref, usually `main`.
|
|
|
|
## Jujutsu-like Semantics
|
|
|
|
Kobold borrows these concepts from Jujutsu/Git-like systems:
|
|
|
|
- **Commit**: immutable operation chunk.
|
|
- **Change**: stable logical edit identity represented by `change_id`.
|
|
- **Bookmark/ref**: named pointer to a commit.
|
|
- **Working copy**: a user's draft ref and its projection.
|
|
- **Proposal**: a reviewable request to accept commits into another ref.
|
|
- **Clone/fetch/pull**: transfer dataset metadata, refs, and commits.
|
|
|
|
Unlike Git, the user-facing data model remains structured records and dynamic
|
|
fields. Users do not manipulate files or raw patches.
|
|
|
|
## Internal Cluster Sync
|
|
|
|
Kobold resources that define canonical state are AshNostrSync resources:
|
|
|
|
- `Dataset`
|
|
- `ResourceDefinition`
|
|
- `Commit`
|
|
- `Bookmark`
|
|
- `Proposal`
|
|
|
|
Record projections may be synced for speed or rebuilt locally from commits. The
|
|
canonical source is always commits plus refs.
|
|
|
|
Cluster sync is independent of external visibility:
|
|
|
|
| Object | Cluster sync | External access |
|
|
| --- | --- | --- |
|
|
| Public dataset | yes | policy-gated |
|
|
| Private dataset | yes | deny by default |
|
|
| Private draft | yes | not externally reachable |
|
|
| Proposal | yes | visible if submitted/shared and allowed |
|
|
|
|
Kobold must not suppress AshNostrSync publication merely because a dataset is
|
|
private.
|
|
|
|
## External Tribe Access
|
|
|
|
External access is controlled by Trust and `Tribes.Access`.
|
|
|
|
Kobold policy actions include:
|
|
|
|
- `advertise`: may this tribe see the dataset/ref in catalog?
|
|
- `read`: may this tribe fetch metadata, refs, and commits?
|
|
- `write`: may this tribe submit commits/proposals?
|
|
- `admin`: may this tribe administer sharing rules?
|
|
|
|
Default public dataset policy may allow:
|
|
|
|
```text
|
|
subject_type = tribe
|
|
subject_id = *
|
|
action = read / advertise
|
|
condition = min_trust_score >= 0
|
|
```
|
|
|
|
Private datasets deny external access by default.
|
|
|
|
Remote access is evaluated against the remote tribe pubkey, not tribe name.
|
|
Tribe names are display metadata and are not unique identities.
|
|
|
|
## Clone and Fetch Over the Wire
|
|
|
|
A clone transfers canonical history, not projections.
|
|
|
|
The logical clone sequence is:
|
|
|
|
```text
|
|
remote catalog/discovery
|
|
-> user chooses dataset
|
|
-> Kobold verifies advertise/read access
|
|
-> fetch Dataset metadata
|
|
-> fetch ResourceDefinitions
|
|
-> fetch allowed Bookmarks/refs
|
|
-> fetch reachable Commits
|
|
-> rebuild local projections
|
|
```
|
|
|
|
The initial clone envelope can be represented as:
|
|
|
|
```json
|
|
{
|
|
"dataset": { "id": "...", "name": "Seed Catalog" },
|
|
"resources": [
|
|
{ "name": "SeedVariety", "fields": { "name": { "type": "string" } } }
|
|
],
|
|
"bookmarks": [
|
|
{ "name": "origin/<pubkey>/main", "commit_id": "..." }
|
|
],
|
|
"commits": [
|
|
{
|
|
"id": "...",
|
|
"change_id": "...",
|
|
"parent_commit_ids": [],
|
|
"operations": []
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
For real datasets, clone/fetch is paged rather than one large response.
|
|
|
|
## Parrhesia Fast Copy and `SYNC-PAGE`
|
|
|
|
Kobold should use Parrhesia's optimized sync/backfill machinery for initial
|
|
clone and incremental fetch.
|
|
|
|
The ideal host API is a set of bulk sync/copy primitives, for example:
|
|
|
|
```elixir
|
|
Parrhesia.Sync.page_events(source, filter, opts)
|
|
Parrhesia.Sync.stream_events(source, filter, opts)
|
|
Parrhesia.Sync.copy_events(source, target, opts)
|
|
Parrhesia.Sync.import_events(events, opts)
|
|
```
|
|
|
|
These APIs should support:
|
|
|
|
- keyset cursors
|
|
- page size
|
|
- ascending/descending order
|
|
- batched event frames
|
|
- signature verification
|
|
- deduplication
|
|
- progress callbacks
|
|
- authorization callbacks
|
|
|
|
Parrhesia's `SYNC-PAGE` is a good low-level primitive for this because it pages
|
|
stored events by `(created_at, event_id)` and can return batched `EVENTS` frames.
|
|
|
|
Kobold should not expose an unrestricted relay backfill to arbitrary remote
|
|
tribes. The flow should be:
|
|
|
|
```text
|
|
remote clone request
|
|
-> Kobold authenticates remote tribe
|
|
-> Kobold evaluates Trust/access
|
|
-> Kobold builds allowed event filters
|
|
-> Parrhesia fast pagination/copy runs under that authorization
|
|
-> receiver imports events and rebuilds projections
|
|
```
|
|
|
|
For efficient dataset-level clone, Kobold/AshNostrSync events need filterable
|
|
tags such as:
|
|
|
|
```text
|
|
["r", "plugins.kobold.commit"]
|
|
["dataset", dataset_id]
|
|
["ref", ref_name]
|
|
```
|
|
|
|
Without dataset/ref tags, clone would need to scan all Kobold commit events and
|
|
filter by payload content, which does not scale.
|
|
|
|
## GraphQL
|
|
|
|
Kobold exposes a generic repo-style GraphQL API over its stable resources and
|
|
actions.
|
|
|
|
The generic GraphQL schema is static:
|
|
|
|
```graphql
|
|
type KoboldRecordProjection {
|
|
datasetId: ID!
|
|
refName: String!
|
|
resourceName: String!
|
|
recordId: ID!
|
|
fields: JSON!
|
|
headCommitId: ID
|
|
}
|
|
```
|
|
|
|
Dynamic user columns remain inside `fields: JSON`. Clients build dynamic forms
|
|
and tables by reading `ResourceDefinition.fields`.
|
|
|
|
GraphQL is a good fit for repo actions:
|
|
|
|
- `koboldDatasets`
|
|
- `koboldDataset`
|
|
- `koboldResources`
|
|
- `koboldRecords`
|
|
- `koboldCommits`
|
|
- `koboldBookmarks`
|
|
- `koboldCreateDataset`
|
|
- `koboldUpsertRecord`
|
|
- `koboldCreateDraft`
|
|
- `koboldCommitDraft`
|
|
- `koboldCloneDataset`
|
|
- `koboldFetchDataset`
|
|
- `koboldCreateProposal`
|
|
- `koboldSubmitProposal`
|
|
- `koboldAcceptProposal`
|
|
|
|
Kobold can use `AshGraphql` for these stable resources and actions.
|
|
|
|
### Dynamic plugin schemas
|
|
|
|
The host should not need recompilation when a plugin is installed. A single
|
|
merged Absinthe schema containing dynamically installed plugin domains is not the
|
|
natural fit for AshGraphql.
|
|
|
|
Instead, the host exposes one GraphQL endpoint with routing:
|
|
|
|
```text
|
|
POST /graphql
|
|
```
|
|
|
|
A request selects a schema using a header, query parameter, or GraphQL
|
|
extensions payload:
|
|
|
|
```json
|
|
{
|
|
"extensions": {
|
|
"tribes": {
|
|
"plugin": "kobold"
|
|
}
|
|
},
|
|
"query": "..."
|
|
}
|
|
```
|
|
|
|
The gateway dispatches to:
|
|
|
|
```text
|
|
core schema -> TribesWeb.GraphqlSchema
|
|
plugin kobold schema -> KoboldWeb.GraphqlSchema
|
|
plugin trust schema -> TrustWeb.GraphqlSchema
|
|
```
|
|
|
|
This gives one endpoint without requiring host recompilation. Introspection is
|
|
per routed schema rather than one globally merged graph.
|
|
|
|
## Supportive Plugins
|
|
|
|
Supportive plugins do not own Kobold storage. They declare compatibility with a
|
|
schema or dataset type and provide richer behavior on top of the same logical
|
|
Kobold dataset.
|
|
|
|
A supportive plugin may provide:
|
|
|
|
- custom UI
|
|
- richer validation
|
|
- import/export flows
|
|
- computed views
|
|
- proposal review UI
|
|
- domain-specific actions
|
|
- optional typed GraphQL for known schema families
|
|
|
|
The generic Kobold dataset remains dynamic and schema-as-data. Installing a
|
|
supportive plugin improves the experience but does not migrate the dataset into
|
|
plugin-owned tables.
|
|
|
|
This preserves the desired flow:
|
|
|
|
```text
|
|
clone with generic Kobold
|
|
-> explore/edit dynamically
|
|
-> install supportive plugin later
|
|
-> continue using the same dataset identity/history
|
|
```
|
|
|
|
## Query and Performance Model
|
|
|
|
The basic storage model is JSONB/map-based projections. This is deliberate so
|
|
users can add fields dynamically.
|
|
|
|
Important indexes include:
|
|
|
|
```text
|
|
RecordProjection(dataset_id, ref_name, resource_name, record_id) unique
|
|
RecordProjection(dataset_id, ref_name, resource_name)
|
|
RecordProjection(head_commit_id)
|
|
Commit(dataset_id, inserted_at)
|
|
Commit(dataset_id, change_id)
|
|
Bookmark(dataset_id, name) unique
|
|
```
|
|
|
|
For common dynamic-field queries, Kobold can add incremental optimizations:
|
|
|
|
1. JSONB GIN indexes for broad containment queries.
|
|
2. Promoted expression indexes for hot fields.
|
|
3. A secondary field index table for heavily queried fields.
|
|
4. Supportive-plugin views for known schema families.
|
|
|
|
The first implementation should favor simple, correct dynamic behavior over
|
|
runtime DDL.
|
|
|
|
## Commit Sizing
|
|
|
|
Commits are chunks. They should not be unbounded.
|
|
|
|
Guidelines:
|
|
|
|
- Small UI edit: one commit/change.
|
|
- Bulk import: chunked commits, e.g. 100-1000 operations each.
|
|
- Large files/blobs: content-addressed attachment references, not inline JSON.
|
|
|
|
Chunking keeps sync retry behavior and projection rebuilds manageable.
|
|
|
|
## Security Boundaries
|
|
|
|
Kobold has three distinct access scopes:
|
|
|
|
### Same-tribe cluster sync
|
|
|
|
Nodes in the same tribe receive Kobold sync events according to cluster sync
|
|
rules. Kobold dataset visibility does not block this.
|
|
|
|
### Local users
|
|
|
|
Local users may see/edit datasets and draft refs according to local user/admin
|
|
permissions. A user's private draft ref is not visible to other users unless
|
|
shared locally.
|
|
|
|
### Remote tribes
|
|
|
|
Remote tribes are identified by pubkey. They may only see externally visible
|
|
refs and commits if Trust/access policies allow the requested action.
|
|
|
|
Remote write access should normally create a proposal or signed write request.
|
|
The owning tribe validates and accepts it into its own history. Remote writes do
|
|
not blindly mutate the owner's main ref.
|
|
|
|
## API Compatibility
|
|
|
|
Kobold may expose compatibility shapes for older tests or clients, such as an
|
|
`events` array derived from commits. These are API views, not separate canonical
|
|
storage.
|
|
|
|
The canonical edit model is:
|
|
|
|
```text
|
|
Commit + Bookmark + Projection
|
|
```
|
|
|
|
not:
|
|
|
|
```text
|
|
DatasetEvent + Projection
|
|
```
|
|
|
|
## Implementation Priorities
|
|
|
|
1. Keep dynamic datasets schema-as-data.
|
|
2. Keep commits/bookmarks as canonical history.
|
|
3. Scope projections by ref/user.
|
|
4. Use AshNostrSync for all cluster-relevant Kobold canonical resources.
|
|
5. Ensure private means externally private, not cluster-local.
|
|
6. Add routed plugin GraphQL schemas for generic repo actions.
|
|
7. Add Parrhesia bulk sync/copy APIs and dataset/ref tags for fast clone.
|
|
8. Add proposals and remote write review.
|
|
9. Optimize dynamic-field queries only after real usage demonstrates need.
|