docs: Sketch NIF-77 sync and ACLs

This commit is contained in:
2026-03-16 14:57:08 +01:00
parent b628770517
commit 4c2c93deb3
2 changed files with 697 additions and 297 deletions

397
docs/SYNC.md Normal file
View File

@@ -0,0 +1,397 @@
# Parrhesia Relay Sync
## 1. Purpose
This document defines the Parrhesia proposal for **relay-to-relay event synchronization**.
It is intentionally transport-focused:
- manage remote relay peers,
- catch up on matching events,
- keep a live stream open,
- expose health and basic stats.
It does **not** define application data semantics.
Parrhesia syncs Nostr events. Callers decide which events matter and how to apply them.
---
## 2. Boundary
### Parrhesia is responsible for
- storing and validating events,
- querying and streaming events,
- running outbound sync workers against remote relays,
- tracking peer configuration, worker health, and sync counters,
- exposing peer management through `Parrhesia.API.Sync`.
### Parrhesia is not responsible for
- resource mapping,
- trusted node allowlists for an app profile,
- mutation payload validation beyond normal event validation,
- conflict resolution,
- replay winner selection,
- database upsert/delete semantics.
For Tribes, those remain in `TRIBES-NOSTRSYNC` and `AshNostrSync`.
---
## 3. Security Foundation
### Default posture
The baseline posture for sync traffic is:
- no access to sync events by default,
- no implicit trust from ordinary relay usage,
- no reliance on plaintext confidentiality from public relays.
For the first implementation, Parrhesia should protect sync data primarily with:
- authenticated server identities,
- ACL-gated read and write access,
- TLS with certificate pinning for outbound peers.
### Server identity
Parrhesia owns a low-level server identity used for relay-to-relay authentication.
This identity is separate from:
- TLS endpoint identity,
- application event author pubkeys.
Recommended model:
- Parrhesia has one local server-auth pubkey,
- sync peers authenticate as server-auth pubkeys,
- ACL grants are bound to those authenticated server-auth pubkeys,
- application-level writer trust remains outside Parrhesia.
Identity lifecycle:
1. use configured/imported key if provided,
2. otherwise use persisted local identity,
3. otherwise generate once during initial startup and persist it.
Private key export should not be supported.
### ACLs
Sync traffic should use a real ACL layer, not moderation allowlists.
Initial ACL model:
- principal: authenticated pubkey,
- capabilities: `sync_read`, `sync_write`,
- match: event/filter shape such as `kinds: [5000]` and namespace tags.
This is enough for now. We do **not** need a separate user ACL model and server ACL model yet.
A sync peer is simply an authenticated principal with sync capabilities.
### TLS pinning
Each outbound sync peer must include pinned TLS material.
Recommended pin type:
- SPKI SHA-256 pins
Multiple pins should be allowed to support certificate rotation.
---
## 4. Sync Model
Each configured sync server represents one outbound worker managed by Parrhesia.
Minimum behavior:
1. connect to the remote relay,
2. run an initial catch-up query for the configured filters,
3. ingest received events into the local relay through the normal API path,
4. switch to a live subscription for the same filters,
5. reconnect with backoff when disconnected.
The worker treats filters as opaque Nostr filters. It does not interpret app payloads.
### Initial implementation mode
Initial implementation should use ordinary NIP-01 behavior:
- catch-up via `REQ`-style query,
- live updates via `REQ` subscription.
This is enough for Tribes and keeps the first version simple.
### NIP-77
NIP-77 is **not required** for the first sync implementation.
Reason:
- Parrhesia currently only has `NEG-*` session tracking, not real negentropy reconciliation.
- The current Tribes sync profile already assumes catch-up plus live replay, not negentropy.
NIP-77 should be treated as a later optimization for bandwidth-efficient reconciliation once Parrhesia has a real reusable implementation.
---
## 5. API Surface
Primary control plane:
- `Parrhesia.API.Identity.get/1`
- `Parrhesia.API.Identity.ensure/1`
- `Parrhesia.API.Identity.import/2`
- `Parrhesia.API.Identity.rotate/1`
- `Parrhesia.API.ACL.grant/2`
- `Parrhesia.API.ACL.revoke/2`
- `Parrhesia.API.ACL.list/1`
- `Parrhesia.API.Sync.put_server/2`
- `Parrhesia.API.Sync.remove_server/2`
- `Parrhesia.API.Sync.get_server/2`
- `Parrhesia.API.Sync.list_servers/1`
- `Parrhesia.API.Sync.start_server/2`
- `Parrhesia.API.Sync.stop_server/2`
- `Parrhesia.API.Sync.sync_now/2`
- `Parrhesia.API.Sync.server_stats/2`
- `Parrhesia.API.Sync.sync_stats/1`
- `Parrhesia.API.Sync.sync_health/1`
These APIs are in-process. HTTP management may expose them through `Parrhesia.API.Admin` or direct routing to `Parrhesia.API.Sync`.
---
## 6. Server Specification
`put_server/2` is an upsert.
Suggested server shape:
```elixir
%{
id: "tribes-primary",
url: "wss://relay-a.example/relay",
enabled?: true,
auth_pubkey: "<remote-server-auth-pubkey>",
mode: :req_stream,
filters: [
%{
"kinds" => [5000],
"authors" => ["<trusted-node-pubkey-a>", "<trusted-node-pubkey-b>"],
"#r" => ["tribes.accounts.user", "tribes.chat.tribe"]
}
],
overlap_window_seconds: 300,
auth: %{
type: :nip42
},
tls: %{
mode: :required,
hostname: "relay-a.example",
pins: [
%{type: :spki_sha256, value: "<pin-a>"},
%{type: :spki_sha256, value: "<pin-b>"}
]
},
metadata: %{}
}
```
Required fields:
- `id`
- `url`
- `auth_pubkey`
- `filters`
- `tls`
Recommended fields:
- `enabled?`
- `mode`
- `overlap_window_seconds`
- `auth`
- `metadata`
Rules:
- `id` must be stable and unique locally.
- `url` is the remote relay websocket URL.
- `auth_pubkey` is the expected remote server-auth pubkey.
- `filters` must be valid NIP-01 filters.
- filters are owned by the caller; Parrhesia only validates filter shape.
- `mode` defaults to `:req_stream`.
- `tls.mode` defaults to `:required`.
- `tls.pins` must be non-empty for synced peers.
---
## 7. Runtime State
Each server should have both configuration and runtime status.
Suggested runtime fields:
```elixir
%{
server_id: "tribes-primary",
state: :running,
connected?: true,
last_connected_at: ~U[2026-03-16 10:00:00Z],
last_disconnected_at: nil,
last_sync_started_at: ~U[2026-03-16 10:00:00Z],
last_sync_completed_at: ~U[2026-03-16 10:00:02Z],
last_event_received_at: ~U[2026-03-16 10:12:45Z],
last_eose_at: ~U[2026-03-16 10:00:02Z],
reconnect_attempts: 0,
last_error: nil
}
```
Parrhesia should keep this state generic. It is about relay sync health, not app state convergence.
---
## 8. Stats and Health
### Per-server stats
`server_stats/2` should return basic counters such as:
- `events_received`
- `events_accepted`
- `events_duplicate`
- `events_rejected`
- `query_runs`
- `subscription_restarts`
- `reconnects`
- `last_remote_eose_at`
- `last_error`
### Aggregate sync stats
`sync_stats/1` should summarize:
- total configured servers,
- enabled servers,
- running servers,
- connected servers,
- aggregate event counters,
- aggregate reconnect count.
### Health
`sync_health/1` should be operator-oriented, for example:
```elixir
%{
"status" => "degraded",
"servers_total" => 3,
"servers_connected" => 2,
"servers_failing" => [
%{"id" => "tribes-secondary", "reason" => "connection_refused"}
]
}
```
This is intentionally simple. It should answer “is sync working?” without pretending to prove application convergence.
---
## 9. Event Ingest Path
Events received from a remote sync worker should enter Parrhesia through the same ingest path as any other accepted event.
That means:
1. validate the event,
2. run normal write policy,
3. persist or reject,
4. fan out locally,
5. rely on duplicate-event behavior for idempotency.
This avoids a second ingest path with divergent behavior.
Before normal event acceptance, the sync worker should enforce:
1. pinned TLS validation for the remote endpoint,
2. remote server-auth identity match,
3. local ACL grant permitting the peer to perform sync reads and/or writes.
The sync worker may attach request-context metadata such as:
```elixir
%Parrhesia.API.RequestContext{
caller: :sync,
metadata: %{sync_server_id: "tribes-primary"}
}
```
That metadata is for telemetry and audit only. It must not become app sync semantics.
---
## 10. Persistence
Parrhesia should persist enough sync control-plane state to survive restart:
- local server identity reference,
- configured ACL rules for sync principals,
- configured servers,
- whether a server is enabled,
- optional catch-up cursor or watermark per server,
- basic last-error and last-success markers.
Parrhesia does not need to persist application replay heads or winner state. That remains in the embedding application.
---
## 11. Relationship to Current Features
### BEAM cluster fanout
`Parrhesia.Fanout.MultiNode` is a separate feature.
It provides best-effort live fanout between connected BEAM nodes. It is not remote relay sync and is not a substitute for `Parrhesia.API.Sync`.
### Management stats
Current admin `stats` is relay-global and minimal.
Sync adds a new dimension:
- peer config,
- worker state,
- per-peer counters,
- sync health summary.
That should be exposed without coupling it to app-specific sync semantics.
---
## 12. Tribes Usage
For Tribes, `AshNostrSync` should be able to:
1. rely on Parrhesias local server identity,
2. register one or more remote relays with `Parrhesia.API.Sync.put_server/2`,
3. grant sync ACLs for trusted server-auth pubkeys,
4. provide narrow Nostr filters for `kind: 5000`,
5. observe sync health and counters,
6. consume events via the normal local Parrhesia ingest/query/stream surface.
Tribes should not need Parrhesia to know:
- what a resource namespace means,
- which node pubkeys are trusted for Tribes,
- how to resolve conflicts,
- how to apply an upsert or delete.
That is the key boundary.