parrhesia/docs/SYNC.md

# Parrhesia Relay Sync

## 1. Purpose

This document defines the Parrhesia proposal for **relay-to-relay event synchronization**.

It is intentionally transport-focused:

- manage remote relay peers,
- catch up on matching events,
- keep a live stream open,
- expose health and basic stats.

It does **not** define application data semantics.

Parrhesia syncs Nostr events. Callers decide which events matter and how to apply them.

---

## 2. Boundary

### Parrhesia is responsible for

- storing and validating events,
- querying and streaming events,
- running outbound sync workers against remote relays,
- tracking peer configuration, worker health, and sync counters,
- exposing peer management through `Parrhesia.API.Sync`.

### Parrhesia is not responsible for

- resource mapping,
- trusted node allowlists for an app profile,
- mutation payload validation beyond normal event validation,
- conflict resolution,
- replay winner selection,
- database upsert/delete semantics.

For Tribes, those remain in `TRIBES-NOSTRSYNC` and `AshNostrSync`.

---

## 3. Security Foundation

### Default posture

The baseline posture for sync traffic is:

- no access to sync events by default,
- no implicit trust from ordinary relay usage,
- no reliance on plaintext confidentiality from public relays.

For the first implementation, Parrhesia should protect sync data primarily with:

- authenticated server identities,
- ACL-gated read and write access,
- TLS with certificate pinning for outbound peers.

### Server identity

Parrhesia owns a low-level server identity used for relay-to-relay authentication.

This identity is separate from:

- TLS endpoint identity,
- application event author pubkeys.

Recommended model:

- Parrhesia has one local server-auth pubkey,
- sync peers authenticate as server-auth pubkeys,
- ACL grants are bound to those authenticated server-auth pubkeys,
- application-level writer trust remains outside Parrhesia.

Identity lifecycle:

1. use configured/imported key if provided,
2. otherwise use persisted local identity,
3. otherwise generate once during initial startup and persist it.

Private key export should not be supported.

### ACLs

Sync traffic should use a real ACL layer, not moderation allowlists.

Current implementation note:

- Parrhesia already has storage-backed moderation state such as `allowed_pubkeys` and `blocked_ips`,
- that is not the sync ACL model,
- sync protection must be enforced in the active websocket/query/count/negentropy/write path, not inferred from management tables alone.

Initial ACL model:

- principal: authenticated pubkey,
- capabilities: `sync_read`, `sync_write`,
- match: event/filter shape such as `kinds: [5000]` and namespace tags.

This is enough for now. We do **not** need a separate user ACL model and server ACL model yet.

A sync peer is simply an authenticated principal with sync capabilities.

### TLS pinning

Each outbound sync peer must include pinned TLS material.

Recommended pin type:

- SPKI SHA-256 pins

Multiple pins should be allowed to support certificate rotation.

---

## 4. Sync Model

Each configured sync server represents one outbound worker managed by Parrhesia.

Implementation note:

- Khatru-style relay designs benefit from explicit runtime stages,
- Parrhesia sync should therefore plug into clear internal phases for connection admission, auth, query/count, subscription, negentropy, publish, and fanout,
- this should stay a runtime refactor, not become extra sync semantics.

Minimum behavior:

1. connect to the remote relay,
2. run an initial catch-up query for the configured filters,
3. ingest received events into the local relay through the normal API path,
4. switch to a live subscription for the same filters,
5. reconnect with backoff when disconnected.

The worker treats filters as opaque Nostr filters. It does not interpret app payloads.

### Initial implementation mode

Initial implementation should use ordinary NIP-01 behavior:

- catch-up via `REQ`-style query,
- live updates via `REQ` subscription.

This is enough for Tribes and keeps the first version simple.

### NIP-77

Parrhesia now has a real reusable relay-side NIP-77 engine:

- proper `NEG-OPEN` / `NEG-MSG` / `NEG-CLOSE` / `NEG-ERR` framing,
- a reusable negentropy codec and reconciliation engine,
- bounded local `(created_at, id)` snapshot enumeration for matching filters,
- connection/session integration with policy checks and resource limits.

That means NIP-77 can be used for bandwidth-efficient catch-up between trusted nodes.

The first sync worker implementation may still default to ordinary NIP-01 catch-up plus live replay, because that path is operationally simpler and already matches the current Tribes sync profile. `:negentropy` can now be introduced as an optimization mode rather than a future prerequisite.

---

## 5. API Surface

Primary control plane:

- `Parrhesia.API.Identity.get/1`
- `Parrhesia.API.Identity.ensure/1`
- `Parrhesia.API.Identity.import/2`
- `Parrhesia.API.Identity.rotate/1`
- `Parrhesia.API.ACL.grant/2`
- `Parrhesia.API.ACL.revoke/2`
- `Parrhesia.API.ACL.list/1`
- `Parrhesia.API.Sync.put_server/2`
- `Parrhesia.API.Sync.remove_server/2`
- `Parrhesia.API.Sync.get_server/2`
- `Parrhesia.API.Sync.list_servers/1`
- `Parrhesia.API.Sync.start_server/2`
- `Parrhesia.API.Sync.stop_server/2`
- `Parrhesia.API.Sync.sync_now/2`
- `Parrhesia.API.Sync.server_stats/2`
- `Parrhesia.API.Sync.sync_stats/1`
- `Parrhesia.API.Sync.sync_health/1`

These APIs are in-process. HTTP management may expose them through `Parrhesia.API.Admin` or direct routing to `Parrhesia.API.Sync`.

---

## 6. Server Specification

`put_server/2` is an upsert.

Suggested server shape:

```elixir
%{
  id: "tribes-primary",
  url: "wss://relay-a.example/relay",
  enabled?: true,
  auth_pubkey: "<remote-server-auth-pubkey>",
  mode: :req_stream,
  filters: [
    %{
      "kinds" => [5000],
      "authors" => ["<trusted-node-pubkey-a>", "<trusted-node-pubkey-b>"],
      "#r" => ["tribes.accounts.user", "tribes.chat.tribe"]
    }
  ],
  overlap_window_seconds: 300,
  auth: %{
    type: :nip42
  },
  tls: %{
    mode: :required,
    hostname: "relay-a.example",
    pins: [
      %{type: :spki_sha256, value: "<pin-a>"},
      %{type: :spki_sha256, value: "<pin-b>"}
    ]
  },
  metadata: %{}
}
```

Required fields:

- `id`
- `url`
- `auth_pubkey`
- `filters`
- `tls`

Recommended fields:

- `enabled?`
- `mode`
- `overlap_window_seconds`
- `auth`
- `metadata`

Rules:

- `id` must be stable and unique locally.
- `url` is the remote relay websocket URL.
- `auth_pubkey` is the expected remote server-auth pubkey.
- `filters` must be valid NIP-01 filters.
- filters are owned by the caller; Parrhesia only validates filter shape.
- `mode` defaults to `:req_stream`.
- `tls.mode` defaults to `:required`.
- `tls.pins` must be non-empty for synced peers.

---

## 7. Runtime State

Each server should have both configuration and runtime status.

Suggested runtime fields:

```elixir
%{
  server_id: "tribes-primary",
  state: :running,
  connected?: true,
  last_connected_at: ~U[2026-03-16 10:00:00Z],
  last_disconnected_at: nil,
  last_sync_started_at: ~U[2026-03-16 10:00:00Z],
  last_sync_completed_at: ~U[2026-03-16 10:00:02Z],
  last_event_received_at: ~U[2026-03-16 10:12:45Z],
  last_eose_at: ~U[2026-03-16 10:00:02Z],
  reconnect_attempts: 0,
  last_error: nil
}
```

Parrhesia should keep this state generic. It is about relay sync health, not app state convergence.

---

## 8. Stats and Health

### Per-server stats

`server_stats/2` should return basic counters such as:

- `events_received`
- `events_accepted`
- `events_duplicate`
- `events_rejected`
- `query_runs`
- `subscription_restarts`
- `reconnects`
- `last_remote_eose_at`
- `last_error`

### Aggregate sync stats

`sync_stats/1` should summarize:

- total configured servers,
- enabled servers,
- running servers,
- connected servers,
- aggregate event counters,
- aggregate reconnect count.

### Health

`sync_health/1` should be operator-oriented, for example:

```elixir
%{
  "status" => "degraded",
  "servers_total" => 3,
  "servers_connected" => 2,
  "servers_failing" => [
    %{"id" => "tribes-secondary", "reason" => "connection_refused"}
  ]
}
```

This is intentionally simple. It should answer “is sync working?” without pretending to prove application convergence.

---

## 9. Event Ingest Path

Events received from a remote sync worker should enter Parrhesia through the same ingest path as any other accepted event.

That means:

1. validate the event,
2. run normal write policy,
3. persist or reject,
4. fan out locally,
5. rely on duplicate-event behavior for idempotency.

This avoids a second ingest path with divergent behavior.

Before normal event acceptance, the sync worker should enforce:

1. pinned TLS validation for the remote endpoint,
2. remote server-auth identity match,
3. local ACL grant permitting the peer to perform sync reads and/or writes.

The sync worker may attach request-context metadata such as:

```elixir
%Parrhesia.API.RequestContext{
  caller: :sync,
  peer_id: "tribes-primary",
  metadata: %{sync_server_id: "tribes-primary"}
}
```

Recommended additional context when available:

- `remote_ip`
- `subscription_id`

This context is for telemetry, policy, and audit only. It must not become app sync semantics.

---

## 10. Persistence

Parrhesia should persist enough sync control-plane state to survive restart:

- local server identity reference,
- configured ACL rules for sync principals,
- configured servers,
- whether a server is enabled,
- optional catch-up cursor or watermark per server,
- basic last-error and last-success markers.

Parrhesia does not need to persist application replay heads or winner state. That remains in the embedding application.

---

## 11. Relationship to Current Features

### BEAM cluster fanout

`Parrhesia.Fanout.MultiNode` is a separate feature.

It provides best-effort live fanout between connected BEAM nodes. It is not remote relay sync and is not a substitute for `Parrhesia.API.Sync`.

### Management stats

Current admin `stats` is relay-global and minimal.

Sync adds a new dimension:

- peer config,
- worker state,
- per-peer counters,
- sync health summary.

That should be exposed without coupling it to app-specific sync semantics.

---

## 12. Tribes Usage

For Tribes, `AshNostrSync` should be able to:

1. rely on Parrhesia’s local server identity,
2. register one or more remote relays with `Parrhesia.API.Sync.put_server/2`,
3. grant sync ACLs for trusted server-auth pubkeys,
4. provide narrow Nostr filters for `kind: 5000`,
5. observe sync health and counters,
6. consume events via the normal local Parrhesia ingest/query/stream surface.

Tribes should not need Parrhesia to know:

- what a resource namespace means,
- which node pubkeys are trusted for Tribes,
- how to resolve conflicts,
- how to apply an upsert or delete.

That is the key boundary.