Add monthly partition maintenance and retention pruning

2026-03-14 18:09:53 +01:00
parent 19664ac56c
commit 889d630c12
12 changed files with 1359 additions and 76 deletions
--- a/docs/CLUSTER.md
+++ b/docs/CLUSTER.md
@@ -0,0 +1,234 @@
+# Parrhesia clustering and distributed fanout
+
+This document describes:
+
+1. the **current** distributed fanout behavior implemented today, and
+2. a practical evolution path to a more production-grade clustered relay.
+
+---
+
+## 1) Current state (implemented today)
+
+### 1.1 What exists right now
+
+Parrhesia currently includes a lightweight multi-node live fanout path (untested!):
+
+- `Parrhesia.Fanout.MultiNode` (`lib/parrhesia/fanout/multi_node.ex`)
+  - GenServer that joins a `:pg` process group.
+  - Receives locally-published events and forwards them to other group members.
+  - Receives remote events and performs local fanout lookup.
+- `Parrhesia.Web.Connection` (`lib/parrhesia/web/connection.ex`)
+  - On successful ingest, after ACK scheduling, it does:
+    1. local fanout (`fanout_event/1`), then
+    2. cross-node publish (`maybe_publish_multi_node/1`).
+- `Parrhesia.Subscriptions.Supervisor` (`lib/parrhesia/subscriptions/supervisor.ex`)
+  - Starts `Parrhesia.Fanout.MultiNode` unconditionally.
+
+In other words: **if BEAM nodes are connected, live events are fanned out cross-node**.
+
+### 1.2 What is not included yet
+
+- No automatic cluster formation/discovery (no `libcluster`, DNS polling, gossip, etc.).
+- No durable inter-node event transport.
+- No replay/recovery of missed cross-node live events.
+- No explicit per-node delivery ACK between relay nodes.
+
+---
+
+## 2) Current runtime behavior in detail
+
+### 2.1 Local ingest flow and publish ordering
+
+For an accepted event in `Parrhesia.Web.Connection`:
+
+1. validate/policy/persist path runs.
+2. Client receives `OK` reply.
+3. A post-ACK message triggers:
+   - local fanout (`Index.candidate_subscription_keys/1` + send `{:fanout_event, ...}`),
+   - multi-node publish (`MultiNode.publish/1`).
+
+Important semantics:
+
+- Regular persisted events: ACK implies DB persistence succeeded.
+- Ephemeral events: ACK implies accepted by policy, but no DB durability.
+- Cross-node fanout happens **after** ACK path is scheduled.
+
+### 2.2 Multi-node transport mechanics
+
+`Parrhesia.Fanout.MultiNode` uses `:pg` membership:
+
+- On init:
+  - ensures `:pg` is started,
+  - joins group `Parrhesia.Fanout.MultiNode`.
+- On publish:
+  - gets all group members,
+  - excludes itself,
+  - sends `{:remote_fanout_event, event}` to each member pid.
+- On remote receive:
+  - runs local subscription candidate narrowing via `Parrhesia.Subscriptions.Index`,
+  - forwards matching candidates to local connection owners as `{:fanout_event, sub_id, event}`.
+
+No republish on remote receive, so this path does not create fanout loops.
+
+### 2.3 Subscription index locality
+
+The subscription index is local ETS state per node (`Parrhesia.Subscriptions.Index`).
+
+- Each node only tracks subscriptions of its local websocket processes.
+- Each node independently decides which local subscribers match a remote event.
+- There is no global cross-node subscription registry.
+
+### 2.4 Delivery model and guarantees (current)
+
+Current model is **best-effort live propagation** among connected nodes.
+
+- If nodes are connected and healthy, remote live subscribers should receive events quickly.
+- If there is a netsplit or temporary disconnection:
+  - remote live subscribers may miss events,
+  - persisted events can still be recovered by normal `REQ`/history query,
+  - ephemeral events are not recoverable.
+
+### 2.5 Cluster preconditions
+
+For cross-node fanout to work, operators must provide distributed BEAM connectivity:
+
+- consistent Erlang cookie,
+- named nodes (`--name`/`--sname`),
+- network reachability for Erlang distribution ports,
+- explicit node connections (or external discovery tooling).
+
+Parrhesia currently does not automate these steps.
+
+---
+
+## 3) Operational characteristics of current design
+
+### 3.1 Performance shape
+
+For each accepted event on one node:
+
+- one local fanout lookup + local sends,
+- one cluster publish that sends to `N - 1` remote bus members,
+- on each remote node: one local fanout lookup + local sends.
+
+So inter-node traffic scales roughly linearly with node count per event (full-cluster broadcast).
+
+This is simple and low-latency for small-to-medium clusters, but can become expensive as node count grows.
+
+### 3.2 Failure behavior
+
+- Remote node down: send attempts to that member stop once membership updates; no replay.
+- Netsplit: live propagation gap during split.
+- Recovery: local clients can catch up via DB-backed queries (except ephemeral kinds).
+
+### 3.3 Consistency expectations
+
+- No global total-ordering guarantee for live delivery across nodes.
+- Per-connection ordering is preserved by each connection process queue/drain behavior.
+- Duplicate suppression for ingestion uses storage semantics (`duplicate_event`), but transport itself is not exactly-once.
+
+### 3.4 Observability today
+
+Relevant metrics exist for fanout/queue pressure (see `Parrhesia.Telemetry`), e.g.:
+
+- `parrhesia.fanout.duration.ms`
+- `parrhesia.connection.outbound_queue.depth`
+- `parrhesia.connection.outbound_queue.pressure`
+- `parrhesia.connection.outbound_queue.overflow.count`
+
+These are useful but do not yet fully separate local-vs-remote fanout pipeline stages.
+
+---
+
+## 4) Practical extension path to a fully-fledged clustered system
+
+A realistic path is incremental. Suggested phases:
+
+### Phase A — hardened BEAM cluster control plane
+
+1. Add cluster discovery/formation (e.g. `libcluster`) with environment-specific topology:
+   - Kubernetes DNS,
+   - static nodes,
+   - cloud VM discovery.
+2. Add clear node liveness/partition telemetry and alerts.
+3. Provide operator docs for cookie, node naming, and network requirements.
+
+Outcome: simpler and safer cluster operations, same data plane semantics.
+
+### Phase B — resilient distributed fanout data plane
+
+Introduce a durable fanout stream for persisted events.
+
+Recommended pattern:
+
+1. On successful DB commit of event, append to a monotonic fanout log (or use DB sequence-based stream view).
+2. Each relay node runs a consumer with a stored cursor.
+3. On restart/partition recovery, node resumes from cursor and replays missed events.
+4. Local fanout remains same (subscription index + per-connection queues).
+
+Semantics target:
+
+- **at-least-once** node-to-node propagation,
+- replay after downtime,
+- idempotent handling keyed by event id.
+
+Notes:
+
+- Ephemeral events can remain best-effort (or have a separate short-lived transport), since no storage source exists for replay.
+
+### Phase C — scale and efficiency improvements
+
+As cluster size grows, avoid naive full broadcast where possible:
+
+1. Optional node-level subscription summaries (coarse bloom/bitset or keyed summaries) to reduce unnecessary remote sends.
+2. Shard fanout workers for CPU locality and mailbox control.
+3. Batch remote delivery payloads.
+4. Separate traffic classes (e.g. Marmot-heavy streams vs generic) with independent queues.
+
+Outcome: higher throughput per node and lower inter-node amplification.
+
+### Phase D — stronger observability and SLOs
+
+Add explicit distributed pipeline metrics:
+
+- publish enqueue/dequeue latency,
+- cross-node delivery lag (commit -> remote fanout enqueue),
+- replay backlog depth,
+- per-node dropped/expired transport messages,
+- partition detection counters.
+
+Define cluster SLO examples:
+
+- p95 commit->remote-live enqueue under nominal load,
+- max replay catch-up time after node restart,
+- bounded message loss for best-effort channels.
+
+---
+
+## 5) How a fully-fledged system would behave in practice
+
+With Phases A-D implemented, expected behavior:
+
+- **Normal operation:**
+  - low-latency local fanout,
+  - remote nodes receive events via stream consumers quickly,
+  - consistent operational visibility of end-to-end lag.
+- **Node restart:**
+  - node reconnects and replays from stored cursor,
+  - local subscribers begin receiving new + missed persisted events.
+- **Transient partition:**
+  - live best-effort path may degrade,
+  - persisted events converge after partition heals via replay.
+- **High fanout bursts:**
+  - batching + sharding keeps queue pressure bounded,
+  - overflow policies remain connection-local and measurable.
+
+This approach gives a good trade-off between Nostr relay latency and distributed robustness without requiring strict exactly-once semantics.
+
+---
+
+## 6) Current status summary
+
+Today, Parrhesia already supports **lightweight distributed live fanout** when BEAM nodes are connected.
+
+It is intentionally simple and fast for smaller clusters, and provides a solid base for a more durable, observable cluster architecture as relay scale and availability requirements grow.
--- a/docs/MARMOT_OPERATIONS.md
+++ b/docs/MARMOT_OPERATIONS.md
@@ -1,69 +0,0 @@
-# Marmot operations guide (relay operator tuning)
-
-This document captures practical limits and operational defaults for Marmot-heavy traffic (`443`, `445`, `10051`, wrapped `1059`, optional media/push flows).
-
-## 1) Recommended baseline limits
-
-Use these as a starting point and tune from production telemetry.
-
-```elixir
-config :parrhesia,
-  limits: [
-    max_filter_limit: 500,
-    max_filters_per_req: 16,
-    max_outbound_queue: 256,
-    outbound_drain_batch_size: 64
-  ],
-  policies: [
-    # Marmot group routing/query guards
-    marmot_require_h_for_group_queries: true,
-    marmot_group_max_h_values_per_filter: 32,
-    marmot_group_max_query_window_seconds: 2_592_000,
-
-    # Kind 445 retention
-    mls_group_event_ttl_seconds: 300,
-
-    # MIP-04 metadata controls
-    marmot_media_max_imeta_tags_per_event: 8,
-    marmot_media_max_field_value_bytes: 1024,
-    marmot_media_max_url_bytes: 2048,
-    marmot_media_allowed_mime_prefixes: [],
-    marmot_media_reject_mip04_v1: true,
-
-    # MIP-05 push controls (optional)
-    marmot_push_server_pubkeys: [],
-    marmot_push_max_relay_tags: 16,
-    marmot_push_max_payload_bytes: 65_536,
-    marmot_push_max_trigger_age_seconds: 120,
-    marmot_push_require_expiration: true,
-    marmot_push_max_expiration_window_seconds: 120,
-    marmot_push_max_server_recipients: 1
-  ]
-```
-
-## 2) Index expectations for Marmot workloads
-
-The Postgres adapter relies on dedicated partial tag indexes for hot Marmot selectors:
-
- `event_tags_h_value_created_at_idx` for `#h` group routing
- `event_tags_i_value_created_at_idx` for `#i` keypackage reference lookups
-
-Query-plan regression tests assert these paths remain usable for heavy workloads.
-
-## 3) Telemetry to watch
-
-Key metrics for Marmot traffic and pressure:
-
- `parrhesia.ingest.duration.ms{traffic_class="marmot|generic"}`
- `parrhesia.query.duration.ms{traffic_class="marmot|generic"}`
- `parrhesia.fanout.duration.ms{traffic_class="marmot|generic"}`
- `parrhesia.connection.outbound_queue.depth{traffic_class=...}`
- `parrhesia.connection.outbound_queue.pressure{traffic_class=...}`
- `parrhesia.connection.outbound_queue.pressure_events.count{traffic_class=...}`
- `parrhesia.connection.outbound_queue.overflow.count{traffic_class=...}`
-
-Operational target: keep queue pressure below sustained 0.75 and avoid overflow spikes during `445` bursts.
-
-## 4) Fault and recovery expectations
-
-During storage outages, Marmot group-flow writes must fail with explicit `OK false` errors. After recovery, reordered group events should still query deterministically by `created_at DESC, id ASC`.
--- a/docs/slop/LOCAL_API.md
+++ b/docs/slop/LOCAL_API.md
@@ -0,0 +1,398 @@
+# Parrhesia Shared API + Local API Design (Option 1)
+
+## 1) Goal
+
+Expose a stable in-process API for embedding apps **and** refactor server transports to consume the same API.
+
+Desired end state:
+
+- WebSocket server, HTTP management, and embedding app all call one shared core API.
+- Transport layers (WS/HTTP/local) only do framing, auth header extraction, and response encoding.
+- Policy/storage/fanout/business semantics live in one place.
+
+This keeps everything in the same dependency (`:parrhesia`) and avoids a second package.
+
+---
+
+## 2) Key architectural decision
+
+Previous direction: `Parrhesia.Local.*` as primary public API.
+
+Updated direction (this doc):
+
+- Introduce **shared core API modules** under `Parrhesia.API.*`.
+- Make server code (`Parrhesia.Web.Connection`, management handlers) delegate to `Parrhesia.API.*`.
+- Keep `Parrhesia.Local.*` as optional convenience wrappers over `Parrhesia.API.*`.
+
+This ensures no divergence between local embedding behavior and websocket behavior.
+
+---
+
+## 3) Layered design
+
+```text
+Transport layer
+  - Parrhesia.Web.Connection (WS)
+  - Parrhesia.Web.Management (HTTP)
+  - Parrhesia.Local.* wrappers (in-process)
+
+Shared API layer
+  - Parrhesia.API.Auth
+  - Parrhesia.API.Events
+  - Parrhesia.API.Stream (optional)
+  - Parrhesia.API.Admin (optional, for management methods)
+
+Domain/runtime dependencies
+  - Parrhesia.Policy.EventPolicy
+  - Parrhesia.Storage.* adapters
+  - Parrhesia.Groups.Flow
+  - Parrhesia.Subscriptions.Index
+  - Parrhesia.Fanout.MultiNode
+  - Parrhesia.Telemetry
+```
+
+Rule: all ingest/query/count decisions happen in `Parrhesia.API.Events`.
+
+---
+
+## 4) Public module plan
+
+## 4.1 `Parrhesia.API.Auth`
+
+Purpose:
+- event validation helpers
+- NIP-98 verification
+- optional embedding account resolution hook
+
+Proposed functions:
+
+```elixir
+@spec validate_event(map()) :: :ok | {:error, term()}
+@spec compute_event_id(map()) :: String.t()
+
+@spec validate_nip98(String.t() | nil, String.t(), String.t()) ::
+  {:ok, Parrhesia.API.Auth.Context.t()} | {:error, term()}
+
+@spec validate_nip98(String.t() | nil, String.t(), String.t(), keyword()) ::
+  {:ok, Parrhesia.API.Auth.Context.t()} | {:error, term()}
+```
+
+`validate_nip98/4` options:
+
+```elixir
+account_resolver: (pubkey_hex :: String.t(), auth_event :: map() ->
+  {:ok, account :: term()} | {:error, term()})
+```
+
+Context struct:
+
+```elixir
+defmodule Parrhesia.API.Auth.Context do
+  @enforce_keys [:pubkey, :auth_event]
+  defstruct [:pubkey, :auth_event, :account, claims: %{}]
+end
+```
+
+---
+
+## 4.2 `Parrhesia.API.Events`
+
+Purpose:
+- canonical ingress/query/count API used by WS + local + HTTP integrations.
+
+Proposed functions:
+
+```elixir
+@spec publish(map(), keyword()) :: {:ok, Parrhesia.API.Events.PublishResult.t()} | {:error, term()}
+@spec query([map()], keyword()) :: {:ok, [map()]} | {:error, term()}
+@spec count([map()], keyword()) :: {:ok, non_neg_integer() | map()} | {:error, term()}
+```
+
+Request context:
+
+```elixir
+defmodule Parrhesia.API.RequestContext do
+  defstruct authenticated_pubkeys: MapSet.new(),
+            actor: nil,
+            metadata: %{}
+end
+```
+
+Publish result:
+
+```elixir
+defmodule Parrhesia.API.Events.PublishResult do
+  @enforce_keys [:event_id, :accepted, :message]
+  defstruct [:event_id, :accepted, :message]
+end
+```
+
+### Publish semantics (must match websocket EVENT)
+
+Pipeline in `publish/2`:
+
+1. frame/event size limits
+2. `Parrhesia.Protocol.validate_event/1`
+3. `Parrhesia.Policy.EventPolicy.authorize_write/2`
+4. group handling (`Parrhesia.Groups.Flow.handle_event/1`)
+5. persistence path (`put_event`, deletion, vanish, ephemeral rules)
+6. fanout (local + multi-node)
+7. telemetry emit
+
+Return shape mirrors Nostr `OK` semantics:
+
+```elixir
+{:ok, %PublishResult{event_id: id, accepted: true, message: "ok: event stored"}}
+{:ok, %PublishResult{event_id: id, accepted: false, message: "blocked: ..."}}
+```
+
+### Query/count semantics (must match websocket REQ/COUNT)
+
+`query/2` and `count/2`:
+
+1. validate filters
+2. run read policy (`EventPolicy.authorize_read/2`)
+3. call storage with `requester_pubkeys` from context
+4. return ordered events/count payload
+
+Giftwrap restrictions (`kind 1059`) must remain identical to websocket behavior.
+
+---
+
+## 4.3 `Parrhesia.API.Stream` (optional but recommended)
+
+Purpose:
+- local in-process subscriptions using same subscription index/fanout model.
+
+Proposed functions:
+
+```elixir
+@spec subscribe(pid(), String.t(), [map()], keyword()) :: {:ok, reference()} | {:error, term()}
+@spec unsubscribe(reference()) :: :ok
+```
+
+Subscriber contract:
+
+```elixir
+{:parrhesia, :event, ref, subscription_id, event}
+{:parrhesia, :eose, ref, subscription_id}
+{:parrhesia, :closed, ref, subscription_id, reason}
+```
+
+---
+
+## 4.4 `Parrhesia.Local.*` wrappers
+
+`Parrhesia.Local.*` remain as convenience API for embedding apps, implemented as thin wrappers:
+
+- `Parrhesia.Local.Auth` -> delegates to `Parrhesia.API.Auth`
+- `Parrhesia.Local.Events` -> delegates to `Parrhesia.API.Events`
+- `Parrhesia.Local.Stream` -> delegates to `Parrhesia.API.Stream`
+- `Parrhesia.Local.Client` -> use-case helpers (posts + private messages)
+
+No business logic in wrappers.
+
+---
+
+## 5) Server integration plan (critical)
+
+## 5.1 WebSocket (`Parrhesia.Web.Connection`)
+
+After decode:
+- `EVENT` -> `Parrhesia.API.Events.publish/2`
+- `REQ` -> `Parrhesia.API.Events.query/2`
+- `COUNT` -> `Parrhesia.API.Events.count/2`
+- `AUTH` keep transport-specific challenge/session flow, but can use `API.Auth.validate_event/1` internally
+
+WebSocket keeps responsibility for:
+- websocket framing
+- subscription lifecycle per connection
+- AUTH challenge rotation protocol frames
+
+## 5.2 HTTP management (`Parrhesia.Web.Management`)
+
+- NIP-98 header validation via `Parrhesia.API.Auth.validate_nip98/3`
+- command execution via `Parrhesia.API.Admin` (or existing storage admin adapter via API facade)
+
+---
+
+## 6) High-level client helpers for embedding app use case
+
+These helpers are optional and live in `Parrhesia.Local.Client`.
+
+## 6.1 Public posts
+
+```elixir
+@spec publish_post(Parrhesia.API.Auth.Context.t(), String.t(), keyword()) ::
+  {:ok, Parrhesia.API.Events.PublishResult.t()} | {:error, term()}
+
+@spec list_posts(keyword()) :: {:ok, [map()]} | {:error, term()}
+@spec stream_posts(pid(), keyword()) :: {:ok, reference()} | {:error, term()}
+```
+
+`publish_post/3` options:
+- `:tags`
+- `:created_at`
+- `:signer` callback (required unless fully signed event provided)
+
+Signer contract:
+
+```elixir
+(unsigned_event_map -> {:ok, signed_event_map} | {:error, term()})
+```
+
+Parrhesia does not store or manage private keys.
+
+## 6.2 Private messages (giftwrap kind 1059)
+
+```elixir
+@spec send_private_message(
+  Parrhesia.API.Auth.Context.t(),
+  recipient_pubkey :: String.t(),
+  encrypted_payload :: String.t(),
+  keyword()
+) :: {:ok, Parrhesia.API.Events.PublishResult.t()} | {:error, term()}
+
+@spec inbox(Parrhesia.API.Auth.Context.t(), keyword()) :: {:ok, [map()]} | {:error, term()}
+@spec stream_inbox(pid(), Parrhesia.API.Auth.Context.t(), keyword()) :: {:ok, reference()} | {:error, term()}
+```
+
+Behavior:
+- `send_private_message/4` builds event template with kind `1059` and `p` tag.
+- host signer signs template.
+- publish through `API.Events.publish/2`.
+- `inbox/2` queries `%{"kinds" => [1059], "#p" => [auth.pubkey]}` with authenticated context.
+
+---
+
+## 7) Error model
+
+Shared API should normalize output regardless of transport.
+
+Guideline:
+- protocol/policy rejection -> `{:ok, %{accepted: false, message: "..."}}`
+- runtime/system failure -> `{:error, term()}`
+
+Common reason mapping:
+
+| Reason | Message prefix |
+|---|---|
+| `:auth_required` | `auth-required:` |
+| `:restricted_giftwrap` | `restricted:` |
+| `:invalid_event` | `invalid:` |
+| `:duplicate_event` | `duplicate:` |
+| `:event_rate_limited` | `rate-limited:` |
+
+---
+
+## 8) Telemetry
+
+Emit shared events in API layer (not transport-specific):
+
+- `[:parrhesia, :api, :publish, :stop]`
+- `[:parrhesia, :api, :query, :stop]`
+- `[:parrhesia, :api, :count, :stop]`
+- `[:parrhesia, :api, :auth, :stop]`
+
+Metadata:
+- `traffic_class`
+- `caller` (`:websocket | :http | :local`)
+- optional `account_present?`
+
+Transport-level telemetry can remain separate where needed.
+
+---
+
+## 9) Refactor sequence
+
+### Phase 1: Extract shared API
+1. Create `Parrhesia.API.Events` with publish/query/count from current `Web.Connection` paths.
+2. Create `Parrhesia.API.Auth` wrappers for NIP-98/event validation.
+3. Add API-level tests.
+
+### Phase 2: Migrate transports
+1. Update `Parrhesia.Web.Connection` to delegate publish/query/count to `API.Events`.
+2. Update `Parrhesia.Web.Management` to use `API.Auth`.
+3. Keep behavior unchanged.
+
+### Phase 3: Add local wrappers/helpers
+1. Implement `Parrhesia.Local.Auth/Events/Stream` as thin delegates.
+2. Add `Parrhesia.Local.Client` post/inbox/send helpers.
+3. Add embedding documentation.
+
+### Phase 4: Lock parity
+1. Add parity tests: WS vs Local API for same inputs and policy outcomes.
+2. Add property tests for query/count equivalence where feasible.
+
+---
+
+## 10) Testing requirements
+
+1. **Transport parity tests**
+   - Same signed event via WS and API => same accepted/message semantics.
+2. **Policy parity tests**
+   - Giftwrap visibility and auth-required behavior identical across WS/API/local.
+3. **Auth tests**
+   - NIP-98 success/failure + account resolver success/failure.
+4. **Fanout tests**
+   - publish via API reaches local stream subscribers and WS subscribers.
+5. **Failure tests**
+   - storage failures surface deterministic errors in all transports.
+
+---
+
+## 11) Backwards compatibility
+
+- No breaking change to websocket protocol.
+- No breaking change to management endpoint contract.
+- New API modules are additive.
+- Existing apps can ignore local API entirely.
+
+---
+
+## 12) Embedding example flow
+
+### 12.1 Login/auth
+
+```elixir
+with {:ok, auth} <- Parrhesia.API.Auth.validate_nip98(header, method, url,
+       account_resolver: &MyApp.Accounts.resolve_nostr_pubkey/2
+     ) do
+  # use auth.pubkey/auth.account in host session
+end
+```
+
+### 12.2 Post publish
+
+```elixir
+Parrhesia.Local.Client.publish_post(auth, "hello", signer: &MyApp.NostrSigner.sign/1)
+```
+
+### 12.3 Private message
+
+```elixir
+Parrhesia.Local.Client.send_private_message(
+  auth,
+  recipient_pubkey,
+  encrypted_payload,
+  signer: &MyApp.NostrSigner.sign/1
+)
+```
+
+### 12.4 Inbox
+
+```elixir
+Parrhesia.Local.Client.inbox(auth, limit: 100)
+```
+
+---
+
+## 13) Summary
+
+Yes, this can and should be extracted into a shared API module. The server should consume it too.
+
+That gives:
+- one canonical behavior path,
+- cleaner embedding,
+- easier testing,
+- lower long-term maintenance cost.