docs: Study Khatru

2026-03-16 16:53:55 +01:00
parent 186d0f98ee
commit 14fb0f7ffb
3 changed files with 182 additions and 1 deletions
--- a/docs/KHATRU.md
+++ b/docs/KHATRU.md
@@ -0,0 +1,140 @@
 # Khatru-Inspired Runtime Improvements
 This document collects refactoring and extension ideas learned from studying Khatru-style relay design.
 It is intentionally **not** about the new public API surface or the sync ACL model. Those live in `docs/slop/LOCAL_API.md` and `docs/SYNC.md`.
 The focus here is runtime shape, protocol behavior, and operator-visible relay features.
 ---
 ## 1. Why This Matters
 Khatru appears mature mainly because it exposes clearer relay pipeline stages.
 That gives three practical benefits:
 - less policy drift between storage, websocket, and management code,
 - easier feature addition without hard-coding more branches into one connection module,
 - better composability for relay profiles with different trust and traffic models.
 Parrhesia should borrow that clarity without copying Khatru's code-first hook model wholesale.
 ---
 ## 2. Proposed Runtime Refactors
 ### 2.1 Staged policy pipeline
 Parrhesia should stop treating policy as one coarse `EventPolicy` module plus scattered special cases.
 Recommended internal stages:
 1. connection admission
 2. authentication challenge and validation
 3. publish/write authorization
 4. query/count authorization
 5. stream subscription authorization
 6. negentropy authorization
 7. response shaping
 8. broadcast/fanout suppression
 This is an internal runtime refactor. It does not imply a new public API.
 ### 2.2 Richer internal request context
 The runtime should carry a structured request context through all stages.
 Useful fields:
 - authenticated pubkeys
 - caller kind
 - remote IP
 - subscription id
 - peer id
 - negentropy session flag
 - internal-call flag
 This reduces ad-hoc branching and makes audit/telemetry more coherent.
 ### 2.3 Separate policy from storage presence tables
 Moderation state should remain data.
 Runtime enforcement should be a first-class layer that consumes that data, not a side effect of whether a table exists.
 This is especially important for:
 - blocked IP enforcement,
 - pubkey allowlists,
 - future kind- or tag-scoped restrictions.
 ---
 ## 3. Protocol and Relay Features
 ### 3.1 Real COUNT sketches
 Parrhesia currently returns a synthetic `hll` payload for NIP-45-style count responses.
 If approximate count exchange matters, implement a real reusable HLL sketch path instead of hashing `filters + count`.
 ### 3.2 Relay identity in NIP-11
 Once Parrhesia owns a stable server identity, NIP-11 should expose the relay pubkey instead of returning `nil`.
 This is useful beyond sync:
 - operator visibility,
 - relay fingerprinting,
 - future trust tooling.
 ### 3.3 Connection-level IP enforcement
 Blocked IP support should be enforced on actual connection admission, not only stored in management tables.
 This should happen early, before expensive protocol handling.
 ### 3.4 Better response shaping
 Introduce a narrow internal response shaping layer for cases where returned events or counts need controlled rewriting or suppression.
 Examples:
 - hide fields for specific relay profiles,
 - suppress rebroadcast of locally-ingested remote sync traffic,
 - shape relay notices consistently.
 This should stay narrow and deterministic. It should not become arbitrary app semantics.
 ---
 ## 4. Suggested Extension Points
 These should be internal runtime seams, not necessarily public interfaces:
 - `ConnectionPolicy`
 - `AuthPolicy`
 - `ReadPolicy`
 - `WritePolicy`
 - `NegentropyPolicy`
 - `ResponsePolicy`
 - `BroadcastPolicy`
 They may initially be plain modules with well-defined callbacks or functions.
 The point is not pluggability for its own sake. The point is to make policy stages explicit and testable.
 ---
 ## 5. Near-Term Priority
 Recommended order:
 1. enforce blocked IPs and any future connection-gating on the real connection path
 2. split the current websocket flow into explicit read/write/negentropy policy stages
 3. enrich runtime request context and telemetry metadata
 4. expose relay pubkey in NIP-11 once identity lands
 5. replace fake HLL payloads with a real approximate-count implementation if NIP-45 support matters operationally
 This keeps the runtime improvements incremental and independent from the ongoing API and ACL implementation.
--- a/docs/SYNC.md
+++ b/docs/SYNC.md
@@ -84,6 +84,12 @@ Private key export should not be supported.
 Sync traffic should use a real ACL layer, not moderation allowlists.
 Current implementation note:
 - Parrhesia already has storage-backed moderation state such as `allowed_pubkeys` and `blocked_ips`,
 - that is not the sync ACL model,
 - sync protection must be enforced in the active websocket/query/count/negentropy/write path, not inferred from management tables alone.
 Initial ACL model:
 - principal: authenticated pubkey,
@@ -110,6 +116,12 @@ Multiple pins should be allowed to support certificate rotation.
 Each configured sync server represents one outbound worker managed by Parrhesia.
 Implementation note:
 - Khatru-style relay designs benefit from explicit runtime stages,
 - Parrhesia sync should therefore plug into clear internal phases for connection admission, auth, query/count, subscription, negentropy, publish, and fanout,
 - this should stay a runtime refactor, not become extra sync semantics.
 Minimum behavior:
 1. connect to the remote relay,
@@ -332,11 +344,17 @@ The sync worker may attach request-context metadata such as:
 ```elixir
 %Parrhesia.API.RequestContext{
  caller: :sync,
  peer_id: "tribes-primary",
  metadata: %{sync_server_id: "tribes-primary"}
 }
 ```
-That metadata is for telemetry and audit only. It must not become app sync semantics.
+Recommended additional context when available:
 - `remote_ip`
 - `subscription_id`
 This context is for telemetry, policy, and audit only. It must not become app sync semantics.
 ---
--- a/docs/slop/LOCAL_API.md
+++ b/docs/slop/LOCAL_API.md
@@ -64,6 +64,12 @@ Runtime internals
 Rule: transport framing stays at the edge. Business decisions happen in `Parrhesia.API.*`.
 Implementation note:
 - the runtime beneath `Parrhesia.API.*` should expose clearer internal policy stages than it does today,
 - at minimum: connection/auth, publish, query/count, stream subscription, negentropy, response shaping, and broadcast/fanout,
 - these are internal runtime seams, not additional public APIs.
 ---
 ## 4. Core Context
@@ -73,12 +79,22 @@ defmodule Parrhesia.API.RequestContext do
  defstruct authenticated_pubkeys: MapSet.new(),
            actor: nil,
            caller: :local,
            remote_ip: nil,
            subscription_id: nil,
            peer_id: nil,
            metadata: %{}
 end
 ```
 `caller` is for telemetry and policy parity, for example `:websocket`, `:http`, `:local`, or `:sync`.
 Recommended usage:
 - `remote_ip` for connection-level policy and audit,
 - `subscription_id` for query/stream/negentropy context,
 - `peer_id` for trusted sync peer identity when applicable,
 - `metadata` for transport-specific details that should not become API fields.
 ---
 ## 5. Public Modules
@@ -245,6 +261,12 @@ Purpose:
 This is a real authorization layer, not a reuse of moderation allowlists.
 Current implementation note:
 - Parrhesia already has storage-backed moderation presence tables such as `allowed_pubkeys` and `blocked_ips`,
 - those are not sufficient for sync ACLs,
 - the new ACL layer must be enforced directly in the active read/write/query/negentropy path, not only through management tables.
 ```elixir
@spec grant(map(), keyword()) :: :ok | {:error, term()}
@spec revoke(map(), keyword()) :: :ok | {:error, term()}
@@ -343,6 +365,7 @@ Important constraints:
 - Parrhesia must expose worker health and basic counters,
 - remote relay TLS pinning is required,
 - sync peer auth is bound to a server-auth pubkey, not inferred from event author pubkeys.
 - sync enforcement should reuse the same runtime policy stages as ordinary websocket traffic rather than inventing a parallel trust path.
 Server identity model: