docs: Study Khatru

2026-03-16 16:53:55 +01:00
parent 186d0f98ee
commit 14fb0f7ffb
3 changed files with 182 additions and 1 deletions
--- a/docs/KHATRU.md
+++ b/docs/KHATRU.md
@@ -0,0 +1,140 @@
+# Khatru-Inspired Runtime Improvements
+
+This document collects refactoring and extension ideas learned from studying Khatru-style relay design.
+
+It is intentionally **not** about the new public API surface or the sync ACL model. Those live in `docs/slop/LOCAL_API.md` and `docs/SYNC.md`.
+
+The focus here is runtime shape, protocol behavior, and operator-visible relay features.
+
+---
+
+## 1. Why This Matters
+
+Khatru appears mature mainly because it exposes clearer relay pipeline stages.
+
+That gives three practical benefits:
+
+- less policy drift between storage, websocket, and management code,
+- easier feature addition without hard-coding more branches into one connection module,
+- better composability for relay profiles with different trust and traffic models.
+
+Parrhesia should borrow that clarity without copying Khatru's code-first hook model wholesale.
+
+---
+
+## 2. Proposed Runtime Refactors
+
+### 2.1 Staged policy pipeline
+
+Parrhesia should stop treating policy as one coarse `EventPolicy` module plus scattered special cases.
+
+Recommended internal stages:
+
+1. connection admission
+2. authentication challenge and validation
+3. publish/write authorization
+4. query/count authorization
+5. stream subscription authorization
+6. negentropy authorization
+7. response shaping
+8. broadcast/fanout suppression
+
+This is an internal runtime refactor. It does not imply a new public API.
+
+### 2.2 Richer internal request context
+
+The runtime should carry a structured request context through all stages.
+
+Useful fields:
+
+- authenticated pubkeys
+- caller kind
+- remote IP
+- subscription id
+- peer id
+- negentropy session flag
+- internal-call flag
+
+This reduces ad-hoc branching and makes audit/telemetry more coherent.
+
+### 2.3 Separate policy from storage presence tables
+
+Moderation state should remain data.
+
+Runtime enforcement should be a first-class layer that consumes that data, not a side effect of whether a table exists.
+
+This is especially important for:
+
+- blocked IP enforcement,
+- pubkey allowlists,
+- future kind- or tag-scoped restrictions.
+
+---
+
+## 3. Protocol and Relay Features
+
+### 3.1 Real COUNT sketches
+
+Parrhesia currently returns a synthetic `hll` payload for NIP-45-style count responses.
+
+If approximate count exchange matters, implement a real reusable HLL sketch path instead of hashing `filters + count`.
+
+### 3.2 Relay identity in NIP-11
+
+Once Parrhesia owns a stable server identity, NIP-11 should expose the relay pubkey instead of returning `nil`.
+
+This is useful beyond sync:
+
+- operator visibility,
+- relay fingerprinting,
+- future trust tooling.
+
+### 3.3 Connection-level IP enforcement
+
+Blocked IP support should be enforced on actual connection admission, not only stored in management tables.
+
+This should happen early, before expensive protocol handling.
+
+### 3.4 Better response shaping
+
+Introduce a narrow internal response shaping layer for cases where returned events or counts need controlled rewriting or suppression.
+
+Examples:
+
+- hide fields for specific relay profiles,
+- suppress rebroadcast of locally-ingested remote sync traffic,
+- shape relay notices consistently.
+
+This should stay narrow and deterministic. It should not become arbitrary app semantics.
+
+---
+
+## 4. Suggested Extension Points
+
+These should be internal runtime seams, not necessarily public interfaces:
+
+- `ConnectionPolicy`
+- `AuthPolicy`
+- `ReadPolicy`
+- `WritePolicy`
+- `NegentropyPolicy`
+- `ResponsePolicy`
+- `BroadcastPolicy`
+
+They may initially be plain modules with well-defined callbacks or functions.
+
+The point is not pluggability for its own sake. The point is to make policy stages explicit and testable.
+
+---
+
+## 5. Near-Term Priority
+
+Recommended order:
+
+1. enforce blocked IPs and any future connection-gating on the real connection path
+2. split the current websocket flow into explicit read/write/negentropy policy stages
+3. enrich runtime request context and telemetry metadata
+4. expose relay pubkey in NIP-11 once identity lands
+5. replace fake HLL payloads with a real approximate-count implementation if NIP-45 support matters operationally
+
+This keeps the runtime improvements incremental and independent from the ongoing API and ACL implementation.
--- a/docs/SYNC.md
+++ b/docs/SYNC.md
@@ -84,6 +84,12 @@ Private key export should not be supported.

 Sync traffic should use a real ACL layer, not moderation allowlists.

+Current implementation note:
+
+- Parrhesia already has storage-backed moderation state such as `allowed_pubkeys` and `blocked_ips`,
+- that is not the sync ACL model,
+- sync protection must be enforced in the active websocket/query/count/negentropy/write path, not inferred from management tables alone.
+
 Initial ACL model:

 - principal: authenticated pubkey,
@@ -110,6 +116,12 @@ Multiple pins should be allowed to support certificate rotation.

 Each configured sync server represents one outbound worker managed by Parrhesia.

+Implementation note:
+
+- Khatru-style relay designs benefit from explicit runtime stages,
+- Parrhesia sync should therefore plug into clear internal phases for connection admission, auth, query/count, subscription, negentropy, publish, and fanout,
+- this should stay a runtime refactor, not become extra sync semantics.
+
 Minimum behavior:

 1. connect to the remote relay,
@@ -332,11 +344,17 @@ The sync worker may attach request-context metadata such as:
 ```elixir
 %Parrhesia.API.RequestContext{
  caller: :sync,
+  peer_id: "tribes-primary",
  metadata: %{sync_server_id: "tribes-primary"}
 }
 ```

-That metadata is for telemetry and audit only. It must not become app sync semantics.
+Recommended additional context when available:
+
+- `remote_ip`
+- `subscription_id`
+
+This context is for telemetry, policy, and audit only. It must not become app sync semantics.

 ---

--- a/docs/slop/LOCAL_API.md
+++ b/docs/slop/LOCAL_API.md
@@ -64,6 +64,12 @@ Runtime internals

 Rule: transport framing stays at the edge. Business decisions happen in `Parrhesia.API.*`.

+Implementation note:
+
+- the runtime beneath `Parrhesia.API.*` should expose clearer internal policy stages than it does today,
+- at minimum: connection/auth, publish, query/count, stream subscription, negentropy, response shaping, and broadcast/fanout,
+- these are internal runtime seams, not additional public APIs.
+
 ---

 ## 4. Core Context
@@ -73,12 +79,22 @@ defmodule Parrhesia.API.RequestContext do
  defstruct authenticated_pubkeys: MapSet.new(),
            actor: nil,
            caller: :local,
+            remote_ip: nil,
+            subscription_id: nil,
+            peer_id: nil,
            metadata: %{}
 end
 ```

 `caller` is for telemetry and policy parity, for example `:websocket`, `:http`, `:local`, or `:sync`.

+Recommended usage:
+
+- `remote_ip` for connection-level policy and audit,
+- `subscription_id` for query/stream/negentropy context,
+- `peer_id` for trusted sync peer identity when applicable,
+- `metadata` for transport-specific details that should not become API fields.
+
 ---

 ## 5. Public Modules
@@ -245,6 +261,12 @@ Purpose:

 This is a real authorization layer, not a reuse of moderation allowlists.

+Current implementation note:
+
+- Parrhesia already has storage-backed moderation presence tables such as `allowed_pubkeys` and `blocked_ips`,
+- those are not sufficient for sync ACLs,
+- the new ACL layer must be enforced directly in the active read/write/query/negentropy path, not only through management tables.
+
 ```elixir
@spec grant(map(), keyword()) :: :ok | {:error, term()}
@spec revoke(map(), keyword()) :: :ok | {:error, term()}
@@ -343,6 +365,7 @@ Important constraints:
 - Parrhesia must expose worker health and basic counters,
 - remote relay TLS pinning is required,
 - sync peer auth is bound to a server-auth pubkey, not inferred from event author pubkeys.
+- sync enforcement should reuse the same runtime policy stages as ordinary websocket traffic rather than inventing a parallel trust path.

 Server identity model: