supabase-realtime-skill
v0.3.0
Published
Agent Skill + MCP server for Supabase Realtime/CDC. Bounded subscription pattern fits Edge Function isolate budgets.
Maintainers
Readme
supabase-realtime-skill
Agent Skill + MCP server that gives an LLM agent a bounded primitive for reacting to Postgres row-changes (and coordinating over Realtime broadcast channels) on Supabase. Deploys as an Edge Function. Ships pre-registered eval thresholds and 16 ADRs documenting the load-bearing tradeoffs (substrate-correctness fixes for multi-tenant RLS landed in ADRs 0011-0014; tool-routing + E2E smoke surface in ADRs 0015-0016; current release is 0.3.0. v1.0.0 stays unclaimed until manifest n=300 ships per ADR-0016 recon Decision 4).
The headline pattern is agent-watches-database: the agent calls a tool that blocks until either max_events arrive or timeout_ms elapses, then returns the batch. No streaming protocol, no persistent connection across tool-calls — fits MCP's request/response shape and Edge Function isolate budgets (Pro caps wall-clock at 150s; this caps tool timeout at 120s).
sequenceDiagram
autonumber
participant Agent
participant MCP as MCP Tool<br/>(watch_table)
participant Realtime as Supabase Realtime
participant PG as Postgres
Agent->>MCP: call watch_table<br/>{table, predicate, timeout_ms, max_events}
MCP->>Realtime: subscribe(topic)
Realtime-->>MCP: SUBSCRIBED ack
Note over MCP,Realtime: bounded loop:<br/>collect events that match predicate<br/>until max_events OR timeout
PG->>Realtime: row INSERT/UPDATE/DELETE
Realtime-->>MCP: postgres_changes event
PG->>Realtime: ...more events...
Realtime-->>MCP: postgres_changes event
alt max_events reached
MCP-->>MCP: closed_reason = "max_events"
else timeout_ms elapsed
MCP-->>MCP: closed_reason = "timeout"
end
MCP->>Realtime: unsubscribe (always — finally)
MCP-->>Agent: { events[], closed_reason }
Agent->>Agent: process batch, decide next call
Note over Agent: loop: call again, or stopThe "boundary" is the tool-call return — that's the natural checkpoint for an agent loop. Persistent WebSocket fights this model; bounded subscription embraces it.
The full narrative — what was tried, what failed, what landed, with the eval numbers — is in docs/writeup.md.
Why this exists
Supabase's product direction in 2026 is "agents running on Supabase, not just against." Edge Functions, Automatic Embeddings, Agent Skills, MCP-on-Edge — all 2026 moves. The substrate to make agents react to Postgres CDC, with a shape that fits the Edge Function isolate model and respects RLS, didn't exist. This is that substrate.
This isn't a kitchen-sink wrapper. It's five tools, two primitives, two worked examples (support-ticket triage + multi-tenant audit log), four eval metrics — opinionated and tight, with the surface area documented in 11 references.
Quick start
Install:
npm install supabase-realtime-skill # or: bun add supabase-realtime-skillUse the bounded primitive directly (Node):
import { boundedWatch, makeSupabaseAdapter } from "supabase-realtime-skill/server";
const adapter = makeSupabaseAdapter("support_tickets", {
supabaseUrl: process.env.SUPABASE_URL!,
supabaseKey: process.env.SUPABASE_ANON_KEY!,
});
const { events, closed_reason } = await boundedWatch({
adapter,
table: "support_tickets",
predicate: { event: "INSERT" },
timeout_ms: 60_000,
max_events: 10,
});
// events: Array<{ event, table, schema, new, old, commit_timestamp }>
// closed_reason: "max_events" | "timeout" | "error"Deploy the MCP server as an Edge Function (live-verified):
supabase functions deploy mcp --project-ref <your-project>
# Verify with a JSON-RPC tools/list:
curl -X POST "https://<project-ref>.supabase.co/functions/v1/mcp" \
-H "Authorization: Bearer <anon_or_service_role_key>" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
# → returns all 5 tools with input schemasSee references/edge-deployment.md for full operator setup.
What's in the box
| Tool | Shape |
|---|---|
| watch_table | bounded subscription to Postgres row-changes (INSERT / UPDATE / DELETE / *) |
| broadcast_to_channel | fire-and-forget broadcast on a Realtime channel; idempotent retry on 5xx |
| subscribe_to_channel | bounded subscription to a Broadcast channel (mirrors watch_table's shape) |
| list_channels | best-effort registry listing |
| describe_table_changes | introspect columns, PK, RLS state, REPLICA IDENTITY |
All five over WebStandardStreamableHTTPServerTransport (MCP SDK 1.29+), per-request stateless. Five tools, intentionally tight — see SKILL.md for what not to use them for.
Two worked examples ship in supabase/migrations/: the support-ticket triage schema (the eval's SUT) and the multi-tenant audit log demo (instantiates ADR-0014's worked example end-to-end — both Postgres-Changes RLS and Broadcast Authorization RLS exercised under forwarded JWT). Apply via supabase db push; the multi-tenant demo's design doc is references/multi-tenant-rls.md.
Eval results (latest ci-full, n=100, post-ADR-0009 with Sonnet 4.6)
| Metric | Result | Threshold (manifest v1.0.0) | Status |
|---|---|---|---|
| latency_to_first_event_ms p95 | 1281 ms | ≤ 2000 ms | ✅ PASS |
| missed_events_rate | 0/100 (Wilson upper 0.0370) | rate ≤ 0.005, CI upper ≤ 0.01 | rate PASS, CI mechanically unreachable at n=100 — see ADR-0001 |
| spurious_trigger_rate | 0/100 (Wilson upper 0.0370) | rate ≤ 0.01, CI upper ≤ 0.03 | rate PASS, CI same as above |
| agent_action_correctness | 99/100, CI low 0.946 | rate ≥ 0.90, CI low ≥ 0.85 | ✅ PASS rate AND CI low |
Calibration sequence (each step in its own ADR; honest, attributable, isolable):
| Run | Model | Intervention | Rate | CI low | |---|---|---|---|---| | v0.1.0 baseline | Haiku 4.5 | recency proxy "retrieval" | 87/100 | 0.79 | | v0.1.1 | Haiku 4.5 | real pgvector wiring | 90/100 | 0.83 | | v0.1.2 (post-ADR-0002) | Haiku 4.5 | f019 ground-truth relabel | 94/100 | 0.875 | | v0.1.3 (post-ADR-0006) | Haiku 4.5 | resolved-corpus enrichment | 96/100 | 0.902 | | v0.1.4 (post-ADR-0009) | Sonnet 4.6 | model swap | 99/100 | 0.946 |
Pre-registered in manifest.json at v1.0.0; gated via eval/runner.ts; v2.0.0 amendment to bump n→300 deferred per ADR-0001.
Run on demand (locally):
EVAL_SUPABASE_PAT=... EVAL_HOST_PROJECT_REF=... ANTHROPIC_API_KEY=... \
bun run eval/runner.ts ci-full
# Cost ~$2-3, ~30 min wallclock against a transient Pro branch.
# Override the routing model with EVAL_TRIAGE_MODEL=claude-sonnet-4-6
# (default haiku-4-5).Or via GitHub Actions (workflow_dispatch, requires the three secrets set on the repo):
gh workflow run ci-full.yml -R 0xquinto/supabase-realtime-skillThe cron schedule was dropped (b49a1fc) — daily runs against a substrate that doesn't change daily are cost burn for no signal. Methodology evidence is the workflow file + the run-on-demand path, not a calendar trigger.
Methodology: 4 metrics, binary scoring, Wilson 95% CIs, McNemar paired-test comparisons. No LLM-as-judge as a gate. See references/eval-methodology.md.
Decisions + findings
The judgment trail. Each ADR carries a falsifiable predicted effect or a documented constraint. The calibration sequence above is one ADR per step — every gain attributable to one named intervention with its own paper trail. See docs/decisions/ for the live index.
- ADR-0001 — manifest v1 stays uncalibrated. Wilson CI bounds at n=100 are mechanically unreachable; resist the urge to retroactively soften the gate. v2.0.0 amendment deferred to a versioned bump.
- ADR-0002 — f019 seed relabel. The eval caught a mislabeled fixture (service-bug bucketed as
general); relabeled with audit trail before re-running ci-full. - ADR-0003 — dual-path embedding provider. Canonical schema stays spec-compliant
halfvec(1536)with OpenAI; eval falls through tohalfvec(384)Transformers.js whenOPENAI_API_KEYunset. Closes the spec deviation. - ADR-0004 — reshape T31 as user-feedback (proposed). Pre-T31 recon found the upstream maintainer's policy is monolith + references, not federation. Reshape pending operator decision.
- ADR-0005 — Mousavi data-quality audit. Per-fixture audit on the 20 ci-fast seeds; 5% flaw rate (only f017 boundary-ambiguous), below the 10% repair threshold. Concordant with the eval's systematic-miss findings.
- ADR-0006 — f017 cluster remediation. Pre-registered prediction → ran → magnitude predictions missed by 1pp; hypothesis directionally supported. Honest "Accepted with caveats."
- ADR-0007 — pre-staged v2.0.0 manifest design. n=300 with tightened CI gates. Locks the calibration loop's next move without yet shipping the file.
- ADR-0008 — comparative embedding eval (REJECTED). OpenAI 1536 vs MiniLM 384 paired-McNemar — byte-for-byte identical metrics at the Sonnet 4.6 model level. Hypothesis cleanly falsified; ADR-0003's dual-path is now empirically validated as a free choice.
- ADR-0009 — multi-model probe result. Sonnet 4.6 closes 2 of 3 residual f017 misses + reverses ADR-0006's f010 drift. 99/100, CI low 0.946. Validates ADR-0007's pre-staged v2.0.0 thresholds as conservative.
Operational findings from the spike — 5s Realtime warm-up window, Deno bundler .ts extension policy, vitest workspace gotcha — live in docs/spike-findings.md.
The pre-T31 recon (whether to file the upstream issue) lives in docs/upstream/recon/2026-05-01-pre-t31-engagement-recon.md. The Mousavi audit data lives in playbook/research/data-quality-audit.md.
Layout
SKILL.md— Open Skills Standard entry; three triggers + tools at a glancereferences/— 11 opinionated patterns (predicates, RLS implications, multi-tenant RLS deep dive, replication identity, pgvector composition, eval methodology, edge deployment, presence-deferred, worked example, outbox forwarder, queue drain)CHANGELOG.md— release log (0.3.0headlines the E2E smoke surface ship;0.2.0headlines the worked-example ship)src/server/— MCP server (5 tools) + bounded primitives + production adapterssrc/client/— npm consumer barrel (boundedWatch + schemas + types)supabase/functions/mcp/— Edge Function entry (WebStandardStreamableHTTPServerTransport)eval/— regression harness with pre-registered thresholds + synthesizer + triage agentfixtures/— 20 hand-curated ci-fast seeds + 100 ci-full (20 seeds × 5 LLM-augmented variations)docs/writeup.md— the headline narrativedocs/decisions/— ADRs (see directory for the live index)docs/upstream/— recon + spec + plan that produced this repoplaybook/— eval methodology backbone
Engineering polish
- Dual ESM + CJS publish via
tsupwith.d.tsand.d.ctsdeclarations - npm publish via OIDC Trusted Publisher (no
NPM_TOKENsecret in CI; sigstore-signed provenance attestation on every release) - ci-fast (PR-blocking) + ci-full (manual
workflow_dispatch) split — full CI in.github/workflows/ - Strict TypeScript (
exactOptionalPropertyTypes,noUncheckedIndexedAccess,noExplicitAny)
License
Apache-2.0. See LICENSE.
