bot-relay-mcp
v2.7.2
Published
MCP server for AI coding agent terminal-to-terminal communication. Local-first, SQLite-backed, webhook-integrated.
Maintainers
Readme
bot-relay-mcp
A local-first coordination bus for AI coding agents. Durable inboxes, task queues, and wakeups for Claude Code, Cursor, Cline, Codex-style CLIs, scripts, and webhooks. Two interfaces, one shared SQLite database, zero infrastructure.
v2.7 — Tether-ready. 30 MCP tools. Cross-process inbox notifications via a durable outbox + SSE keepalive (any agent's inbox change wakes IDE subscribers across stdio + HTTP processes). Pairs with the Tether VSCode extension for live in-editor status. See CHANGELOG for the v2.7 phase-by-phase arc and the prior v2.1 architectural sweep (explicit auth_state machine, managed-agent rotation grace, keyring-based encryption, unified relay CLI).
What is this?
bot-relay-mcp gives AI coding agents and external systems a way to coordinate.
Two audiences, two transports:
- AI coding agents (Claude Code, Cursor, Cline, Zed) connect via stdio MCP. Drop one entry into
~/.claude.jsonand the relay's tools appear inside Claude. No daemon required. - External systems (n8n, Slack, Telegram, custom scripts) connect via HTTP+SSE with optional Bearer auth. Trigger agent actions or receive webhook events.
Everything reads and writes the same SQLite file at ~/.bot-relay/relay.db. There is no cloud, no daemon you have to install, no service mesh.
Why not just use Claude Agent Teams?
Fair question. Claude Code v2.1.32 shipped Agent Teams — native multi-agent coordination inside Claude. If your setup is Claude-only and stays that way, use it. It's the right tool.
bot-relay-mcp solves a different problem: coordination across a heterogeneous LLM stack. The tools your agents run on probably aren't all Claude. You might have Claude Code terminals for architecture work, Codex or Cursor on a different repo, an n8n workflow firing webhooks, a background Python process polling results, a Telegram bot forwarding alerts. Agent Teams coordinates Claude with Claude. bot-relay-mcp coordinates whatever speaks MCP (or plain HTTP) with whatever else does.
Three things it does that a vendor-native solution can't:
- LLM-agnostic. Any MCP client — Claude Code, Cursor, Cline, Zed — registers and talks to any other. Non-MCP systems join over the HTTP+SSE transport with optional Bearer auth. No client is privileged.
- Self-hosted. Everything lives in a single SQLite file on your machine. No cloud account, no control plane, no vendor lock-in. The HTTP daemon is optional; stdio-only works offline.
- Federation-bound (v2.3 roadmap). Hub/edge federation is on the locked plan so multiple self-hosted relays can mesh without collapsing into one shared namespace. Teams of teams, across orgs, without routing through someone else's service.
It's also a durable bus, not an in-process coordinator. Messages, tasks, and channels survive individual agent sessions — restart a terminal, spawn a fresh one, come back tomorrow; the state is still there.
If your stack is Claude-only and stays that way, use Agent Teams. If it isn't, or if you want a self-hosted bus that survives individual sessions, that's what bot-relay-mcp is for.
Quick Start (30 seconds)
Once published to npm, setup is a single config entry — no cloning, no compiling, no absolute paths.
Add to your ~/.claude.json:
{
"mcpServers": {
"bot-relay": {
"command": "npx",
"args": ["-y", "bot-relay-mcp"],
"type": "stdio"
}
}
}The first invocation fetches the package and starts the server. Subsequent launches are instant.
Quick Start (from source)
git clone https://github.com/Maxlumiere/bot-relay-mcp.git
cd bot-relay-mcp
npm install
npm run buildAdd to ~/.claude.json:
{
"mcpServers": {
"bot-relay": {
"command": "node",
"args": ["/absolute/path/to/bot-relay-mcp/dist/index.js"],
"type": "stdio"
}
}
}Open two Claude Code terminals and try it:
Terminal A:
> Register on the relay as "planner" with role "orchestrator"
> Discover other agents
> Send a message to "builder": "Can you handle the API layer?"Terminal B:
> Register on the relay as "builder" with role "builder"
> Check my relay messages
> Reply to planner: "On it."The database is created automatically at ~/.bot-relay/relay.db on first use. That is the full setup.
File permissions (v2.1). The relay creates ~/.bot-relay/ at 0700 and relay.db + backup tarballs at 0600 — owner-only. config.json is operator-managed; the relay never chmods it but logs a warning at startup if it's more permissive than 0600. POSIX only — native Windows NTFS uses ACLs, not POSIX modes, so the chmod calls are no-ops there (documented).
Tether (v2.5 + v2.7 cross-process delivery)
relay://inbox/<agent_name> is a subscribable MCP resource. Any MCP-aware client subscribes via the standard subscribe request and receives notifications/resources/updated pushes when the agent's inbox changes (new mail, broadcast, drain). No polling, standard MCP semantics.
v2.7 added cross-process delivery. A message sent from a stdio MCP terminal (one OS process) now wakes IDE subscribers connected to the HTTP daemon (a different OS process) via a durable inbox_events outbox table polled by the daemon. v2.5 only delivered events same-process — v2.7 is the load-bearing release for any Tether-style IDE integration.
The bundled VSCode extension ships on the marketplace:
code --install-extension lumiere-ventures.bot-relay-tetherOr browse marketplace.visualstudio.com/items?itemName=lumiere-ventures.bot-relay-tether. Source for the extension lives at extensions/vscode/ in this repo for users who want to build from source. The extension surfaces pending count + last-message recency in the status bar, opens a webview with the last message preview on click, and optionally auto-types inbox into the integrated terminal so Claude Code wakes up. See extensions/vscode/README.md for install + config and docs/tether-roadmap.md for the free-vs-paid scope line.
Tools
Identity
| Tool | Inputs | Description |
|------|--------|-------------|
| register_agent | name, role, capabilities[] | Register this terminal as a named agent. Uses upsert — safe to call multiple times. |
| unregister_agent | name | Remove an agent from the relay. Idempotent. Fires agent.unregistered webhook on success. |
| discover_agents | role (optional) | List all registered agents with status (online/stale/offline). |
| spawn_agent | name, role, capabilities, cwd?, initial_message? | Spawn a new Claude Code terminal pre-configured as a relay agent. Cross-platform (v1.9): macOS (iTerm2/Terminal.app), Linux (gnome-terminal/konsole/xterm/tmux fallback chain — tmux covers headless servers), Windows (wt.exe/powershell.exe/cmd.exe). See docs/cross-platform-spawn.md. |
Messaging
| Tool | Inputs | Description |
|------|--------|-------------|
| send_message | from, to, content, priority | Send a direct message to another agent by name. |
| get_messages | agent_name, status, limit | Check your mailbox. Pending messages are auto-marked as read. |
| broadcast | from, content, role (optional) | Send a message to all registered agents (or filter by role). |
Tasks
| Tool | Inputs | Description |
|------|--------|-------------|
| post_task | from, to, title, description, priority | Assign a task to another agent. |
| post_task_auto (v2.0) | from, title, description, required_capabilities[], priority | Auto-route to the least-loaded agent whose capabilities match ALL required. Queues if no match; assigns on the next capable registration. |
| update_task | task_id, agent_name, action, result? | Actions: accept / complete / reject / cancel (v2.0, requester-only) / heartbeat (v2.0, renews lease). State machine + CAS enforced. |
| get_tasks | agent_name, role, status, limit | Query your task queue (assigned to you or posted by you). |
| get_task | task_id | Get a single task by ID with full details. |
Channels (v2.0)
| Tool | Inputs | Description |
|------|--------|-------------|
| create_channel | name, description?, creator | Create a named channel for multi-agent coordination. Requires channels capability. |
| join_channel | channel_name, agent_name | Join any public channel. |
| leave_channel | channel_name, agent_name | Leave a channel. |
| post_to_channel | channel_name, from, content, priority | Post to a channel you are a member of. |
| get_channel_messages | channel_name, agent_name, limit, since? | Read messages posted to a channel since your join time. |
Status + Health (v2.0)
| Tool | Inputs | Description |
|------|--------|-------------|
| set_status | agent_name, status | Signal online / busy / away / offline. busy/away exempt you from health-monitor task reassignment. |
| health_check | (none) | Report relay version, uptime, and live counts (agents, messages, tasks, channels). No auth required. |
Webhooks (v1.2+)
| Tool | Inputs | Description |
|------|--------|-------------|
| register_webhook | url, event, filter, secret | Subscribe to relay events via HTTP POST. |
| list_webhooks | (none) | List all registered webhook subscriptions. |
| delete_webhook | webhook_id | Remove a webhook subscription. |
Supported events: message.sent, message.broadcast, task.posted, task.accepted, task.completed, task.rejected, task.cancelled (v2.0), task.auto_routed (v2.0), task.health_reassigned (v2.0), channel.message_posted (v2.0), agent.unregistered, agent.spawned, webhook.delivery_failed, * (all).
v2.0 — retry with backoff. Failed webhook deliveries retry at 60s / 300s / 900s (3 attempts). CAS-claimed per row — no double delivery. Piggybacks on webhook-firing tool calls, no background thread.
When secret is provided, each delivery includes an X-Relay-Signature: sha256=... HMAC header. Filter optionally restricts firing to events where from_agent or to_agent matches.
Example: Task Delegation
Terminal A — Orchestrator:
1. register_agent("orchestrator", "planner", ["delegation", "review"])
2. discover_agents() → sees "worker" is online
3. post_task(from: "orchestrator", to: "worker",
title: "Write auth tests",
description: "Cover login, logout, token refresh. Use vitest.",
priority: "high")
4. send_message(from: "orchestrator", to: "worker",
content: "Task posted — check your queue.")Terminal B — Worker:
1. register_agent("worker", "builder", ["testing", "backend"])
2. get_messages("worker") → message from orchestrator
3. get_tasks("worker", role: "assigned", status: "posted") → auth test task
4. update_task(task_id, "worker", "accept")
5. ... does the work ...
6. update_task(task_id, "worker", "complete", result: "12 tests passing")
7. send_message(from: "worker", to: "orchestrator",
content: "Auth tests done. All passing.")Terminal A checks results:
1. get_messages("orchestrator") → "Auth tests done."
2. get_task(task_id) → status: completed, result: "12 tests passing"How It Works
Every Claude Code terminal spawns its own MCP server process via stdio. All processes read and write the same SQLite file at ~/.bot-relay/relay.db. SQLite WAL mode handles concurrent access safely. Messages older than 7 days and completed tasks older than 30 days are purged automatically on startup.
Unified relay CLI (v2.1)
One entry, six subcommands: doctor / init / test / generate-hooks / backup / restore. First-run setup:
relay init # interactive
relay init --yes # defaults + random HTTP secretrelay doctor runs a diagnostic sweep; relay test runs a minimal self-check against a throwaway relay; relay generate-hooks emits Claude Code hook JSON for ~/.claude/settings.json. Full reference in docs/cli.md. The standalone bin/relay-backup + bin/relay-restore from Phase 2c have been absorbed into relay backup + relay restore.
Token lifecycle (v2.1)
Two new tools for credential hygiene: rotate_token lets an agent swap its own token with history preserved; revoke_token lets an admin-capable agent nullify another agent's token_hash (target re-bootstraps via the Phase 2b migration path). New admin capability is never auto-granted — register admin agents explicitly. Full operator runbook in docs/token-lifecycle.md.
Error codes (v2.1)
Every tool error response carries a stable error_code token alongside the free-form error string. Branch on the code; never string-match the message. Full catalog + stability guarantee in docs/error-codes.md. Source of truth: src/error-codes.ts.
Protocol version (v2.1)
Beyond the package version string, the relay surfaces a protocol_version via register_agent + health_check responses. Clients should key compatibility on protocol_version (bumps only on tool-surface changes) rather than the package version (bumps on every ship). See docs/protocol-version.md for SemVer rules + a client-side compatibility snippet.
HTTP Mode — for n8n, Slack, Telegram, custom scripts
Run the relay as an HTTP daemon and any HTTP client can drive it:
RELAY_TRANSPORT=http RELAY_HTTP_SECRET=your-shared-secret node dist/index.js
# Listens on http://127.0.0.1:3777Production deployment — set
RELAY_HTTP_SECRET. v2.1 refuses to start on a non-loopback host (0.0.0.0, a public IP, Docker-p 3777:3777without loopback pinning) unlessRELAY_HTTP_SECRETis set. Loopback binds (127.0.0.1,::1,localhost) stay zero-config for local development.Dev-only escape hatch:
RELAY_ALLOW_OPEN_PUBLIC=1lets the relay start anyway on a public host without a secret — useful for throwaway local Docker nets, but logs a loud warning every startup. Never use in production.
Endpoints:
POST /mcp— JSON-RPC (the MCP protocol over HTTP+SSE). RequiresAuthorization: Bearer <secret>ifhttp_secretis configured.GET /health— server status (always open, no auth)GET /— built-in dashboard (live view of agents, messages, tasks, webhooks). v2.1 Phase 4d: protected by Host-header allowlist (DNS-rebinding defense) + auth gate (RELAY_DASHBOARD_SECRETorRELAY_HTTP_SECRETfallback). Loopback binds allow no-secret access for dev; non-loopback binds require a secret. Full policy indocs/dashboard-security.md.GET /api/snapshot— JSON snapshot of relay state (same gates as/)
Three transport modes:
stdio(default) — per-terminal, for AI coding agentshttp— daemon, for external systemsboth— HTTP daemon plus a stdio connection (useful for bridge scripts)
All transports share the same SQLite database. Stdio agents and HTTP clients see the same world.
Process-boundary reminder (v2.1.3): stdio MCP clients and the HTTP daemon are separate processes. Each Claude Code terminal with "type":"stdio" in ~/.claude.json spawns its own node dist/index.js child. Restarting the :3777 HTTP daemon never affects stdio clients — their own child processes are untouched. Operator /mcp reconnect is only needed after restart for "type":"http" MCP clients pointed at the daemon URL. See docs/transport-architecture.md for the full topology + post-restart operator checklist.
n8n integration example
Trigger a Claude Code agent from an n8n workflow:
POST http://127.0.0.1:3777/mcp
Authorization: Bearer your-shared-secret
Content-Type: application/json
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "post_task",
"arguments": {
"from": "n8n-workflow-42",
"to": "builder",
"title": "Process new lead",
"description": "Lead data: ...",
"priority": "high"
}
}
}Then register a webhook so n8n hears about completion:
{
"jsonrpc": "2.0", "id": 2, "method": "tools/call",
"params": {
"name": "register_webhook",
"arguments": {
"url": "https://your-n8n.example.com/webhook/abc",
"event": "task.completed",
"secret": "shared-with-n8n"
}
}
}When a task completes, n8n receives a POST with the result and an HMAC signature in X-Relay-Signature.
Config File (v1.2+)
Optional ~/.bot-relay/config.json:
{
"transport": "http",
"http_port": 3777,
"http_host": "127.0.0.1",
"webhook_timeout_ms": 5000,
"http_secret": null,
"trusted_proxies": []
}Env vars override file config: RELAY_TRANSPORT, RELAY_HTTP_PORT, RELAY_HTTP_HOST, RELAY_HTTP_SECRET, RELAY_TRUSTED_PROXIES (comma-separated CIDRs).
Trusted Proxies and X-Forwarded-For (v1.6.2)
By default, the relay IGNORES the X-Forwarded-For header completely. Rate limits are keyed on the direct socket peer IP only. This prevents a caller from sending a spoofed header to get their own rate-limit bucket.
If you front the relay with Cloudflare, nginx, or any other reverse proxy, configure trusted_proxies with CIDRs of those proxies:
{
"trusted_proxies": ["127.0.0.0/8", "::1/128", "10.0.0.0/8"]
}Or via env var:
RELAY_TRUSTED_PROXIES="127.0.0.0/8,::1/128,10.0.0.0/8"When the direct peer IP falls in the trusted list, the relay walks the X-Forwarded-For chain right-to-left, skipping trusted hops, and uses the leftmost-untrusted hop as the real client IP. This matches RFC 7239 §7.4 and how nginx/Express normally handle this.
Per-Agent Tokens (v1.7)
Every tool call (other than first-time register_agent and /health) requires an agent token. The token identifies WHO is calling — separate from and stronger than the shared HTTP secret, which only identifies a trusted network.
Issuing a token — first registration:
# The response returns `agent_token` ONCE. Save it.
{
"jsonrpc": "2.0", "id": 1, "method": "tools/call",
"params": {
"name": "register_agent",
"arguments": { "name": "builder", "role": "builder", "capabilities": ["tasks"] }
}
}The server stores only a bcrypt hash. The raw token is surfaced once in the response agent_token field and echoed to stderr as [auth] New agent_token issued for "builder". Save it: RELAY_AGENT_TOKEN=.... If you lose it, unregister_agent (auth'd) then re-register.
Presenting the token on every subsequent call — three ways:
- Arg field (works for stdio + HTTP):
{ "name": "send_message", "arguments": { "from": "builder", "to": "ops", "content": "hi", "agent_token": "..." } } - HTTP header:
X-Agent-Token: <token> - Env var (stdio flow, also picked up by HTTP client wrappers):
export RELAY_AGENT_TOKEN=<token>
Capabilities are set at first registration and are immutable (v1.7.1). To change an agent's capability set, call unregister_agent (with its token) then register_agent with the new capability list. Re-register attempts that change caps are ignored with a capabilities_note in the response.
Capability catalog:
spawn— required forspawn_agenttasks— required forpost_task,update_taskwebhooks— required forregister_webhook,list_webhooks,delete_webhookbroadcast— required forbroadcast- All other tools are always allowed for any authenticated agent (no capability check).
Migration for pre-v1.7 agents (v2.1+): agents registered before v1.7 have no token hash. A register_agent call against such a row self-migrates — the relay detects the null hash, issues a fresh token, and the agent is first-class from that point on. No RELAY_ALLOW_LEGACY=1 required for the migration call itself. RELAY_ALLOW_LEGACY is still available as a coarser escape hatch for non-register tool calls against unmigrated legacy rows (e.g., if you want send_message to work before an agent has migrated); turn it OFF once all your agents have migrated.
Encryption at Rest (v1.7 opt-in; keyring + rotation in v2.1 Phase 4b.3)
Set the keyring to encrypt message/task/audit/webhook content fields in the SQLite database with AES-256-GCM. Three configuration sources (pick exactly one — multi-set is rejected at startup):
# 1. Inline JSON (for CI / secrets managers)
export RELAY_ENCRYPTION_KEYRING='{"current":"k1","keys":{"k1":"<base64-32>"}}'
# 2. File path (operator-friendly; chmod 600)
export RELAY_ENCRYPTION_KEYRING_PATH=~/.bot-relay/keyring.json
# 3. Legacy single-key (auto-wraps to { current: "k1", keys: { k1: <value> } }; deprecation warning at startup)
export RELAY_ENCRYPTION_KEY="<base64-32>"
# Generate a key:
openssl rand -base64 32
# or:
node -e 'console.log(require("crypto").randomBytes(32).toString("base64"))'When the keyring is set, the relay transparently encrypts on write (with current key) and decrypts on read (with any key in the keyring). Every ciphertext carries an enc:<key_id>:... prefix so rows are self-describing. Legacy enc1:... rows (pre-Phase-4b.3 deployments) decrypt via RELAY_ENCRYPTION_LEGACY_KEY_ID (default k1).
Rotating keys (online)
Full runbook at docs/key-rotation.md. In summary:
- Add the new key to the keyring while keeping the old one (both decrypt;
currentstill points to old). - Flip
currentto the new key; restart. New writes use the new key. relay re-encrypt --from old_key_id --to new_key_id --yes— scans + migrates all existing rows across 5 encrypted columns. Resumable.relay re-encrypt --verify-clean old_key_id— exit 0 = safe to retire.- Remove the old key from the keyring; restart.
Without the keyring set, content is stored plaintext (default, convenient for local dev).
Rotation Guide — HTTP Shared Secret (v1.7)
The RELAY_HTTP_SECRET shared secret can be rotated without downtime using a grace window:
Step 1 — promote the new secret as primary, keep the old as previous:
RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_HTTP_SECRET_PREVIOUS="old-secret-v1" \
RELAY_TRANSPORT=http node dist/index.jsDuring this window, BOTH secrets are accepted. Requests using the old secret receive an X-Relay-Secret-Deprecated: true response header as a signal to upgrade.
Step 2 — update every client to present new-secret-v2 in their Authorization: Bearer … or X-Relay-Secret header.
Step 3 — watch for the deprecation header on your dashboard/logs until no more requests use the old secret.
Step 4 — drop the old secret:
RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_TRANSPORT=http node dist/index.js # RELAY_HTTP_SECRET_PREVIOUS unsetMultiple previous secrets are supported as a comma-separated list:
RELAY_HTTP_SECRET_PREVIOUS="v1-secret,v0-secret"Secret comparisons are timing-safe (v1.7.1 — crypto.timingSafeEqual), so an attacker cannot recover the secret via byte-by-byte response-timing measurement.
Multi-machine: centralized deployment (v2.1)
bot-relay-mcp is LLM-agnostic, CLI-agnostic, and deployment-flexible. Pick the path that fits your setup:
Path A — Single-machine (default). Stdio transport, per-terminal process, zero infrastructure. Best for solo development on one laptop. No secrets, no reverse proxies, no ops — run npm install + add the stdio entry to your MCP client config and you're done. Covered throughout this README.
Path B — Multi-machine (centralized, v2.1 Phase 7r). One bot-relay-mcp hub on a VPS, multiple thin MCP clients connecting via HTTP. Agents on different machines can send_message, post tasks, subscribe to webhooks, and join channels through shared state. No new architecture — just the HTTP transport we've had since v1.2, packaged with a convenience CLI in v2.1.
When to pick centralized
- Two or more machines in play (dev laptop + CI, work + personal, family devices)
- AI agents running on different hosts that need to coordinate
- Team environments where multiple people connect their MCP clients to a shared relay
Quick pair flow
On the hub (VPS, reachable at e.g. https://relay.example.com): install bot-relay-mcp, run under systemd with RELAY_TRANSPORT=http + RELAY_HTTP_SECRET, terminate TLS with Caddy/nginx. See docs/multi-machine-deployment.md for the worked VPS runbook.
On each client machine:
relay pair https://relay.example.com \
--name "$(whoami)-$(hostname -s)" \
--role operator \
--capabilities spawn,tasks,webhooks,broadcast,channels \
--secret "$RELAY_HTTP_SECRET"relay pair probes the hub, registers this machine as an agent, captures the returned one-time agent_token, and emits an MCP client config snippet ready to paste into ~/.claude.json / ~/.cursor/mcp.json / etc. Persist the token (export RELAY_AGENT_TOKEN=… in your shell rc) so hooks can authenticate on every terminal open.
Verify after pairing:
relay doctor --remote https://relay.example.comExpected: PASS on reachability + protocol compatibility + token auth + hub auth config.
Trust-model tradeoffs
- Hub operator can read plaintext messages in RAM (even with
RELAY_ENCRYPTION_KEYset, decryption happens server-side for routing) - Hub is a single point of failure for cross-machine coordination
- Recommended for trusted groups (families, small teams, personal multi-machine setups)
- NOT recommended for mutually distrustful parties sharing a single hub, or compliance-bound workloads where in-RAM access by the operator is a policy violation
See SECURITY.md §Centralized deployment trust model for the full posture + incident response playbook.
Bridge to other tools via MCP
bot-relay-mcp is MCP-compatible, so your MCP client can connect to both bot-relay-mcp AND other MCP servers (Slack, Discord, Matrix, email, etc.) simultaneously. That's an operator deployment choice — we don't integrate those into bot-relay-mcp. See docs/multi-machine-deployment.md §3 for the pattern.
Zero-Friction Setup
To skip approval prompts for relay tools, add this to your project's .claude/settings.json:
{
"permissions": {
"allow": [
"mcp__bot-relay__register_agent",
"mcp__bot-relay__discover_agents",
"mcp__bot-relay__send_message",
"mcp__bot-relay__get_messages",
"mcp__bot-relay__broadcast",
"mcp__bot-relay__post_task",
"mcp__bot-relay__update_task",
"mcp__bot-relay__get_tasks",
"mcp__bot-relay__get_task",
"mcp__bot-relay__register_webhook",
"mcp__bot-relay__list_webhooks",
"mcp__bot-relay__delete_webhook"
]
}
}Auto-Check on Session Start
Add a SessionStart hook so every terminal automatically checks the relay for pending messages when it opens. See docs/hooks.md for the full configuration.
Near-Real-Time Mail Delivery (v1.8)
The SessionStart hook only fires when a terminal opens. If an agent is actively working and mail arrives mid-session, it does not see the message until next startup (or a human pastes it in).
v1.8 adds a PostToolUse hook — hooks/post-tool-use-check.sh — that fires after every tool call, checks the mailbox, and injects pending messages as additionalContext so the running session picks them up immediately.
Install per-project (NOT global), in <project>/.claude/settings.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "/path/to/bot-relay-mcp/hooks/post-tool-use-check.sh",
"timeout": 5
}
]
}
]
}
}Important — if your path contains spaces, single-quote it inside the JSON string. Claude Code passes the
commandto the shell, which splits on whitespace. Without the single quotes the hook silently fails with errors like/bin/sh: ... is a directory. Example for a real installation at/Users/maxime/Documents/Ai stuff/Claude AI/bot-relay-mcp/:"command": "'/Users/maxime/Documents/Ai stuff/Claude AI/bot-relay-mcp/hooks/post-tool-use-check.sh'"The outer double-quotes are JSON; the inner single-quotes are shell. Paths with no spaces do not need this treatment.
The hook prefers the HTTP path when RELAY_AGENT_TOKEN is set and the daemon is running (full auth + audit), falling back to direct sqlite on RELAY_DB_PATH otherwise. It does NOT re-register (SessionStart handles that), does NOT check tasks (simpler focus, less context pressure), and silent-exits when there is no mail. Full docs + troubleshooting in docs/post-tool-use-hook.md.
Honest limitation: idle terminals get no delivery. The hook only fires when the agent is actively running tool calls. For long-idle windows, still rely on SessionStart + human attention.
Turn-End Mail Delivery (v2.1)
PostToolUse only fires on turns that include at least one tool call. A text-only turn (Claude responds with no tool invocation) does not trigger it. The Stop hook — hooks/stop-check.sh — closes that gap by firing on every turn-end, whether or not the turn invoked tools. Install both together in your project's .claude/settings.json:
{
"hooks": {
"Stop": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "/path/to/bot-relay-mcp/hooks/stop-check.sh",
"timeout": 5
}
]
}
]
}
}Same single-quote-the-path-if-it-contains-spaces rule, same env vars, same HTTP/sqlite fallback, same silent-fail contract as PostToolUse. Full docs + troubleshooting in docs/stop-hook.md.
Honest limitation: Stop does NOT wake truly idle terminals. If no turn is in progress, neither hook fires. For long-idle windows, use the Layer 2 Managed Agent reference (examples/managed-agent-reference/).
Backup & Restore (v2.1)
Two CLIs for disaster recovery:
relay-backup # snapshot to ~/.bot-relay/backups/
relay-backup --output /srv/backup.tgz # custom path
relay-restore ~/.bot-relay/backups/relay-backup-<iso>.tar.gzrelay-backup produces a tar.gz of the live DB (via a consistent VACUUM INTO snapshot — safe while the daemon is running), the optional config.json, and a manifest.json with schema version and row counts. Works identically on the native better-sqlite3 driver and the optional sql.js wasm driver.
relay-restore always safety-backs-up the current DB first (to ~/.bot-relay/backups/pre-restore-<iso>.tar.gz). If that safety backup fails, the restore aborts untouched. It then refuses if the daemon appears to be running (/health probe, best-effort), refuses schema-version mismatches (higher = hard refuse, lower = --force overrides), runs PRAGMA integrity_check on the extracted DB, and finally atomic-swaps the new DB into place.
Full docs + troubleshooting in docs/backup-restore.md.
Lost-Token Recovery (v2.1)
Close a terminal, lose RELAY_AGENT_TOKEN, and the relay rejects your register_agent with AUTH_FAILED because the row is intact. Clear the registration so the agent can re-bootstrap:
relay recover <agent-name> # interactive confirm
relay recover <agent-name> --yes # skip confirm (for scripts)
relay recover <agent-name> --dry-run # show what would change, commit nothing
relay recover <agent-name> --db-path PATH # non-default DB locationMessages and tasks addressed to the agent are preserved — only the agents + agent_capabilities rows are cleared. After recovery, the operator calls register_agent with the same name/role/capabilities and captures a fresh agent_token.
Trust model: filesystem access to ~/.bot-relay/relay.db IS the authority (same boundary the daemon relies on). Not an MCP tool — the caller by definition cannot authenticate. The CLI emits an audit_log entry with tool='recovery.cli' + the operator's OS username for incident traceability.
External-CLI Token Mint (v2.6)
Some LLM CLI clients (Codex 5.5 was the canonical case) have safety monitors that pattern-match register_agent followed immediately by use returned token as a credential handoff and cancel the second tool call. The result: the agent row exists on the relay, but the agent process never captured its token, and every subsequent re-register attempt fails with NAME_COLLISION_ACTIVE because the orphan session is still considered live.
relay mint-token solves this by minting the token outside the agent's process. The operator hands the plaintext token to the agent's environment via RELAY_AGENT_TOKEN, and the agent authenticates on its first MCP call without ever invoking register_agent:
relay mint-token codex-5-5 --role builder --capabilities build,test,auditOutput (token shown ONCE):
✓ Minted token for new agent "codex-5-5"
Token (shown ONCE — store it now):
<43-char-base64url-token>
Set in your CLI's environment before launching:
export RELAY_AGENT_NAME=codex-5-5
export RELAY_AGENT_TOKEN=<43-char-base64url-token>Other forms:
relay mint-token NAME --json # structured output for scripts
relay mint-token EXISTING --force # rotate token; caps + role preserved
relay mint-token NAME --description "human-readable" # discoverable in dashboardCaps are locked at first mint per the feedback_relay_caps_immutable.md discipline; --force rotates the token but preserves both caps and role. The CLI emits a daemon-running advisory to stderr and writes an audit_log entry (tool='agent.token_minted') with the operator's OS username, the --force flag state, and whether the row was created or rotated. Full setup walkthrough including platform-specific token-storage best practices: docs/agents/external-cli-setup.md.
Cross-Platform Spawn (v1.9)
spawn_agent opens a new Claude Code terminal on macOS, Linux, and Windows via a driver abstraction:
- macOS —
bin/spawn-agent.sh(iTerm2 → Terminal.app). Unchanged from v1.6.4, preserves the 3-layer hardening + 19-payload adversarial test suite. - Linux —
gnome-terminal→konsole→xterm→tmuxfallback chain. The tmux fallback creates a detached session (attach later withtmux attach -t <agent-name>) — covers headless servers with no GUI. - Windows —
wt.exe(Windows Terminal) →powershell.exe→cmd.exe.
Driver selection: RELAY_TERMINAL_APP override (allowlist-gated) > process.platform auto-detect > in-driver fallback chain.
Full install requirements per platform + manual smoke-test checklists + troubleshooting: docs/cross-platform-spawn.md.
Env-var propagation is minimal by default (principle of least authority): system essentials + anything prefixed RELAY_*. Secrets like AWS_SECRET_ACCESS_KEY are NOT passed to spawned agents unless explicitly prefixed.
Plug-and-play defaults (v2.1.2): spawned terminals are configured for autonomous work out of the box — they auto-pull mail from their inbox on first turn instead of idling, run with --permission-mode bypassPermissions so they don't ask the operator to approve every tool call, get an iTerm2 / session-picker title set to the agent name, and run at --effort high to cap token spend (parent terminals doing strategic work may use xhigh; spawned children doing mechanical work shouldn't inherit it). Every default has an env override for the rare case the legacy behavior is wanted:
| Default | Env override | Notes |
|---|---|---|
| Kickstart prompt sent as positional arg | RELAY_SPAWN_KICKSTART="custom" / RELAY_SPAWN_NO_KICKSTART=1 | The default tells the spawned agent to pull get_messages and act on inbox. |
| --permission-mode bypassPermissions | RELAY_SPAWN_PERMISSION_MODE=<mode> | Allowlist: acceptEdits, auto, bypassPermissions, default, dontAsk, plan. |
| --name <agent> | RELAY_SPAWN_DISPLAY_NAME="custom" | Shows up as the iTerm2 tab + Claude Code session title. |
| --effort high | RELAY_SPAWN_EFFORT=<level> | Allowlist: low, medium, high, xhigh, max. |
Layer 2: Managed Agents (v1.10)
Agents that are NOT Claude Code terminals — Python daemons, Node workers, Hermes/Ollama integrations, custom scripts. They connect to the relay via HTTP (recommended) or direct SQLite, use the same 30 MCP tools (v2.7), and authenticate with per-agent tokens. If registered with managed:true, they also receive token-rotation push-messages over the normal get_messages channel — see docs/managed-agent-protocol.md.
Full integration guide with mental model, auth flow, lifecycle, error patterns, and security notes: docs/managed-agent-integration.md.
Runnable reference implementations (stdlib-only, ~200 LOC each):
- Python:
examples/managed-agent-reference/python/agent.py - Node:
examples/managed-agent-reference/node/agent.js
Both demonstrate: register, send/receive messages, accept + complete tasks, discover peers, SIGINT cleanup. Each has a SMOKE.md with a 5-step manual verification checklist.
SQLite Driver Options (v1.11)
The relay uses SQLite for persistent state. Two drivers are available:
native(default) —better-sqlite3, a compiled C addon. Fast, supports WAL mode, multi-process safe. Requires a C++ compiler atnpm installtime.wasm—sql.js, SQLite compiled to WebAssembly. Zero native compilation. Slightly slower writes (in-memory + write-back-to-file). Single-process only (not safe for multi-terminal stdio).
Switch with one env var:
npm install sql.js # one-time install of the optional dep
RELAY_SQLITE_DRIVER=wasm node dist/index.jsBoth drivers read the same relay.db file format. Full details, performance notes, and limitations: docs/sqlite-wasm-driver.md.
Roadmap
- v1.1: Local relay, 9 tools, SQLite, auto-purge
- v1.2: HTTP transport, webhook system, config file — 12 tools
- v1.3: Presence integrity,
unregister_agent, hook delivers mail — 13 tools - v1.4:
spawn_agent+ role templates + dashboard — 14 tools - v1.5: Built-in security — Bearer auth, audit log, rate limiting
- v1.6: Hardening pass — SSRF, input validation, path traversal, stdout discipline
- v1.7: Per-agent tokens, secret rotation, at-rest encryption, capability scoping
- v1.8: Near-real-time mail via
PostToolUsehook - v1.9: Cross-platform spawn (macOS / Linux / Windows / tmux) — Node/TS driver abstraction
- v1.10: Layer 2 Managed Agents — reference Python + Node workers
- v1.11: SQLite WASM driver (
sql.jsopt-in) — zero native compilation on Windows/Alpine/Docker/CI - v2.0: Plug-and-play — channels, smart routing (
post_task_auto), task leases + heartbeat, lazy health monitor, session-aware reads, busy/DND,health_check, webhook retry with CAS, payload size limits, config validation, auto-unregister, dead-agent purge, debug mode. 22 tools. - v2.1: Architectural completion — explicit
auth_statemachine, managed-agent rotation grace, versioned ciphertext + keyring with online rotation (relay re-encrypt), lost-token recovery CLI (relay recover), admin-initiated cross-agent rotation (rotate_token_admin), structured error_code catalog, protocol_version surface, Phase 4p webhook-secret encryption, Phase 4b.1 v2 revoke/recovery redesign. 25 tools. 14 of 14 Codex architectural findings closed. - v2.7 (current): Tether-ready cross-process inbox notifications. Durable
inbox_eventsoutbox table (schema v11→v12, idempotent migration); HTTP daemon polls + dispatches torelay://inbox/<agent>subscribers regardless of which OS process wrote the message. Reaper now skips sessions with active SSE GET streams (openGetStreamscount). Daemon emits:keepalive\n\nSSE comment frames everyRELAY_SSE_KEEPALIVE_MS(default 20s) so Electron-based clients (VS Code Tether extension) don't idle-disconnect at ~2.5min. Hermes-flaggedget_messagesfilter-after-mark P1 fix —sincefilter now applies in SQL BEFORE the read-mark mutation. 30 tools. Pairs with Tether VSCode extension v0.1.2+ on the marketplace. - v2.2: Polish — batch operations, fan-out/fan-in, scheduled messages, metrics endpoint, message ACK, private channels, token scoping, idle-terminal wake.
- v2.5: Federation — cross-machine peering, E2E encryption.
Dashboard
When running in http or both mode, open http://127.0.0.1:3777/ in a browser. You'll see:
- Live agent presence (online / stale / offline)
- Active tasks with priority and assignment
- Recent messages
- Registered webhooks
- Recently completed tasks
Auto-refreshes every 3 seconds. Useful for "what's happening across all my terminals right now?" at a glance.
Role Templates
See roles/ for drop-in role specs. Examples:
- planner.md — orchestrator that delegates and synthesizes
- builder.md — worker that accepts and completes tasks
- reviewer.md — skeptical reviewer with structured output
- researcher.md — investigates questions, returns findings
Three ways to apply a role: paste into project CLAUDE.md, pass as initial_message when spawning, or wire via shell alias.
Requirements
- Node.js 18+
- Claude Code (or any MCP-compatible client)
License
MIT
