bot-relay-mcp

v2.16.2

Published

2 days ago

MCP server for AI coding agent terminal-to-terminal communication. Local-first, SQLite-backed, webhook-integrated.

0High
0Medium
0Low

lumiereventures

mcp claude-code agent-communication multi-agent ai model-context-protocol agent-relay cursor cline

bot-relay-mcp

A local-first coordination bus for AI coding agents. Durable inboxes, task queues, and wakeups for Claude Code, Cursor, Cline, Codex-style CLIs, scripts, and webhooks. Two interfaces, one shared SQLite database, zero infrastructure.

v2.12. 34 MCP tools. The headline feature is hands-free, LLM-agnostic autowake: agents running on different models (Claude Code, Codex) wake on relay mail and coordinate as a team without manual polling, via the Tether VSCode extension plus a durable cross-process outbox, so an inbox change in one process wakes subscribers in another. See the CHANGELOG for the phase-by-phase arc, including the v2.1 architectural sweep (auth-state machine, keyring encryption, unified relay CLI), the v2.7 cross-process delivery layer, and the v2.12 pending-vs-history resolved plane.

What is this?

bot-relay-mcp gives AI coding agents and external systems a way to coordinate.

Two audiences, two transports:

AI coding agents (Claude Code, Cursor, Cline, Zed) connect via stdio MCP. Drop one entry into ~/.claude.json and the relay's tools appear inside Claude. No daemon required.
External systems (n8n, Slack, Telegram, custom scripts) connect via HTTP+SSE with optional Bearer auth. Trigger agent actions or receive webhook events.

Everything reads and writes the same SQLite file at ~/.bot-relay/relay.db. There is no cloud, no daemon you have to install, no service mesh.

Why not just use Claude Agent Teams?

Fair question. Claude Code v2.1.32 shipped Agent Teams — native multi-agent coordination inside Claude. If your setup is Claude-only and stays that way, use it. It's the right tool.

bot-relay-mcp solves a different problem: coordination across a heterogeneous LLM stack. The tools your agents run on probably aren't all Claude. You might have Claude Code terminals for architecture work, Codex or Cursor on a different repo, an n8n workflow firing webhooks, a background Python process polling results, a Telegram bot forwarding alerts. Agent Teams coordinates Claude with Claude. bot-relay-mcp coordinates whatever speaks MCP (or plain HTTP) with whatever else does.

Three things it does that a vendor-native solution can't:

LLM-agnostic. Any MCP client — Claude Code, Cursor, Cline, Zed — registers and talks to any other. Non-MCP systems join over the HTTP+SSE transport with optional Bearer auth. No client is privileged.
Self-hosted. Everything lives in a single SQLite file on your machine. No cloud account, no control plane, no vendor lock-in. The HTTP daemon is optional; stdio-only works offline.
Federation-bound (v2.3 roadmap). Hub/edge federation is on the locked plan so multiple self-hosted relays can mesh without collapsing into one shared namespace. Teams of teams, across orgs, without routing through someone else's service.

It's also a durable bus, not an in-process coordinator. Messages, tasks, and channels survive individual agent sessions — restart a terminal, spawn a fresh one, come back tomorrow; the state is still there.

If your stack is Claude-only and stays that way, use Agent Teams. If it isn't, or if you want a self-hosted bus that survives individual sessions, that's what bot-relay-mcp is for.

Quick Start (30 seconds)

Once published to npm, setup is a single config entry — no cloning, no compiling, no absolute paths.

Add to your ~/.claude.json:

{
  "mcpServers": {
    "bot-relay": {
      "command": "npx",
      "args": ["-y", "bot-relay-mcp"],
      "type": "stdio"
    }
  }
}

The first invocation fetches the package and starts the server. Subsequent launches are instant.

Quick Start (from source)

git clone https://github.com/Maxlumiere/bot-relay-mcp.git
cd bot-relay-mcp
npm install
npm run build

One-command setup (`relay init`)

After building, run the installer:

./bin/relay init

It reconciles your ~/.claude.json (mcpServers entry), ~/.claude/settings.json (SessionStart hook), and — on macOS — installs a keep-alive launchd daemon. It never touches your agent tokens, and re-running is always safe (idempotent). Use --config-only / --skip-hooks / --skip-daemon to opt out of any step.

⚠️ Security — the default is local-trust. A default relay init install has no HTTP secret: the relay binds to 127.0.0.1 and trusts every process on the same machine. Any local process can then reach the loopback HTTP surface — including the dashboard operator endpoints (/api/snapshot, set-status, wake-agent, kill-agent, operator-identity). Per-agent tokens protect who you can act as (you can't impersonate another agent), but they do not gate the whole surface. This is a reasonable boundary on a personal, single-user machine (a local process already has OS-level access to the same data), and non-loopback binds are still refused without a secret.
On a shared machine, or any non-loopback / team / remote setup, gate it: run relay init --secret <strong-random> (or export RELAY_DASHBOARD_SECRET). The daemon requires a secret to bind to any non-loopback host.

Alternatively, wire it up by hand — add to ~/.claude.json:

{
  "mcpServers": {
    "bot-relay": {
      "command": "node",
      "args": ["/absolute/path/to/bot-relay-mcp/dist/index.js"],
      "type": "stdio"
    }
  }
}

Open two Claude Code terminals and try it:

Terminal A:

> Register on the relay as "planner" with role "orchestrator"
> Discover other agents
> Send a message to "builder": "Can you handle the API layer?"

Terminal B:

> Register on the relay as "builder" with role "builder"
> Check my relay messages
> Reply to planner: "On it."

The database is created automatically at ~/.bot-relay/relay.db on first use. That is the full setup.

File permissions (v2.1). The relay creates ~/.bot-relay/ at 0700 and relay.db + backup tarballs at 0600 — owner-only. config.json is operator-managed; the relay never chmods it but logs a warning at startup if it's more permissive than 0600. POSIX only — native Windows NTFS uses ACLs, not POSIX modes, so the chmod calls are no-ops there (documented).

Tether (v2.5 + v2.7 cross-process delivery)

relay://inbox/<agent_name> is a subscribable MCP resource. Any MCP-aware client subscribes via the standard subscribe request and receives notifications/resources/updated pushes when the agent's inbox changes (new mail, broadcast, drain). No polling, standard MCP semantics.

v2.7 added cross-process delivery. A message sent from a stdio MCP terminal (one OS process) now wakes IDE subscribers connected to the HTTP daemon (a different OS process) via a durable inbox_events outbox table polled by the daemon. v2.5 only delivered events same-process — v2.7 is the load-bearing release for any Tether-style IDE integration.

The bundled VSCode extension ships on the marketplace:

code --install-extension lumiere-ventures.bot-relay-tether

Or browse marketplace.visualstudio.com/items?itemName=lumiere-ventures.bot-relay-tether. Source for the extension lives at extensions/vscode/ in this repo for users who want to build from source. The extension surfaces pending count + last-message recency in the status bar, opens a webview with the last message preview on click, and optionally auto-types inbox into the integrated terminal so Claude Code wakes up. See extensions/vscode/README.md for install + config and docs/tether-roadmap.md for the free-vs-paid scope line.

Tools

Identity

| Tool | Inputs | Description | |------|--------|-------------| | register_agent | name, role, capabilities[] | Register this terminal as a named agent. Uses upsert — safe to call multiple times. | | unregister_agent | name | Remove an agent from the relay. Idempotent. Fires agent.unregistered webhook on success. | | discover_agents | role (optional) | List all registered agents with status (online/stale/offline). | | spawn_agent | name, role, capabilities, cwd?, initial_message? | Spawn a new Claude Code terminal pre-configured as a relay agent. Cross-platform (v1.9): macOS (iTerm2/Terminal.app), Linux (gnome-terminal/konsole/xterm/tmux fallback chain — tmux covers headless servers), Windows (wt.exe/powershell.exe/cmd.exe). See docs/cross-platform-spawn.md. |

Messaging

| Tool | Inputs | Description | |------|--------|-------------| | send_message | from, to, content, priority | Send a direct message to another agent by name. | | get_messages | agent_name, status, limit | Check your mailbox. Pending messages are auto-marked as read. | | broadcast | from, content, role (optional) | Send a message to all registered agents (or filter by role). |

Tasks

| Tool | Inputs | Description | |------|--------|-------------| | post_task | from, to, title, description, priority | Assign a task to another agent. | | post_task_auto (v2.0) | from, title, description, required_capabilities[], priority | Auto-route to the least-loaded agent whose capabilities match ALL required. Queues if no match; assigns on the next capable registration. | | update_task | task_id, agent_name, action, result? | Actions: accept / complete / reject / cancel (v2.0, requester-only) / heartbeat (v2.0, renews lease). State machine + CAS enforced. | | get_tasks | agent_name, role, status, limit | Query your task queue (assigned to you or posted by you). | | get_task | task_id | Get a single task by ID with full details. |

Channels (v2.0)

| Tool | Inputs | Description | |------|--------|-------------| | create_channel | name, description?, creator | Create a named channel for multi-agent coordination. Requires channels capability. | | join_channel | channel_name, agent_name | Join any public channel. | | leave_channel | channel_name, agent_name | Leave a channel. | | post_to_channel | channel_name, from, content, priority | Post to a channel you are a member of. | | get_channel_messages | channel_name, agent_name, limit, since? | Read messages posted to a channel since your join time. |

Status + Health (v2.0)

| Tool | Inputs | Description | |------|--------|-------------| | set_status | agent_name, status | Signal online / busy / away / offline. busy/away exempt you from health-monitor task reassignment. | | health_check | (none) | Report relay version, uptime, and live counts (agents, messages, tasks, channels). No auth required. |

Webhooks (v1.2+)

| Tool | Inputs | Description | |------|--------|-------------| | register_webhook | url, event, filter, secret | Subscribe to relay events via HTTP POST. | | list_webhooks | (none) | List all registered webhook subscriptions. | | delete_webhook | webhook_id | Remove a webhook subscription. |

Supported events: message.sent, message.broadcast, task.posted, task.accepted, task.completed, task.rejected, task.cancelled (v2.0), task.auto_routed (v2.0), task.health_reassigned (v2.0), channel.message_posted (v2.0), agent.unregistered, agent.spawned, webhook.delivery_failed, * (all).

v2.0 — retry with backoff. Failed webhook deliveries retry at 60s / 300s / 900s (3 attempts). CAS-claimed per row — no double delivery. Piggybacks on webhook-firing tool calls, no background thread.

When secret is provided, each delivery includes an X-Relay-Signature: sha256=... HMAC header. Filter optionally restricts firing to events where from_agent or to_agent matches.

Example: Task Delegation

Terminal A — Orchestrator:

1. register_agent("orchestrator", "planner", ["delegation", "review"])
2. discover_agents() → sees "worker" is online
3. post_task(from: "orchestrator", to: "worker",
     title: "Write auth tests",
     description: "Cover login, logout, token refresh. Use vitest.",
     priority: "high")
4. send_message(from: "orchestrator", to: "worker",
     content: "Task posted — check your queue.")

Terminal B — Worker:

1. register_agent("worker", "builder", ["testing", "backend"])
2. get_messages("worker") → message from orchestrator
3. get_tasks("worker", role: "assigned", status: "posted") → auth test task
4. update_task(task_id, "worker", "accept")
5. ... does the work ...
6. update_task(task_id, "worker", "complete", result: "12 tests passing")
7. send_message(from: "worker", to: "orchestrator",
     content: "Auth tests done. All passing.")

Terminal A checks results:

1. get_messages("orchestrator") → "Auth tests done."
2. get_task(task_id) → status: completed, result: "12 tests passing"

How It Works

Every Claude Code terminal spawns its own MCP server process via stdio. All processes read and write the same SQLite file at ~/.bot-relay/relay.db. SQLite WAL mode handles concurrent access safely. Messages older than 7 days and completed tasks older than 30 days are purged automatically on startup.

Unified `relay` CLI (v2.1)

One entry, six subcommands: doctor / init / test / generate-hooks / backup / restore. First-run setup:

relay init          # interactive
relay init --yes    # defaults + random HTTP secret

relay doctor runs a diagnostic sweep; relay test runs a minimal self-check against a throwaway relay; relay generate-hooks emits Claude Code hook JSON for ~/.claude/settings.json. Full reference in docs/cli.md. The standalone bin/relay-backup + bin/relay-restore from Phase 2c have been absorbed into relay backup + relay restore.

Token lifecycle (v2.1)

Two new tools for credential hygiene: rotate_token lets an agent swap its own token with history preserved; revoke_token lets an admin-capable agent nullify another agent's token_hash (target re-bootstraps via the Phase 2b migration path). New admin capability is never auto-granted — register admin agents explicitly. Full operator runbook in docs/token-lifecycle.md.

Error codes (v2.1)

Every tool error response carries a stable error_code token alongside the free-form error string. Branch on the code; never string-match the message. Full catalog + stability guarantee in docs/error-codes.md. Source of truth: src/error-codes.ts.

Protocol version (v2.1)

Beyond the package version string, the relay surfaces a protocol_version via register_agent + health_check responses. Clients should key compatibility on protocol_version (bumps only on tool-surface changes) rather than the package version (bumps on every ship). See docs/protocol-version.md for SemVer rules + a client-side compatibility snippet.

HTTP Mode — for n8n, Slack, Telegram, custom scripts

Run the relay as an HTTP daemon and any HTTP client can drive it:

RELAY_TRANSPORT=http RELAY_HTTP_SECRET=your-shared-secret node dist/index.js
# Listens on http://127.0.0.1:3777

Production deployment — set RELAY_HTTP_SECRET. v2.1 refuses to start on a non-loopback host (0.0.0.0, a public IP, Docker -p 3777:3777 without loopback pinning) unless RELAY_HTTP_SECRET is set. Loopback binds (127.0.0.1, ::1, localhost) stay zero-config for local development.
Dev-only escape hatch: RELAY_ALLOW_OPEN_PUBLIC=1 lets the relay start anyway on a public host without a secret — useful for throwaway local Docker nets, but logs a loud warning every startup. Never use in production.

Endpoints:

POST /mcp — JSON-RPC (the MCP protocol over HTTP+SSE). Requires Authorization: Bearer <secret> if http_secret is configured.
GET /health — server status (always open, no auth)
GET / — built-in dashboard (live view of agents, messages, tasks, webhooks). v2.1 Phase 4d: protected by Host-header allowlist (DNS-rebinding defense) + auth gate (RELAY_DASHBOARD_SECRET or RELAY_HTTP_SECRET fallback). Loopback binds allow no-secret access for dev; non-loopback binds require a secret. Full policy in docs/dashboard-security.md.
GET /api/snapshot — JSON snapshot of relay state (same gates as /)

Three transport modes:

stdio (default) — per-terminal, for AI coding agents
http — daemon, for external systems
both — HTTP daemon plus a stdio connection (useful for bridge scripts)

All transports share the same SQLite database. Stdio agents and HTTP clients see the same world.

Process-boundary reminder (v2.1.3): stdio MCP clients and the HTTP daemon are separate processes. Each Claude Code terminal with "type":"stdio" in ~/.claude.json spawns its own node dist/index.js child. Restarting the :3777 HTTP daemon never affects stdio clients — their own child processes are untouched. Operator /mcp reconnect is only needed after restart for "type":"http" MCP clients pointed at the daemon URL. See docs/transport-architecture.md for the full topology + post-restart operator checklist.

n8n integration example

Trigger a Claude Code agent from an n8n workflow:

POST http://127.0.0.1:3777/mcp
Authorization: Bearer your-shared-secret
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "post_task",
    "arguments": {
      "from": "n8n-workflow-42",
      "to": "builder",
      "title": "Process new lead",
      "description": "Lead data: ...",
      "priority": "high"
    }
  }
}

Then register a webhook so n8n hears about completion:

{
  "jsonrpc": "2.0", "id": 2, "method": "tools/call",
  "params": {
    "name": "register_webhook",
    "arguments": {
      "url": "https://your-n8n.example.com/webhook/abc",
      "event": "task.completed",
      "secret": "shared-with-n8n"
    }
  }
}

When a task completes, n8n receives a POST with the result and an HMAC signature in X-Relay-Signature.

Config File (v1.2+)

Optional ~/.bot-relay/config.json:

{
  "transport": "http",
  "http_port": 3777,
  "http_host": "127.0.0.1",
  "webhook_timeout_ms": 5000,
  "http_secret": null,
  "trusted_proxies": []
}

Env vars override file config: RELAY_TRANSPORT, RELAY_HTTP_PORT, RELAY_HTTP_HOST, RELAY_HTTP_SECRET, RELAY_TRUSTED_PROXIES (comma-separated CIDRs).

Trusted Proxies and X-Forwarded-For (v1.6.2)

By default, the relay IGNORES the X-Forwarded-For header completely. Rate limits are keyed on the direct socket peer IP only. This prevents a caller from sending a spoofed header to get their own rate-limit bucket.

If you front the relay with Cloudflare, nginx, or any other reverse proxy, configure trusted_proxies with CIDRs of those proxies:

{
  "trusted_proxies": ["127.0.0.0/8", "::1/128", "10.0.0.0/8"]
}

Or via env var:

RELAY_TRUSTED_PROXIES="127.0.0.0/8,::1/128,10.0.0.0/8"

When the direct peer IP falls in the trusted list, the relay walks the X-Forwarded-For chain right-to-left, skipping trusted hops, and uses the leftmost-untrusted hop as the real client IP. This matches RFC 7239 §7.4 and how nginx/Express normally handle this.

Per-Agent Tokens (v1.7)

Every tool call (other than first-time register_agent and /health) requires an agent token. The token identifies WHO is calling — separate from and stronger than the shared HTTP secret, which only identifies a trusted network.

Issuing a token — first registration:

# The response returns `agent_token` ONCE. Save it.
{
  "jsonrpc": "2.0", "id": 1, "method": "tools/call",
  "params": {
    "name": "register_agent",
    "arguments": { "name": "builder", "role": "builder", "capabilities": ["tasks"] }
  }
}

The server stores only a bcrypt hash. The raw token is surfaced once in the response agent_token field and echoed to stderr as [auth] New agent_token issued for "builder". Save it: RELAY_AGENT_TOKEN=.... If you lose it, unregister_agent (auth'd) then re-register.

Presenting the token on every subsequent call — three ways:

Arg field (works for stdio + HTTP):

{ "name": "send_message", "arguments": { "from": "builder", "to": "ops", "content": "hi", "agent_token": "..." } }

HTTP header:
```
X-Agent-Token: <token>
```
Env var (stdio flow, also picked up by HTTP client wrappers):
```
export RELAY_AGENT_TOKEN=<token>
```

Capabilities are set at first registration and are immutable (v1.7.1). To change an agent's capability set, call unregister_agent (with its token) then register_agent with the new capability list. Re-register attempts that change caps are ignored with a capabilities_note in the response.

Capability catalog:

spawn — required for spawn_agent
tasks — required for post_task, update_task
webhooks — required for register_webhook, list_webhooks, delete_webhook
broadcast — required for broadcast
All other tools are always allowed for any authenticated agent (no capability check).

Migration for pre-v1.7 agents (v2.1+): agents registered before v1.7 have no token hash. A register_agent call against such a row self-migrates — the relay detects the null hash, issues a fresh token, and the agent is first-class from that point on. No RELAY_ALLOW_LEGACY=1 required for the migration call itself. RELAY_ALLOW_LEGACY is still available as a coarser escape hatch for non-register tool calls against unmigrated legacy rows (e.g., if you want send_message to work before an agent has migrated); turn it OFF once all your agents have migrated.

Encryption at Rest (v1.7 opt-in; keyring + rotation in v2.1 Phase 4b.3)

Set the keyring to encrypt message/task/audit/webhook content fields in the SQLite database with AES-256-GCM. Three configuration sources (pick exactly one — multi-set is rejected at startup):

# 1. Inline JSON (for CI / secrets managers)
export RELAY_ENCRYPTION_KEYRING='{"current":"k1","keys":{"k1":"<base64-32>"}}'

# 2. File path (operator-friendly; chmod 600)
export RELAY_ENCRYPTION_KEYRING_PATH=~/.bot-relay/keyring.json

# 3. Legacy single-key (auto-wraps to { current: "k1", keys: { k1: <value> } }; deprecation warning at startup)
export RELAY_ENCRYPTION_KEY="<base64-32>"

# Generate a key:
openssl rand -base64 32
# or:
node -e 'console.log(require("crypto").randomBytes(32).toString("base64"))'

When the keyring is set, the relay transparently encrypts on write (with current key) and decrypts on read (with any key in the keyring). Every ciphertext carries an enc:<key_id>:... prefix so rows are self-describing. Legacy enc1:... rows (pre-Phase-4b.3 deployments) decrypt via RELAY_ENCRYPTION_LEGACY_KEY_ID (default k1).

Rotating keys (online)

Full runbook at docs/key-rotation.md. In summary:

Add the new key to the keyring while keeping the old one (both decrypt; current still points to old).
Flip current to the new key; restart. New writes use the new key.
relay re-encrypt --from old_key_id --to new_key_id --yes — scans + migrates all existing rows across 5 encrypted columns. Resumable.
relay re-encrypt --verify-clean old_key_id — exit 0 = safe to retire.
Remove the old key from the keyring; restart.

Without the keyring set, content is stored plaintext (default, convenient for local dev).

Rotation Guide — HTTP Shared Secret (v1.7)

The RELAY_HTTP_SECRET shared secret can be rotated without downtime using a grace window:

Step 1 — promote the new secret as primary, keep the old as previous:

RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_HTTP_SECRET_PREVIOUS="old-secret-v1" \
RELAY_TRANSPORT=http node dist/index.js

During this window, BOTH secrets are accepted. Requests using the old secret receive an X-Relay-Secret-Deprecated: true response header as a signal to upgrade.

Step 2 — update every client to present new-secret-v2 in their Authorization: Bearer … or X-Relay-Secret header.

Step 3 — watch for the deprecation header on your dashboard/logs until no more requests use the old secret.

Step 4 — drop the old secret:

RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_TRANSPORT=http node dist/index.js    # RELAY_HTTP_SECRET_PREVIOUS unset

Multiple previous secrets are supported as a comma-separated list:

RELAY_HTTP_SECRET_PREVIOUS="v1-secret,v0-secret"

Secret comparisons are timing-safe (v1.7.1 — crypto.timingSafeEqual), so an attacker cannot recover the secret via byte-by-byte response-timing measurement.

Multi-machine: centralized deployment (v2.1)

bot-relay-mcp is LLM-agnostic, CLI-agnostic, and deployment-flexible. Pick the path that fits your setup:

Path A — Single-machine (default). Stdio transport, per-terminal process, zero infrastructure. Best for solo development on one laptop. No secrets, no reverse proxies, no ops — run npm install + add the stdio entry to your MCP client config and you're done. Covered throughout this README.

Path B — Multi-machine (centralized, v2.1 Phase 7r). One bot-relay-mcp hub on a VPS, multiple thin MCP clients connecting via HTTP. Agents on different machines can send_message, post tasks, subscribe to webhooks, and join channels through shared state. No new architecture — just the HTTP transport we've had since v1.2, packaged with a convenience CLI in v2.1.

When to pick centralized

Two or more machines in play (dev laptop + CI, work + personal, family devices)
AI agents running on different hosts that need to coordinate
Team environments where multiple people connect their MCP clients to a shared relay

Quick pair flow

On the hub (VPS, reachable at e.g. https://relay.example.com): install bot-relay-mcp, run under systemd with RELAY_TRANSPORT=http + RELAY_HTTP_SECRET, terminate TLS with Caddy/nginx. See docs/multi-machine-deployment.md for the worked VPS runbook.

On each client machine:

relay pair https://relay.example.com \
  --name "$(whoami)-$(hostname -s)" \
  --role operator \
  --capabilities spawn,tasks,webhooks,broadcast,channels \
  --secret "$RELAY_HTTP_SECRET"

relay pair probes the hub, registers this machine as an agent, captures the returned one-time agent_token, and emits an MCP client config snippet ready to paste into ~/.claude.json / ~/.cursor/mcp.json / etc. Persist the token (export RELAY_AGENT_TOKEN=… in your shell rc) so hooks can authenticate on every terminal open.

Verify after pairing:

relay doctor --remote https://relay.example.com

Expected: PASS on reachability + protocol compatibility + token auth + hub auth config.

Trust-model tradeoffs

Hub operator can read plaintext messages in RAM (even with RELAY_ENCRYPTION_KEY set, decryption happens server-side for routing)
Hub is a single point of failure for cross-machine coordination
Recommended for trusted groups (families, small teams, personal multi-machine setups)
NOT recommended for mutually distrustful parties sharing a single hub, or compliance-bound workloads where in-RAM access by the operator is a policy violation

See SECURITY.md §Centralized deployment trust model for the full posture + incident response playbook.

Bridge to other tools via MCP

bot-relay-mcp is MCP-compatible, so your MCP client can connect to both bot-relay-mcp AND other MCP servers (Slack, Discord, Matrix, email, etc.) simultaneously. That's an operator deployment choice — we don't integrate those into bot-relay-mcp. See docs/multi-machine-deployment.md §3 for the pattern.

Zero-Friction Setup

To skip approval prompts for relay tools, add this to your project's .claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__bot-relay__register_agent",
      "mcp__bot-relay__discover_agents",
      "mcp__bot-relay__send_message",
      "mcp__bot-relay__get_messages",
      "mcp__bot-relay__broadcast",
      "mcp__bot-relay__post_task",
      "mcp__bot-relay__update_task",
      "mcp__bot-relay__get_tasks",
      "mcp__bot-relay__get_task",
      "mcp__bot-relay__register_webhook",
      "mcp__bot-relay__list_webhooks",
      "mcp__bot-relay__delete_webhook"
    ]
  }
}

Auto-Check on Session Start

Add a SessionStart hook so every terminal automatically checks the relay for pending messages when it opens. See docs/hooks.md for the full configuration.

Near-Real-Time Mail Delivery (v1.8)

The SessionStart hook only fires when a terminal opens. If an agent is actively working and mail arrives mid-session, it does not see the message until next startup (or a human pastes it in).

v1.8 adds a PostToolUse hook — hooks/post-tool-use-check.sh — that fires after every tool call, checks the mailbox, and injects pending messages as additionalContext so the running session picks them up immediately.

Install per-project (NOT global), in <project>/.claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/bot-relay-mcp/hooks/post-tool-use-check.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Important — if your path contains spaces, single-quote it inside the JSON string. Claude Code passes the command to the shell, which splits on whitespace. Without the single quotes the hook silently fails with errors like /bin/sh: ... is a directory. Example for an installation at /path/to/My Projects/bot-relay-mcp/:
"command": "'/path/to/My Projects/bot-relay-mcp/hooks/post-tool-use-check.sh'"
The outer double-quotes are JSON; the inner single-quotes are shell. Paths with no spaces do not need this treatment.

The hook prefers the HTTP path when RELAY_AGENT_TOKEN is set and the daemon is running (full auth + audit), falling back to direct sqlite on RELAY_DB_PATH otherwise. It does NOT re-register (SessionStart handles that), does NOT check tasks (simpler focus, less context pressure), and silent-exits when there is no mail. Full docs + troubleshooting in docs/post-tool-use-hook.md.

Honest limitation: idle terminals get no delivery. The hook only fires when the agent is actively running tool calls. For long-idle windows, still rely on SessionStart + human attention.

Turn-End Mail Delivery (v2.1)

PostToolUse only fires on turns that include at least one tool call. A text-only turn (Claude responds with no tool invocation) does not trigger it. The Stop hook — hooks/stop-check.sh — closes that gap by firing on every turn-end, whether or not the turn invoked tools. Install both together in your project's .claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/bot-relay-mcp/hooks/stop-check.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Same single-quote-the-path-if-it-contains-spaces rule, same env vars, same HTTP/sqlite fallback, same silent-fail contract as PostToolUse. Full docs + troubleshooting in docs/stop-hook.md.

Honest limitation: Stop does NOT wake truly idle terminals. If no turn is in progress, neither hook fires. For long-idle windows, use the Layer 2 Managed Agent reference (examples/managed-agent-reference/).

Backup & Restore (v2.1)

Two CLIs for disaster recovery:

relay-backup                              # snapshot to ~/.bot-relay/backups/
relay-backup --output /srv/backup.tgz     # custom path
relay-restore ~/.bot-relay/backups/relay-backup-<iso>.tar.gz

relay-backup produces a tar.gz of the live DB (via a consistent VACUUM INTO snapshot — safe while the daemon is running), the optional config.json, and a manifest.json with schema version and row counts. Works identically on the native better-sqlite3 driver and the optional sql.js wasm driver.

relay-restore always safety-backs-up the current DB first (to ~/.bot-relay/backups/pre-restore-<iso>.tar.gz). If that safety backup fails, the restore aborts untouched. It then refuses if the daemon appears to be running (/health probe, best-effort), refuses schema-version mismatches (higher = hard refuse, lower = --force overrides), runs PRAGMA integrity_check on the extracted DB, and finally atomic-swaps the new DB into place.

Full docs + troubleshooting in docs/backup-restore.md.

Lost-Token Recovery (v2.1)

Close a terminal, lose RELAY_AGENT_TOKEN, and the relay rejects your register_agent with AUTH_FAILED because the row is intact. Clear the registration so the agent can re-bootstrap:

relay recover <agent-name>                 # interactive confirm
relay recover <agent-name> --yes           # skip confirm (for scripts)
relay recover <agent-name> --dry-run       # show what would change, commit nothing
relay recover <agent-name> --db-path PATH  # non-default DB location

Messages and tasks addressed to the agent are preserved — only the agents + agent_capabilities rows are cleared. After recovery, the operator calls register_agent with the same name/role/capabilities and captures a fresh agent_token.

Trust model: filesystem access to ~/.bot-relay/relay.db IS the authority (same boundary the daemon relies on). Not an MCP tool — the caller by definition cannot authenticate. The CLI emits an audit_log entry with tool='recovery.cli' + the operator's OS username for incident traceability.

External-CLI Token Mint (v2.6)

Some LLM CLI clients (Codex was the canonical case) have safety monitors that pattern-match register_agent followed immediately by use returned token as a credential handoff and cancel the second tool call. The result: the agent row exists on the relay, but the agent process never captured its token, and every subsequent re-register attempt fails with NAME_COLLISION_ACTIVE because the orphan session is still considered live.

relay mint-token solves this by minting the token outside the agent's process. The operator hands the plaintext token to the agent's environment via RELAY_AGENT_TOKEN, and the agent authenticates on its first MCP call without ever invoking register_agent:

relay mint-token codex --role builder --capabilities build,test,audit

Output (token shown ONCE):

✓ Minted token for new agent "codex"

Token (shown ONCE — store it now):

  <43-char-base64url-token>

Set in your CLI's environment before launching:

  export RELAY_AGENT_NAME=codex
  export RELAY_AGENT_TOKEN=<43-char-base64url-token>

Other forms:

relay mint-token NAME --json                          # structured output for scripts
relay mint-token EXISTING --force                     # rotate token; caps + role preserved
relay mint-token NAME --description "human-readable"  # discoverable in dashboard

Caps are locked at first mint (caps are immutable after first registration); --force rotates the token but preserves both caps and role. The CLI emits a daemon-running advisory to stderr and writes an audit_log entry (tool='agent.token_minted') with the operator's OS username, the --force flag state, and whether the row was created or rotated. Full setup walkthrough including platform-specific token-storage best practices: docs/agents/external-cli-setup.md.

Cross-Platform Spawn (v1.9)

spawn_agent opens a new Claude Code terminal on macOS, Linux, and Windows via a driver abstraction:

macOS — bin/spawn-agent.sh (iTerm2 → Terminal.app). Unchanged from v1.6.4, preserves the 3-layer hardening + 19-payload adversarial test suite.
Linux — gnome-terminal → konsole → xterm → tmux fallback chain. The tmux fallback creates a detached session (attach later with tmux attach -t <agent-name>) — covers headless servers with no GUI.
Windows — wt.exe (Windows Terminal) → powershell.exe → cmd.exe.

Driver selection: RELAY_TERMINAL_APP override (allowlist-gated) > process.platform auto-detect > in-driver fallback chain.

Full install requirements per platform + manual smoke-test checklists + troubleshooting: docs/cross-platform-spawn.md.

Env-var propagation is minimal by default (principle of least authority): system essentials + anything prefixed RELAY_*. Secrets like AWS_SECRET_ACCESS_KEY are NOT passed to spawned agents unless explicitly prefixed.

Plug-and-play defaults (v2.1.2): spawned terminals are configured for autonomous work out of the box — they auto-pull mail from their inbox on first turn instead of idling, run with --permission-mode bypassPermissions so they don't ask the operator to approve every tool call, get an iTerm2 / session-picker title set to the agent name, and run at --effort high to cap token spend (parent terminals doing strategic work may use xhigh; spawned children doing mechanical work shouldn't inherit it). Every default has an env override for the rare case the legacy behavior is wanted:

| Default | Env override | Notes | |---|---|---| | Kickstart prompt sent as positional arg | RELAY_SPAWN_KICKSTART="custom" / RELAY_SPAWN_NO_KICKSTART=1 | The default tells the spawned agent to pull get_messages and act on inbox. | | --permission-mode bypassPermissions | RELAY_SPAWN_PERMISSION_MODE=<mode> | Allowlist: acceptEdits, auto, bypassPermissions, default, dontAsk, plan. | | --name <agent> | RELAY_SPAWN_DISPLAY_NAME="custom" | Shows up as the iTerm2 tab + Claude Code session title. | | --effort high | RELAY_SPAWN_EFFORT=<level> | Allowlist: low, medium, high, xhigh, max. |

Layer 2: Managed Agents (v1.10)

Agents that are NOT Claude Code terminals — Python daemons, Node workers, Ollama/vLLM integrations, custom scripts. They connect to the relay via HTTP (recommended) or direct SQLite, use the same 34 MCP tools (v2.12), and authenticate with per-agent tokens. If registered with managed:true, they also receive token-rotation push-messages over the normal get_messages channel — see docs/managed-agent-protocol.md.

Full integration guide with mental model, auth flow, lifecycle, error patterns, and security notes: docs/managed-agent-integration.md.

Runnable reference implementations (stdlib-only, ~200 LOC each):

Python: examples/managed-agent-reference/python/agent.py
Node: examples/managed-agent-reference/node/agent.js

Both demonstrate: register, send/receive messages, accept + complete tasks, discover peers, SIGINT cleanup. Each has a SMOKE.md with a 5-step manual verification checklist.

SQLite Driver Options (v1.11)

The relay uses SQLite for persistent state. Two drivers are available:

native (default) — better-sqlite3, a compiled C addon. Fast, supports WAL mode, multi-process safe. Requires a C++ compiler at npm install time.
wasm — sql.js, SQLite compiled to WebAssembly. Zero native compilation. Slightly slower writes (in-memory + write-back-to-file). Single-process only (not safe for multi-terminal stdio).

Switch with one env var:

npm install sql.js                    # one-time install of the optional dep
RELAY_SQLITE_DRIVER=wasm node dist/index.js

Both drivers read the same relay.db file format. Full details, performance notes, and limitations: docs/sqlite-wasm-driver.md.

Roadmap

v1.1: Local relay, 9 tools, SQLite, auto-purge
v1.2: HTTP transport, webhook system, config file — 12 tools
v1.3: Presence integrity, unregister_agent, hook delivers mail — 13 tools
v1.4: spawn_agent + role templates + dashboard — 14 tools
v1.5: Built-in security — Bearer auth, audit log, rate limiting
v1.6: Hardening pass — SSRF, input validation, path traversal, stdout discipline
v1.7: Per-agent tokens, secret rotation, at-rest encryption, capability scoping
v1.8: Near-real-time mail via PostToolUse hook
v1.9: Cross-platform spawn (macOS / Linux / Windows / tmux) — Node/TS driver abstraction
v1.10: Layer 2 Managed Agents — reference Python + Node workers
v1.11: SQLite WASM driver (sql.js opt-in) — zero native compilation on Windows/Alpine/Docker/CI
v2.0: Plug-and-play — channels, smart routing (post_task_auto), task leases + heartbeat, lazy health monitor, session-aware reads, busy/DND, health_check, webhook retry with CAS, payload size limits, config validation, auto-unregister, dead-agent purge, debug mode. 22 tools.
v2.1: Architectural completion — explicit auth_state machine, managed-agent rotation grace, versioned ciphertext + keyring with online rotation (relay re-encrypt), lost-token recovery CLI (relay recover), admin-initiated cross-agent rotation (rotate_token_admin), structured error_code catalog, protocol_version surface, Phase 4p webhook-secret encryption, Phase 4b.1 v2 revoke/recovery redesign. 25 tools. 14 of 14 Codex architectural findings closed.
v2.7: Tether-ready cross-process inbox notifications. Durable inbox_events outbox table (schema v11→v12, idempotent migration); HTTP daemon polls + dispatches to relay://inbox/<agent> subscribers regardless of which OS process wrote the message. Reaper now skips sessions with active SSE GET streams (openGetStreams count). Daemon emits :keepalive\n\n SSE comment frames every RELAY_SSE_KEEPALIVE_MS (default 20s) so Electron-based clients (VS Code Tether extension) don't idle-disconnect at ~2.5min. External-review-flagged get_messages filter-after-mark P1 fix — since filter now applies in SQL BEFORE the read-mark mutation. 30 tools. Pairs with Tether VSCode extension v0.1.2+ on the marketplace.
v2.12 (current): Hands-free, LLM-agnostic autowake. The multi-agent Tether watch-all wakes Claude and Codex agents on relay mail and auto-submits, no manual polling (dogfood-proven: the relay now coordinates its own development). Plus the pending-vs-history resolved plane (resolve_messages + an ack flag, so handled mail stops re-flooding fresh sessions; tool #34) and a central schema-scoped auth fix closing a stray-field impersonation hole. 34 tools.
Beyond: cross-machine federation (hub/edge) so multiple self-hosted relays mesh without a shared namespace, plus an App Server push path for low-latency remote wake.

Dashboard

When running in http or both mode, open http://127.0.0.1:3777/ in a browser. You'll see:

Live agent presence (online / stale / offline)
Active tasks with priority and assignment
Recent messages
Registered webhooks
Recently completed tasks

Auto-refreshes every 3 seconds. Useful for "what's happening across all my terminals right now?" at a glance.

Role Templates

See roles/ for drop-in role specs. Examples:

planner.md — orchestrator that delegates and synthesizes
builder.md — worker that accepts and completes tasks
reviewer.md — skeptical reviewer with structured output
researcher.md — investigates questions, returns findings

Three ways to apply a role: paste into project CLAUDE.md, pass as initial_message when spawning, or wire via shell alias.

Requirements

Node.js 18+
Claude Code (or any MCP-compatible client)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

bot-relay-mcp

What is this?

Why not just use Claude Agent Teams?

Quick Start (30 seconds)

Quick Start (from source)

One-command setup (relay init)

Tether (v2.5 + v2.7 cross-process delivery)

Tools

Identity

Messaging

Tasks

Channels (v2.0)

Status + Health (v2.0)

Webhooks (v1.2+)

Example: Task Delegation

How It Works

Unified relay CLI (v2.1)

Token lifecycle (v2.1)

Error codes (v2.1)

Protocol version (v2.1)

HTTP Mode — for n8n, Slack, Telegram, custom scripts

n8n integration example

Config File (v1.2+)

Trusted Proxies and X-Forwarded-For (v1.6.2)

Per-Agent Tokens (v1.7)

Encryption at Rest (v1.7 opt-in; keyring + rotation in v2.1 Phase 4b.3)

Rotating keys (online)

Rotation Guide — HTTP Shared Secret (v1.7)

Multi-machine: centralized deployment (v2.1)

When to pick centralized

Quick pair flow

Trust-model tradeoffs

Bridge to other tools via MCP

Zero-Friction Setup

Auto-Check on Session Start

Near-Real-Time Mail Delivery (v1.8)

Turn-End Mail Delivery (v2.1)

Backup & Restore (v2.1)

Lost-Token Recovery (v2.1)

External-CLI Token Mint (v2.6)

Cross-Platform Spawn (v1.9)

Layer 2: Managed Agents (v1.10)

SQLite Driver Options (v1.11)

Roadmap

Dashboard

Role Templates

Requirements

License

One-command setup (`relay init`)

Unified `relay` CLI (v2.1)