mcp-coordinator
v0.10.5
Published
Embedded MQTT broker + MCP server for multi-agent coordination
Maintainers
Readme
mcp-coordinator
Embedded MQTT broker + MCP server for multi-agent coordination. Zero conflicts, everyone aligned.
Latest: v0.10.4. Optional OAuth 2.1 + RFC 8628 device flow + 4 IdP providers (GitHub OAuth App, GitHub App, Google, generic OIDC) + 4 allowlist strategies + SHA-256 audit log hash chain (SOC 2 tamper-evidence) + rotate-jwt-secret CLI. Single-user / dev local mode requires zero configuration — run mcp-coordinator server start and you're done.
Getting started · Problem · How It Works · MQTT Layer · Scoring · MCP Tools · CLI · Standalone use · Quota · Dashboard · Config · Auth · SDK
The Problem
When multiple developers each use an AI coding agent in parallel on the same repo, things break:
- Regressions — Agent A rewrites a module that Agent B was depending on
- Duplicated work — Two agents implement the same feature from different directions
- Architectural drift — Agents make local decisions that conflict with each other's designs
- Wasted reconciliation time — Developers spend hours untangling what the agents did
Each agent works in isolation. None of them know what the others are doing.
mcp-coordinator fixes this by giving agents a shared nervous system over MQTT — they announce intentions before coding, conflicts are detected before a single line is written, and agents see each other's actions in real-time to agree on an approach.
It works with or without an orchestrator on top. Use it standalone with any MCP client (Claude Code, Cursor, Cline, Aider) — see Standalone use. Or pair it with essaim when you want pre-composed agent profiles, work-stealing templates, and a behavior catalog.
Getting started
# 1. Install
npm install -g mcp-coordinator
# 2. First-time setup — creates ~/.mcp-coordinator/, writes a default config,
# and prints a .mcp.json snippet for your MCP client.
mcp-coordinator init
# 3. Start the server (foreground or --daemon for background)
mcp-coordinator server start --daemon
# 4. Verify
mcp-coordinator server status
mcp-coordinator dashboard # opens http://localhost:3100/dashboardStep 2 is idempotent — re-running init won't overwrite an existing config. The snippet it prints goes into your MCP client's config (e.g., ~/.claude/.mcp.json for Claude Code). If you'd rather not copy-paste, run mcp-coordinator init --write-mcp-config <project-path> and the snippet is written to <project-path>/.mcp.json (merging if the file already exists).
After step 4, every Claude Code (or other MCP-compatible) session connected to this coordinator can call all 26 tools (register_agent, announce_work, post_to_thread, coordinator_status, ...). For the full multi-Claude or team setup, see Standalone use.
How It Works
Agent A Agent B
│ │
│ announce_work │ announce_work
▼ ▼
┌──────────────┐ ┌──────────────┐
│ MCP client │ ◄── MQTT ────► │ MCP client │
│ (any vendor) │ push-based │ (any vendor) │
└──────┬───────┘ └──────┬───────┘
│ MCP HTTP / SSE │
└──────────────┬────────────────┘
│
┌─────────▼──────────┐
│ mcp-coordinator │
│ 26 MCP tools + DB │
│ Aedes MQTT broker │
└─────────┬──────────┘
│ SSE
┌─────────▼──────────┐
│ Dashboard │
│ live events/quota │
└────────────────────┘The consultation cycle has four steps:
- Announce — A client calls
announce_workwith target files,depends_on_files, and target modules before coding. - Detect — The coordinator scores impact against all online agents and opens a thread if a score ≥ 90 matches.
- Consult — MQTT pushes the new thread to every affected agent. Each agent posts context, constraints, or proposes a resolution.
- Resolve — Agents approve, contest, or propose again. The thread closes when consensus is reached, or auto-resolves after timeout / in gray zones.
The server is client-agnostic: any MCP-compatible agent (Claude Code, Cursor, Cline, Aider, custom scripts) can connect over HTTP/SSE or stdio.
MQTT Communication Layer
The coordinator ships with an embedded Aedes MQTT broker. Agents subscribe once and receive every coordination event in real-time — no polling, no extra infrastructure.
Broker
| Transport | Port | Use case |
|-----------|------|----------|
| TCP | 1883 (bind 127.0.0.1 by default) | Local / LAN agents, best latency |
| WebSocket | /mqtt on the coordinator HTTP port (default 3100) | Bun binary, remote agents, firewall-friendly |
One coordinator = one broker. Nothing external to install.
Topic map
Every coordinator event is published on a well-known topic. Clients subscribe to the full set on connect.
| Topic | Emitted when | Payload highlights |
|-------|--------------|--------------------|
| coordinator/consultations/new | A thread is opened | thread_id, subject, initiator_id, target_modules, target_files |
| coordinator/consultations/{id}/messages | Anyone posts to a thread | agent_id, name, content, type (warning/context/proposal) |
| coordinator/consultations/{id}/status | Thread transitions state | status ∈ open / resolving / resolved / timeout |
| coordinator/consultations/{id}/claimed | An agent atomically claims a task (work-stealing) | claimed_by, thread_id |
| coordinator/consultations/{id}/completed | Claimed task finishes | agent_id, thread_id, resolution |
| coordinator/agents/{id}/status | Agent goes online / offline | status, name, modules |
| coordinator/broadcast | System-wide announcements | arbitrary JSON |
| coordinator/quota/update | Anthropic quota refresh | usage, limit, utilization_pct |
Push delivery flow
COORDINATOR BROKER (Aedes) CLIENT
─────────── ────────────── ──────
announce_work() ──────────► publish subscribe
coordinator/ ─► event
consultations/new ─────────► classify topic
self-msg filter
─► handlerKey guarantees:
- Self-filter — clients drop messages where
payload.agent_idequals the local agent's id, so agents never wake on their own actions. - Bun compatibility — when consumed from a Bun-compiled client, a Duplex stream bridges the
mqttclient to the native WebSocket API (thewspackage receiver doesn't work under Bun). - Backpressure-free — messages are small JSON envelopes.
Impact Scoring
Every announce_work call scores all online agents across multiple detection layers. The highest matching layer wins.
| Layer | Signal | Score | Trigger |
|-------|--------|------:|---------|
| 0a | Same file announced in active thread | 100 | target_files ∩ their target_files |
| 0b | They modify a file you depend on | 80 | depends_on_files ∩ their target_files |
| 0c | You modify a file they depend on | 80 | target_files ∩ their depends_on_files |
| 1 | Same file recently edited | 100 | File tracker conflict (last 60s) |
| 2 | Dependency file recently edited | 80 | depends_on_files recently touched |
| 3 | Same module prefix | 30 | target_modules overlap |
Scores are categorized into three outcomes:
| Score | Category | Action |
|-------|----------|--------|
| ≥ 90 | concerned | Thread opened, consultation required |
| 30–89 | gray_zone | Thread auto-resolved, introspection recommended |
| < 30 | pass | No conflict, proceed immediately |
Layer 0 is critical. Without announced intentions, a two-agent scenario where both work in
src/auth/would score only 30 (gray zone, auto-resolved). Withannounce_work, the same scenario scores 100 and triggers a full consultation.
Capabilities at a glance (v0.10.5)
| Concern | Out of the box | Opt-in |
|---------|----------------|--------|
| Run the coordinator | mcp-coordinator server start -- zero config | — |
| Conflict detection | 4-layer impact scoring (announce / file / module / co-change), MQTT push | — |
| Authentication | Open mode (synthetic legacy claims) | Phase 1 JWT (COORDINATOR_AUTH_ENABLED) OR Phase 2 OAuth (COORDINATOR_OAUTH_ENABLED) |
| Identity providers | — | GitHub OAuth App + GitHub App + Google + generic OIDC; up to 4 in parallel via picker UI |
| Allowlist strategies | — | 4 strategies (memberships / idp_org_id / id_token_groups / none); auto-selected per provider |
| Session model | — | Cookie sessions (__Host-coordinator_session) + Bearer JWT for MCP transport + service tokens for CI/CD |
| Refresh-token rotation | — | Reuse detection with 10s grace + family revoke; GitHub App auto-refreshes IdP access tokens on 401 |
| IdP token encryption at rest | — | ✅ Shipped v0.10.5 — column-level AES-256-GCM on users.idp_access_token + users.idp_refresh_token, AAD-bound to user_id/column/org_id, key fingerprint guard at boot (COORDINATOR_ENCRYPTION_KEY) |
| Audit log | — | Tier-1 (never-drop) + Tier-2 (batched) + SHA-256 hash chain (prev_hash + row_hash) for tamper-evidence |
| Operational tooling | — | init phase2 wizard, doctor --phase2, service-token {issue,list,revoke}, rotate-jwt-secret, verify-audit-chain.ts |
| Compliance posture | — | SOC 2 Type II audit-chain runbook, GDPR Art. 17 procedures, threat model with IdP-token-leak path closed (v0.10.5) and remaining residual risks documented |
| Observability | Pino logs, MQTT broker stats | 29 Prometheus metrics on /metrics/auth, Grafana dashboard JSON, alert rules YAML |
| Multi-instance | — | (Planned v1.0) Redis-backed cache invalidation + leader election |
| Database backend | SQLite (better-sqlite3 / Bun:sqlite) | Postgres (planned v0.11, see design spec) |
1740 tests passing. Phase 2 features ship feature-flagged so Phase 1 deployments stay byte-identical (proven by the tests/backcompat/ suite).
MCP Tools
26 tools organized by function. All registered under one HTTP/SSE transport at /mcp (and stdio for stdio-mode clients).
Agent registry
| Tool | Description |
|------|-------------|
| register_agent | Register as online with name and module list |
| list_agents | List all registered online agents |
| heartbeat | Update last-seen and derive activity status |
| agent_activity | Get activity status for all online agents |
| wait_for_peers | Block until N peers online, or timeout (prevents race before first announce) |
Consultation
| Tool | Description |
|------|-------------|
| announce_work | Open a consultation thread — the main entry point before coding |
| post_to_thread | Post a message (warning, context, question) to an open thread |
| propose_resolution | Submit a resolution proposal for participants to approve |
| approve_resolution | Approve the current resolution proposal |
| contest_resolution | Reject the proposal with a reason — resets to open |
| close_thread | Close a thread after work is complete |
| cancel_thread | Cancel a thread (work abandoned or no longer relevant) |
| get_thread | Get a thread with all messages and current status |
| get_thread_updates | Poll for new messages since a timestamp |
| list_threads | List threads, filterable by status or agent |
| log_action_summary | Log a one-liner action summary for the dashboard timeline |
File tracking
| Tool | Description |
|------|-------------|
| hot_files | List files being edited by multiple agents |
| get_session_files | Get all files edited by an agent in the current session |
| check_file_conflict | Check whether another agent edited a given file recently |
Dependency map
| Tool | Description |
|------|-------------|
| set_dependency_map | Load a module dependency graph (JSON) |
| get_blast_radius | Calculate which other modules are affected by changes |
| get_module_info | Get dependency and dependent info for a module |
MQTT
| Tool | Description |
|------|-------------|
| wait_for_message | Block until a coordination message arrives on the agent's topic |
| get_queued_messages | Drain all queued messages without blocking |
| mqtt_publish | Publish a raw message to any MQTT topic |
Status
| Tool | Description |
|------|-------------|
| coordinator_status | Full system status: agents, threads, file activity, MQTT, quota |
The in-server introspection tool returns the full schema for every tool — point any MCP client at it for live discovery.
CLI
Two distribution channels:
- npm —
npm install -g mcp-coordinator. Requires Node.js 20+. - Single-file binary — Bun-compiled, no Node required. Download the matching tarball from a GitHub Release.
Commands
| Command | Description |
|---------|-------------|
| mcp-coordinator init [--url <url>] [--write-mcp-config <path>] [--write-claude-md <path>] | First-time setup — create config dir, default config.json, print/write the .mcp.json snippet, optionally scaffold a sample CLAUDE.md |
| mcp-coordinator uninstall [--mcp-config <path>] [--claude-md <path>] [--purge] [--force] | Remove integrations: drop coordinator entry from a .mcp.json, strip the coordination section from a CLAUDE.md, or --purge the ~/.mcp-coordinator/ directory entirely |
| mcp-coordinator server start [--port N] [--data-dir PATH] [--daemon] | Start the coordinator (foreground or daemon) |
| mcp-coordinator server stop | Stop the coordinator |
| mcp-coordinator server status | PID, port, online agents, open threads |
| mcp-coordinator server logs [-n N] [-f] | Tail the daemon log at ~/.mcp-coordinator/logs/server.log |
| mcp-coordinator dashboard | Open http://localhost:3100/dashboard |
| mcp-coordinator doctor [--host H] [--port P] [--mqtt-port P] | Health check: config, server liveness, /health, /mcp initialize, dashboard, MQTT broker |
| mcp-coordinator --version | Print the installed version |
Quick start
# Start the coordinator (embedded MQTT + dashboard)
mcp-coordinator server start --daemon
# Open the dashboard
mcp-coordinator dashboard
# Stop when done
mcp-coordinator server stopIn-process from your own Node app
import { startServer } from "mcp-coordinator";
await startServer({
port: 3100,
dataDir: "./coordinator-data",
});Standalone use — without an orchestrator
You don't need an orchestrator. mcp-coordinator works on its own with any MCP-compatible client — Claude Code, Cursor, Cline, Aider, custom scripts. The two most common setups:
Solo developer, multiple Claude Code sessions
You're running 2-3 Claude Code sessions in parallel on the same repo and want them to see each other's work. One coordinator instance handles all of them.
# In one terminal: start the coordinator
mcp-coordinator server start --daemonThen add the coordinator to each Claude Code session's .mcp.json (located at ~/.claude/.mcp.json for the global config, or <your-project>/.mcp.json for per-project):
{
"mcpServers": {
"coordinator": {
"type": "http",
"url": "http://localhost:3100/mcp"
}
}
}Each Claude session now has access to all 26 coordination tools (register_agent, announce_work, post_to_thread, etc.). Open mcp-coordinator dashboard in a browser to watch real-time activity across your sessions.
Team setup — shared coordinator on LAN
One person hosts the coordinator on a shared machine; teammates point their Claude at it.
Host:
# Bind to all interfaces; default is 127.0.0.1
COORDINATOR_BIND=0.0.0.0 mcp-coordinator server start --daemonEach teammate's .mcp.json points to the host's IP:
{
"mcpServers": {
"coordinator": {
"type": "http",
"url": "http://192.168.1.42:3100/mcp"
}
}
}For internet-facing or multi-tenant deployments, enable JWT auth (see Authentication). Each teammate registers via POST /api/auth/register with the team's COORDINATOR_REGISTRATION_SECRET, gets a Bearer token, and adds it to their .mcp.json:
{
"mcpServers": {
"coordinator": {
"type": "http",
"url": "https://coordinator.example.com/mcp",
"headers": { "Authorization": "Bearer <your-token>" }
}
}
}Telling Claude to use the coordinator tools
Without a behavior catalog (which is what essaim ships), you instruct Claude manually. Easiest path:
# In your project root — scaffolds CLAUDE.md with coordinator instructions
mcp-coordinator init --write-claude-md ~/my-repo --write-mcp-config ~/my-repoThis appends a clearly-marked mcp-coordinator:coordination-section block to ~/my-repo/CLAUDE.md (creating it if absent, replacing the section if it already exists). Combined with --write-mcp-config, your project is fully wired in one command.
If you'd rather embed the instructions yourself (or you're not using Claude Code), the section reads roughly:
Before modifying any source file, register with the coordinator MCP server:
- Call
register_agentwith your name and the modules you'll touch- Call
announce_workdescribing what you'll do, listing target files (anddepends_on_filesif applicable)- If a thread is created (consultation triggered), wait for the resolution before writing code
- After a meaningful change, call
log_action_summaryto update the dashboard timeline- If another agent is already working on a file you need to touch, post a question to the thread via
post_to_threadand wait for their response before proceedingUse the
coordinator_statustool to see current activity at any time.
That's all you need to start coordinating. The dashboard shows live who's doing what; the SQLite database persists threads across sessions; conflicts are detected before code is written.
Push vs polling — important architectural note
Vanilla Claude Code talks to mcp-coordinator over MCP (HTTP/stdio request-response). It does not subscribe to MQTT. That means events the coordinator publishes on MQTT (coordinator/consultations/new, etc.) are not auto-delivered to a Claude Code session — Claude has to poll the coordinator to discover new activity. The polling pattern is:
announce_workreturns the thread ID immediately if a conflict is detected — that's the most important checkpoint- After that, periodic calls to
coordinator_status/list_threads/get_thread_updatessurface new posts on threads you're a participant in - The CLAUDE.md scaffolded by
mcp-coordinator init --write-claude-mdinstructs Claude to do exactly this polling
If you want real-time push (every coordination event interrupting Claude between turns instead of waiting for a poll), use essaim. essaim ships an agent-loop wrapper that subscribes to the MQTT broker and injects events into the turn flow automatically. mcp-coordinator alone supports the polling model — which is sufficient for most use cases (2-3 Claude sessions on a small team) and zero-config to set up.
End-to-end example: two Claudes coordinating (polling model)
Two terminals, same repo, both Claude Code sessions wired to the same local coordinator. Both sessions have a CLAUDE.md scaffolded by mcp-coordinator init --write-claude-md, which instructs Claude to register, announce, and poll. The conversation below is what each Claude does — the human user just asks each Claude to make a change.
TERMINAL 1 (Alice) TERMINAL 2 (Bob)
$ claude $ claude
> "Add updated_at to User type in > "Migrate User schema"
src/models/user.ts" (touches src/models/user.ts)
[Alice's Claude] [Bob's Claude]
register_agent(name="Alice", ...) register_agent(name="Bob", ...)
announce_work(
target_files: ["src/models/user.ts"]
)
→ response: { thread_id: null,
concerned_agents: [] } announce_work(
target_files: ["src/models/user.ts",
"migrations/004.sql"]
)
→ response: { thread_id: "T-1",
concerned_agents: ["alice"],
score: 100, layer: "0a" }
[Bob sees the conflict in the response]
get_thread("T-1")
post_to_thread("T-1", type: "context",
content: "full schema migration; can
wait for your field to land first")
[Alice writes the field, then before
next major action the CLAUDE.md says
"poll coordinator_status"]
coordinator_status()
→ response: shows T-1 with Bob's post
get_thread("T-1")
post_to_thread("T-1", type: "context",
content: "adding 1 field at line 42,
no rename. Done in 5 min.")
propose_resolution("T-1",
content: "Alice's field first,
Bob runs migration after")
[Bob's CLAUDE.md polling step]
coordinator_status()
→ shows T-1 in 'resolving' state
get_thread("T-1")
approve_resolution("T-1")
[Alice's next poll]
coordinator_status()
→ T-1 status = 'resolved'
[Alice writes the field] [Bob writes the migration]
log_action_summary(...) log_action_summary(...)The dashboard at http://localhost:3100/dashboard/ plays the entire timeline live. mcp-coordinator server logs -f (in a third terminal) tails the daemon log if you want to see the protocol-level events. If polling cadence is too coarse and you find Claude missing posts, switch to essaim's agent-loop, which delivers MQTT events automatically.
Team setup walkthrough — shared coordinator with JWT
Full step-by-step for a team running a coordinator on a shared host with internet-facing or multi-tenant access. Adjust to your network/TLS reality.
Step 1 (host) — generate secrets
# 32+ char shared secret; put in your secrets manager and inject as env vars
JWT_SECRET=$(openssl rand -hex 32)
REGISTRATION_SECRET=$(openssl rand -hex 32)
ADMIN_SECRET=$(openssl rand -hex 32)Step 2 (host) — start the coordinator with auth enabled
COORDINATOR_AUTH_ENABLED=true \
COORDINATOR_JWT_SECRET="$JWT_SECRET" \
COORDINATOR_REGISTRATION_SECRET="$REGISTRATION_SECRET" \
COORDINATOR_ADMIN_SECRET="$ADMIN_SECRET" \
COORDINATOR_BIND=0.0.0.0 \
mcp-coordinator server start --daemon --port 3100(Front the server with TLS via nginx/Caddy/etc. for internet exposure. Local LAN can use plain HTTP.)
Step 3 (each teammate) — request a token
curl -X POST https://coordinator.example.com/api/auth/register \
-H "Content-Type: application/json" \
-d '{"agent_name":"alice","registration_secret":"<REGISTRATION_SECRET shared via team channel>"}'
# Response: { "agent_id": "alice-abc123", "token": "eyJ...", "expires_at": "...", "role": "agent" }Step 4 (each teammate) — wire .mcp.json
{
"mcpServers": {
"coordinator": {
"type": "http",
"url": "https://coordinator.example.com/mcp",
"headers": { "Authorization": "Bearer <paste-token-here>" }
}
}
}Step 5 (each teammate) — run init --write-claude-md to scaffold project instructions, OR add the coordination section to their existing CLAUDE.md.
Step 6 (each teammate) — verify: mcp-coordinator doctor --host coordinator.example.com --port 443 should show all checks green from any laptop.
Token rotation: tokens expire per COORDINATOR_JWT_EXPIRY (default 24h). Refresh via POST /api/auth/refresh with the current Bearer token. The admin can revoke a specific agent via POST /api/auth/revoke (admin token required).
Logs and debugging
The daemon writes to ~/.mcp-coordinator/logs/server.log. Tail it:
mcp-coordinator server logs # last 50 lines
mcp-coordinator server logs -n 200 # last 200 lines
mcp-coordinator server logs -f # follow (Ctrl+C to stop)For a one-shot check that everything is wired up correctly (config valid, server up, MCP responds, dashboard reachable, MQTT accepting connections), use the doctor:
mcp-coordinator doctordoctor exits non-zero if any check fails and prints actionable hints next to each failure. Probe a remote coordinator with --host and --port:
mcp-coordinator doctor --host coordinator.example.com --port 443 --mqtt-port 1883Logging level is controlled by LOG_LEVEL (debug, info, warn, error — default info). Set NODE_ENV=development for human-readable pretty logs:
NODE_ENV=development LOG_LEVEL=debug mcp-coordinator server startRemoving the integration (per-project or globally)
Symmetric to init, the uninstall command undoes what was added without touching anything you wrote yourself.
# Remove coordinator from a project's .mcp.json AND strip its section from CLAUDE.md
mcp-coordinator uninstall --mcp-config ~/my-repo --claude-md ~/my-repo
# Wipe the global config dir (~/.mcp-coordinator/) entirely — config + data + logs + pid file
mcp-coordinator uninstall --purge # asks for confirmation
mcp-coordinator uninstall --purge --force # skip the prompt, useful in scripts--mcp-config <path> reads <path>/.mcp.json, removes only the coordinator server entry (other servers untouched), and deletes the file if it ends up empty. --claude-md <path> removes only the block delimited by the mcp-coordinator:coordination-section sentinels (rendered as HTML comments around the section) — it never touches text you authored. Combine flags as needed; if the resulting CLAUDE.md is empty, it's deleted.
To remove the npm package itself: npm uninstall -g mcp-coordinator.
Running multiple coordinators on the same machine
Useful for per-project isolation — every project gets its own ephemeral coordinator with no cross-contamination. Pick distinct ports + data dirs:
# Project A
PORT=3110 \
COORDINATOR_MQTT_TCP_PORT=11883 \
mcp-coordinator server start --daemon --data-dir ./.mcp-coordinator-A
# Project B (different terminal)
PORT=3120 \
COORDINATOR_MQTT_TCP_PORT=12883 \
mcp-coordinator server start --daemon --data-dir ./.mcp-coordinator-BThe default ~/.mcp-coordinator/server.pid only tracks ONE daemon at a time. For multi-instance runs, pass --data-dir explicitly to each instance — the PID file lives next to the data dir, so multiple instances don't fight over the same file. To stop a specific instance, cd to its data dir's parent and run mcp-coordinator server stop from there, OR kill $(cat ./.mcp-coordinator-A/../server.pid).
In each project's .mcp.json, point at the project's coordinator:
{
"mcpServers": {
"coordinator": {
"type": "http",
"url": "http://localhost:3110/mcp"
}
}
}This pattern works well alongside essaim, which uses Strategy A (in-process) and starts its own ephemeral coordinator per essaim run — there's no port conflict because essaim picks an isolated dir by default.
Authentication
The coordinator runs in one of three modes, selected by env-var configuration. Single-user / dev local stays zero-config; multi-user deployments opt in to Phase 2 OAuth via a single feature flag.
Mode 1 — Open (default, no auth)
mcp-coordinator server startNo env vars, no setup, no /auth/login. All requests get synthetic legacy claims (sub="legacy", user_id="legacy", org="default", role="admin"). MCP tools work unchanged. This is byte-identical to v0.7.x; a 31-case tests/backcompat/ suite proves the non-regression on every release.
Mode 2 — JWT (Phase 1, v0.7.x)
Opt-in symmetric JWTs (HS256 via jose) without OAuth. Agents self-register against a shared secret; admins revoke via a separate secret.
export COORDINATOR_AUTH_ENABLED=true
export COORDINATOR_JWT_SECRET="$(openssl rand -base64 32)"
export COORDINATOR_REGISTRATION_SECRET="team-shared-secret"
export COORDINATOR_ADMIN_SECRET="admin-only-secret"# Agent self-register
curl -X POST http://localhost:3100/api/auth/register \
-H "Content-Type: application/json" \
-d '{"agent_name":"my-agent","registration_secret":"team-shared-secret"}'
# → { agent_id, token, expires_at, role }
# Refresh
curl -X POST http://localhost:3100/api/auth/refresh \
-H "Authorization: Bearer <current-token>"
# Admin revoke
curl -X POST http://localhost:3100/api/auth/revoke \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"agent_id":"agent-to-revoke"}'Exempt routes: GET /health, POST /api/auth/register, POST /api/auth/refresh, GET /api/events (SSE).
Mode 3 — Phase 2 OAuth (v0.8.0+, the modern path)
Full OAuth 2.1 + RFC 8628 device flow + cookie sessions + service tokens + audit-chain tamper-evidence + multi-IdP picker. Activate by setting COORDINATOR_OAUTH_ENABLED=true; Phase 1 deployments are unaffected when the flag is unset.
Quick start (single IdP — GitHub OAuth App)
export COORDINATOR_OAUTH_ENABLED=true
export COORDINATOR_JWT_SECRET=$(openssl rand -base64 32)
export COORDINATOR_PUBLIC_URL=https://coordinator.example.com
# GitHub OAuth App
export COORDINATOR_GITHUB_CLIENT_ID=<from-github-oauth-app>
export COORDINATOR_GITHUB_CLIENT_SECRET=<from-github-oauth-app>
export COORDINATOR_GITHUB_ORG=<your-org-slug>
mcp-coordinator init phase2 # interactive wizard
mcp-coordinator server startThe first user to sign in via ${PUBLIC_URL}/auth/login becomes the bootstrap admin atomically. Full walkthrough: docs/onboarding-self-host.md.
Multi-IdP matrix (v0.9.0 → v0.10.4)
Configure any subset of the 4 providers simultaneously — /auth/login automatically renders a picker page when >=2 are registered. Each provider can be opted in / out independently; COORDINATOR_GITHUB_CLIENT_ID is the only one always required.
| Provider | Required env vars | Allowlist strategy | Allowlist column |
|----------|-------------------|--------------------|------------------|
| GitHub OAuth App | COORDINATOR_GITHUB_CLIENT_ID + _SECRET + COORDINATOR_GITHUB_ORG | memberships (default) | orgs.allowlist_github_org |
| GitHub App (v0.10.0+) | COORDINATOR_GITHUB_APP_CLIENT_ID + _SECRET | memberships (default) OR memberships via installation-footprint when COORDINATOR_GITHUB_APP_ALLOWLIST_SOURCE=user_installations (v0.10.3) | orgs.allowlist_github_org |
| Google (v0.9.0+) | COORDINATOR_GOOGLE_CLIENT_ID + _SECRET | idp_org_id (v0.10.2; Workspace hd claim) | orgs.allowlist_idp_org_id |
| Generic OIDC (v0.9.0+) | COORDINATOR_OIDC_ISSUER_URL + _CLIENT_ID + _CLIENT_SECRET | none by default; switches to id_token_groups when COORDINATOR_OIDC_GROUPS_CLAIM is set (v0.10.4) | orgs.allowlist_github_org |
Setting partial credentials for any provider fails closed at boot (BootValidationError). Detailed setup per provider: docs/idp-providers.md.
Allowlist strategy semantics
All four strategies are case-insensitive string matches against the relevant orgs column. Tie-break is alphabetical (deterministic per V4 FIX 22).
memberships— Provider'slistMemberships(accessToken)returns an array of strings, each matched againstorgs.allowlist_github_org. Used by GitHub OAuth App (/user/orgs→ GitHub orgs) and GitHub App (/user/orgsOR/user/installationsper_ALLOWLIST_SOURCE).idp_org_id—IdpUserInfo.idp_org_idmatched directly againstorgs.allowlist_idp_org_id. Used by Google (thehdWorkspace claim).id_token_groups— Groups extracted from the id_token at a configurable dot-notation path (COORDINATOR_OIDC_GROUPS_CLAIM, typical values:groups/realm_access.roles/roles). Used by OIDC.none— Deny by default. Used by OIDC when no groups claim is configured. Operators wanting a custom OIDC allowlist vendor a subclass.
Refresh-rotation re-checks the IdP-side allowlist only for memberships providers. idp_org_id and id_token_groups strategies are sign-in-only — operator-side revocation requires token_epoch bump.
Sample orgs provisioning
-- GitHub OAuth App + GitHub App (memberships): use allowlist_github_org
INSERT INTO orgs (id, name, allowlist_github_org)
VALUES ('org-acme-gh', 'Acme GitHub', 'acme');
-- Google Workspace (idp_org_id): use allowlist_idp_org_id
INSERT INTO orgs (id, name, allowlist_idp_org_id)
VALUES ('org-acme-google', 'Acme Workspace', 'acme.com');
-- OIDC with groups claim (id_token_groups): reuses allowlist_github_org
INSERT INTO orgs (id, name, allowlist_github_org)
VALUES ('org-acme-okta', 'Acme via Okta', 'engineers');A user signs in once per provider; their users.idp_provider stays sticky. Identities are not auto-merged across providers — a one-shot SQL reconciliation is needed if you want to consolidate users who signed in via multiple paths.
Refresh-token recovery (GitHub App, v0.10.0+)
GitHub App user-to-server tokens expire after 8h. On a 401 at refresh-rotation time, the coordinator automatically calls refreshIdpToken(refresh_token) to mint a fresh access+refresh pair, persists both to users.idp_access_token / users.idp_refresh_token, emits a Tier 2 auth.idp.token_refreshed audit, and retries the membership check. Only triggers for providers that implement refreshIdpToken (currently GitHubAppProvider).
Service tokens for CI/CD
mcp-coordinator service-token issue \
--user u-admin-123 --org org-acme-001 \
--scope read --ttl 30d --reason "CI deploy pipeline"
# → Returns access_token (show once)
mcp-coordinator service-token list
mcp-coordinator service-token revoke --jti <jti>90-day max TTL. Admin-only issuance. Reason ≥ 10 chars required.
OIDC defense-in-depth
nonceverification (v0.10.1) — Generated per authorize request, stored inoauth_state, verified againstid_token.nonceat exchange time. Guards against id_token replay across authorize requests. Automatic; no configuration.id_tokensignature verification (v0.9.0) — RS256 only, JWKS-by-kid lookup from the OIDC discovery doc,iss+audchecks.- Discovery doc
issuercross-check — The discovery doc's ownissuerfield MUST matchCOORDINATOR_OIDC_ISSUER_URL; mismatch fails at first/auth/login. Guards against discovery-URL redirect attacks.
Operational tooling
| CLI / script | Purpose | Doc |
|--------------|---------|-----|
| mcp-coordinator init phase2 | Interactive Phase 2 wizard | onboarding |
| mcp-coordinator doctor --phase2 | 8 Phase 2 health probes | — |
| mcp-coordinator service-token {issue,list,revoke} | CI/CD tokens | — |
| mcp-coordinator rotate-jwt-secret [--format env\|json\|secret-only] | JWT secret rotation helper (v0.9.2) | auto-rotation |
| tsx scripts/verify-audit-chain.ts [--db <path>] [--json] | SHA-256 audit chain integrity (v0.9.1) | audit-integrity |
Audit log tamper-evidence (v0.9.1)
Every audit_log row carries prev_hash + row_hash. row_hash = SHA-256(prev_hash || canonical(row_fields)). The chain proves no in-place tampering of committed rows. Pair with the tip-attestation workflow (cron + external signed store, e.g. S3 Object Lock) for full SOC 2 Type II deletion-detection.
Limitations (documented in src/security/audit-chain.ts + docs/ops/audit-integrity.md):
created_atis NOT in the hash — timestamp rewrites are not detected by the chain alone.- Deletion of recent rows is indistinguishable from legitimate sweeper retention without external tip-attestation.
Documentation
| Doc | Topic |
|-----|-------|
| docs/onboarding-self-host.md | Zero-to-first-signin walkthrough |
| docs/idp-providers.md | Per-provider setup (GitHub OAuth App, GitHub App, Google, OIDC, Azure AD) |
| docs/openapi.yaml | OpenAPI 3.1, 17 endpoints |
| docs/security/threat-model.md | STRIDE per asset, 10 residual risks |
| docs/ops/upgrade-phase1-to-phase2.md | Phase 1 → Phase 2 migration |
| docs/ops/key-rotation.md | JWT_SECRET rotation procedure |
| docs/ops/auto-rotation.md | Automation around rotate-jwt-secret (systemd / k8s CronJob) |
| docs/ops/audit-integrity.md | Audit chain runbook + tip-attestation workflow |
| docs/ops/backup-restore.md | Litestream + NR12 reconciliation |
| docs/gdpr.md | GDPR Art. 17 procedures |
| sdk/README.md | TypeScript SDK reference |
Historical: v0.6.x → v0.7.0 migration
v0.7.0 reworked auth foundation: schema gained org_id everywhere, JWTs gained user_id/org claims, MQTT topics gained an org prefix. Migration runs on first boot of v0.7.0 (PRAGMA user_version guard).
Manual JWT-secret rotation (v0.7.0 era, still works in v0.8+):
export COORDINATOR_JWT_SECRET=new-secret-here
export COORDINATOR_JWT_PREV_SECRET=old-secret-here
# Restart coordinator. Wait one JWT TTL (24h default).
# Then remove COORDINATOR_JWT_PREV_SECRET and restart again.The v0.9.2 rotate-jwt-secret CLI automates the new-secret generation step of this procedure.
Anthropic Quota Pre-flight
The coordinator tracks Anthropic workspace quota live and exposes it on MQTT, the dashboard, and the coordinator_status MCP tool — so MCP clients can decide whether to abort, throttle, or proceed before launching expensive turns.
- Reads usage from the Anthropic API using the key in the environment.
- Threshold via
MAX_QUOTA_PCTenv var (default95). - Back-off when the usage endpoint itself returns 429.
- Live widget in the dashboard with manual refresh + historical buckets.
coordinator/quota/updateMQTT events stream into the timeline by default.
Orchestrators that spawn N agents at once can read coordinator_status.quota and abort their run if utilization is over a configured threshold — the essaim reference orchestrator does exactly this.
Token Observability
Every MCP tool call and agent turn is logged with token breakdown.
- Logs — component logger
tokensemitsinput_tokens,output_tokens,cache_read,cache_creation,thinking, model id, turn index. - Dashboard — live per-agent token gauge, cumulative session total, quota widget.
Aggregating across runs (e.g., reports/YYYY-MM-DD-<run-id>.md) is an orchestrator responsibility — the coordinator emits the events, the orchestrator consumes them.
Dashboard
http://localhost:3100/dashboard (or /dashboard on whichever port the coordinator is bound to).
- Timeline — all threads +
quota_updateevents with scores and resolution types - Agent panel — online/offline, working/idle/waiting, current file, thread being waited on. Resizable drag handle.
- Scoring breakdown — which detection layer triggered each conflict
- Quota widget — live utilization %, stacked buckets, manual refresh button
- Version banner — server version shown in the header (dynamic, not hardcoded)
- Consensus metrics — per session: consensus / timeout / auto-resolved split, token totals
All events arrive via SSE on /api/events. No polling.
Agent Activity States
| Status | Indicator | Meaning | |--------|-----------|---------| | working | pulsing blue | Actively editing files | | idle | solid green | Online, no recent activity | | waiting | pulsing yellow | Blocked on a consultation thread | | offline | solid red | Disconnected or session ended |
Activity is derived from heartbeats enriched with the current file/thread context from the file tracker.
Configuration
Local data
~/.mcp-coordinator/
├── config.json # persistent configuration
├── data/
│ └── coordinator.db # SQLite database
├── server.pid # PID file (when daemonized)
└── logs/
└── server.log # daemon logsconfig.json
{
"server": { "port": 3100, "data_dir": "~/.mcp-coordinator/data" },
"defaults": { "coordinator_url": "http://localhost:3100" }
}Resolution priority (highest to lowest): CLI flag → env var → config.json → default.
Server env vars
| Variable | Default | Description |
|----------|---------|-------------|
| PORT | 3100 | HTTP port (also serves MQTT-over-WebSocket on /mqtt) |
| COORDINATOR_DATA_DIR | ~/.mcp-coordinator/data | Directory for the SQLite database |
| COORDINATOR_MQTT_TCP_PORT | 1883 | TCP port for the embedded broker |
| COORDINATOR_MQTT_WS_PATH | /mqtt | WebSocket path on the same HTTP port |
| LOG_LEVEL | info | debug / info / warn / error |
| NODE_ENV | — | development for pretty logs |
| COORDINATOR_AUTH_ENABLED | false | Enable JWT authentication |
| COORDINATOR_JWT_SECRET | — | HMAC signing key (min 32 chars) |
| COORDINATOR_JWT_EXPIRY | 24h | Token lifetime (e.g., 1h, 7d) |
| COORDINATOR_REGISTRATION_SECRET | — | Shared secret for agent auto-register |
| COORDINATOR_ADMIN_SECRET | — | Separate secret for admin token creation |
| MAX_QUOTA_PCT | 95 | Pre-flight abort threshold for Anthropic quota |
Environment variables (v0.5+)
| Variable | Default | Effect |
|---|---|---|
| COORDINATOR_REPO_ROOT | (unset → team-mode) | Repo root for path-guard, FS fallback, Layer 4 |
| COORDINATOR_MAX_BODY_BYTES | 1048576 | parseBody hard cap |
| COORDINATOR_LAYER4_DENYLIST | (uses defaults) | Comma-separated globs appended to denylist |
| COORDINATOR_LAYER4_SINCE_DAYS | 7 | git log --since window |
| COORDINATOR_LAYER4_MAX_COMMITS | 2000 | git log --max-count |
| COORDINATOR_LAYER4_REFRESH_INTERVAL_MS | 1800000 | Refresh on success |
| COORDINATOR_LAYER4_RETRY_MS | 300000 | Retry on timeout |
| COORDINATOR_WORKING_FILES_TTL_MIN | 30 | working_files claim TTL |
| COORDINATOR_WORKING_FILES_SWEEP_INTERVAL_MS | 60000 | TTL sweeper tick |
Phase 2 OAuth env vars (v0.8.0+)
All Phase 2 vars are inert when COORDINATOR_OAUTH_ENABLED is unset. Setting OAuth on requires the required rows; everything else is opt-in. The canonical source-of-truth is .env.example (annotated, copy-pasteable).
Required when COORDINATOR_OAUTH_ENABLED=true (5 vars):
| Variable | Effect |
|----------|--------|
| COORDINATOR_OAUTH_ENABLED | Master feature flag. Set to true to enable Phase 2. |
| COORDINATOR_JWT_SECRET | HS256 signing key for issued tokens. Min 128 bits of entropy (rejected at boot otherwise). Generate via mcp-coordinator rotate-jwt-secret --format secret-only. |
| COORDINATOR_PUBLIC_URL | Externally-visible URL of the coordinator (https://...). Used to build OAuth redirect_uri. Validated http(s) and non-localhost-requires-https unless COORDINATOR_INSECURE_COOKIES=true. |
| COORDINATOR_GITHUB_CLIENT_ID | GitHub OAuth App client ID. |
| COORDINATOR_GITHUB_CLIENT_SECRET | GitHub OAuth App client secret. |
| COORDINATOR_GITHUB_ORG | Allowlisted GitHub org slug; bootstrapped into orgs.allowlist_github_org. |
Optional v0.8 hardening / GHES:
| Variable | Effect |
|----------|--------|
| COORDINATOR_GITHUB_AUTH_BASE_URL | GHES OAuth host override (default https://github.com). |
| COORDINATOR_GITHUB_API_BASE_URL | GHES REST API host override (default https://api.github.com). |
| COORDINATOR_INSECURE_COOKIES | true to drop Secure cookie flag for HTTP non-localhost (NOT recommended for prod). |
| COORDINATOR_ALLOW_RESTORE | true after restoring from backup (NR12) to authorize token_epoch global bump. Unset after first boot. |
| COORDINATOR_JWT_SECRET_PREV | Previous JWT signing secret for rotation overlap (v0.8.1). Verify-only under kid hs256-v0. |
| COORDINATOR_JWT_SECRET_PREV_ROTATED_AT | ISO timestamp recorded in the config.key_rotation Tier 1 audit. |
| COORDINATOR_JWT_ACCESS_TTL | Access token TTL. Default 15m, max 60m. |
| COORDINATOR_JWT_REFRESH_TTL | Refresh token TTL. Default 30d, max 90d. |
| COORDINATOR_AUTO_PROVISION | false to require admin pre-creation of users before first sign-in. Default true. |
| COORDINATOR_METRICS_BEARER | Bearer token gating /metrics/auth for remote Prometheus. Unset = localhost-only. |
Multi-IdP — Google (v0.9.0+, v0.10.2 hd allowlist):
| Variable | Effect |
|----------|--------|
| COORDINATOR_GOOGLE_CLIENT_ID | Google OAuth client ID. Enables Google sign-in. Match against Workspace hd claim in orgs.allowlist_idp_org_id. |
| COORDINATOR_GOOGLE_CLIENT_SECRET | Google OAuth client secret. Both required together; fail-closed at boot. |
Multi-IdP — generic OIDC (v0.9.0+, v0.10.1 nonce, v0.10.4 groups):
| Variable | Effect |
|----------|--------|
| COORDINATOR_OIDC_ISSUER_URL | Issuer URL; drives discovery via /.well-known/openid-configuration. |
| COORDINATOR_OIDC_CLIENT_ID | OIDC client ID. |
| COORDINATOR_OIDC_CLIENT_SECRET | OIDC client secret. All three required together. |
| COORDINATOR_OIDC_GROUPS_CLAIM | (v0.10.4, optional) Dot-notation path to groups in id_token. Common: groups, realm_access.roles, roles. When set, switches allowlist strategy to id_token_groups. Without it, OIDC stays deny-by-default. |
Multi-IdP — GitHub App (v0.10.0+, v0.10.3 installation allowlist):
| Variable | Effect |
|----------|--------|
| COORDINATOR_GITHUB_APP_CLIENT_ID | GitHub App client ID (usually Iv1. prefix). Sibling to OAuth App; both can run simultaneously. |
| COORDINATOR_GITHUB_APP_CLIENT_SECRET | GitHub App client secret. |
| COORDINATOR_GITHUB_APP_NAME | (optional) Registry key for picker UI; default github-app. |
| COORDINATOR_GITHUB_APP_ALLOWLIST_SOURCE | (v0.10.3, optional) user_orgs (default) or user_installations. The latter drives allowlist off App installation footprint instead of user's GitHub-org memberships. |
Structured Logging
Pino emits JSON per subsystem. Component loggers: http, mcp, mqtt, consultation, conflict, auth, tokens, quota.
Production (default):
{"level":"info","time":1712345678901,"component":"http","msg":"Server started","port":3100}Dev (NODE_ENV=development):
[14:21:03.456] INFO (http): Server started
port: 3100Levels controlled via LOG_LEVEL.
SDK
A TypeScript reference client lives in sdk/ (not yet published to npm). Install via npm install file:./sdk from a consumer project.
import { McpCoordinatorClient, FileTokenStore, ProactiveRefresh } from "@mcp-coordinator/sdk-js";
const client = new McpCoordinatorClient({
baseUrl: "https://coordinator.example.com",
store: new FileTokenStore(),
refreshStrategy: new ProactiveRefresh(),
refreshLockPath: process.env.HOME + "/.mcp-coordinator/refresh.lock",
});
await client.loadFromStore();
const me = await client.whoami();See sdk/README.md for the full API.
Test Results
All four coordination scenarios are validated end-to-end by the test suite:
| Scenario | Layer | Score | Category | Outcome | |----------|-------|------:|----------|---------| | S1 — Same file | 0a | 100 | concerned | Thread opened → consensus | | S2 — Same module | 3 | 30 | gray_zone | Auto-resolved, introspection | | S3 — Dependency | 0b | 80 | gray_zone | Auto-resolved, introspection | | S4 — No overlap | — | 0 | pass | Auto-resolved immediately |
Performance:
| Component | Time | |-----------|------| | Conflict detection (no LLM) | < 5 ms | | MQTT push delivery | < 50 ms end-to-end | | Full consultation cycle (S1) | 30–45 s |
Integration patterns
Any MCP client
Connect to http://localhost:3100/mcp (HTTP/SSE) or stdio. The server speaks MCP 2024-11-05.
Custom orchestrator
Spawn agents that connect to the MQTT broker and register via the MCP register_agent tool. The orchestrator decides spawn count, lifecycle, and quota gating; the coordinator handles the protocol. See essaim for a reference implementation, or write your own.
Reference catalog of coordinator-aware behaviors
The behaviors that make agents announce-before-write, resolve conflicts, and participate in work-stealing are YAML configs assembled by @swoofer/promptweave. See essaim's behaviors for a curated catalog.
Release history & Roadmap
Per-version detail for v0.5.0 → v0.10.5 lives below. The Capabilities at a glance matrix at the top of this README is the current-state summary.
What's New in v0.8.0 (Phase 2 OAuth)
Released 2026-05-14. Feature-flagged behind COORDINATOR_OAUTH_ENABLED=true. Phase 1 deployments are byte-identical when the flag is unset.
A. OAuth 2.1 + device flow (RFC 6749 + RFC 8628)
GET /auth/logininitiates the browser flow with PKCE S256 + HMAC-bound state cookieGET /api/auth/oauth/callbackperforms state CAS + provider mix-up defense + IdP code exchange + user provisioning inside a transactionPOST /api/auth/oauth/tokenunified grant endpoint (authorization_code, refresh_token, device_code per RFC 8628)POST /api/auth/oauth/device_authorizationfor CLI / TV / IoT device flow- 5 HTML pages: /auth/login, /auth/device, /auth/device/confirm, /auth/device/approve POST, /auth/success
- See
docs/openapi.yamlfor the full API spec
B. Refresh-token rotation with stolen-token detection (V3 §B-NEW-2)
- Each refresh issues a new family member (
family_id) withparent_jtilineage - 10-second grace window allows legitimate retries with matching fingerprint
- Mismatched fingerprint within grace → atomic
replay_count++; family revoked at threshold 3 - Hard reuse (rotation > 10s old) → immediate family revoke +
auth.refresh.chain_revokedTier 1 audit
C. Cookie sessions (Scenario 5)
__Host-coordinator_sessioncookie +__Host-coordinator_csrf(double-submit pattern)authenticateRequestnow handles 5 scenarios: legacy fallback, no-auth, v0.6 legacy JWT reject, Bearer JWT, cookie sessionPOST /api/auth/logout(revoke current refresh),/logout-all(bumptoken_epoch→ all sessions invalidated instantly),/revoke(RFC 7009)GET /api/auth/meuserinfo helper
D. Service tokens (V4 §5.5)
- Admin-issued long-lived JWTs for CI/CD:
mcp-coordinator service-token issue --user X --org Y --scope read --ttl 30d --reason "..." - 90d hardcoded TTL ceiling; reason ≥10 chars required; admin-only issuance
family_idformatservice:<uuid>distinguishes from user refresh families- DB-lookup verification on every request (admin force-revoke is immediate)
list+revokeCLI verbs;auth.service_token.{issued,revoked,used}audit events
E. Audit pipeline (two-tier durability per V3 NR13)
- 35 audit event types catalogued in
src/security/audit-events.ts - Tier 1 (sync direct INSERT, never drop): security-critical events (refresh.chain_revoked, login.locked, token.revoked, admin.bootstrapped, ...)
- Tier 2 (async batched queue, may drop under pressure): high-volume operational (login.success, refresh.rotated, device.code_issued, ...)
- AsyncLocalStorage-based actor + request_id propagation — no explicit threading
- Audit queue capacity 10K with backpressure →
auth.shutdown.audit_lossrow on drop
F. Operational
bootPhase2composes ServerContext at boot: env validation, HKDF key derivation, restore detection (NR12), feature-flag gate- Sweeper prunes 6 tables on 60s cadence (oauth_state, device_auth_requests, refresh_tokens × 2 retention buckets, audit_log × 2 tiers) with adaptive chained passes + 5-failure circuit breaker
- Rate limiter + login lockout (5 failures / 15min → 15min lockout per V3 §B-NEW-8) in-memory token bucket
- IdP membership cache with 60s positive TTL + 10min stale-on-error window (V3 §B-NEW-5)
G. Observability
- 29 new Prometheus metrics in
src/observability/metrics.ts(auth activity, refresh chain, device flow, IdP, audit queue, sweeper, rate limit, request duration) /metrics/authPrometheus scrape endpoint (localhost-only + optional Bearer auth)/healthz(liveness) +/health/ready(readiness — DB + sweeper circuit + audit queue depth + draining flag)- Pino logger with 16 redact paths per V4 §11.3
- Grafana dashboard JSON (
docs/ops/dashboards/coordinator.json) + Prometheus alert rules YAML (docs/ops/alerts/coordinator-alerts.yaml)
H. CLI
mcp-coordinator init phase2interactive wizard for first-time Phase 2 setupmcp-coordinator doctor --phase2runs 8 Phase 2 health probes (DB schema version, JWT_SECRET entropy, discovery doc reachability, etc.)mcp-coordinator service-token {issue,list,revoke}for admin token management
I. Testing
- 1555 tests passing across 116 test files (up from 392 at v0.5.0)
- 100% branch coverage enforced via vitest per-file thresholds on every security-critical module (csrf, token-epoch, oauth-state, jwt-mint, membership-cache, refresh-rotation, service-tokens, github provider, etc.)
- Playwright E2E suite (
tests/e2e/) — 5 scenarios in ~12s, zero flakes over 5 runs - D1-D10 cross-cutting test matrix (
tests/integration/d1-d10-matrix.test.ts) — 20 cases proving component-interaction seams - Phase 1 backcompat suite (
tests/backcompat/) — 31 cases provingCOORDINATOR_OAUTH_ENABLEDunset = byte-identical Phase 1 - Cross-tenant isolation suite — 22 cases proving multi-tenant data boundary
J. SDK (sdk/)
- TypeScript client
@mcp-coordinator/sdk-jsin repo workspace (not yet published) McpCoordinatorClientwith verbs:whoami,logout,logoutAll,revoke,refresh,deviceCodeStart,deviceCodePoll- 14 typed error subclasses mapping to the OpenAPI error envelope
FileTokenStorepersists tokens to~/.mcp-coordinator/tokens.jsonwithchmod 0600(POSIX) + atomic write-renameProactiveRefreshschedules refresh ataccessExpiresAt - 120s ± 30s jitter- Single-flight refresh lock via atomic O_EXCL file lock (multi-process CLI safety)
- See
sdk/README.mdfor usage
Documentation
23 new doc files under docs/ and examples/:
- Operator:
docs/onboarding-self-host.md,docs/ops/upgrade-phase1-to-phase2.md, 8 ops runbooks - Security:
docs/security/threat-model.md, 3 incident runbooks,SECURITY.md,.well-known/security.txt - Compliance:
docs/gdpr.md,docs/idp-providers.md - API:
docs/openapi.yaml(OpenAPI 3.1, 17 endpoints, 13 schemas) - Examples:
examples/{docker-compose,nginx-reverse-proxy,ghes-config,custom-idp-provider}/
v0.8.1 follow-up (2026-05-15)
- JWT key rotation overlap (prev-secret support per
docs/ops/key-rotation.md) - GHES env vars wiring (
COORDINATOR_GITHUB_AUTH_BASE_URL+_API_BASE_URL)
What's New in v0.9.0 (Multi-IdP)
Released 2026-05-15. Single-provider GitHub-only deployments stay behaviour-compatible — every change is opt-in via new env vars or a no-op when only one IdP is registered.
A. Provider registry
ProviderRegistryclass attached toAuthHandlerContext.providers(T45). First registration becomes the implicit default.- Every OAuth handler resolves the IdP through
ctx.providers.get(...)rather than the removedctx.githubProvideralias (T46). Refresh-rotation readsusers.idp_providerso multi-provider users get re-validated against the IdP they actually signed in with.
B. Google OAuth / OIDC
- First-class
GoogleProvider(T47) with mandatory id_token signature verification: josecreateRemoteJWKSet+ RS256 +iss=https://accounts.google.com+aud=client_id. - Identity claims read straight from the verified id_token (no extra
/userinforound-trip). - Workspace
hdclaim surfaces asidp_org_idfor hd-based allowlist deployments. - Opt-in via
COORDINATOR_GOOGLE_CLIENT_ID+COORDINATOR_GOOGLE_CLIENT_SECRET(both required or neither — fail-closed at boot).
C. Generic OpenID Connect
OIDCProvider(T48) for Okta / Auth0 / Azure AD / Keycloak / Authentik / any conformant OIDC issuer.- Auto-discovers
authorization_endpoint,token_endpoint, andjwks_urifrom<issuer>/.well-known/openid-configuration. - Discovery doc's own
issuerfield is cross-checked against config — catches redirect attacks on the discovery URL. - Email-claim fallback chain:
email→preferred_username→sub(OIDC core makesemailoptional). - Opt-in via
COORDINATOR_OIDC_ISSUER_URL+COORDINATOR_OIDC_CLIENT_ID+COORDINATOR_OIDC_CLIENT_SECRET(all three required together).
D. Login picker UI
GET /auth/loginrenders an HTML picker whenctx.providers.size() > 1(T49). Each button is a top-level GET to/auth/login?provider=<name>; the underlying PKCE + state-cookie + 302 flow is unchanged.- Friendly built-in labels for
github/google/oidc; title-cased fallback for custom provider names. - Unknown
?provider=X→ 400UNKNOWN_PROVIDER(no silent fallback to the default). - Single-provider deployments skip the picker entirely.
Breaking changes (internal embedding APIs)
| Surface | Change | Migration |
|---------|--------|-----------|
| AuthHandlerContext.githubProvider | Removed | Use ctx.providers.get("github") or ctx.providers.getDefault() |
| IdPProvider.buildAuthUrl return type | string → string \| Promise<string> | await the result; built-in providers stay synchronous |
| provisionUser(...) | Required 6th arg providerName: string | Pass "github" for existing call sites; the resolved provider.name for new ones |
| auth.state.mixup audit metadata | { expected_provider: "github" } → { registered_providers: string[] } | Log-pipeline consumers parsing expected_provider need to update |
Testing
- 1623 tests passing (+61 vs v0.8.1). 100% branch coverage on
auth/providers/{registry,github,google,oidc}.ts. - 16 GoogleProvider tests covering happy path, id_token verification (wrong issuer / audience / expired / unknown kid), JWKS unreachable transient errors, token-endpoint 401 / 502 / 4xx mapping.
- 21 OIDCProvider tests covering discovery-URL validation, issuer cross-check, the same id_token verification matrix, and email-claim fallback chain.
- 8 login-picker rendering tests + 6 picker integration tests (1 vs N provider behaviour, unknown-name 400, rate-limit, state row provider field).
v0.9.1 follow-up (2026-05-15)
Audit log tamper-evidence. New prev_hash + row_hash columns on audit_log build a SHA-256 chain over every row written via audit(). scripts/verify-audit-chain.ts walks the chain and reports tampering; docs/ops/audit-integrity.md is the SOC 2 Type II operator runbook covering the external tip-attestation workflow that closes the deletion-detection gap.
v0.9.2 follow-up (2026-05-15)
mcp-coordinator rotate-jwt-secret CLI helper generates a fresh signing secret with entropy validation + prints the operator workflow. docs/ops/auto-rotation.md covers systemd-timer + Vault and Kubernetes CronJob automation patterns around the helper.
What's New in v0.10.0 (GitHub App)
Released 2026-05-16. Adds a GitHubAppProvider sibling to the existing OAuth App GitHubProvider, with built-in user-to-server token refresh handling. Existing OAuth App and Google / OIDC deployments stay behaviour-compatible; the new provider is opt-in via env vars.
Why GitHub App on top of OAuth App?
- Fine-grained permissions -- GitHub Apps declare per-resource permissions, OAuth App scopes are coarser
- Installation isolation -- the App's footprint IS the allowlist; uninstalling the App from an org is an immediate hard revoke
- Short-lived user-to-server tokens -- 8h TTL with auto-rotating refresh tokens vs OAuth App's effectively permanent tokens
Configure
export COORDINATOR_GITHUB_APP_CLIENT_ID=Iv1.0123456789abcdef
export COORDINATOR_GITHUB_APP_CLIENT_SECRET=<from-app-settings>
# Optional: registry key (default "github-app")
export COORDINATOR_GITHUB_APP_NAME=acme-appBoth _ID + _SECRET are required together; partial config fails closed at boot. Co-exists with COORDINATOR_GITHUB_CLIENT_ID + _SECRET (OAuth App) -- the picker UI on /auth/login shows both entry points when both providers are registered. See docs/idp-providers.md for the full setup walkthrough.
Refresh-token recovery
On IdPTokenRevoked from /user/orgs at refresh-rotation time, the coordinator calls GitHubAppProvider.refreshIdpToken(refresh_token) to mint a fresh access token + rotated refresh token, persists both, and retries the membership check. A Tier 2 auth.idp.token_refreshed audit row captures the recovery. If refresh fails too -- existing Tier 1 auth.idp.token_revoked + 401 path.
Out of scope for v0.10.0
- App-as-itself installation token flow for membership queries (v0.10.x exploration; requires PEM private key provisioning)
- Webhook-driven membership cache invalidation (v1.0)
- IdP refresh-token replay detection (the coordinator's reuse logic covers ITS OWN refresh family only)
Testing
- 1700 tests passing (+45 vs v0.9.2)
- 19
GitHubAppProviderunit tests, 5 refresh-rotation recovery tests, 7 boot wiring tests, plus the shared HTTP transport refactor exercised by the 35 existing OAuth App tests
