@permissionbrick/auto-review-mcp
v0.1.4
Published
Local MCP server that orchestrates an auto-review loop between a developer agent and a reviewer agent.
Maintainers
Readme
auto-review-mcp
A local MCP server that orchestrates a continuous review loop between two coding agents: one developer and one reviewer. They both connect to this one long-running server at the same time and hand work back and forth through it — no human in the middle of the loop.
The main usecase is to have two different agents (and preferrably two different harnesses / models / subscriptions) split up between development and review, in order to avoid having the same bisases in the developer and reviewer. For example: Claude Code (Opus 4.8) as developer, and Codex (GPT 5.5) as reviewer, both running with the respective subscription.
developer agent auto-review server reviewer agent
─────────────── ────────────────── ──────────────
(works on a batch)
request_review(summary, ──────▶ stages all changes, diffs vs HEAD
commit_message) registers the batch, BLOCKS dev
wakes reviewer ──────────────────▶ get_next_review()
(returns summary + full diff)
reviews it…
◀────────────────────────────────── submit_review(approved
approved → git commit (dev's msg) | changes_requested,
◀────── {approved, commit_sha} changes → forward the issue issue, category)
or {changes_requested, issue}
(continue / fix & resubmit) get_next_review() …The protocol is taught entirely through the MCP tool descriptions, so the two agents self-orchestrate from the tools alone.
Install
Nothing to install. Add this one block to each agent's MCP config — it's fetched and run on
demand via npx:
{
"mcpServers": {
"auto-review": {
"command": "npx",
"args": ["-y", "@permissionbrick/auto-review-mcp"],
"env": { "AUTO_REVIEW_POLL_SECONDS": "240", "AUTO_REVIEW_WAIT_SECONDS": "600" },
"timeout": 1800000
}
}
}Run two agents — one Developer and one Reviewer — pointed at the same git repo, and
you're done. Per-client setup (Claude Code, Codex, HTTP) is in
Connect the two agents. Hacking on the server itself? git clone then
npm install (builds via the prepare script).
How it works
- One shared coordinator; the role is set by how you attach:
- No role (the default) →
…/both/mcp, which exposes all tools at once; each tool's description then tells the agent it must play one user-assigned role and use only that role's tools. Handy when you'd rather assign roles by prompt than maintain two configs. Pin a single role with--role/AUTO_REVIEW_ROLEto get just that role's tools, as before. - Alternative: Define each agent with a strict role via the mcp server --role parameter:
- Developer agent →
--role developer(tools:initialize_review_session,request_review,await_review,signal_complete,workflow_status) - Reviewer agent →
--role reviewer(tools:get_next_review,submit_review,workflow_status)
- Developer agent →
- No role (the default) →
- The developer names the repo at runtime. As its first step the developer agent calls
initialize_review_sessionwith the absolute path of the repo it's editing. The coordinator remembers it for its lifetime (or until called again). So there's nothing repo-specific to bake into config. (You can still pre-set it with--repo/AUTO_REVIEW_REPOif you prefer.) - Two ways to attach (see Connect the two agents):
- stdio (recommended) — each agent's
.mcp.jsonrunsnpx -y @permissionbrick/auto-review-mcp --role …, a thin proxy. The first proxy auto-starts a single shared background coordinator; both proxies forward to it. No server to start by hand. The shared state lives in the coordinator, not the agents. - HTTP — you start the coordinator yourself (
auto-review-server) and point each agent at its URL (good for remote agents or sharing one coordinator across machines).
- stdio (recommended) — each agent's
- Blocking handoffs.
request_reviewblocks until the reviewer rules;get_next_reviewblocks until a batch arrives. Waiting is event-driven, so the actual handoff is instant. Internally the coordinator holds each HTTP long-poll only up to a bounded poll window (AUTO_REVIEW_POLL_SECONDS, ≤ ~270 s); the stdio proxy quietly loops over those polls and only surfaceskeep_waitingto the agent after the wait window (AUTO_REVIEW_WAIT_SECONDS, default 600 s). Akeep_waitingreply quotes the shell poll command as the preferred way to resume (a foreground shell process can wait far longer than any MCP call); the MCP alternatives are the lightweightawait_reviewtool for the developer (just thebatch_id— no re-sending the summary) and callingget_next_reviewagain for the reviewer. This makes the wait effectively unbounded while surviving connection drops and client timeouts. See Tuning how long a call waits. - The server owns the commits. On approval the server runs
git add -A+git commitwith the developer's commit message (plus aReviewed-by: auto-reviewtrailer). HEAD advances, so each review's diff is naturally just the new batch. The developer never commits. - One batch at a time, identified by a
batch_id. The diff shown to the reviewer is the full unified diff of the working tree vs HEAD (new/deleted files included).
Connect the two agents
Launch two Claude Code instances — one developer, one reviewer. Both must point at the same git repository the developer edits (it must be a git repo, not this server's directory).
Option A — node/stdio (recommended)
Each agent's .mcp.json runs the stdio proxy via npx; the first one auto-starts the shared
coordinator. Nothing to install or run by hand, and no repo path in config — the developer agent
declares it at runtime via initialize_review_session. Sample configs are in
configs/:
// configs/developer.mcp.json (reviewer.mcp.json is identical with --role reviewer)
{
"mcpServers": {
"auto-review": {
"command": "npx",
"args": ["-y", "@permissionbrick/auto-review-mcp", "--role", "developer"],
"env": { "AUTO_REVIEW_POLL_SECONDS": "240", "AUTO_REVIEW_WAIT_SECONDS": "600" },
"timeout": 1800000
}
}
}Tip: drop the
"--role", "developer"argument to give one agent every tool and assign its role in the prompt instead (it attaches to the combined/bothendpoint). Keep--roleto pin developer vs reviewer as above.
The shipped configs use a 240 s poll window (AUTO_REVIEW_POLL_SECONDS, the practical max per
single HTTP hold) and a 600 s agent-facing wait window (AUTO_REVIEW_WAIT_SECONDS): the proxy
loops over coordinator polls internally, so an idle agent is only re-prompted with keep_waiting
every ~10 min — and resuming costs just an await_review(batch_id) / get_next_review() call.
Handoffs are still instant. (See Tuning how long a call waits.)
# Developer instance (point at the config above, e.g. configs/developer.mcp.json)
claude --mcp-config configs/developer.mcp.json --strict-mcp-config
# Reviewer instance (separate terminal)
claude --mcp-config configs/reviewer.mcp.json --strict-mcp-config--strict-mcp-config makes each instance load only that file. No launch-time env is needed — the
config's timeout field raises Claude Code's per-call cap by itself.
Proxy env / args (also accepted as --flags in args): AUTO_REVIEW_ROLE, AUTO_REVIEW_PORT
(default 8765), AUTO_REVIEW_HOST (default 127.0.0.1), AUTO_REVIEW_POLL_SECONDS (default
240, clamped to ≤ 270), AUTO_REVIEW_WAIT_SECONDS (default 600 — how long the proxy waits,
looping over coordinator polls, before returning keep_waiting to the agent),
AUTO_REVIEW_MAX_DIFF_BYTES (default 200000), and
optionally AUTO_REVIEW_REPO to pre-set the repo instead of using initialize_review_session. The
coordinator logs to ${TMPDIR}/auto-review-coordinator-<port>.log and keeps running after the
agents exit (kill it via that port if you want a clean reset — also needed to apply a changed poll
window, see below).
Option B — HTTP (manual start / remote agents)
Start the coordinator yourself and point each agent at its URL (configs:
configs/*.http.mcp.json):
# start the shared HTTP coordinator (no install needed via npx; or use the global bin)
npx -y -p @permissionbrick/auto-review-mcp auto-review-server --port 8765 --poll-seconds 240 # keep ≤ ~270; optional: --repo /path
claude --mcp-config configs/developer.http.mcp.json --strict-mcp-config
claude --mcp-config configs/reviewer.http.mcp.json --strict-mcp-configThe HTTP configs carry "timeout": 3600000, so no launch env is needed here either. Server flags
(env var in parens): --repo (AUTO_REVIEW_REPO, optional — else the developer sets it via
initialize_review_session), --port (AUTO_REVIEW_PORT, 8765), --host (AUTO_REVIEW_HOST,
0.0.0.0 — reachable as agent-vm.mshome.net), --poll-seconds (AUTO_REVIEW_POLL_SECONDS,
1500), --max-diff-bytes (AUTO_REVIEW_MAX_DIFF_BYTES, 200000). GET /healthz returns a JSON
snapshot of the workflow.
Tuning how long a call waits
get_next_review, request_review, and await_review block in one agent-visible call for up
to the wait window, then return keep_waiting; the agent resumes via the shell poll command
quoted in that reply, or by re-calling over MCP (await_review with the batch_id on the
developer side; get_next_review again on the reviewer side). Three knobs bound that call:
| Knob | Where | Effect |
|------|-------|--------|
| AUTO_REVIEW_WAIT_SECONDS | config env (stdio proxy only) | how long the proxy waits — looping over coordinator polls — before returning keep_waiting to the agent. Default 600. |
| AUTO_REVIEW_POLL_SECONDS | config env (stdio) / --poll-seconds (HTTP) | how long the coordinator holds one internal HTTP long-poll. Must stay under ~270 s (see below); the stdio proxy clamps it. Invisible to the agent in stdio mode. |
| timeout (ms) | per-server field in .mcp.json | Claude Code's per-call cap (default 60000). Must be ≥ wait window + poll window + margin (the shipped 1800000 covers the defaults comfortably). Overrides MCP_TOOL_TIMEOUT; not extended by progress. |
Handoffs themselves are event-driven (instant) regardless — the windows only set how long an idle waiter holds before re-polling.
⚠️ Hard ~5-minute ceiling on a single HTTP hold. Every MCP client here is Node-based (the stdio proxy and the poller CLI both use Node's
fetch/undici, whose defaultheadersTimeoutis 300 s), so a long-poll held longer than ~5 min dies asfetch failed. Worse, the abandoned request leaves a server-side waiter that could swallow a batch. So keepAUTO_REVIEW_POLL_SECONDS≤ ~270 s (the shipped configs use 240). You cannot get a longer single HTTP hold by raising the poll window — longer agent-facing waits come from looping over polls instead: the stdio proxy does this up toAUTO_REVIEW_WAIT_SECONDS, and the poller CLI does the same for shell waits (see the Codex section); neither fails fatally on a hiccup.
Two more caveats:
- The coordinator is a singleton. Changing the poll window only takes effect on a fresh
coordinator — kill the running one (
fuser -k <port>/tcp, orkillthe pid on that port) so the next agent restarts it with the new value. Both agents should use the same window. - Across machines (HTTP transport), a dropped connection isn't retried until the next poll; prefer a shorter window there too.
Using Codex instead of Claude Code
Codex (OpenAI) works as the client, but it's the opposite of Claude Code on timeouts:
- It reads
~/.codex/config.toml, not.mcp.json, and ignores thetimeoutfield. - All Codex harnesses (CLI, VS Code extension, Windows app) enforce a hard ~120 s
awaiting tools/calldeadline thattool_timeout_secdoes not reliably raise (openai/codex#13831). So unlike Claude Code, a single blocking call cannot exceed ~120 s on Codex.
Therefore, do the reverse of the Claude tuning: keep both windows safely under 120 s so
the proxy returns keep_waiting before Codex gives up, and the agent re-polls. 90 is a good
value. See configs/codex.config.toml:
[mcp_servers.auto-review]
command = "npx"
args = ["-y", "@permissionbrick/auto-review-mcp", "--role", "developer"] # reviewer in its own config
env = { AUTO_REVIEW_POLL_SECONDS = "90", AUTO_REVIEW_WAIT_SECONDS = "90" } # MUST be < Codex's ~120 s ceiling
startup_timeout_sec = 30
tool_timeout_sec = 110Handoffs are still instant — the windows only set how often an idle waiter re-polls. The
singleton-coordinator rule still applies: both agents need the same AUTO_REVIEW_POLL_SECONDS, and
you must kill any running coordinator for a changed poll window to take effect
(AUTO_REVIEW_WAIT_SECONDS lives in each proxy, so restarting the agent is enough for that one).
Avoiding re-polls on Codex: the shell poll command
Re-polling under 120 s works but is chatty. To get long, quiet waits anyway, there's a built-in
escape hatch: an MCP tool call is capped at ~120 s, but a shell command isn't. When
get_next_review / request_review is killed by Codex's timed out awaiting tools/call error, the
agent never receives a result — so the tool descriptions (not the responses) tell it to fall
back to a blocking shell command that waits without the limit and prints the same JSON:
# the tool description already prints the exact command (absolute paths, ready to paste); by name:
npx -y -p @permissionbrick/auto-review-mcp auto-review-cli next-review --port 8765 --timeout 1500 # reviewer
npx -y -p @permissionbrick/auto-review-mcp auto-review-cli await-verdict --port 8765 --timeout 1500 # developerThese hit a plain-HTTP long-poll endpoint on the coordinator, hide keep_waiting internally, and
block until there's a real result (give the command a long command timeout, ~25 min). They're not
just for timeout errors: a normal keep_waiting reply quotes the matching command (with absolute
paths) as the preferred way to keep waiting, since one foreground shell wait replaces many MCP
re-polls.
With this fallback you can even set a long poll window (so the MCP call carries the fast case
within 120 s and the shell command carries longer waits). Verify it with npm run demo:cli.
Suggested prompts
The tools are self-describing, so the prompts can be short.
Developer:
You are the developer. First call
initialize_review_session(theauto-reviewMCP) with the absolute path of this repo. Then implement in small, self-contained batches. After each batch, callrequest_reviewwith a clear summary and commit message, and follow whatever it returns: onchanges_requested, fix the issue and resubmit; onapproved, continue with the next batch; onkeep_waiting, callawait_reviewwith the returnedbatch_idto keep waiting. Do not rungit commityourself. When the whole task is done and the last batch is approved, callsignal_complete.
Reviewer:
You are the reviewer. Repeatedly call
get_next_review(theauto-reviewMCP) to receive each batch (summary + full diff vs HEAD), review it against <the task/spec> and for code quality, then callsubmit_review—approved, orchanges_requestedwith a clearissueandcategory(spec/code). Keep looping until you getworkflow_complete.
On Codex, also allow shell commands and add: "If a get_next_review/request_review call ever
fails with timed out awaiting tools/call, follow that tool's CLIENT-TIMEOUT FALLBACK instructions
— run the shell command it names to wait, then continue."
Verify
End-to-end tests that spin up a throwaway git repo and drive the full loop (keep_waiting → submit → review → changes_requested → fix → approve → commit → complete):
npm run build
npm run demo # HTTP path: two SDK clients against a running server
npm run demo:stdio # node/stdio path: two spawned proxies + auto-started coordinator
npm run demo:cli # shell poll-command path: MCP for instant ops + cli.js for the blocking waitEach prints a checklist and exits non-zero if anything fails.
Notes & limitations (v1)
- State is in-memory. Restarting the server resets the loop (no batch is mid-flight unless an agent is actively blocked). There is no persistence yet.
- One developer + one reviewer. A single batch flows at a time; extra connections share the same state.
- Commit hooks are respected. If a pre-commit hook rejects an approved batch, the reviewer gets
an error and the batch stays open to retry or send back as
changes_requested.
