@tvgaming.net/mcpverify
v0.1.0
Published
MCP server + stop hook for Claude Code that enforces independent review before an agent can claim work is done.
Downloads
143
Maintainers
Readme
mcpverify
An MCP server that prevents AI agents from claiming work is done when it isn't. It creates a verification checkpoint: agents register tasks before starting, then must pass an independent review before they can stop.
The Mandatory Workflow
For every task of medium or higher complexity, the agent must follow this exact sequence:
- Explain your approach — describe what you plan to do and get user confirmation before writing any code
- Register the task — call
todoWritewith a detailed markdown description (acceptance criteria, expected files, constraints) - Implement — do the work
- Verify — call
todoCompleteAndVerifywith the task_id - If NOK — read the feedback, fix the issues, call
todoCompleteAndVerifyagain until it passes - Only report done after OK — never tell the user work is complete until the reviewer confirms it
This is enforced by a Stop hook: the agent literally cannot end the conversation with unverified tasks.
The Problem
AI coding agents confidently report "done!" without actually finishing the work. They skip steps, miss requirements, or introduce bugs — and you don't find out until you review the diff yourself. The longer the task, the worse this gets.
How It Works
mcpverify adds two MCP tools to your agent's toolbox and a Stop hook that blocks the conversation from ending until all registered tasks pass review.
Happy path (task passes review)
sequenceDiagram
participant A as Agent (Claude Code)
participant S as mcpverify Server
participant D as tasks-state.json
participant R as Reviewer (claude --print)
participant H as Stop Hook
A->>S: todoWrite({ description: "## Add OAuth..." })
S->>D: persist { mcpv-a3f8b2: verified: false }
S-->>A: { task_id: "mcpv-a3f8b2" }
Note over A: Agent implements the task<br/>writes code, runs builds & tests
A->>S: todoCompleteAndVerify("mcpv-a3f8b2")
activate S
Note over S: git diff HEAD<br/>git diff --name-only HEAD
S->>R: stdin: task description + git diff
activate R
R-->>S: { "status": "OK" }
deactivate R
S->>D: update { mcpv-a3f8b2: verified: true }
S-->>A: { status: "OK" }
deactivate S
A->>H: Agent tries to stop
H->>D: read tasks-state.json
D-->>H: mcpv-a3f8b2: verified ✓
H-->>A: exit 0 (allowed)
Note over A: Conversation endsFailure path (task fails review, agent must fix)
sequenceDiagram
participant A as Agent (Claude Code)
participant S as mcpverify Server
participant D as tasks-state.json
participant R as Reviewer (claude --print)
participant H as Stop Hook
A->>S: todoWrite({ description: "## Add OAuth..." })
S->>D: persist { mcpv-a3f8b2: verified: false }
S-->>A: { task_id: "mcpv-a3f8b2" }
Note over A: Agent implements (but misses a requirement)
A->>S: todoCompleteAndVerify("mcpv-a3f8b2")
activate S
S->>R: stdin: task description + git diff
activate R
R-->>S: { "status": "NOK", "feedback": "No /auth/callback route" }
deactivate R
S-->>A: { status: "NOK", feedback: "No /auth/callback route" }
deactivate S
Note over A: Agent reads feedback, adds the missing route
A->>S: todoCompleteAndVerify("mcpv-a3f8b2")
activate S
S->>R: stdin: task description + updated git diff
activate R
R-->>S: { "status": "OK" }
deactivate R
S->>D: update { mcpv-a3f8b2: verified: true }
S-->>A: { status: "OK" }
deactivate S
A->>H: Agent tries to stop
H->>D: read tasks-state.json
D-->>H: mcpv-a3f8b2: verified ✓
H-->>A: exit 0 (allowed)What if the agent skips verification entirely?
sequenceDiagram
participant A as Agent (Claude Code)
participant S as mcpverify Server
participant D as tasks-state.json
participant H as Stop Hook
A->>S: todoWrite({ description: "## Add OAuth..." })
S->>D: persist { mcpv-a3f8b2: verified: false }
S-->>A: { task_id: "mcpv-a3f8b2" }
Note over A: Agent implements but never calls verify
A->>H: Agent tries to stop
H->>D: read tasks-state.json
D-->>H: mcpv-a3f8b2: verified ✗
H-->>A: exit 2 — BLOCKED
Note over A: "BLOCKED: 1 unverified task(s).<br/>Call todoCompleteAndVerify for: mcpv-a3f8b2"
Note over A: Agent is forced back into the loopThe reviewer is a separate, independent agent — it has no memory of the working agent's conversation. It only sees the task description and the git diff. This makes it hard to game: the working agent can't talk its way past the reviewer.
Setup
1. Add the MCP server to your Claude Code settings
In your project's .mcp.json (or .claude/settings.local.json), add mcpverify as an MCP server:
{
"mcpServers": {
"mcpverify": {
"command": "npx",
"args": ["-y", "@tvgaming.net/mcpverify", "mcpverify-server"]
}
}
}npx -y will install the package on first use and cache it for subsequent runs — no manual install needed.
2. Configure the reviewer (optional)
The defaults in the shipped config.json work out of the box:
{
"reviewer": {
"command": "claude",
"args": ["--print", "--model", "sonnet"],
"timeout_ms": 120000
}
}At runtime, the project directories are resolved in this order:
"project_dirs": ["/abs/path/a", "/abs/path/b"]in config (authoritative, for multi-repo workspaces you want to pin).- MCP
roots/listresponse from the client (Claude Code sends its workspace folders, including "additional working directories" — you get multi-repo review for free). "project_dir": "/abs/path"in config (legacy, single repo).$CLAUDE_PROJECT_DIR→process.cwd()(single repo, auto-detected).
When multiple repos are resolved, the reviewer receives one <REPO path="..."> block per repo, each with its own <CHANGED_FILES> and <GIT_DIFF>. A change satisfying the task in any listed repo counts.
The reviewer can be any CLI that reads a prompt from stdin and writes a JSON response to stdout. The default uses claude --print (Claude Code in non-interactive mode) with Sonnet for fast, cheap reviews. You could swap it for any LLM CLI, a custom script, or even a human-in-the-loop checker. On first use the command is resolved to an absolute path via which/where to reduce PATH-hijack risk.
3. Add the Stop hook (the enforcement mechanism)
This is what prevents the agent from simply ignoring verification. Add to your project's .claude/settings.local.json:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "npx -y @tvgaming.net/mcpverify mcpverify-check",
"timeout": 300
}
]
}
]
}
}The Stop hook fires every time the agent tries to end the conversation. It reads the session transcript, finds any task IDs that were created during this session, and checks if they're all verified. If any are unverified, it exits with code 2, which blocks the agent from stopping and forces it to address the outstanding tasks.
The Workflow in Practice
Here's how this integrates into a real development workflow:
Step 1: Agent explains approach, user confirms
The agent describes what it plans to do. No code yet — just alignment.
Step 2: Agent registers the task
After confirmation, the agent calls todoWrite with a detailed markdown description — not a one-liner. This description is the only thing the reviewer sees, so it needs to be thorough:
## Add OAuth Login
### Requirements
- Login page with Google OAuth button
- /auth/callback route handling token exchange
- Session stored in httpOnly cookie
### Acceptance Criteria
- User can click "Sign in with Google" and complete the flow
- Token persisted across page refreshes
- Redirect to /dashboard after successful login
### Files Expected
- src/routes/auth.ts (new)
- src/middleware/session.ts (modified)
- src/pages/login.tsx (new)The server returns a task_id like mcpv-a3f8b2.
Step 3: Agent implements
Normal coding — writing files, running builds, fixing tests.
Step 4: Agent verifies
The agent calls todoCompleteAndVerify("mcpv-a3f8b2"). Behind the scenes:
- Server runs
git diff HEADandgit diff --name-only HEADagainst the project directory - Server builds a prompt containing the task description + diff + changed files
- Server spawns the reviewer CLI (
claude --print --model sonnetby default) - Reviewer reads the prompt, compares requirements against actual changes
- Returns
{"status": "OK"}or{"status": "NOK", "feedback": "..."}
If NOK, the feedback is specific: "OAuth callback route missing. No /auth/callback in router." The agent reads it, fixes the issue, and calls verify again.
Step 5: Agent tries to stop
When the agent finishes and attempts to end the conversation, the Stop hook fires. It:
- Reads the session transcript (Claude Code provides
transcript_pathvia stdin) - Searches for any
mcpv-*task IDs mentioned in the transcript - Cross-references them against
tasks-state.json - If any are unverified → exit code 2 → agent is blocked from stopping
The agent sees: BLOCKED: This session has 1 unverified task(s). Call todoCompleteAndVerify for: mcpv-a3f8b2
It has no choice but to complete the verification loop.
Architecture
mcpverify/
├── server.js # MCP server — handles todoWrite + todoCompleteAndVerify
├── reviewer.js # Builds the reviewer prompt, spawns the reviewer CLI, parses output
├── transcript.js # Git diff/changed-files helpers (array-form spawn, no shell)
├── check-verified.cjs # Stop hook script (CommonJS — runs outside the MCP server)
├── config.json # Reviewer CLI config (project_dir is auto-detected)
└── tasks-state.json # Persisted task state (survives server restarts; gitignored)State persistence
Tasks are stored in memory and persisted to tasks-state.json on every write. If the MCP server restarts mid-session (which happens), the state is rehydrated from disk. The Stop hook reads this same file directly — it doesn't go through the MCP server.
Why the Stop hook is CommonJS
The Stop hook (check-verified.cjs) runs as a standalone Node.js script invoked by Claude Code's hook system — not as part of the MCP server process. It's CommonJS because it needs to be a simple, fast, dependency-free script that reads a JSON file and greps a transcript. No module resolution complexity.
Key Design Decisions
The reviewer is blind. It only sees the task description and the git diff. It doesn't see the conversation, the agent's reasoning, or any "I already tested this" claims. This is intentional — the verification must be based on evidence (the diff), not persuasion.
Task descriptions are the contract. A vague description like "fix the login bug" will get rubber-stamped because the reviewer can't verify what it can't see. Detailed descriptions with acceptance criteria produce meaningful reviews. The quality of verification is directly proportional to the quality of the task description.
The hook is the enforcement. Without the Stop hook, the agent could simply skip calling todoCompleteAndVerify. The hook makes verification mandatory — the agent physically cannot end the conversation with outstanding unverified tasks.
The reviewer is configurable. The config.json pattern means you can swap reviewers without changing server code. Use a fast model for quick checks, a capable model for complex reviews, or a custom script that runs your test suite.
Emergency Bail-Out: todoClear
Sometimes a verification loop genuinely breaks — the reviewer CLI is offline, the git diff is too large to review, or the task was registered against a workspace that no longer exists. When that happens, the Stop hook keeps blocking, and the agent cannot end the conversation on its own. todoClear is the escape hatch.
How to arm it
Set a password in config.json:
{
"reviewer": { "...": "..." },
"clear_password": "pick-something-memorable"
}Until clear_password is set, todoClear is disabled and returns an error — the feature is opt-in.
How to use it
Takes a list of task_ids plus the password. Each ID that exists in the state is deleted; IDs not found are returned in not_found. The agent always has the task_ids it needs — the Stop hook lists them in its BLOCKED message, and the agent received them from its own todoWrite calls.
todoClear({
task_ids: ["mcpv-abc123", "mcpv-def456"],
password: "pick-something-memorable"
})
// → { status: "OK", cleared: ["mcpv-abc123", "mcpv-def456"], not_found: [] }Security model
The password is stored in plaintext on disk — this is intentional. The protection is not cryptographic; it's that an agent cannot clear tasks unless a human tells it the password. An agent that hasn't been told the password would have to go looking for config.json inside the npx install cache, which it will not do under normal operation. If you need stronger protection (a determined adversarial agent that will read arbitrary files), this is not the right tool.
Security Considerations
Before adopting mcpverify in a project, understand the trust model:
⚠️ The reviewer sees untracked files — keep secrets out of your working tree
To let the reviewer evaluate brand-new files the agent just created, transcript.js runs git ls-files --others --exclude-standard and feeds the full content of each untracked file into the reviewer's prompt. Anything piped to the reviewer is sent to an external LLM (by default, Claude via claude --print).
Consequences:
- If you have an untracked
.env,credentials.json, private key, or cloud-config file in the project directory, its contents will be shipped to the reviewer LLM every timetodoCompleteAndVerifyruns. - mcpverify heuristically skips files matching common secret patterns (
.env*,*.pem,*.key,id_rsa,secrets/..., etc.), and size-caps each untracked file at 256 KB, but these are best-effort filters — they are not a substitute for.gitignore.
Mitigations:
- Add any file containing secrets to
.gitignorebefore you touch the repo with an mcpverify-enabled agent. - Review the "CHANGED FILES" output of a few review calls (
server.loglogs it) to confirm nothing sensitive is being transmitted. - If you must keep a secret in the working tree without committing it, place it outside the project directory or name it to match one of the skipped patterns.
The reviewer is advisory, not tamper-proof
The working agent controls what goes into the task description and the code that produces the git diff. A sufficiently motivated agent could write prompt-injection payloads into either — the reviewer prompt wraps those inputs in <TASK_DESCRIPTION> / <GIT_DIFF> tags and tells the reviewer to treat them as data, but this is a soft boundary. Treat the reviewer's OK as "one layer of evidence the work is done", not "the work is provably complete".
Hook fail-closed policy
If the Stop hook crashes, cannot read the state file, or receives malformed stdin, it exits 2 (block) — not 0 (allow). An mcpverify that fails open would violate its own guarantee. If the hook becomes flaky on your machine, fix the root cause rather than working around it; silently allowing stops defeats the tool.
What mcpverify does NOT protect against
- Supply-chain risk from whatever reviewer CLI you configure (
config.reviewer.command). Anyone who can write to the install directory'sconfig.jsoncan execute arbitrary binaries whenever verification runs. - Malicious MCP servers, malicious agents running with full filesystem access, or any threat that Claude Code itself does not defend against.
- Data egress from the reviewer LLM. Your task descriptions and diffs are sent to whichever provider your reviewer CLI talks to.
