@tvgaming.net/mcpverify

v0.1.0

Published

19 days ago

MCP server + stop hook for Claude Code that enforces independent review before an agent can claim work is done.

Downloads

143

0High
0Medium
0Low

tvgaming.net

mcp claude claude-code verification hook agent

mcpverify

An MCP server that prevents AI agents from claiming work is done when it isn't. It creates a verification checkpoint: agents register tasks before starting, then must pass an independent review before they can stop.

The Mandatory Workflow

For every task of medium or higher complexity, the agent must follow this exact sequence:

Explain your approach — describe what you plan to do and get user confirmation before writing any code
Register the task — call todoWrite with a detailed markdown description (acceptance criteria, expected files, constraints)
Implement — do the work
Verify — call todoCompleteAndVerify with the task_id
If NOK — read the feedback, fix the issues, call todoCompleteAndVerify again until it passes
Only report done after OK — never tell the user work is complete until the reviewer confirms it

This is enforced by a Stop hook: the agent literally cannot end the conversation with unverified tasks.

The Problem

AI coding agents confidently report "done!" without actually finishing the work. They skip steps, miss requirements, or introduce bugs — and you don't find out until you review the diff yourself. The longer the task, the worse this gets.

How It Works

mcpverify adds two MCP tools to your agent's toolbox and a Stop hook that blocks the conversation from ending until all registered tasks pass review.

Happy path (task passes review)

sequenceDiagram
    participant A as Agent (Claude Code)
    participant S as mcpverify Server
    participant D as tasks-state.json
    participant R as Reviewer (claude --print)
    participant H as Stop Hook

    A->>S: todoWrite({ description: "## Add OAuth..." })
    S->>D: persist { mcpv-a3f8b2: verified: false }
    S-->>A: { task_id: "mcpv-a3f8b2" }

    Note over A: Agent implements the task<br/>writes code, runs builds & tests

    A->>S: todoCompleteAndVerify("mcpv-a3f8b2")
    activate S
    Note over S: git diff HEAD<br/>git diff --name-only HEAD
    S->>R: stdin: task description + git diff
    activate R
    R-->>S: { "status": "OK" }
    deactivate R
    S->>D: update { mcpv-a3f8b2: verified: true }
    S-->>A: { status: "OK" }
    deactivate S

    A->>H: Agent tries to stop
    H->>D: read tasks-state.json
    D-->>H: mcpv-a3f8b2: verified ✓
    H-->>A: exit 0 (allowed)
    Note over A: Conversation ends

Failure path (task fails review, agent must fix)

sequenceDiagram
    participant A as Agent (Claude Code)
    participant S as mcpverify Server
    participant D as tasks-state.json
    participant R as Reviewer (claude --print)
    participant H as Stop Hook

    A->>S: todoWrite({ description: "## Add OAuth..." })
    S->>D: persist { mcpv-a3f8b2: verified: false }
    S-->>A: { task_id: "mcpv-a3f8b2" }

    Note over A: Agent implements (but misses a requirement)

    A->>S: todoCompleteAndVerify("mcpv-a3f8b2")
    activate S
    S->>R: stdin: task description + git diff
    activate R
    R-->>S: { "status": "NOK", "feedback": "No /auth/callback route" }
    deactivate R
    S-->>A: { status: "NOK", feedback: "No /auth/callback route" }
    deactivate S

    Note over A: Agent reads feedback, adds the missing route

    A->>S: todoCompleteAndVerify("mcpv-a3f8b2")
    activate S
    S->>R: stdin: task description + updated git diff
    activate R
    R-->>S: { "status": "OK" }
    deactivate R
    S->>D: update { mcpv-a3f8b2: verified: true }
    S-->>A: { status: "OK" }
    deactivate S

    A->>H: Agent tries to stop
    H->>D: read tasks-state.json
    D-->>H: mcpv-a3f8b2: verified ✓
    H-->>A: exit 0 (allowed)

What if the agent skips verification entirely?

sequenceDiagram
    participant A as Agent (Claude Code)
    participant S as mcpverify Server
    participant D as tasks-state.json
    participant H as Stop Hook

    A->>S: todoWrite({ description: "## Add OAuth..." })
    S->>D: persist { mcpv-a3f8b2: verified: false }
    S-->>A: { task_id: "mcpv-a3f8b2" }

    Note over A: Agent implements but never calls verify

    A->>H: Agent tries to stop
    H->>D: read tasks-state.json
    D-->>H: mcpv-a3f8b2: verified ✗
    H-->>A: exit 2 — BLOCKED
    Note over A: "BLOCKED: 1 unverified task(s).<br/>Call todoCompleteAndVerify for: mcpv-a3f8b2"
    Note over A: Agent is forced back into the loop

The reviewer is a separate, independent agent — it has no memory of the working agent's conversation. It only sees the task description and the git diff. This makes it hard to game: the working agent can't talk its way past the reviewer.

Setup

1. Add the MCP server to your Claude Code settings

In your project's .mcp.json (or .claude/settings.local.json), add mcpverify as an MCP server:

{
  "mcpServers": {
    "mcpverify": {
      "command": "npx",
      "args": ["-y", "@tvgaming.net/mcpverify", "mcpverify-server"]
    }
  }
}

npx -y will install the package on first use and cache it for subsequent runs — no manual install needed.

2. Configure the reviewer (optional)

The defaults in the shipped config.json work out of the box:

{
  "reviewer": {
    "command": "claude",
    "args": ["--print", "--model", "sonnet"],
    "timeout_ms": 120000
  }
}

At runtime, the project directories are resolved in this order:

"project_dirs": ["/abs/path/a", "/abs/path/b"] in config (authoritative, for multi-repo workspaces you want to pin).
MCP roots/list response from the client (Claude Code sends its workspace folders, including "additional working directories" — you get multi-repo review for free).
"project_dir": "/abs/path" in config (legacy, single repo).
$CLAUDE_PROJECT_DIR → process.cwd() (single repo, auto-detected).

When multiple repos are resolved, the reviewer receives one <REPO path="..."> block per repo, each with its own <CHANGED_FILES> and <GIT_DIFF>. A change satisfying the task in any listed repo counts.

The reviewer can be any CLI that reads a prompt from stdin and writes a JSON response to stdout. The default uses claude --print (Claude Code in non-interactive mode) with Sonnet for fast, cheap reviews. You could swap it for any LLM CLI, a custom script, or even a human-in-the-loop checker. On first use the command is resolved to an absolute path via which/where to reduce PATH-hijack risk.

3. Add the Stop hook (the enforcement mechanism)

This is what prevents the agent from simply ignoring verification. Add to your project's .claude/settings.local.json:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "npx -y @tvgaming.net/mcpverify mcpverify-check",
            "timeout": 300
          }
        ]
      }
    ]
  }
}

The Stop hook fires every time the agent tries to end the conversation. It reads the session transcript, finds any task IDs that were created during this session, and checks if they're all verified. If any are unverified, it exits with code 2, which blocks the agent from stopping and forces it to address the outstanding tasks.

The Workflow in Practice

Here's how this integrates into a real development workflow:

Step 1: Agent explains approach, user confirms

The agent describes what it plans to do. No code yet — just alignment.

Step 2: Agent registers the task

After confirmation, the agent calls todoWrite with a detailed markdown description — not a one-liner. This description is the only thing the reviewer sees, so it needs to be thorough:

## Add OAuth Login

### Requirements
- Login page with Google OAuth button
- /auth/callback route handling token exchange
- Session stored in httpOnly cookie

### Acceptance Criteria
- User can click "Sign in with Google" and complete the flow
- Token persisted across page refreshes
- Redirect to /dashboard after successful login

### Files Expected
- src/routes/auth.ts (new)
- src/middleware/session.ts (modified)
- src/pages/login.tsx (new)

The server returns a task_id like mcpv-a3f8b2.

Step 3: Agent implements

Normal coding — writing files, running builds, fixing tests.

Step 4: Agent verifies

The agent calls todoCompleteAndVerify("mcpv-a3f8b2"). Behind the scenes:

Server runs git diff HEAD and git diff --name-only HEAD against the project directory
Server builds a prompt containing the task description + diff + changed files
Server spawns the reviewer CLI (claude --print --model sonnet by default)
Reviewer reads the prompt, compares requirements against actual changes
Returns {"status": "OK"} or {"status": "NOK", "feedback": "..."}

If NOK, the feedback is specific: "OAuth callback route missing. No /auth/callback in router." The agent reads it, fixes the issue, and calls verify again.

Step 5: Agent tries to stop

When the agent finishes and attempts to end the conversation, the Stop hook fires. It:

Reads the session transcript (Claude Code provides transcript_path via stdin)
Searches for any mcpv-* task IDs mentioned in the transcript
Cross-references them against tasks-state.json
If any are unverified → exit code 2 → agent is blocked from stopping

The agent sees: BLOCKED: This session has 1 unverified task(s). Call todoCompleteAndVerify for: mcpv-a3f8b2

It has no choice but to complete the verification loop.

Architecture

mcpverify/
├── server.js           # MCP server — handles todoWrite + todoCompleteAndVerify
├── reviewer.js         # Builds the reviewer prompt, spawns the reviewer CLI, parses output
├── transcript.js       # Git diff/changed-files helpers (array-form spawn, no shell)
├── check-verified.cjs  # Stop hook script (CommonJS — runs outside the MCP server)
├── config.json         # Reviewer CLI config (project_dir is auto-detected)
└── tasks-state.json    # Persisted task state (survives server restarts; gitignored)

State persistence

Tasks are stored in memory and persisted to tasks-state.json on every write. If the MCP server restarts mid-session (which happens), the state is rehydrated from disk. The Stop hook reads this same file directly — it doesn't go through the MCP server.

Why the Stop hook is CommonJS

The Stop hook (check-verified.cjs) runs as a standalone Node.js script invoked by Claude Code's hook system — not as part of the MCP server process. It's CommonJS because it needs to be a simple, fast, dependency-free script that reads a JSON file and greps a transcript. No module resolution complexity.

Key Design Decisions

The reviewer is blind. It only sees the task description and the git diff. It doesn't see the conversation, the agent's reasoning, or any "I already tested this" claims. This is intentional — the verification must be based on evidence (the diff), not persuasion.

Task descriptions are the contract. A vague description like "fix the login bug" will get rubber-stamped because the reviewer can't verify what it can't see. Detailed descriptions with acceptance criteria produce meaningful reviews. The quality of verification is directly proportional to the quality of the task description.

The hook is the enforcement. Without the Stop hook, the agent could simply skip calling todoCompleteAndVerify. The hook makes verification mandatory — the agent physically cannot end the conversation with outstanding unverified tasks.

The reviewer is configurable. The config.json pattern means you can swap reviewers without changing server code. Use a fast model for quick checks, a capable model for complex reviews, or a custom script that runs your test suite.

Emergency Bail-Out: `todoClear`

Sometimes a verification loop genuinely breaks — the reviewer CLI is offline, the git diff is too large to review, or the task was registered against a workspace that no longer exists. When that happens, the Stop hook keeps blocking, and the agent cannot end the conversation on its own. todoClear is the escape hatch.

How to arm it

Set a password in config.json:

{
  "reviewer": { "...": "..." },
  "clear_password": "pick-something-memorable"
}

Until clear_password is set, todoClear is disabled and returns an error — the feature is opt-in.

How to use it

Takes a list of task_ids plus the password. Each ID that exists in the state is deleted; IDs not found are returned in not_found. The agent always has the task_ids it needs — the Stop hook lists them in its BLOCKED message, and the agent received them from its own todoWrite calls.

todoClear({
  task_ids: ["mcpv-abc123", "mcpv-def456"],
  password: "pick-something-memorable"
})
// → { status: "OK", cleared: ["mcpv-abc123", "mcpv-def456"], not_found: [] }

Security model

The password is stored in plaintext on disk — this is intentional. The protection is not cryptographic; it's that an agent cannot clear tasks unless a human tells it the password. An agent that hasn't been told the password would have to go looking for config.json inside the npx install cache, which it will not do under normal operation. If you need stronger protection (a determined adversarial agent that will read arbitrary files), this is not the right tool.

Security Considerations

Before adopting mcpverify in a project, understand the trust model:

⚠️ The reviewer sees untracked files — keep secrets out of your working tree

To let the reviewer evaluate brand-new files the agent just created, transcript.js runs git ls-files --others --exclude-standard and feeds the full content of each untracked file into the reviewer's prompt. Anything piped to the reviewer is sent to an external LLM (by default, Claude via claude --print).

Consequences:

If you have an untracked .env, credentials.json, private key, or cloud-config file in the project directory, its contents will be shipped to the reviewer LLM every time todoCompleteAndVerify runs.
mcpverify heuristically skips files matching common secret patterns (.env*, *.pem, *.key, id_rsa, secrets/..., etc.), and size-caps each untracked file at 256 KB, but these are best-effort filters — they are not a substitute for .gitignore.

Mitigations:

Add any file containing secrets to .gitignore before you touch the repo with an mcpverify-enabled agent.
Review the "CHANGED FILES" output of a few review calls (server.log logs it) to confirm nothing sensitive is being transmitted.
If you must keep a secret in the working tree without committing it, place it outside the project directory or name it to match one of the skipped patterns.

The reviewer is advisory, not tamper-proof

The working agent controls what goes into the task description and the code that produces the git diff. A sufficiently motivated agent could write prompt-injection payloads into either — the reviewer prompt wraps those inputs in <TASK_DESCRIPTION> / <GIT_DIFF> tags and tells the reviewer to treat them as data, but this is a soft boundary. Treat the reviewer's OK as "one layer of evidence the work is done", not "the work is provably complete".

Hook fail-closed policy

If the Stop hook crashes, cannot read the state file, or receives malformed stdin, it exits 2 (block) — not 0 (allow). An mcpverify that fails open would violate its own guarantee. If the hook becomes flaky on your machine, fix the root cause rather than working around it; silently allowing stops defeats the tool.

What mcpverify does NOT protect against

Supply-chain risk from whatever reviewer CLI you configure (config.reviewer.command). Anyone who can write to the install directory's config.json can execute arbitrary binaries whenever verification runs.
Malicious MCP servers, malicious agents running with full filesystem access, or any threat that Claude Code itself does not defend against.
Data egress from the reviewer LLM. Your task descriptions and diffs are sent to whichever provider your reviewer CLI talks to.