planpong

v0.6.2

Published

15 days ago

Multi-model adversarial plan review — orchestrates AI agents to critique and refine implementation plans

0High
0Medium
0Low

andrewhml

mcp model-context-protocol plan-review adversarial ai-agents claude codex gemini

Planpong

Adversarial plan review for AI-assisted development. Two AI models play ping-pong with your plan — one critiques, the other revises — until the plan converges or you stop them.

Plans go through three review phases, each with a different lens:

| Round | Phase | What the reviewer looks for | | ----- | ------------- | ------------------------------------------------------------------------------------------------ | | 1 | Direction | Is this the right problem? Is the approach sound? Is the scope appropriate? | | 2 | Risk | Pre-mortem — assume the plan fails. Surface hidden assumptions, dependencies, and failure modes. | | 3+ | Detail | Implementation completeness — missing steps, edge cases, gaps, verification criteria. |

The planner model evaluates each piece of feedback independently — accepting, rejecting, or deferring with rationale — then rewrites the plan. This continues until the reviewer approves or the round limit is reached.

Prerequisites

You need at least one AI CLI installed and authenticated:

Claude Code — npm install -g @anthropic-ai/claude-code (Anthropic API key or Max subscription)
Codex CLI — npm install -g @openai/codex (ChatGPT account or OpenAI API key)
Gemini CLI — npm install -g @google/gemini-cli (Google account auth — run gemini once to authenticate)

If multiple are installed, planpong uses one for planning and a different one for reviewing (configurable). If only one is available, it auto-fallbacks to using that CLI for both roles.

Note on gemini as reviewer: the gemini CLI does not expose a stable session-resume mechanism, so reviewer rounds run without persistent context. Expect noticeably slower per-round wall time than claude or codex when gemini is the reviewer. The first time you load a config that selects gemini as reviewer, planpong prints a stderr warning.

Verify your CLI works:

claude --version   # or
codex --version    # or
gemini --version

Planpong shells out to these CLIs — no API keys are configured in planpong itself.

Install

npm install -g planpong

Then run the interactive setup wizard:

planpong init

The wizard auto-detects which AI CLIs you have installed, lets you pick a planner + reviewer, and writes a working planpong.yaml for the current project. You can re-run it any time to tweak settings — only changed keys are written.

Setup (Claude Code MCP)

Add planpong as an MCP server so Claude Code can use it as a native tool:

claude mcp add planpong -- planpong-mcp

Allow the tools in your Claude Code settings (.claude/settings.json):

{
  "permissions": {
    "allow": ["mcp__planpong"]
  }
}

Restart Claude Code. The planpong tools should appear in your tool list.

Usage

Via Claude Code (recommended)

Ask Claude to review a plan:

Review my plan at docs/plans/my-feature.md using planpong

Or use the slash commands (auto-installed with the MCP server):

/planpong:review docs/plans/my-feature.md             # autonomous — runs to completion
/planpong:review_interactive docs/plans/my-feature.md # pauses between rounds for your input
/planpong:status <session_id>                         # current state and round history
/planpong:sessions                                    # list all review sessions in this project
/planpong:report <session_id>                         # detailed phase-specific report (direction confidence, risk register, round history)

Via CLI

planpong review docs/plans/my-feature.md

Configuration

Optional. Run planpong init to generate this interactively, or create planpong.yaml in your project root by hand:

planner:
  provider: claude # claude, codex, or gemini
  model: opus # provider-specific; aliases or full IDs both work
reviewer:
  provider: codex
  model: gpt-5.3-codex
  effort: xhigh # codex-only knob: low | medium | high | xhigh
max_rounds: 10
plans_dir: docs/plans
revision_mode: full # full or edits
planner_mode: inline # inline or external (see below)

Valid model and effort values are provider-specific and change as providers ship new versions. Run planpong config providers to see the current per-provider lists, or planpong init for an interactive picker — don't copy the values above verbatim.

All fields are optional. Defaults: claude (planner) + codex (reviewer), 10 rounds, docs/plans/ directory, planner_mode: inline, revision_mode: full, human_in_loop: true.

Revision mode: full vs edits

revision_mode controls how the planner emits a revised plan after each round of feedback:

full (default) — the planner re-emits the entire plan markdown each round. Simple and robust; works for any plan size and any kind of change.
edits — the planner emits a list of targeted text replacements ({ section, before, after }) which planpong applies server-side. ~10× less output token volume on detail rounds that touch only a section or two, so revisions are noticeably faster on mature plans. Direction (round 1) still uses full rewrites since that's the round where sweeping changes are expected.

Use full for new plans where most rounds will rewrite large sections. Switch to edits once a plan has converged enough that rounds are touching one or two paragraphs at a time.

Planner mode: inline vs external

planner_mode is the most consequential operational choice. It decides who actually rewrites the plan after each round of feedback:

inline (default) — when you're driving planpong from Claude Code, you are the planner. Planpong returns the reviewer's issues; Claude reads the plan, edits it directly, and reports its accept/reject/defer decisions back via planpong_record_revision. No second model is invoked, so revisions are fast and use the conversational context Claude already has.
external — planpong shells out to the configured planner provider (e.g. another claude -p or codex exec invocation) to produce the revision. Use this when running planpong outside Claude Code (CLI flow), or when you want a different model to plan than the one orchestrating.

Inline is the right default for the Claude-Code-as-orchestrator workflow; external is the right default for planpong review from a plain shell.

Viewing and changing config

planpong config              # show resolved config with source annotations
planpong config path         # print path to active config file
planpong config keys         # list all keys with valid values, types, and defaults
planpong config providers    # list per-provider model and effort values
planpong config get <key>    # print a single resolved value
planpong config set <key> <value>   # set a config value

Examples:

planpong config set reviewer.provider gemini
planpong config set reviewer.model gemini-2.5-pro
planpong config set max_rounds 5
planpong config set planner_mode inline

Valid keys: planner.provider, planner.model, planner.effort, reviewer.provider, reviewer.model, reviewer.effort, plans_dir, max_rounds, human_in_loop, revision_mode, planner_mode. Run planpong config keys for the canonical list with descriptions.

Config via MCP

Two MCP tools are available for programmatic config access:

planpong_get_config — returns resolved config, file path, version, and per-key source provenance
planpong_set_config — dry-run by default (confirm: false); pass confirm: true to write

MCP API notes

Planpong's MCP tools are designed to be safe under retries, duplicated calls, and orchestrator restarts:

planpong_revise and planpong_record_revision require expected_round. Pass the round number returned by the most recent planpong_get_feedback. Stale calls (round mismatched lower) and out-of-order calls (mismatched higher) return precise errors instead of double-charging the planner.
Tool calls are replay-safe. Calling planpong_get_feedback twice before the round's revision returns the existing feedback with idempotent_replay: true instead of re-invoking the reviewer. The same applies to planpong_revise and planpong_record_revision when the round's response artifact already exists.
Per-session lock. Mutating MCP tools acquire an exclusive lock at .planpong/sessions/<id>/lock so two overlapping clients cannot both advance the same session.
Reviewer findings carry an evidence flag. Each issue in planpong_get_feedback may include a quoted_text field (a short verbatim quote from the plan) and a verified: true | false flag set by planpong post-parse. verified: false means the quote could not be located in the plan — usually a hallucinated or paraphrased finding. Planners should deprioritize unverified issues. The response also includes unverified_count for quick triage.

Most users driving planpong through Claude Code never see these primitives — the slash commands and the orchestrator's instructions handle them. They matter if you're building an external MCP client.

What it produces

Planpong updates your plan file in-place and adds a status line tracking the review:

**planpong:** R3/10 | claude → codex | 2P2 1P3 → 1P3 → 0 | Accepted: 4 | +32/-8 lines | 5m 23s | Approved after 3 rounds

Reading left to right: round 3 of 10, claude planned / codex reviewed, issue trajectory across rounds, total accepted issues, line delta from original, elapsed time, and outcome.

Session data is stored in .planpong/sessions/ (add to .gitignore).

Development

git clone https://github.com/andrewhml/planpong.git
cd planpong
npm install        # installs deps + configures git hooks
npm run build      # compile TypeScript
npm run typecheck  # type-check without emitting

A pre-commit hook automatically rebuilds dist/ when TypeScript files are staged.

Publishing

Automated via GitHub Actions with npm trusted publishing (OIDC). No tokens needed.

npm version patch   # bumps version + creates git tag
git push && git push --tags   # triggers publish to npm

License

MIT