debate-mcp

v1.0.0

Published

2 months ago

MCP server that stress-tests your decisions with adversarial AI debate. GPT vs Gemini, Skeptic vs Steelman, grounded in web search.

0High
0Medium
0Low

lumiwealth

mcp mcp-server model-context-protocol claude-code ai-debate adversarial multi-llm multi-model openai gemini llm-orchestration developer-tools decision-making

Debate MCP

Stress-test your decisions before you commit. An MCP server that runs adversarial AI debates between frontier models, grounded in live web search.

Most AI tools optimize for consensus. Debate MCP optimizes for finding where your plan breaks.

How It Works

You describe your plan
        |
        v
  [Web Search] -- gathers current facts, laws, regulations
        |
        v
  +-----------+          +-----------+
  |  SKEPTIC  |          | STEELMAN  |
  |  (GPT)    |          | (Gemini)  |
  |           |          |           |
  | Attacks   |          | Finds the |
  | your plan |          | strongest |
  | ruthlessly|          | version,  |
  |           |          | then      |
  |           |          | stress-   |
  |           |          | tests it  |
  +-----------+          +-----------+
        |    Round 2: they     |
        |    read each other   |
        |    (anonymized) and  |
        +--- argue back -------+
                  |
                  v
        [Structured synthesis]
        Recommendation + Crux +
        What Would Falsify +
        Unresolved disagreements

Quick Start

1. Install

npx debate-mcp

2. Add to Claude Code

claude mcp add debate npx debate-mcp \
  -e OPENAI_API_KEY=sk-... \
  -e GEMINI_API_KEY=AI...

3. Use it

Just tell Claude: "debate this", "what am I missing", "stress-test this plan", or "is this the right call".

[!TIP] You can also trigger it with domain and current_leaning for targeted debates: "Debate this as a tax attorney. I'm leaning toward electing S-Corp."

What Makes This Different

| Feature | Why it matters | |---------|---------------| | Asymmetric roles | One model attacks (Skeptic), one defends then stress-tests (Steelman). Research shows this outperforms giving both models the same prompt. | | Anonymized cross-examination | In Round 2, models see each other's work labeled "another analyst" to prevent identity bias. Based on NeurIPS 2025 research. | | Web search grounding | Before the debate, the server searches for current facts, laws, and regulations. Both models receive this as VERIFIED evidence and must flag ungrounded claims as UNVERIFIED. | | Confirmation bias attack | Tell it what you're leaning toward. The Skeptic will specifically attack that leaning. | | Domain expertise | Pass domain: "tax attorney" or "systems architect" to make both analysts domain-specific. | | Constrained synthesis | The output forces a structured format: Recommendation, Crux of Disagreement, What Would Falsify, Risk of Acting vs Waiting. Prevents AI from smoothing real disagreements into false consensus. |

Example

Input: "Should we elect S-Corp status? Net profit $40K, based in NYC." Domain: tax attorney Current leaning: "I think S-Corp will save on self-employment tax"

What happens:

Web search pulls current NYC tax rates, QBI rules, IRS thresholds
Skeptic leads with: "At $40K net profit in NYC, S-Corp election is mathematically guaranteed to lose you money" and explains exactly why
Steelman finds the strongest case for S-Corp, then stress-tests it against NYC-specific tax penalties
Cross-examination: Skeptic concedes the QBI interaction point, Steelman concedes the compliance cost erasure
Synthesis: Don't elect. Here's the specific profit threshold where it flips.

Configuration

Environment Variables

Required (at minimum):

| Variable | Description | |----------|-------------| | OPENAI_API_KEY | API key for the Skeptic model (OpenAI by default) | | GEMINI_API_KEY | API key for the Steelman model (Gemini by default) |

Model configuration:

| Variable | Default | Description | |----------|---------|-------------| | SKEPTIC_MODEL | gpt-5.4 | Model for the Skeptic role | | SKEPTIC_BASE_URL | OpenAI default | Base URL for the Skeptic API (change to use Grok, Groq, Mistral, etc.) | | STEELMAN_MODEL | gemini-3.1-pro-preview | Model for the Steelman role | | STEELMAN_PROVIDER | gemini | Set to openai to use any OpenAI-compatible API for Steelman | | STEELMAN_BASE_URL | - | Base URL when using STEELMAN_PROVIDER=openai | | STEELMAN_API_KEY | Falls back to GEMINI_API_KEY | API key when using STEELMAN_PROVIDER=openai | | CALL_TIMEOUT_MS | 90000 | Timeout per API call (ms) |

Use Any Model Provider

The Skeptic role works with any OpenAI-compatible API out of the box. Just change the base URL:

# Grok (xAI)
SKEPTIC_BASE_URL=https://api.x.ai/v1 SKEPTIC_MODEL=grok-3 OPENAI_API_KEY=xai-...

# Groq
SKEPTIC_BASE_URL=https://api.groq.com/openai/v1 SKEPTIC_MODEL=llama-4-scout OPENAI_API_KEY=gsk_...

# Ollama (local, free)
SKEPTIC_BASE_URL=http://localhost:11434/v1 SKEPTIC_MODEL=llama3 OPENAI_API_KEY=ollama

# Mistral
SKEPTIC_BASE_URL=https://api.mistral.ai/v1 SKEPTIC_MODEL=mistral-large OPENAI_API_KEY=...

The Steelman role uses Gemini by default (for Google Search grounding). To use a different provider, set STEELMAN_PROVIDER=openai and configure the base URL.

MCP Configuration (`.mcp.json`)

{
  "mcpServers": {
    "debate": {
      "command": "npx",
      "args": ["-y", "debate-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "GEMINI_API_KEY": "AI..."
      }
    }
  }
}

[!NOTE] Bring your own API keys. Debate MCP calls OpenAI and Google APIs directly. You are responsible for your own API usage and costs. A typical debate uses ~20,000-30,000 tokens across both providers.

Tool Parameters

| Parameter | Required | Description | |-----------|----------|-------------| | context | Yes | The plan, decision, or situation to debate. Include all relevant details. | | question | No | Specific question to focus the debate on. | | domain | No | Domain expertise: "tax attorney", "systems architect", "financial advisor", etc. | | current_leaning | No | What you're leaning toward. The Skeptic attacks this to counter confirmation bias. |

The Research Behind It

Debate MCP's design is based on peer-reviewed research on multi-agent debate:

Asymmetric roles outperform identical prompts ("Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate", 2025)
Anonymized cross-examination prevents identity bias ("When Identity Skews Debate", NeurIPS 2025)
Steelmanning before disagreeing forces genuine engagement (Kahneman's Adversarial Collaboration framework)
Re-stating the original question each round prevents context drift ("Talk Isn't Always Cheap", ICML 2025)
Caller-model synthesis avoids positional commitment bias from debaters ("Auditing Multi-Agent LLM Reasoning Trees", 2025)
Ray Dalio's triangulation method: get independent expert opinions, map convergence and divergence, then decide

When To Use It

Good for: Taxes, legal decisions, financial planning, business strategy, architecture choices, investment analysis, contract terms, hiring decisions, production deployments.

Not for: Simple coding tasks, quick lookups, routine bug fixes, or questions with obvious answers.

License

MIT