claude-skill-debate

v2.0.0

Published

3 months ago

Claude vs GPT multi-turn AI debate skill for Claude Code with cross fact-checking and HTML report generation

0High
0Medium
0Low

claude claude-code claude-skill codex gpt debate ai-debate fact-check openai

The best way to stress-test an idea is to have two smart opponents try to destroy each other's arguments, then let a judge decide.

Most AI analysis gives you one perspective. You nod along, never knowing what it missed. Debate forces both sides to attack, defend, and cite evidence, while a neutral reporter picks the winner.

No false balance. No "both sides have a point." Someone wins.

Why

Single AI analysis:
  "Here's my take" → You accept it → Confirmation bias

Debate:
  Claude: "Here's why I'm right"
  GPT:    "Here's why you're wrong"
  Claude: "Your data is cherry-picked, here's the full picture"
  GPT:    "Your model assumes 20% CAGR, Microsoft only did 12%"
  Judge:  "GPT wins 29.5 to 27.5. Here's why."

Two adversarial AIs will find flaws that a single AI never surfaces. The cross fact-check phase catches bad numbers, misleading citations, and logical gaps.

Install

npx claude-skill-debate

That's it. One command. The skill is copied to ~/.claude/skills/debate/ and ready to use.

git clone https://github.com/YunseobShin/claude-skill-debate.git
mkdir -p ~/.claude/skills/debate
cp claude-skill-debate/SKILL.md ~/.claude/skills/debate/SKILL.md

Usage

Inside a Claude Code session:

/debate Will AI replace software engineers?
/debate React vs Vue for new projects in 2026
/debate Tesla stock price outlook
/debate Should central banks adopt CBDCs?

Natural language also triggers it:

"Have two AIs debate this"
"Make Claude and GPT argue about this"

The Arena

Five phases. Two combatants. One judge.

Phase 0   PREPARATION
          Topic → 3-5 key arguments → debate rules
          ──────────────────────────────────────────

Phase 1   INDEPENDENT ANALYSIS  (parallel)
          Claude ──┐
                   ├──→  500-800 word analysis each
          GPT   ───┘
          ──────────────────────────────────────────

Phase 2   FREE DEBATE  (up to 5 rounds)
          Round 1:  Claude attacks  →  GPT counters
          Round 2:  GPT attacks     →  Claude counters
          ...
          Convergence check: no new arguments → early exit
          ──────────────────────────────────────────

Phase 3   CROSS FACT-CHECK
          Claude → verifies GPT's top 3 claims
          GPT    → verifies Claude's top 3 claims
          Verdict: [Confirmed / Refuted / Unverified]
          ──────────────────────────────────────────

Phase 4   VERDICT REPORT
          Neutral Reporter agent reads full transcript
          → HTML report (Tailwind CSS, dark mode)
          → Clear winner, not "both have merits"

Why Cross Fact-Check Matters

In our first real debate (Palantir stock outlook), fact-checking caught:

Claude citing Q4 revenue growth as +70% (actual: +36%, it mixed up adjusted vs GAAP)
GPT citing commercial revenue growth as +109% (actual: +60% global, +109% was US-only)

Both sides had compelling arguments. But one side had more wrong numbers. The fact-check changed the verdict.

Output

All artifacts saved to /tmp/debate/YYYYMMDD_HHMMSS/:

| File | What | |:-----|:-----| | topic.md | Topic & key arguments | | rules.md | Debate rules | | analysis_claude.md | Claude's independent analysis | | analysis_codex.md | GPT's independent analysis | | round_N_claude.md | Claude's round N statement | | round_N_codex.md | GPT's round N statement | | factcheck_by_claude.md | Claude fact-checks GPT | | factcheck_by_codex.md | GPT fact-checks Claude | | report.html | Final HTML verdict report |

The Report

The HTML report has 8 sections:

Header with topic, date, participants
Background (200 words max)
Key issues in card layout
Debate highlights with per-round quotes
Fact-check results table (claim / verdict / evidence)
Final verdict with clear winner and scoring
Conclusion & takeaways
Disclaimer

The report auto-opens in your browser on completion.

Prerequisites

| Requirement | Required | Notes | |:---|:---:|:---| | Claude Code | Yes | CLI installed and authenticated | | OpenAI Codex CLI | Yes | npm i -g @openai/codex | | Codex MCP Server | No | Enables multi-turn threading (recommended) |

Codex Setup

# Install
npm install -g @openai/codex

# Authenticate (ChatGPT Plus or Pro Plan)
codex auth login

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "codex": {
      "command": "codex",
      "args": ["--full-auto", "mcp"]
    }
  }
}

Without MCP, the skill falls back to CLI mode automatically.

Design Principles

| Principle | Implementation | |:----------|:--------------| | No false balance | The report must pick a winner based on evidence | | Cross-verification | Both sides fact-check the opponent's top 3 claims | | Early termination | Debate stops when arguments start repeating | | Parallel execution | Independent phases run concurrently | | Adversarial by design | Each side is prompted to attack, not agree |

Timing

Typically 3-5 minutes depending on round count. MCP mode is slightly faster than CLI fallback.

Notes

Debate logs live in /tmp and are lost on reboot. Copy them if you need to keep them.
Investment/stock debates are for reference only, not financial advice.
Codex CLI works with ChatGPT Plus or Pro Plan auth. API key usage is billed separately.