arena-mcp
v0.1.3
Published
Multi-agent AI arena for debates, code reviews, and red-team challenges via MCP
Maintainers
Readme
Arena MCP
█████╗ ██████╗ ███████╗███╗ ██╗ █████╗
██╔══██╗██╔══██╗██╔════╝████╗ ██║██╔══██╗
███████║██████╔╝█████╗ ██╔██╗ ██║███████║
██╔══██║██╔══██╗██╔══╝ ██║╚██╗██║██╔══██║
██║ ██║██║ ██║███████╗██║ ╚████║██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝ ╚═══╝╚═╝ ╚═╝A Model Context Protocol (MCP) server that enables multi-agent AI competitions and collaborations. Run debates, code reviews, red-team challenges, and evaluations across different AI models (Claude, OpenAI, Gemini, Codex).
Features
🎭 arena_debate
Multi-agent debates where AI agents argue different positions across multiple rounds.
- Assign specific positions to each agent
- Sequential or parallel execution modes
- Full conversation history tracking
🔍 arena_review
Parallel code reviews from multiple AI perspectives.
- Focus areas: bugs, security, performance, or comprehensive review
- Support for git refs, file lists, patches, and raw code
- JSON or prose output formats
⚔️ arena_challenge
Red-team style challenges where multiple agents attack an assertion.
- Optional defender agent to protect the assertion
- Multi-round adversarial testing
- Find edge cases and counterexamples
⚖️ arena_judge
Impartial evaluation of completed arena sessions.
- Score each agent's performance
- Identify strengths, weaknesses, and consensus
- Custom evaluation criteria
🏥 arena_health
Health check for all registered AI agent CLIs.
Installation
Prerequisites
Install the AI CLI tools you want to use:
# Claude CLI (required for claude agent)
npm install -g @anthropic-ai/claude-cli
# Codex CLI (required for codex agent)
npm install -g @codex-ai/cli
# Note: OpenAI and Gemini agents use codex CLI as adapter
# No separate openai-cli or gemini-cli installation neededInstall Arena MCP
From npm (Recommended)
# Run directly (no install needed)
npx arena-mcp
# Or install globally
npm install -g arena-mcpFrom Source
git clone https://github.com/tim101010101/arena.git
cd arena
bun install
bun run build
bun install -g .Configure MCP Client
Arena MCP works with any MCP-compatible client. Configuration examples:
Claude Desktop
Edit the configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Or use: Settings > Developer > Edit Config
{
"mcpServers": {
"arena": {
"command": "npx",
"args": ["-y", "arena-mcp"],
"env": {
"ARENA_TIMEOUT_MS": "120000",
"ARENA_DEFAULT_ROUNDS": "3",
"ARENA_DEFAULT_MODE": "parallel"
}
}
}
}Claude Code CLI
Use the CLI command to add the MCP server:
# If installed globally
claude mcp add arena arena-mcp
# Or use npx
claude mcp add arena npx arena-mcpTo configure environment variables, edit your Claude Code config file manually:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Configuration format is the same as Claude Desktop.
Other MCP Clients
For other MCP clients, refer to their documentation for MCP server configuration. The server command is arena and configuration is done via environment variables (see Configuration section below).
Restart your MCP client to load the server.
Configuration
All configuration is done through environment variables in the MCP client configuration. Available options:
| Variable | Description | Default | Valid Range |
|----------|-------------|---------|-------------|
| ARENA_TIMEOUT_MS | Agent execution timeout (milliseconds) | 120000 | 1000-600000 |
| ARENA_DEFAULT_ROUNDS | Default rounds for debates/challenges | 3 | 1-10 |
| ARENA_DEFAULT_MODE | Execution mode | parallel | sequential, parallel |
| ARENA_MAX_CONTEXT_SIZE | Maximum context size | 1000000 | 100000-10000000 |
| ARENA_CLAUDE_MODEL | Claude model override | (CLI default) | See Claude CLI docs for current models |
| ARENA_CODEX_MODEL | Codex model override | (CLI default) | See Codex CLI docs for current models |
| ARENA_GEMINI_MODEL | Gemini model override | (CLI default) | See Gemini docs for current models |
| ARENA_OPENAI_MODEL | OpenAI model override | (CLI default) | See OpenAI docs for current models |
Example Configuration
{
"mcpServers": {
"arena": {
"command": "arena",
"env": {
"ARENA_TIMEOUT_MS": "180000",
"ARENA_DEFAULT_ROUNDS": "5",
"ARENA_DEFAULT_MODE": "sequential"
}
}
}
}Troubleshooting
- Build fails: Ensure
bunis installed (curl -fsSL https://bun.sh/install | bash) - Tools not appearing: Restart your MCP client after config changes
- Agent CLI not found: Install the required CLI tools (see Prerequisites)
- Invalid config: Check the error message for validation details
Usage Examples
Debate: Architecture Decision
// Ask multiple AI agents to debate a technical decision
arena_debate({
topic: "Should we use microservices or monolith for our new project?",
agents: ["claude", "openai", "gemini"],
positions: {
"claude": "Advocate for microservices architecture",
"openai": "Advocate for monolithic architecture",
"gemini": "Neutral evaluator focusing on trade-offs"
},
rounds: 3,
context: "Team size: 5 developers, Expected scale: 10k users in year 1",
mode: "sequential"
})Code Review: Multiple Perspectives
// Get code reviews from multiple AI agents
arena_review({
sources: [{
type: "git_ref",
ref: "feature/new-auth",
root: "/path/to/repo"
}],
agents: ["claude", "codex", "openai"],
focus: "security",
output_format: "json"
})Challenge: Security Assertion
// Red-team test a security claim
arena_challenge({
assertion: "Our authentication system is immune to timing attacks",
evidence: "We use constant-time comparison for all password checks",
challengers: ["claude", "codex"],
defender: "openai",
rounds: 2,
context: "Node.js backend with bcrypt password hashing"
})Judge: Evaluate Debate
// Have a neutral agent evaluate the debate
arena_judge({
session_id: "debate_abc123",
judge: "gemini",
criteria: ["evidence quality", "logical coherence", "practical feasibility"]
})Architecture
src/
├── index.ts # MCP server entry point
├── types.ts # Zod schemas and TypeScript types
├── orchestrator.ts # Multi-agent execution orchestration
├── session.ts # Session management and history
├── context.ts # Code context acquisition (git, files)
├── prompts.ts # System and user prompts for each mode
├── output.ts # Response formatting
├── utils.ts # Utilities (timeout, env, binary checks)
├── constants.ts # Configuration constants
└── adapters/
├── base.ts # AgentAdapter interface
├── registry.ts # Adapter registry
├── claude.ts # Claude CLI adapter
├── codex.ts # Codex CLI adapter
├── openai.ts # OpenAI CLI adapter
└── gemini.ts # Gemini CLI adapterDevelopment
# Install dependencies
bun install
# Run tests
bun test
# Build
bun run build
# Start server (for testing)
bun run startUse Cases
1. Code Review Enhancement
Get multiple AI perspectives on code changes to catch more issues and improve code quality.
2. Technical Decision Making
Use structured debates to explore trade-offs and reach better architectural decisions.
3. Security Testing
Red-team your security assumptions with adversarial AI agents.
4. AI Model Comparison
Compare capabilities of different AI models on the same task.
5. Collective Intelligence
Leverage multiple AI agents to solve complex problems that benefit from diverse perspectives.
Limitations
- Requires CLI tools for each AI provider
- API costs scale with number of agents and rounds
- Parallel mode can be expensive for large-scale usage
- Response quality depends on underlying AI models
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
License
MIT
Roadmap
- [ ] Web UI for visualizing debates and reviews
- [ ] Result persistence and analytics
- [ ] Support for more AI providers
- [ ] Streaming responses
- [ ] Cost tracking and optimization
- [ ] Custom agent personas and expertise areas
