@mzkoch/agent-arena
v0.2.0
Published
Cross-platform CLI for running competing autonomous coding agents in isolated git worktrees.
Maintainers
Readme
Agent Arena
arena is a cross-platform CLI for running multiple autonomous coding agents in parallel, each inside its own git worktree, then monitoring, comparing, and evaluating the results from one place.
It is built with TypeScript, Commander, Ink, node-pty, and Zod, and is designed to work on macOS, Linux, and Windows.
Why Agent Arena?
When you want several AI agents to compete on the same project brief, you usually end up juggling terminals, worktrees, prompts, and ad-hoc scripts. Agent Arena wraps that workflow into one tool:
- isolated git worktrees per variant, created as branches on your own repository
- all arena state contained in a single
.arena/directory - provider/model aware launch commands
- a live Ink TUI for dashboard and detail views
- headless mode with TCP/NDJSON IPC for monitoring from another terminal
- comparison report generation across variants
Quick Start
Install from npm:
npm install -g @mzkoch/agent-arena1. Initialize the project
Run once per repository to create the .arena/ directory and add it to .gitignore:
arena init2. Create an arena
Scaffold a new arena with default config (edit .arena/default/arena.json and .arena/default/requirements.md afterwards):
arena createOr create a named arena from existing files:
arena create my-experiment --config arena.json --requirements requirements.md3. Launch agents
Create worktrees and start agents with the TUI:
arena launchLaunch headless, then monitor from another terminal:
arena launch --headless
arena monitorDiagnostics Logs
Each arena launch writes diagnostics under .arena/<name>/logs/:
session.jsonlstores structured lifecycle events such asarena.start,agent.spawn,agent.state,agent.exit,agent.complete,agent.fail, andarena.summary<variant>.logstores raw PTY output for each agent with an ISO-8601 timestamp prefix on every captured chunk
Useful commands:
# Watch a variant's live output
tail -f .arena/my-arena/logs/alpha.log
# Browse all session events
jq '.' .arena/my-arena/logs/session.jsonl
# Filter events for a specific variant
grep '"variant":"alpha"' .arena/my-arena/logs/session.jsonl | jq .
# Show only completion events
jq 'select(.event == "agent.complete")' .arena/my-arena/logs/session.jsonl4. Evaluate and accept
Check structured status:
arena statusGenerate a comparison report:
arena evaluateAccept a winning variant:
arena accept my-experiment copilot-node5. Clean up
Clean worktrees and branches (with safety checks for unmerged work):
arena clean
arena clean --force # skip unmerged work checks and force remote deletion
arena clean --keep-config # keep arena.json and requirements.md
arena clean --keep-remote # skip remote branch deletionBy default, arena clean also deletes remote branches for non-accepted variants. Branches with open pull requests are preserved unless --force is used. Accepted variant arena/* branches are deleted when a corresponding accept/* branch exists on the remote (the accept branch is the canonical ref). If the accept branch has not been pushed yet, the arena branch is preserved. If the remote is unreachable, remote cleanup is skipped gracefully.
Multiple Arenas
Run multiple concurrent arenas by providing a name:
arena create alpha --config arena.json --requirements requirements.md
arena create beta --config arena2.json --requirements requirements2.md
arena launch alpha
arena launch beta --headless
arena list
arena status alpha
arena evaluate beta
arena accept alpha copilot-node
arena clean betaWhen only one arena exists, the name is optional. When multiple arenas exist, you must specify which one to use.
Project Layout
After arena init and arena create, your project looks like:
my-project/
├── .arena/
│ └── default/ # arena name (default when not specified)
│ ├── arena.json
│ ├── requirements.md
│ ├── session.json # created during launch
│ ├── comparison-report.md # created by evaluate
│ ├── logs/
│ └── worktrees/
│ ├── copilot-node/ # branch: arena/default/copilot-node
│ │ ├── .arena/ # REQUIREMENTS.md & ARENA-INSTRUCTIONS.md (gitignored)
│ │ └── ... # agent's implementation files
│ └── claude-fastify/ # branch: arena/default/claude-fastify
│ ├── .arena/
│ └── ...
├── src/
├── package.json
└── .gitignore # .arena/ added automaticallyEach variant worktree is a branch on your own repo (arena/<name>), so you can:
git diff main..arena/default/copilot-nodeto comparegit merge arena/default/copilot-nodeto adopt the winner- Open a GitHub PR from
arena/default/copilot-nodetomain
Requirements and instructions are placed in .arena/ inside each worktree (not at the worktree root) to prevent agents from accidentally committing them.
CLI Reference
| Command | Description |
| --- | --- |
| arena init | One-time project setup: create .arena/ and add to .gitignore |
| arena create [name] | Create a new arena with config and requirements templates |
| arena launch [name] [--headless] | Create worktrees, write variant files, and start agents |
| arena list | List all arenas and their status |
| arena accept <name> <variant> | Create a clean branch from a winning variant |
| arena monitor [name] | Attach the TUI to a running headless session |
| arena status [name] | Print JSON state for the arena |
| arena evaluate [name] | Scan worktrees and write comparison report |
| arena clean [name] [--keep-config] [--keep-remote] [--force] | Remove worktrees and remote branches safely |
| arena version | Print the installed version |
create options:
--config <path>copies an existing arena.json into.arena/<name>/--requirements <path>copies an existing requirements file into.arena/<name>/- Both must be provided together, or omit both to scaffold default files
clean options:
--keep-configkeeps arena.json and requirements.md--keep-remoteskips remote branch deletion (preserves current local-only behavior)--forceskips safety checks (unmerged commits, unpushed commits, uncommitted changes) and deletes remote branches with open PRs
All commands auto-discover the arena from .arena/. When only one arena exists, the [name] argument is optional. When multiple arenas exist, you must specify which one.
Global flags:
-v, --verboseenables structured debug logs on stderr-h, --helpshows contextual help
Configuration Reference
{
"maxContinues": 50,
"agentTimeoutMs": 3600000,
"providers": {},
"variants": [
{
"name": "node-copilot",
"provider": "copilot-cli",
"model": "claude-sonnet-4.5",
"techStack": "Node.js with Express, TypeScript",
"designPhilosophy": "Focus on simplicity and DX",
"branch": "arena/node-copilot"
}
]
}Top-level fields
| Field | Required | Default | Notes |
| --- | --- | --- | --- |
| repoName | No | — | Optional, kept for backward compatibility |
| maxContinues | No | 50 | Passed to providers that expose a max-steps flag |
| agentTimeoutMs | No | 3600000 | Hard timeout per agent |
| providers | No | {} | Custom or overriding provider definitions |
| variants | Yes | — | One or more variant configs |
Variant fields
| Field | Required | Default | Notes |
| --- | --- | --- | --- |
| name | Yes | — | Must match ^[a-z0-9-]+$ |
| provider | No | copilot-cli | Provider key to use |
| model | Yes | — | Provider-specific model name |
| techStack | Yes | — | Written into per-worktree instructions |
| designPhilosophy | Yes | — | Written into per-worktree instructions |
| branch | No | arena/<name> | Branch name for the worktree |
Arena name validation
Arena names must be:
- Lowercase alphanumeric with hyphens only
- Start with a letter or digit
- Maximum 64 characters
- No path traversal characters (
.,/,\)
Provider System
Built-in providers:
copilot-cliclaude-code
Custom providers can override built-ins or define new ones:
{
"providers": {
"my-agent": {
"command": "my-agent-cli",
"baseArgs": ["--autonomous"],
"modelFlag": "--model",
"promptDelivery": "flag",
"promptFlag": "--prompt",
"maxContinuesFlag": "--max-steps",
"exitCommand": "/exit",
"completionProtocol": {
"idleTimeoutMs": 30000,
"maxChecks": 3,
"responseTimeoutMs": 60000
},
"trustedFolders": {
"strategy": "flat-array",
"configFile": "~/.my-agent/config.json",
"jsonKey": "trusted_folders"
}
}
}
}The trustedFolders field is optional. When set, the arena pre-registers each worktree directory in the provider's config file before launching the agent, preventing interactive trust dialogs. Two strategies are supported:
flat-array: folder path is appended to a JSON array (e.g. copilot-cli)nested-object: folder path becomes a key in a nested object with a boolean flag (e.g. claude-code). Requires an additionalnestedKeyfield.
Completion Protocol
Agents signal completion using the structured envelope format:
<<<ARENA_SIGNAL:{"status":"done"}>>> # Agent is done
<<<ARENA_SIGNAL:{"status":"continue"}>>> # Agent is still workingThe orchestrator detects these envelopes in agent terminal output. When an agent goes idle, the orchestrator sends a status check prompt. Per-provider completionProtocol settings control idle detection timing and retry limits.
Completion Verification
Arena verifies agent work before accepting completion. Configured at the top level of arena.json:
{
"completionVerification": {
"enabled": true,
"requireCommit": true,
"requireCleanWorktree": true,
"command": {
"command": "npm",
"args": ["run", "validate"],
"timeoutMs": 300000
}
}
}When an agent signals done, the orchestrator checks that commits exist, the worktree is clean, and an optional validation command passes. If verification fails, specific feedback is sent back to the agent, which resumes work.
Model Validation
Arena validates model names at config load time by discovering available models from each provider. The copilot-cli provider runs copilot --help and parses the choices list. When an invalid model is detected, the error includes a suggestion:
Invalid model "gemini-3-pro" for provider "copilot-cli". Did you mean "gemini-3-pro-preview"?Discovered models are cached to .arena/.model-cache.json (1-hour TTL) to avoid repeated CLI calls.
Custom providers can opt into model validation:
{
"providers": {
"my-agent": {
"command": "my-agent-cli",
"modelDiscovery": {
"command": "my-agent-cli",
"args": ["--help"],
"parseStrategy": "choices-flag"
},
"supportedModels": ["model-a", "model-b"]
}
}
}modelDiscovery: Defines a command to run for runtime model discovery. Thechoices-flagparse strategy extracts models from(choices: "model1", "model2", ...)output.supportedModels: A static list of valid models. When present, takes precedence over runtime discovery.
If a variant agent fails within the first 15 seconds of launch (likely a bad model), the orchestrator automatically retries once with the closest valid model name.
Prompt delivery modes:
positional: append the prompt as the final CLI argumentflag: pass the prompt throughpromptFlagstdin: launch first, then write the prompt to the PTY stdin
TUI Keybindings
| Key | Context | Action |
| --- | --- | --- |
| Tab | Any | Select next agent |
| 1-9 | Any | Jump to agent N |
| d | Non-interactive | Toggle dashboard/detail |
| Up/Down | Dashboard | Change selected row |
| Enter | Dashboard | Open detail view |
| i | Detail | Enter interactive PTY mode |
| Esc | Interactive | Leave interactive mode |
| k | Detail | Kill the selected agent |
| r | Detail | Restart the selected agent |
| q | Non-interactive | Quit, with confirmation if agents are still active |
Architecture Overview
+-------------------+
| commander CLI |
+---------+---------+
|
+----------v-----------+
| ArenaOrchestrator |
+----+-------------+---+
| |
+---------v----+ +----v----------------+
| node-pty PTY | | Git worktree layer |
+--------------+ +---------------------+
|
+-------v--------+
| event stream |
+---+---------+--+
| |
+---------v--+ +--v----------------+
| Ink TUI | | NDJSON IPC server |
+------------+ +-------------------+For the deeper design rationale, see DESIGN.md.
Installation
npm
npm install -g @mzkoch/agent-arenaHomebrew
The repository includes Formula/arena.rb. If you publish a tap:
brew tap mzkoch/tools
brew install arenaInstall scripts
Unix:
curl -fsSL https://raw.githubusercontent.com/mzkoch/agent-arena/main/scripts/install.sh | bashWindows PowerShell:
iwr https://raw.githubusercontent.com/mzkoch/agent-arena/main/scripts/install.ps1 -useb | iexBoth scripts detect OS and architecture, then download the matching release artifact.
Build from source
npm install
npm run build
node dist/cli.js --helpDevelopment
npm install
npm run lint
npm run build
npm run test:coverageThe test suite enforces a minimum 80% coverage threshold for the business-logic surface.
Contributing
See CONTRIBUTING.md for development setup, release procedures, and submission guidelines.
License
MIT
