runoff
v3.0.0
Published
Multi-step code-change pipelines for coding agents — race mode, git worktree isolation, local traces
Downloads
149
Maintainers
Readme
runoff
Run two coding agents on the same task. Pick the winner.
runoff is a multi-step code-change pipeline for coding agents — declarative DAG, git worktree isolation, provider races, and local traces. Works as an MCP server (Cursor, Claude Desktop, Claude Code) or a standalone CLI.
IDE / MCP host runoff coding-agent CLI
──────────────► implement → review → retry ──► Claude Code / Codex / Gemini / …
↑ race mode ↑
two providers, one task, you pickInstall
npx runoff init --work-dir /path/to/your/repoOr clone to develop / self-host:
git clone https://github.com/alexangelzhang/runoff.git && cd runoff
npm install
npm run demo # zero API keys — mock run with trace + experimentRace mode
Put two providers in an array — they run in parallel, each in its own git worktree, and the pipeline pauses for you to pick:
{
"pipeline": {
"implement": [["claude-code", "opencode"]],
"review": ["claude-code", "implement"]
}
}candidate 0 (claude-code) src/utils/format.ts +27 lines
formatRelativeTime(isoString: string) — string input only
candidate 1 (opencode/DeepSeek) src/utils/format.ts +60 lines
formatRelativeTime(dateInput: string | Date) — accepts Date too
+ future dates ("2 hours from now"), week unit, edge-case guards
npx runoff race apply --session abc123 --winner 1Same spec. Two models, different API decisions. With raceFinalize: defer you see both diffs before any code lands.
→ Full mechanics: docs/features/race-mode.md → Real races with diffs: docs/reference/race-showcase.md — 6 real runs, real providers, real design decisions → Token cost data: docs/reference/benchmarks-data.md
Run on your repo
# 1. Generate pipeline.config.json for your repo
npx runoff init --work-dir /path/to/repo --profile feature
# 2. Verify config + backend connectivity
npx runoff doctor --config /path/to/repo/pipeline.config.json
# 3. Run a task
npx runoff run \
--prompt "Add hello() with unit tests" \
--work-dir /path/to/repo \
--config /path/to/repo/pipeline.config.jsonEdit config in a browser (providers, DAG, retry — saves via local HTTP):
npx runoff config edit --config /path/to/pipeline.config.jsonExample configs: examples/configs/ — feature, bugfix, refactor, cli
Real CLI backends: docs/guides/coding-agent-backends.md — Codex, Gemini, Claude Code, OpenCode
MCP server
{
"mcpServers": {
"runoff": {
"command": "npx",
"args": ["runoff", "mcp"],
"cwd": "/absolute/path/to/your/project"
}
}
}Auto-configure for Cursor / Claude Desktop / Claude Code:
npm run setup:mcp| Tool | Purpose |
|------|---------|
| runoff_run_pipeline | Full DAG + retries + checkpoints + race pause |
| runoff_run_step | Single step |
| runoff_query_traces / runoff_query_experiments | Local observability |
| runoff_race_apply / runoff_race_abort | Race finalization |
Full list + governance/memory tools: docs/README.md
Why runoff?
| | runoff | LangGraph | CrewAI | AutoGen | OpenHands | |-|:------:|:---------:|:------:|:-------:|:---------:| | Declarative config DAG (JSON) | ✅ | code-first | Crew/Task | code-first | UI + agent | | Git worktree + lock contract | ✅ | — | — | — | partial | | Provider race + judge pause | ✅ | — | — | — | — | | MCP tool surface for IDE hosts | ✅ | optional | recent | — | different | | Local trace + experiment eval | ✅ | +LangSmith | DIY | DIY | partial |
Full comparison: docs/reference/differentiation.md
Prerequisites
Node 20+, Python 3, Git
bash scripts/shell/check-prereqs.shDevelopment & CI
| Command | Purpose |
|---------|---------|
| npm test | Full suite (~800 tests) |
| npm run ci:gates | IPC sync + gate e2e + unit tests |
| npm run ci:gates:smoke | PR smoke (allow-skip without secrets) |
| npm run check-ipc-sync | After src/core/ipc.ts changes |
| npm run typecheck | tsc --noEmit (required in CI) |
Documentation
Full index: docs/README.md
| Doc | Topic |
|-----|-------|
| getting-started-30min.md | First run → real repo |
| coding-agent-backends.md | Codex, Gemini, Claude Code, OpenCode |
| race-mode.md | Running multiple LLMs on the same step |
| observability.md | Trace + experiment (no LangSmith required) |
| differentiation.md | vs LangGraph, CrewAI, AutoGen, OpenHands |
| security-model.md | Threat model (self-hosted) |
| structure.md | src/ + scripts/ layout |
| advanced/ | A2A, Dream, Dreamify (optional) |
Features
- Declarative DAG pipeline: implement → review → retry
- Provider race mode with judge pause and worktree isolation
- Governance: policy, guardrails, plan approval gate
- Checkpoint / resume; durable run store
- Local trace + experiment logs at
~/.runoff/(no SaaS required) - Optional: external memory, Dream offline worker, A2A federation (experimental)
License
MIT — LICENSE
