mastermind-ai
v1.1.0
Published
A persistent AI engineering team that debates your code decisions before they get committed
Maintainers
Readme
Mastermind
A persistent AI engineering team that debates your code decisions.

A team of AI reviewers lives in your repo. Each one has a charter file you own, a rolling memory log that accumulates across runs, and (when invoked from Claude Code) its own isolated context window. They review diffs, debate RFCs, run postmortems, plan sprints, and write code. Every decision is a file on disk next to your source, tracked in git.
The default roster is seven active specialists plus two optional ones, and you can disable, enable, or add roles with one command. Every workflow adapts its spawn count to whoever is active right now.
If you have used ChatDev, MetaGPT, CrewAI, or any "give a prompt, get a project" tool, the shape here is different. Mastermind does not generate greenfield projects. It joins the one you already have.
Why this exists
Code review is a context problem. One human reviewer (or one AI pass) can only hold a single perspective at a time. The pragmatist, the security reviewer, and the architect get flattened into one voice, and the disagreements (which are usually the useful part) vanish.
Mastermind keeps the voices distinct. Each role runs independently with its own charter, its own memory of past decisions, and its own view of the code. The CTO synthesizes a verdict after all perspectives are in. The result reads like a real review thread, not a checklist.
What makes this different from wrapping one model in multiple system prompts:
- Persistent memory. Each role remembers what they reviewed before. The architect who blocked your auth design two weeks ago will remember why when you bring the revised version.
- Real isolation. When running inside Claude Code, each role gets its own cold subagent. Andrej (architect) literally cannot see Dario's (security) prompt or memory. The opinions are independent.
- Files on disk. Open
.mastermind/team/architect/charter.mdin any editor and read what the architect cares about. Edit it. The kernel picks up your changes on the next run. No code change required.
Quick start (Claude Code)
If you are using Claude Code, Mastermind works out of the box. No API key needed. Claude Code dispatches each role as a real subagent under its own OAuth.
Just say:
review this diffOr any trigger phrase: "propose an RFC for X", "think about X", "there's a bug in X", "plan the sprint". The SKILL.md file teaches Claude Code how to invoke every workflow automatically.
If .mastermind/ does not exist yet, run mastermind onboard --init once to scaffold the team.
Standalone CLI setup
For running Mastermind outside of Claude Code (plain terminal, CI, pre-commit hooks):
npm install -g mastermind-ai
mastermind init # interactive wizard: pick provider, format, generate config
export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY, or point at Ollama
mastermind review --diff # review whatever is stagedThe init command asks which LLM provider you use (Anthropic, OpenAI, Ollama, or a custom endpoint), generates a mastermind.config.json, and scaffolds the .mastermind/ team directory with the default roster: 7 active roles plus 2 optional ones (Grace and Fei-Fei) that are scaffolded but disabled until you opt in.
The team
Mastermind ships with 9 built-in roles. Seven are active by default; two more (Grace and Fei-Fei) are scaffolded but disabled until you opt in. You can also add unlimited custom roles. Every role has a charter file at .mastermind/team/<role>/charter.md that you own and can edit freely.
| Handle | Role | Tier | Core? | Default state | What they focus on | |--------|------|------|-------|---------------|--------------------| | Ada | CTO | Opus | yes | active | Final call on RFCs, decision log, architectural tie-breakers | | Andrej | Architect | Opus | yes | active | Module boundaries, abstractions, long-term maintainability | | Sam | Pragmatist | Haiku | yes | active | User impact, shipping velocity, scope discipline | | Lex | IC | Haiku | yes | active | Implementation drafts, code generation, first passes | | Dario | Security | Opus | no | active | Attack surface, trust boundaries, defense in depth | | Jensen | Performance | Haiku | no | active | Cost at scale, latency budgets, resource efficiency | | Linus | SRE | Opus | no | active | Rollback strategy, runbooks, 3am operability | | Grace | QA Engineer | Sonnet | no | disabled | Test coverage, edge cases, failure modes, regression risk | | Fei-Fei | DX | Sonnet | no | disabled | API clarity, naming, documentation, onboarding friction |
Want Andrej to care about Rust module boundaries instead of generic abstractions? Open team/architect/charter.md and rewrite the prose. The kernel reads it fresh on every run.
Toggle and customize
Enable the optional roles, disable any non-core role, or add your own with one command:
mastermind roles --list # show who's active / disabled / optional
mastermind roles --enable testing # turn on Grace (QA engineer)
mastermind roles --enable dx # turn on Fei-Fei (DX advocate)
mastermind roles --disable performance # skip Jensen for this project
mastermind roles --add data --handle Jordan \
--title "Data Engineer" \
--tagline "Pipeline correctness and schema discipline" \
--tier medium # scaffold a custom role with a starter charter
mastermind roles --remove data --delete-charterCore roles (cto, architect, pragmatist, ic) are load-bearing and cannot be disabled. The four of them always run.
Dynamic subagent count. The kernel reads the active roster at the start of every run. If you have 7 roles enabled, review --diff emits 7 analysis spawns. Enable Grace and Fei-Fei and it emits 9. Add a 10th custom role and it emits 10. The Claude Code skill dispatches exactly as many Task subagents as there are spawn requests — no config knob, no hardcoded limit. Same math for the API backend: parallel chat-completion calls scale with the roster.
What you can do
Review code
mastermind review --diff # review staged changes
mastermind review --file src/auth.ts # review a specific file
mastermind review --question "Split auth?" # free-form architectural question
mastermind review --diff --fast # skip cross-exam for a faster take
mastermind review --diff --format quick # 2 personas, 1 min
mastermind review --diff --format deep # 4-round deep review, 5+ min
mastermind review --diff --stream # stream tokens as they arrive
mastermind review --diff --html # render a self-contained HTML replay
mastermind review --diff --no-memory # skip persistent memory for this runOutput is structured markdown. Each role's analysis appears separately, followed by cross-examination, then the CTO's synthesis (run as a Layer 2 kernel spawn so Claude Code gives the moderator a real cold subagent too). Verdicts: APPROVED, APPROVED_WITH_CONDITIONS, or BLOCKED.
Propose and approve designs
mastermind rfc --title "Add rate limiting to API"
mastermind rfc --title "Auth v2" --attach docs/auth-spec.md docs/threat-model.mdThe IC drafts a proposal, the architect and security reviewer analyze it in parallel, the pragmatist weighs in, and the CTO makes the call. Architectural RFCs auto-spawn a Mermaid diagram. Approved RFCs get an Implementation Brief that the parent session can act on and an entry appended to decisions/log.md.
Debate and simulate
mastermind simulate --topic "Monolith vs microservices for payments" --rounds 5
mastermind simulate --topic "..." --participants architect,security,sre --scenario "We hit 10k RPS"N-round temporal debate where positions evolve. Each round, roles read the shared scratchpad and each other's prior positions. Terminates on convergence or round budget.
Reason through decisions
mastermind think --question "Queue vs polling for background jobs?"
mastermind think --question "Is this auth flow secure?" --roles security,architect --context "stateless JWT"Lightweight multi-perspective reasoning. Two analysts produce considerations and tradeoffs, the CTO synthesizes a structured ReasoningBrief. Fast (under 15s) and cheap.
Query past decisions
mastermind ask --question "Why did we pick PostgreSQL over MySQL?"Read-only Q&A over the company's own state: past decisions, open RFCs, role memories, and the Code Graph. One best-tier spawn. Never writes to memory, never touches the decision log.
Write and improve code
mastermind iterate --path src/rate-limiter.ts --goal "Add per-IP rate limiting with Redis backend"
mastermind iterate --path src/auth.ts --goal "Tighten error messages" --dry-run # plan without writing
mastermind iterate --path src/foo.ts --goal "..." --linked-ticket tkt-abc123 # audit-trail link
mastermind iterate-abort <runId> # stop a running loopAutonomous coding loop: the IC drafts a patch, the architect reviews it, tests run, and if they pass, the CTO commits to a dedicated mastermind/iterate-<runId> branch. If tests fail, the IC repairs and retries. Never touches main. Never force-pushes. Hard caps on budget (iterateMaxSpawns: 20), diff size (iterateMaxDiffLines: 500), and scope (iterateScopeWhitelist).
Requires autonomy.allowIterate: true in config (off by default, by design).
Investigate and plan
mastermind triage --issue "Users can't log in after password reset"
mastermind sprint-plan --goal "Ship the auth v2 migration"
mastermind postmortem --incident "API outage on 2026-04-01" --tag auth
mastermind backlog --next # what's highest priority
mastermind backlog --all # full board
mastermind backlog --role security # only Dario's ticketsEach workflow produces structured output with an Implementation Brief when there is actionable follow-up. backlog, ownership, audit, arbitrate, and roles are pure file I/O (no LLM, zero cost).
Diagrams
mastermind diagram "Request flow through auth middleware" --kind sequence
mastermind diagram "User state machine" --kind stateDiagram --out docs/states.mmdOne architect spawn produces a Mermaid diagram. Kinds: flowchart, sequence, classDiagram, stateDiagram, erDiagram. Also auto-invoked by rfc when the proposal is architectural.
Audit trail and ownership
mastermind audit --ticket tkt-f3a91d # show decision history
mastermind audit --ticket tkt-f3a91d --export ./audit/ # dump to markdown + JSON
mastermind ownership --list # who owns what
mastermind ownership --set security --glob "src/auth/**" # Dario owns auth
mastermind ownership --remove security --glob "src/auth/**"
mastermind override --debate-id <id> --blocker "sync I/O" --reason "Acceptable for MVP"override records an intentional override of a blocker with a paper trail. ownership feeds the ownership veto (see Safety rails below): when enabled, an owner's BLOCK on their own files becomes a hard stop regardless of what the moderator thinks.
Deadlock arbitration
mastermind arbitrate --list # any pending impasses?
mastermind arbitrate --run-id <id> --decision "Go with option 2"
mastermind arbitrate --run-id <id> --decision "..." --ticket-id tkt-f3a91dWhen agents can't agree after the configured number of rounds (autonomy.impasseRounds, default 3), the kernel writes an arbitration request and exits 73. A human resolves via arbitrate --decision, then the workflow resumes with mastermind resume <runId>. Pending arbitrations expire after 72h.
Setup and scaffolding
mastermind init # interactive wizard (provider + config + team)
mastermind init --skip-onboard # config only, skip team scaffolding
mastermind onboard --init # scaffold just .mastermind/ (no LLM)
mastermind company rebuild-graph # build the local Code Graph (SQLite)
mastermind company rebuild-graph --limit 500 # attribute more git historyExport personas to other tools
mastermind export --target cursor # .cursor/rules/*.mdc
mastermind export --target copilot # .github/copilot-instructions.md
mastermind export --target aider # CONVENTIONS.md + .aider/prompts/
mastermind export --target windsurf # .windsurfrules
mastermind export --target cursor --dry-run # preview without writingSame team charters, formatted for your editor of choice.
Replay a past run
mastermind replay <runId> # render HTML from the sidecar
mastermind replay <runId> --out docs/demo.htmlHTML is a single-page document with no CDN, no bundler, no network dependency. Ships in every run automatically when you pass --html; this command rebuilds it from an existing sidecar.
Resume a suspended workflow
cat results.json | mastermind resume <runId> # pipe spawn results back inUsed by the Claude Code skill after dispatching Task subagents. You rarely invoke it by hand unless you're debugging the suspend/resume protocol.
How memory works
Each role's memory lives at .mastermind/team/<role>/memory.md. Here is what the architect's might look like after a few runs:
## Standing Principles
- Repository layer must stay free of HTTP types.
## Active Threads
- [2026-04-02] Reviewing rfc-012, blocked on Jensen's latency numbers.
## Past Decisions (most recent first)
- 2026-03-28, Approved rfc-011, unanimousTwo important properties:
Every spawn sees only a slice. When the kernel fires a review of src/auth.ts, each role gets only the part of their memory that mentions auth. Nobody gets the whole file. This keeps input tokens under 2k per spawn and makes the pre-commit hook affordable.
Roles do not read each other's memories. Andrej cannot see Dario's notes. The only cross-pollination happens through the structured debate rounds, which is deliberate.
Memory compacts automatically over time. Charters are checked into git. Memory files are gitignored by default. That split lets a team share reviewer identities while each developer accumulates private memory.
The file structure
Every artifact Mastermind writes lives under .mastermind/. No hidden state, no cloud service, no external database.
.mastermind/
company.json # the roster: tier, core/optional, enabled flag
team/
<role>/
charter.md # who they are (you edit this; checked into git)
memory.md # what they've learned (auto-updated; gitignored)
inbox.md # messages from other roles (auto-updated)
decisions.md # their own decision log
decisions/
log.md # human-readable append-only decision trail
audit.json # per-ticket machine-readable audit index
rfcs/
000-index.md
<NNN>-<slug>.md # each approved RFC
<NNN>-<slug>.diagram.mmd # auto-generated Mermaid for architectural RFCs
sprints/
current.md # active sprint board
archive/<date>-<slug>.md # archived sprints
runbook/
README.md
<tag>.md # postmortem runbook entries, tagged by incident class
postmortems/
<date>-<slug>.md # blameless incident writeups
tickets/
registry.json # backlog with priorities, owners, status
transcripts/
<workflow>-<runId>.md # rendered markdown output
<workflow>-<runId>.json # sidecar for replay
<workflow>-<runId>.html # standalone HTML replay (when --html)
<runId>.pending.json # suspend/resume state (Claude Code)
<runId>.arbitration.json # pending impasses awaiting human resolution
iterate/
<runId>.abort # sentinel dropped by iterate-abort
cache/workflows/ # dedup cache for workflow runs
ownership.json # role → glob map for the ownership veto
graph.db # SQLite Code Graph (built by company rebuild-graph)Everything is human-readable. Open any file in your editor and you can see exactly what the team knows. Charters are tracked in git; memory.md, inbox.md, transcripts, tickets, and the graph are gitignored by default so each developer accumulates private memory while sharing reviewer identities.
Dual backend
Mastermind runs on two interchangeable execution substrates behind a single AgentRunner interface.
API backend (default): Each role's spawn becomes a chat.completions call to your configured provider. Works with Anthropic, OpenAI, Ollama, or any OpenAI-compatible endpoint. Cheap, fast, runs from any terminal.
Claude Code backend: When invoked from Claude Code (via the mastermind skill), each role gets a real cold subagent with its own context window. Full cognitive isolation. The kernel suspends, the parent dispatches parallel Agent tool calls, and results flow back via mastermind resume. Selected automatically when MASTERMIND_CLAUDE_CODE_PARENT=1 is set.
// mastermind.config.json
{
"runner": { "backend": "auto" } // "auto" | "api" | "claude-code"
}Configuration
All settings live in mastermind.config.json. Every field is optional. If you omit a section, built-in defaults are used. Run mastermind init to generate a starter config interactively, or create the file manually.
| Section | What it controls |
|---------|-----------------|
| provider | LLM endpoints, API keys, and per-tier models (cheap/medium/best → Haiku/Sonnet/Opus) |
| personas | Overrides for built-in persona system prompts (add NEW roles with mastermind roles --add instead) |
| personaDir | Directory of *.persona.json files loaded on startup |
| formats | Named round pipelines (default, quick, deep, plus any custom formats you define) |
| defaultFormat | Which format runs when --format isn't specified |
| memory | Token budget and compaction behavior for the rolling memory |
| agentMemory | Per-role memory slice sizing (maxCharsPerAgent, sharedContextMaxChars, topDecisionsCount) |
| prompts | Excerpt character limits for each round type (analysis, crossexam, moderator) |
| prefilter | Persona routing heuristics (tinyInputThreshold, mandatoryPersonas) |
| input | Hard limits on reviewable input (maxBytes, maxChars) |
| display | Banner, spinner, emojis, markdown templates, color |
| ai | Retry policy, moderator tuning, per-round validation limits |
| storage | SQLite data dir, past-debate context size, memory templates |
| autonomy | Effect confirmation (ask-everything / ask-on-write / ask-on-major / autonomous), reportThreshold, allowIterate, iterateMaxSpawns, iterateMaxDiffLines, iterateScopeWhitelist, iterateTestCommand, allowArbitration, impasseRounds |
| sandbox | Docker isolation for iterate test execution (enabled, image, timeout, mountReadOnly) |
| runner | Backend selection (auto / api / claude-code), transcript pending filename |
| workflows | Per-workflow roster and escalation overrides |
| features | Feature flags: ticketSystem, codeOwnership, auditTrail |
| output | Verbosity levels (silent / executive / normal / verbose), per-workflow overrides |
| integration | Exit codes for CI/CD and git hooks (see Exit codes below) |
| company | .mastermind/ directory location (dir, useProjectDir) |
Full reference: docs/config.md
Safety rails
Mastermind is designed to be safe to run unattended, including inside autonomous loops. The rails:
- Path validation. Every
--fileand--exportargument is resolved to an absolute path and checked against allowed roots (cwd, tmpdir, plus anything inMASTERMIND_ALLOW_PATHS). Path traversal (../../../etc/passwd) is rejected at the CLI boundary. - Ticket ID validation.
--ticketand--linked-ticketmust matchtkt-[a-z0-9-]{1,64}. No shell interpolation reaches downstream code paths. - Prompt injection defense. Guardrail retry feedback is wrapped in
<retry-feedback trust="untrusted">blocks with sanitized content (markdown headers neutralized,[INST]and role tags stripped) and explicit instructions telling the model to treat the feedback as data, not instructions. - Ownership veto (
features.codeOwnership). When enabled, if a role owns the files under review (perownership.json) and votes BLOCK, that veto becomes a hard stop regardless of the moderator verdict. Audited indecisions/audit.json. - Autonomy gates. The
autonomy.levelsetting controls which effects prompt for confirmation. Default isask-on-major, meaning writes todecisions/log.mdand ticket mutations confirm first. - Iterate rails. The autonomous coding loop is off by default. When enabled, it runs on a dedicated
mastermind/iterate-<runId>branch with hard caps on spawn count, diff size, and scope; a--dry-runflag shows what it would do without writing; andmastermind iterate-abort <runId>drops a sentinel that stops the loop at the next layer boundary. - Sandbox isolation (
sandbox.enabled).iteratecan run its test command inside a Docker container with--network none, read-only mounts, and a hard timeout. - Arbitration for deadlocks. When agents can't agree after
autonomy.impasseRoundsrounds, the kernel pauses and waits for human resolution viamastermind arbitrate. No silent forced convergence.
Exit codes
0 APPROVED (or APPROVED_WITH_CONDITIONS) — safe to proceed
1 BLOCKED — the team refused to approve. Stdout contains the full verdict.
2 error — network, validation, or unexpected failure
73 suspended — handoff to Claude Code (spawn requests on stdout) or arbitration pendingConfigure any of them under integration.exitCodes if your CI expects different numbers.
Pre-commit hook
./scripts/install-hook.shInstalls a git hook that runs mastermind review --diff on every commit. A BLOCKED verdict aborts the commit. An APPROVED verdict lets it through.
Benchmarks
Mastermind ships with an Exercism-based benchmark suite (10 exercises across TypeScript) to measure the iterate workflow's code generation quality:
npm run benchmark # run all exercises
npm run benchmark -- --suite typescript # filter by language
npm run benchmark -- --exercise leap # run one exerciseResults are written to benchmarks/results.json.
HTML replays
Add --html to any command to get a self-contained interactive replay:
mastermind review --diff --html
mastermind rfc --title "..." --htmlThe HTML files are single-page documents with no CDN, no bundler, and no network dependency. Open them directly in your browser.
Links
- Getting Started
- All Workflows
- Architecture and Internals
- Configuration Reference
- Multi-Tool Export
- Enterprise Integrations
- FAQ
- Roadmap
- Contributing
License
Apache 2.0
