mastermind-ai

v1.4.0

Published

6 days ago

A persistent AI engineering team you run as CEO: specialists who disagree catch what one pass misses, memory compounds across runs, and you keep the final say

Mastermind

The AI engineering team you run as CEO. It lives in your repo, its memory compounds across runs, and you keep the final say on every verdict.

Tools vs Team. Most "agentic" tools ship a tool catalog; Mastermind ships a team. See docs/positioning.md for the one-pager. Every number in that doc is generated by mastermind doctor honesty -- never hand-edited.

Mastermind is a persistent engineering team that attaches to your codebase, not a tool catalog you operate. A single agent reviewing or fixing a change is one optimistic draft with no adversary. Mastermind reads the same code through a panel of named specialists hand-authored to disagree, so the security, performance, or operability concern a single pass omits gets raised by someone whose whole job is to raise it. The specialists genuinely veto each other, every framing and tie-break is recorded on disk and replayable, and a human can pause the run and keep the final say on every verdict. Two wedges hold it together: coverage (catch what one pass misses) and trust (nothing is hidden, nothing ships without you). And the team gets more grounded over time -- each run's outcome is labeled kept, reverted, merged, or abandoned and written to the role that made the call, so memory compounds and corrects instead of starting cold.

Where these claims come from. The wedge is measured, not asserted, by an offline scoreboard (scripts/vision-eval.mjs, zero API, run over real engagements) and the honesty audit:

Coverage. 54 distinct review catches are logged across real runs; 8 of them surfaced only through cross-examination between specialists, so a single pass would have shipped them (benchmarks/coverage-catches.md).
Compounding memory. On a real engagement, 100% of role-memory writes carry a kept / reverted / merged / abandoned outcome label rather than bare run-status, so the team learns from what actually held, not just what ran.
Recall. 63% of spawns are hydrated with the exact prior-run memory slice that matters, measured per spawn, not estimated.
Legibility. 97.6% of recorded decisions are attributable to a named role on disk.

Autonomous multi-step delivery is the direction the engagement lifecycle is built toward, not a closed claim: its end-to-end convergence rate is still maturing, and this README says so rather than pretending the company ships unattended today. What is real now is coverage, compounding memory, and a control plane that keeps you in the loop.

Under the hood it is a kernel that runs twelve persona charters against your code through a provider-neutral agent graph. Each charter is a markdown file living in <your-repo>/.mastermind/team/<role>/: a system prompt, a memory log, an inbox, a decision log -- files you own, version, and edit. Spawns can use Anthropic, OpenAI, OpenAI-compatible endpoints, or local OSS models through explicit provider capabilities and optional alias routing. Their priorities are hand-authored to disagree with at least one other role on purpose. Their accumulated memory persists on disk and is sliced per spawn so cross-run context survives without inflating prompts.

The daily surfaces are review and iterate:

mastermind review -- a multi-pass panel on a diff, file, or question. The specialists read the same change under conflicting framings, cross-examine each other's findings, and the verdict is synthesized with named dissent. This is the wedge: coverage a single reviewer does not give you.
mastermind iterate -- a single autonomous code-improvement loop on a target file or set of files. The same panel coaches an implementer toward a patch that passes the tests. 5-stage pipeline (Plan, Specialize, IC, Critique, Test+Synth); tests are the final gate.

It scales up to mastermind engage -- a multi-day engagement that wraps iterate inside a seven-phase lifecycle (Discovery, Design, Plan, Build, Review, Ship, Handoff) for work bigger than one fix, where the brief itself needs interpretation. Persisted across sessions and resumable after /compact, after a crash, after a multi-day pause. The same team, more jobs.

It joins your coding environment -- it does not replace it. The same kernel ships as a Claude Code plugin (slash commands + persona subagents) and as a standalone mastermind CLI. OpenAI/Codex-style and local-model workflows use the API backend today; the parent-agent protocol is named generically so more parent runtimes can attach without removing the existing Claude Code path. Same personas, same memory, same disk state.

The local Mission Control HUD has been rewritten in SolidJS + Vite + Tailwind for v1.3. Five surfaces (firm landing, workbench config editor, role profile, decisions timeline, run inspector) sit on a transparency data plane (streaming reasoning, per-spawn metadata, live token meters, plan trees, architecture snapshots) plus a persistent <InterveneBar> for Pause / Resume / Inject / Redirect / Override / Approve. WCAG 2.2 AA on every page; the v1 string-rendered HUD remains at ?legacy=1 for one release.

What this is not. It is not twelve specialists with different capabilities by default. The personas share one model, one tool surface, and one context unless you explicitly route them otherwise. What differs in the default setup is the prompt framing each one gets, what each prioritizes, and what each remembers from prior runs.

Genuine role differentiation is opt-in. features.skillPacks appends per-role rule citations to a persona's system prompt (Kent, Dario, Linus, and Andrej ship packs today). workflows.<id>.providerRouting routes specific roles to named provider aliases. Per-spawn modelTier overrides escalate or downgrade individual roles. Each run's replay records which provider, model, and rules every spawn actually used, so any differentiation you configure is auditable.

The value here is multi-pass deliberation, persistent on-disk state, and the test-gated iterate loop. It is not "each persona is a specialist out of the box"; you assemble that yourself when you configure it.

How `iterate` actually works

iterate is the headline. It is not a single subagent fixing a file. It is a five-stage pipeline running on a dedicated branch (mastermind/iterate-<runId>), with hard caps on diff size, spawn count, scope, and dollar spend.

Stage 1 -- Plan (Sam). The pragmatist reads the goal and the target files, decomposes into ordered subtasks, and returns a structured plan. Skip with --no-plan when you want the IC to draft against the goal directly. The plan is recorded to .mastermind/runs/<runId>/plan.md and replayed in the end-of-run summary.

Stage 2 -- Specialize. A pre-flight pass searches the codebase for relevant patterns (call sites, similar functions, prior changes via git history) and runs the linter against the target files. The output is seeded into Lex's context as real evidence, not just the prompt. This is what stops the IC from inventing API surface that does not exist.

Stage 3 -- IC (Lex). Lex drafts the patch. Opus-tier by charter default; no v1.3 surface exposes an override path, so in practice the tier is fixed for this role. He reads the spec plus the specialize output, opens the files, writes the diff, and surfaces every assumption inline as // NOTE: comments so reviewers see the decision without asking. Output is one of: unified diff (--- a/file\n+++ b/file\n@@ ... @@), full-file replacement, or structured edit blocks. The applier (src/agents/iterate/apply-diff.ts) handles all three formats, then falls back to git apply then patch -F3 for fuzzy context matching when whitespace drifts.

Stage 4 -- Critique (Kent + rotating panel). Kent flags idiomatic-hygiene concerns: errors swallowed without rethrow, names that lie, the unhappy path missing, claimed test coverage that does not exist. With features.skillPacks ON (default in v1.3), Kent's prompt is augmented with his 15-rule pack. The rotating panel adds security, performance, sre, and dx critics depending on file shape. After panel.maxRepairCycles consecutive rejections the run escalates to a CTO tiebreak that issues a binding approve / reject / replan verdict (Phase A.19). With iterate.panel.instantEscalationOnHardVeto ON (default in v1.3), a single block from any persona in iterate.panel.hardVetoRoles (default ['security', 'sre']) escalates on the first rejection. The CTO still has the final say. Per-round verdicts at .mastermind/runs/<runId>/round-report.md. Skip critique with --no-critique to fall back to Architect-only review.

Stage 5 -- Test + Synth. The patch applies on the dedicated branch. Tests run via the configured iterateTestCommand (or auto-discovered: package.json → npm test, Cargo.toml → cargo test, pyproject.toml / setup.py / setup.cfg → pytest). The end-of-session panel (Architect + Security + SRE) reads the diff, the test output, and the original goal, then emits a verdict: approve, concern, or block. Ada the CTO synthesizes the panel into a single ruling with named tradeoffs. Tests are the authoritative gate -- if they pass, the patch ships even if a persona voted block; the verdict is recorded in the run report so you can see which voices dissented.

Block path. If a persona votes block in any round, the block reason is injected into Lex's next round prompt as untrusted content (no prompt injection escape). Up to three rounds (MAX_CONSECUTIVE_FAILS=3). On the third consecutive failure the run halts and the human rules. mastermind iterate-resume <runId> --feedback "..." re-enters with new guidance after addressing the block.

Hard rails. autonomy.allowIterate: true is required before the patch ever lands on disk -- without it, every run is dry-run regardless of the flag. Diff cap is 2000 lines added+removed. Scope whitelist defaults to src/**. Cost cap defaults to none; --max-iterate-cost <USD> adds one, warning at 70% and aborting at 100%. The branch never auto-merges to main unless autonomy.allowMerge: true is set explicitly.

Why a multi-pass panel beats a single subagent

A single subagent fixing a file is one optimistic draft with no adversary. Whatever pattern it produces, no one challenges it.

Mastermind reads the same code multiple times under different prompt framings. The framings are hand-authored to conflict with each other on purpose:

Sam frames for shipping speed. He raises concerns about gold-plating.
Andrej frames for structure. He raises concerns about leaky abstractions and god objects.
Dario frames for security. He raises concerns about unsanitized input, missing auth, and crypto misuse.
Kent frames for code-craft. He raises concerns about swallowed exceptions and names that lie.
Linus frames for operability. He raises concerns about irreversible migrations.
Toby frames for drift. He flags when the patch is quietly building something different from the brief.

Each charter conflicts with at least one other on purpose. Sam vs Andrej on velocity-vs-structure. Dario vs Sam on threat probability. Linus vs Sam on rollback. In iterate today, panel concerns are advisory: they become repair context fed back to Lex's next round, and tests are the final gate. A panel-as-hard-gate path (iterate.panelGates) is on the Phase L roadmap; until that ships, treat the panel as a coaching layer and the test runner as the verdict.

Concrete example. A subagent told to "add caching to getUser" returns a patch that wraps the function in a 5-minute Redis cache. Ships. Two weeks later auth bypasses appear: when the cache returns, the role check inside getUser is skipped. With Mastermind, Dario flags it on day one ("the role check is part of the trusted boundary; you cannot cache the trusted result without re-validating") and Lex's next round scopes the cache to the user-record fetch only.

Persistent memory makes it cumulative. Each persona has a memory file that accumulates calls they have made. Across runs Andrej remembers the design decision he made last sprint; Dario remembers the auth concern he flagged previously. The kernel slices each role's memory per spawn so input tokens stay tight, but the institutional knowledge survives. The persistent on-disk state is the moat, not the "twelve specialists" framing.

The engagement lifecycle

mastermind engage "<brief>" --run opens an engagement and walks all seven phases. Each phase persists artifacts to .mastermind/engagements/<id>/. The kernel resumes from disk after a crash, after /compact, or after a multi-day pause.

Discovery -> Design -> Plan -> Build -> Review -> Ship -> Handoff
   Reid     Andrej +   Sam      Lex      Reid     Fei-Fei +   Reid
            Dario             (iterate)            Linus +
                                                   Andrej + Reid

1. Discovery (Reid). Reads the brief, decides if it is actionable as written. Async by default: takes a best-guess assumption, records it in clarifying.md, proceeds. --sync instead pauses the engagement and asks 1-3 clarifying questions.

2. Design (Andrej + Dario). Two parallel spawns at Opus tier (per-spawn override; their charters default to medium for review). Andrej produces module boundaries and risk areas. Dario produces a security annex. Skipped for trivial bug fixes when the heuristic decides design is overkill, or when the security charter is absent.

3. Plan (Sam). Decomposes the design into ordered subtasks sized for a single iterate run each. Output is a SubtaskPlan with files, rationale, and a dependency DAG. Trivially scoped briefs may produce a single subtask.

4. Build (Lex, wrapping iterate). Each subtask becomes one iterate run. The 5-stage pipeline above runs unchanged inside Build. The resulting IterateRecord is captured into phases.building.nestedRunId.

5. Review (Reid alignment). Iterate's end-of-session panel covered engineering quality. Review covers brief alignment: did what got built match what was asked? Reid emits green-light, red-flag-revise (pause, human can re-enter with gaps as input), or red-flag-block (engagement fails). Malformed parser output defaults to revise -- never silent green-light, never permanent block.

6. Ship (Fei-Fei + optional Linus + optional Andrej + Reid). Fei-Fei always drafts doc updates. Linus drafts a runbook update only when iterate touched runtime/ops paths (deploy/, ops/, server.*, migrations/). Andrej drafts an ADR only when design ran upstream or iterate touched architectural surface (types/, schema, interface). Reid composes the final PR description.

7. Handoff (Reid). Reads the brief plus iterate's CTO summary plus the PR draft plus engagement metadata, and produces a one-page delivery/handoff.md: brief recap, what was built, key decisions, cost summary, what's left, suggested next engagement. This is the screenshot-able artifact -- the document the human reads when they come back.

Vision guard at every phase boundary. Toby (cheap-tier dedicated watcher) reads the just-ended phase output and the original brief, emits a drift score 0-100. Bands: 0-20 proceed silently, 21-50 auto-correct (one prompt round to realign), 51-100 halt for human. Every other role can attach a vision_concerns field to its output. Two or more high-severity concerns from independent roles forces a halt regardless of Toby's score.

Autonomy budget. Every engagement starts with a budget of N actions (bug fix: 20, small feature: 35, medium: 50, refactor: 60, cross-cutting: 100, full-app: 200). Override with --budget N. Read-only spawns (Toby checks, Reid status reports, panel reviewers) do not consume budget. Exhaustion pauses; resume with engage:resume <id> --refresh-budget.

The team

Twelve roles, all active by default. Every charter is a markdown file you own at .mastermind/team/<id>/charter.md. Edit them. They are the source of truth.

| id | handle | tier | core | default | focus | |---------------|---------|---------------------------|------|-----------|-----------------------------------------------------------| | cto | Ada | medium (best on dispute) | yes | active | Final verdict, decision log, tie-breaker | | architect | Andrej | medium (best for design) | yes | active | Module boundaries, abstraction, maintainability | | pragmatist | Sam | cheap | yes | active | Scope, shipping velocity, user impact | | ic | Lex | best (charter default) | yes | active | Implementation drafts, code generation | | kent | Kent | medium | yes | active | Idiomatic patterns, failure-mode hygiene | | reid | Reid | medium | yes | active | Engagement lead: brief, alignment, PR composition, handoff| | security | Dario | medium (best for design) | no | active | Trust model, attack surface, defense in depth | | performance | Jensen | cheap | no | active | Latency, throughput, cost at scale | | sre | Linus | medium | no | active | Observability, rollback, 3am runbook | | toby | Toby | cheap | no | active | Vision watcher: drift detection at phase boundaries | | testing | Grace | medium | no | active | Coverage, edge cases, regression risk; iterate Test Analyst| | dx | Fei-Fei | medium | no | active | API clarity, naming, onboarding friction; engagement Ship docs |

Core roles (cto, architect, pragmatist, ic, kent, reid) cannot be disabled. The other six can be toggled with mastermind roles --disable <id> / --enable <id>. Add custom roles with mastermind roles --add. Grace runs as iterate's Test Analyst on test failures; Fei-Fei drafts engagement documentation in the Ship phase.

Tier strategy. IC stays at Opus -- the 2026 ablation that drove this project's reframe locked it. Routine review work runs at Sonnet. Design phases (Andrej + Dario) and dispute paths (Ada on iterate tiebreak) escalate back to Opus per-spawn. Pragmatist + Performance + Toby stay on Haiku because their work is rubric-pattern, not synthesis-heavy.

The override mechanism is spawn.modelTier ?? charter.tier. The charter declares the default; specific spawn sites hard-override when reasoning quality earns the premium.

Use it from Claude Code

/plugin add Haydn-opti/mastermind

Once installed, every workflow has a slash command and every persona has a subagent definition. The CLI keeps working -- the plugin is purely additive.

/mastermind:engage "Add per-IP rate limiting to src/auth using Redis" --run
/mastermind:review --diff
/mastermind:rfc --title "Move to row-level security?"
/mastermind:think --question "Queue or polling for this background job?"
/mastermind:iterate --path src/rate-limiter.ts --goal "fix off-by-one"
/mastermind:mission-control
/mastermind:insights

Single-persona consults bypass the kernel. Claude Code dispatches one of the 12 persona subagents for a one-shot review:

"Have Andrej review the design in docs/auth-v2.md"
   -> Claude Code dispatches the `andrej` agent

"What would Dario flag in src/auth/middleware.ts?"
   -> Claude Code dispatches the `dario` agent

"Let Reid draft a PR description from the diff"
   -> Claude Code dispatches the `reid` agent

No API key needed inside Claude Code. Spawns dispatch as Agent tool subagents under Claude Code's own OAuth. The kernel itself never calls a model -- it suspends with exit 73 and lets the parent session dispatch. Full protocol contract in SKILL.md.

Use it standalone

A note on names. The npm package is mastermind-ai (the -ai suffix avoided a name collision on npm). The binary it installs is mastermind. The Claude Code plugin and marketplace also use mastermind as the namespace, with mastermind-core plus per-workflow plugins (mastermind-review, mastermind-iterate, …) layered on top. So: install from npm with mastermind-ai, invoke from the shell as mastermind, install from Claude Code as mastermind (or any mastermind-<workflow> from the marketplace).

npm install -g mastermind-ai

# See it in action without an API key (copies a real iterate run's
# replay HTML into ./.mastermind-demo/iterate-example.html).
mastermind demo

# Then set up your team and start running workflows.
mastermind onboard --init
export ANTHROPIC_API_KEY=sk-ant-...

# Small fix: single iterate run on a target file (default workflow).
mastermind iterate --path src/rate-limiter.ts \
  --goal "fix off-by-one in the window edge case"

# Quick review on the current diff.
mastermind review --diff

# Larger work that needs interpretation: open a multi-day engagement.
mastermind engage "Refactor the auth middleware" --run

The default for daily work is iterate (one fix, one branch, tests as the gate) or review (multi-pass panel on a diff). Reach for engage when the brief is bigger than one fix, the scope itself is unclear, or the work spans multiple sessions and you want the seven-phase lifecycle (Discovery, Design, Plan, Build, Review, Ship, Handoff) as a structured trail. Engage is more capable but also more expensive -- a single typo fix should not run through seven phases.

mastermind onboard --init is an interactive wizard. It writes mastermind.config.json, scaffolds .mastermind/team/<role>/charter.md for every default role, and seeds the team's memory.

Provider configuration lives in mastermind.config.json:

Anthropic: ANTHROPIC_API_KEY set in env.
OpenAI: OPENAI_API_KEY set in env, plus a custom baseURL if you are routing through a compatible endpoint.
Ollama / custom: set baseURL to your local endpoint and an optional key.
Per-tier model override: provider.tiers.{cheap,medium,best} maps each tier to a specific model ID.

mastermind --help lists every command. The full reference lives in docs/commands.md.

Defaults and safety rails

The defaults are what they are because real autonomous coding agents fail in predictable ways. These rails encode the lessons.

Iterate is opt-in. autonomy.allowIterate: true is required before iterate ever patches a file. Without it, every iterate run is dry-run regardless of CLI flags. Default: false.

The firm runs by default in v1.3. Three differentiating flags now ship ON: features.skillPacks (named rule packs augment Kent, Dario, Linus, and Andrej on every review + iterate critique), features.codeOwnership (when an ownership map is present, an owner role's BLOCK on its own files flips the verdict to BLOCKED), and iterate.panel.instantEscalationOnHardVeto (a single block from security or sre escalates straight to the CTO tiebreak instead of becoming IC repair text). features.taskMasks stays OFF until you author a mask. mastermind engage redirects bugfix briefs with a single-file target to mastermind iterate unless --force is passed, and prompts for cost-shape confirmation before the first spawn (skipped on --yes or non-TTY).

Never auto-merge to main. Even with all sign-offs, an engagement only opens a PR draft on the dedicated branch. autonomy.allowMerge: true is the per-repo override. Default: false. Hard diff cap: 2000 lines added+removed.

Sandbox fail-closed. sandbox.enabled: true runs iterate's test command inside a Docker container with --network none and a hard timeout. Docker unavailable aborts unless sandbox.fallbackToLocal: true is set explicitly. Default: sandbox off, fall-back local.

No inline API keys. mastermind.config.json rejects an inline apiKey field unless --unsafe-inline-key is passed. Use environment variables or a secret manager. The check exists because the file is easy to commit by accident.

Encryption at rest (opt-in). features.encryptAtRest: true plus the MASTERMIND_ENCRYPTION_KEY env var (32-byte base64) wraps every line of events.ndjson in an AES-256-GCM envelope (MMV1 magic + nonce + GCM tag + ciphertext). Per-line framing keeps appends O(1); the reader handles mixed plaintext/encrypted lines so existing logs stay readable after the flag flips on. Generate a key with: node -e "console.log(require('crypto').randomBytes(32).toString('base64'))". Default OFF -- opt-in protects operators who haven't generated a key.

Path validation. --file and --export arguments resolve to absolute paths and are checked against cwd, tmpdir, and an optional MASTERMIND_ALLOW_PATHS env var. Path traversal is rejected at the CLI boundary.

Prompt injection defense. Every untrusted input (user briefs, file contents, test output, role memory, panel deliberation, IC output, phase artifacts re-injected into next-phase prompts) flows through renderUntrustedTag from src/prompt-safety.ts. Agent-produced text cannot inject instructions into the next round's system prompt.

Mission Control loopback only. The local web UI binds to 127.0.0.1 only and rejects any other host at runtime. URLs are token-gated. Mutating routes return 400 unless confirmation: "CONFIRM" is in the request body.

Autonomy levels. autonomy.level controls which effects prompt the user: ask-everything, ask-on-write, ask-on-major (default), autonomous.

Ownership veto. When features.codeOwnership is on and a role owns the files under review, that role's BLOCK vote is a hard stop regardless of the moderator verdict.

Self-evolution is gated and off. features.selfEvolution lets the team PROPOSE tier changes from its own labeled outcome history, and only propose: a persona is put forward for a higher tier only when it is consistently right including on hard, multi-file work, so being right on easy files and wrong on hard ones never earns a promotion. Applying a proposal needs an offline eval-gate (which never demotes the senior IC or drops a security or sre role to the cheapest tier) plus a typed CEO confirmation of the exact change, and every applied change is journaled with its before-tier so mastermind evolve revert restores it exactly. Default OFF. It is the last, most conservative capability, not the headline.

The cost shape

There is no single number we can give you. Cost depends on provider choice, model tier mix, prompt cache hit rate, and how complicated the task is.

What we can give you is the gates that bound spend.

--budget <USD> on every multi-agent workflow. Forecasts the run before any spawn fires; aborts with a forecast message if the estimate exceeds the budget. Pass --confirm to override.

--max-iterate-cost <USD> on iterate specifically. Hard cap that warns at 70% spent and aborts at 100% mid-run.

--cheap runs the review workflow with three Haiku personas and no cross-exam. Useful for first-pass triage on a large diff. The full panel costs more but blocks more.

Two-breakpoint Anthropic prompt caching is on by default for Anthropic-direct callers. The kernel writes one cache breakpoint after the system prompt and one after the conversation history block, so most multi-round runs see input tokens drop sharply on round 2+.

Per-spawn modelTier override. The single most expensive thing in any run is over-tiering. Charters declare the default tier; specific spawn sites hard-override when reasoning earns it. Routine synthesis runs at medium. Iterate's IC and dispute-path tiebreaks run at best.

Run snapshots. After every run, .mastermind/runs/<runId>/run.json contains exact token counts and dollar spend per layer per spawn. This is the source of truth for what a run actually cost.

The pricing table itself lives at src/pricing/models.ts and is updated when Anthropic publishes new rates.

Workflow surfaces

Mastermind ships eleven direct workflows, plus engage as the seven-phase lifecycle wrapper that composes them. The eleven cover three job families (understand the codebase, run review & ops, plan & deliberate) plus onboard as the one-time setup workflow. Every workflow is a first-class delegation surface.

Understand the codebase.

| Workflow | What it does | |----------------|---------------------------------------------------------------| | ask | Read-only Q&A over the team's accumulated memory | | diagram | Generate a Mermaid architectural sketch from a one-liner | | think | Quick multi-perspective reasoning on a question |

Run review & ops.

| Workflow | What it does | |----------------|---------------------------------------------------------------| | review | Multi-pass panel on a diff, file, or question | | iterate | Autonomous code-improvement loop on a target file or set | | triage | Investigate a bug, identify suspect files, propose remediation| | postmortem | Root cause + remediation for an incident |

Plan & deliberate.

| Workflow | What it does | |----------------|---------------------------------------------------------------| | rfc | Open an RFC, panel deliberation, CTO sign-off | | sprint-plan | Decompose a sprint goal into ordered tasks | | simulate | Multi-round debate to converge on an approach |

Lifecycle wrapper. engage walks the seven-phase lifecycle (Discovery, Design, Plan, Build, Review, Ship, Handoff) for multi-day briefs, calling iterate inside its Build phase. The CLI gates trivial bugfix briefs and redirects them to iterate unless --force is set.

Setup. onboard --init is the eleventh workflow -- interactive wizard that writes mastermind.config.json, scaffolds .mastermind/team/<role>/charter.md for every default role, and seeds the team's memory. Idempotent: rerunning after an upgrade picks up new built-in roles without overwriting your edits.

Each workflow emits an ## Implementation Brief HTML comment on completion, structured for the parent session to act on (fix-bug from triage, implement-rfc from rfc, execute-task from sprint-plan, etc). The Brief is the seam between deliberation and execution.

Mission Control

mastermind mission-control opens the v2 HUD (SolidJS + Vite + Tailwind, WCAG 2.2 AA). Append ?legacy=1 to the URL to fall back to the v1 string-rendered HUD for one release.

Five surfaces:

Firm landing -- engagement-board kanban (Discovery -> Handoff), active runs, live token + cost totals, role org chart.
Workbench (?layout=workbench) -- editable config + persona toggle grid via PATCH /api/config and POST /api/personas/:id/toggle. Atomic write with .bak, JSONC comments preserved through round-trip.
Role profile (?layout=role&id=<id>) -- charter, memory tail, agenda, recent decisions, disagreement profile, skill pack rules, activity sparkline, lifetime tokens.
Decisions timeline (?layout=decisions) -- cross-engagement paper trail backed by FTS5; filter by role / verdict / workflow / date / free text.
Run inspector (?layout=run&id=<runId>) -- animated spawn graph, streaming reasoning panel, live token meter, plan tree, state-machine diagram, architecture graph.

Transparency data plane: six SSE event types (spawn.start, spawn.reasoning_delta, spawn.tokens_delta, state.transition, plan.tree, architecture.snapshot) plus GET /api/customization/registry enumerating ~30 user-tunable settings. Persistent <InterveneBar> (Pause / Resume / Inject / Redirect / Override / Approve) on every long-running surface, backed by six POST /api/runs/:id/{...} endpoints; pause halts the workflow at the next layer or stage boundary via RunPausedError.

Engagement self-contained replay HTML export at POST /api/actions/replay-html (requires confirmation: "CONFIRM" like every mutation; writes a self-contained file that opens from file://).

All mutating routes require confirmation: "CONFIRM"; all routes are token-gated; the server binds loopback only.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme