claude-devkit-cli
v1.13.1
Published
CLI toolkit for spec-first development with Claude Code — hooks, commands, guards, and test runners
Maintainers
Readme
claude-devkit-cli
A lightweight, spec-first development toolkit for Claude Code. It enforces the cycle spec (with acceptance scenarios) → code + tests → build pass through custom commands, automatic hooks, and a universal test runner.
Works with: Swift, TypeScript/JavaScript, Python, Rust, Go, Java/Kotlin, C#, Ruby.
Dependencies: None (requires only Claude Code CLI, Node.js, Git, and Bash).
Optional: GraphAtlas MCP server for graph-based code intelligence — six skills use it automatically when present and fall back to grep when it isn't. See §3 Setup.
Table of Contents
- Philosophy
- Quick Start
- Setup
- Daily Workflows
- Commands Reference
- Automatic Guards (Hooks)
- Spec Format
- Customization
- Token Cost Guide
- Troubleshooting
- FAQ
1. Philosophy
The Core Cycle
SPEC (with acceptance scenarios) → CODE + TESTS → BUILD PASSEvery code change — feature, fix, or removal — follows this cycle. The spec is the source of truth. Acceptance scenarios (Given/When/Then) are embedded directly in the spec — no separate test plan file. If code contradicts the spec, the code is wrong.
Why Spec-First?
- Prevents drift. Acceptance scenarios live inside the spec — no separate test plan to fall out of sync.
- Tests have purpose. Scenarios derived from specs test behavior, not implementation details. This means tests survive refactoring.
- AI writes better code. When Claude Code has a spec with concrete Given/When/Then scenarios, it generates more accurate implementations and more meaningful tests.
- Reviews are grounded. Reviewers can check code against the spec rather than guessing at intent.
Principles
- Specs are source of truth — Code changes require spec updates first.
- Incremental, not big-bang — Test after each code chunk, not after everything is done.
- Tests travel with code — Every PR includes production code + tests + spec updates.
- Build pass is the gate — Nothing merges with failing tests.
- Everything in the repo — Specs, plans, tests, and code are version-controlled and reviewable.
2. Quick Start
Time needed: 5 minutes. Below is a realistic transcript — user input, what each skill actually asks, what it actually outputs. Nothing embellished.
npx claude-devkit-cli init . # one-time install
claude # open Claude CodeStep 1 — Spec the feature (/mf-plan)
You: /mf-plan "add tag filter to the todo list screen"
Claude → running /mf-plan
Phase 0: Codebase scan — found docs/specs/todo-list/todo-list.md (exists)
Phase 1: Scope Challenge — TodoList already supports text search; tag
filter can reuse the same query pipeline. Flagging: reuse vs rebuild.
AskUserQuestion: "Extend existing search, or add a separate tag filter UI?"
A) Extend search — one input, parsed (text + "tag:work") [lowest cost]
B) Separate tag chips — chips below the search box [clearer UX]
C) Both — chips drive the same query [most work]
You: B
Claude: Drafting spec…
docs/specs/todo-list/todo-list.md (Mode C update, snapshot created)
## Stories (added)
### S-007: Filter todos by tag (P0)
AS-014: Given todos tagged [work, home], when user taps "work" chip,
then only work-tagged todos show.
AS-015: Given no todos match the selected tags, when filter applied,
then empty state shows "No todos match these tags".
What Already Exists: TodoRepository.query() handles predicates — reuse.
Not in Scope: saving favorite tag combinations (deferred).Complex spec? Run
/mf-challenge docs/specs/<feature>/<feature>.mdbefore coding. It spawns parallel hostile reviewers (Security Adversary, Failure Mode Analyst, Assumption Destroyer, Scope & YAGNI Critic) that quote the spec and propose surgical edits — e.g. "AS-014 doesn't define tag case-sensitivity → add constraint or AS". Skip for simple CRUD like this tag filter.
Step 2 — Implement + build (/mf-build)
You: (edit TodoListView.swift, TagFilterChip.swift) — now /mf-build
Claude → running /mf-build
Phase 1.5: Coverage Map
[+] TodoListView.swift
└── applyTagFilter(selected:)
├── [★★ TESTED] empty selection → returns all (new test)
├── [★★ TESTED] single tag match (new test)
├── [GAP] no-match empty state (← AS-015)
└── [★★ TESTED] multi-tag AND semantics (new test)
Writing test for [GAP]… building… running.
✓ 14 tests passed (3 new, 0 regressions)Step 3 — Bug found? (/mf-fix)
You: /mf-fix "selecting two tags shows OR results, should be AND"
Claude → running /mf-fix
Phase 0: Bug Path Diagram
applyTagFilter(selected: ["work","urgent"])
└── [BUG] predicate uses .contains(anyOf:) — should be .contains(allOf:)
Phase 1: writing failing test (AS-014 variant: 2 tags, AND)… FAILS ✓
Phase 2: changing .anyOf → .allOf in TodoRepository.swift:88 (1 line)
Phase 3: test now PASSES, full suite green.
DEBUG REPORT
Root cause: wrong set operator, regression from S-005 search refactor
Prevention: add AS for multi-tag AND to spec (auto-signaled)Step 4 — Review + commit (/mf-review → /mf-commit)
You: /mf-review
Claude → Smart Focus: UI state + query predicate
Failure Mode Grid: all paths have tests ✓ error states covered ✓
Finding (confidence 8/10): selected-tags state not persisted across
navigation. Not in current spec — logged under "Not in Scope".
Verdict: APPROVE
You: /mf-commit
Claude → secret scan: clean. debug-code scan: clean.
Staging 4 files. Conventional message:
feat(todo): filter list by selected tags with AND semantics
✓ commit a1b2c3d (not pushed — run `git push` when ready)Complex bug? Insert
/mf-investigate "<bug>"before/mf-fix. It's read-only, writesdocs/investigate/<slug>-<date>.mdwith hypotheses + blast radius, then/mf-fixauto-picks it up. Skip for trivial bugs.
That's the 5 minutes. The CLI auto-detected your project (Swift + XCTest here) — no config touched.
3. Setup
Prerequisites
| Tool | Required | Why |
|------|----------|-----|
| Claude Code CLI | Yes | Runs the commands and hooks |
| Git | Yes | Change detection, commit workflow |
| Node.js (18+) | Yes | File guard hook, JSON parsing |
| Bash (4+) | Yes | Path guard hook, shell-based hooks |
| Language toolchain | Yes | Whatever your project uses (Swift, npm, pytest, etc.) |
| GraphAtlas | Optional | Graph-based code intelligence — skills prefer it over grep when connected (see below) |
Installation
Option A: One-command install (recommended)
npx claude-devkit-cli init .Option B: Global install
npm install -g claude-devkit-cli
# Then, in any project:
cd my-project
claude-devkit init .Option C: Global skills install (available in all projects without running init again)
claude-devkit init --global
# or after per-project init, answer "yes" to the global promptSkills installed globally at ~/.claude/skills/ are available in every project. Per-project .claude/skills/ always takes precedence over global — so projects can still override individual skills.
Option D: Force re-install (overwrites existing files)
npx claude-devkit-cli init --force .Option D: Selective install (only specific components)
npx claude-devkit-cli init --only hooks,skills .What Gets Installed
your-project/
├── .claude/
│ ├── CLAUDE.md ← Project rules hub
│ ├── settings.json ← Hook wiring
│ ├── hooks/
│ │ ├── file-guard.js ← Warns on large files
│ │ ├── path-guard.sh ← Blocks wasteful Bash paths
│ │ ├── glob-guard.js ← Blocks broad glob patterns
│ │ ├── comment-guard.js ← Blocks placeholder comments
│ │ ├── sensitive-guard.sh ← Blocks access to secrets
│ │ └── self-review.sh ← Quality checklist on stop
│ └── skills/
│ ├── mf-explore/SKILL.md ← /mf-explore skill
│ ├── mf-scaffold/ ← /mf-scaffold skill (greenfield bootstrap)
│ │ ├── SKILL.md
│ │ └── references/ ← ARCHITECTURE/DESIGN templates, ADR template,
│ │ │ stack-profiles/ seeds (copy to ~/.claude or
│ │ │ ./.claude to customize — bundled copy is overwritten on upgrade)
│ │ ├── ARCHITECTURE.md.tmpl
│ │ ├── DESIGN.md.tmpl
│ │ ├── adr/NNNN-template.md
│ │ └── stack-profiles/react.md
│ ├── mf-plan/SKILL.md ← /mf-plan skill
│ ├── mf-challenge/SKILL.md ← /mf-challenge skill
│ ├── mf-build/SKILL.md ← /mf-build skill
│ ├── mf-investigate/SKILL.md ← /mf-investigate skill (optional, read-only)
│ ├── mf-fix/SKILL.md ← /mf-fix skill
│ ├── mf-review/SKILL.md ← /mf-review skill
│ ├── mf-commit/SKILL.md ← /mf-commit skill
│ ├── mf-spec-render/ ← /mf-spec-render skill (spec HTML view, user-invoked)
│ │ ├── SKILL.md
│ │ ├── template.html
│ │ ├── components.md
│ │ └── examples/
│ ├── mf-md-render/ ← /mf-md-render skill (generic markdown HTML view)
│ │ ├── SKILL.md
│ │ ├── template.html
│ │ └── components.md
│ ├── mf-voices/SKILL.md ← /mf-voices skill (multi-LLM review)
│ └── mf-humanize/SKILL.md ← /mf-humanize skill (rephrase to human voice)
└── docs/
├── specs/ ← Your specs (folder-per-feature)
│ └── <feature>/
│ ├── <feature>.md ← Spec with acceptance scenarios
│ └── snapshots/ ← Version history (managed by /mf-plan)
└── WORKFLOW.md ← Process referenceOptional: GraphAtlas Code Intelligence
The mf-* skills work out of the box with grep. But when GraphAtlas (GA) is connected as an MCP server, six skills — /mf-explore, /mf-plan, /mf-build, /mf-fix, /mf-review, /mf-investigate — prefer it over grep for code discovery, call-graph tracing, and blast-radius analysis.
Why it helps: grep can't tell a call site from a string literal, doesn't see polymorphic dispatch, and won't follow re-exports. An agent that edits one function but misses its callers, test files, and overrides in other modules ships a bug. GA indexes the repo once into a local graph with typed CALL / IMPORT / OVERRIDE edges, then answers structural questions deterministically in milliseconds with a small token footprint. It runs 100% locally — no LLM, no embeddings, no telemetry.
How the skills use it: each skill runs a one-time probe (ga_architecture) at the start. If GA responds, it leans on tools like ga_impact (blast radius + affected tests), ga_callers / ga_callees (call graph), ga_symbols (definition lookup), and ga_rename_safety. If GA is absent — or the index is stale — the skill falls back to grep/glob automatically. Nothing breaks; you only lose the precision.
Setup: GA is a separate tool, not bundled with this kit. Install and register it as an MCP server following the instructions at github.com/microvn/graphatlas. Once registered, the skills detect it on their own — no changes to this kit's config needed.
Post-Install Configuration
The CLI auto-detects your project type and fills in CLAUDE.md. Verify it's correct:
cat .claude/CLAUDE.mdLook for the Project Info section. Ensure language, test framework, and directories are correct. Edit manually if needed.
Upgrade
npx claude-devkit-cli upgradeSmart upgrade — updates kit files but preserves any you've customized. Use --force to overwrite everything.
# Check if update is available
npx claude-devkit-cli check
# See what changed
npx claude-devkit-cli diff
# View installed files and status
npx claude-devkit-cli listUninstall
npx claude-devkit-cli removeThis removes hooks, skills, and settings. It preserves CLAUDE.md (which you may have customized) and docs/ (which contains your specs).
4. Daily Workflows
New Project (Greenfield)
When: Brand-new project — no codebase yet (empty repo, no package manager /
src/).
1. /mf-explore "what you're building"
→ Detects greenfield, also decides app-type + stack (researched, current),
emits a Bootstrap Brief in docs/explore/<feature>.md.
2. /mf-scaffold
→ Generator-first runnable skeleton (core/ + one pattern-demonstrating module +
tests), smoke-gated (install→build→start GREEN), + ARCHITECTURE.md / ADRs.
Hands off only when it RUNS.
3. /mf-plan → /mf-build → normal New Feature flow, now on a runnable base.Explore Before Planning
When: Requirements are unclear, you're debating between approaches, or it's a brownfield feature with existing code to understand first.
1. /mf-explore "feature description"
→ Asks questions as a Client Technical Lead — one topic at a time.
→ Clarifies: why, behavior, boundaries, business rules, edge cases, permissions, UI.
→ Output: docs/explore/<feature>.md
2. /mf-plan "feature description"
→ Auto-detects docs/explore/<feature>.md, skips redundant discovery.
→ Continue with the normal New Feature flow.Example:
/mf-explore "cancel order request"New Feature
When: Building something new — no existing code or spec.
1. /mf-plan "description of the feature"
→ Generates spec with acceptance scenarios at docs/specs/<feature>/<feature>.md.
2. Implement code in chunks.
After each chunk: /mf-build
Repeat until green.
3. /mf-review (before merge)
4. /mf-commitExample:
/mf-plan "User authentication with email/password login, password reset via email, and session management with 24h expiry"Update Existing Feature
When: Changing behavior of something that already exists.
1. /mf-plan docs/specs/<feature>/<feature>.md "description of changes"
→ Mode C handles everything: snapshot → classification → change report → apply.
Do NOT manually edit the spec before running /mf-plan.
2. Implement the code change.
/mf-build
Fix until green.
3. /mf-review → /mf-commitBug Fix
When: Something is broken.
0. (OPTIONAL) /mf-investigate "description of the bug"
→ Use for complex bugs, outages, data corruption, or when the cause is unclear.
→ Read-only: hypothesis + blast radius + evidence, no code changes.
→ Writes docs/investigate/<slug>-<date>.md for /mf-fix to consume.
→ Skip for trivial/obvious bugs — go straight to /mf-fix.
1. /mf-fix "description of the bug" (or /mf-fix docs/investigate/<slug>-<date>.md)
→ Writes failing test → fixes code → runs full suite.
2. /mf-commitExample:
/mf-fix "Search returns no results when query contains apostrophes like O'Brien"Remove Feature
When: Deleting code, removing deprecated functionality.
1. /mf-plan docs/specs/<feature>/<feature>.md "remove stories S-XXX"
→ Mode C creates a snapshot (removing stories = Major), then marks as removed.
2. Delete production code + related tests.
3. Run the full test suite (your project's native test command).
Fix cascading breaks.
4. /mf-commit5. Commands Reference
/mf-explore — Feature Discovery as Client Technical Lead
Usage:
/mf-explore "cancel order request"
/mf-explore "user notification preferences"When to use: Requirements are unclear, you're debating between approaches, or you want to clarify a feature deeply before committing to a spec. Runs before /mf-plan.
How it works:
- Phase 0: Codebase scan — Silently checks for existing code, related specs, and existing explore docs before asking anything.
- Phase 1: Why, not what — Asks what problem requires this feature, who faces it, and how they handle it today. Prevents building the wrong thing.
- Phase 2: Desired behavior — Walks through the flow step by step, identifies trigger and final result, checks for multi-role approval chains.
- Phase 2.5: UI/UX expectation — Clarifies interface type (table, form, wizard, dashboard). Offers sensible defaults when the client is unsure. Suggests simpler approaches when expectations are complex.
- Phase 3: Boundaries — Impact on existing screens, data changes, migration needs, out of scope, permissions.
- Phase 3.5: Scope optimization — Identifies what can ship fast vs what can defer to phase 2.
- Phase 4: Business rules & validation — Conditions, formulas (with real numbers), input validation, notifications, time constraints, concurrency.
- Phase 5: Edge cases — Empty states, error messages, double submit, network loss, limits, sensitive data, domain-specific cases (payment double-charge, booking overbooking, etc.).
- Phase 6: Scenario confirmation — Presents concrete happy path + unhappy paths with fake data. Confirms with user before proceeding.
- Phase 7: Handoff summary — Compiles everything into a structured doc, confirms with user, writes to
docs/explore/<feature>.md.
Output: docs/explore/<feature>.md — auto-detected by /mf-plan, which skips redundant discovery and maps explore findings directly to spec sections.
Token cost: 10–20k
/mf-scaffold — Greenfield Project Bootstrap
Usage:
/mf-scaffold # bootstrap from the Bootstrap Brief in docs/explore/
/mf-scaffold "Next.js + Nest pnpm monorepo" # standalone: gather app-type/stack itselfWhen to use: A brand-new project with no runnable codebase yet. Runs between /mf-explore (greenfield branch) and /mf-plan: mf-explore → mf-scaffold → mf-plan → mf-build. Skip if a runnable project already exists — go straight to /mf-plan. /mf-build's Foundation Gate refuses to start the TDD loop until this has produced a runnable harness.
How it works:
- Precondition — confirms greenfield; resumes a partial repo without clobbering user files.
- App-type + stack — taken from the Bootstrap Brief (or asked); never silently defaulted; current versions researched, not recalled from training memory. Optional layered stack profiles (
./.claude/>~/.claude/> kit seed) supply opinionated defaults; the Brief always wins. - Skeleton (generator-first) — official
create-*CLIs give real pinned deps (defends against hallucinated/typosquatted packages); monorepos orchestrated root-first; imposescore/+modules/+ co-located tests; seeds ONE module that demonstrates the architecture pattern (the template every feature copies). - Smoke gate (non-negotiable) —
install → build → start/smokemust be GREEN, with ≥1 real passing test (this resolvesTEST_CMDfor/mf-build). Not green → BLOCKED; never a half-scaffold. - Docs — fills
ARCHITECTURE.md(codemap + invariants), one ADR per major stack choice, optionalDESIGN.md. - Hygiene & handoff — secret scan,
.gitignore,.env.example; reports the resolvedTEST_CMD.
Output: a runnable walking skeleton + canonical docs. Thin by design — features come later via /mf-plan → /mf-build.
Token cost: 15–40k + real install/build time (heavier than other skills — it runs generators and builds).
/mf-plan — Generate Spec with Acceptance Scenarios
Usage:
/mf-plan "user authentication with OAuth2" # Mode A: new spec from description
/mf-plan docs/specs/auth/auth.md # Mode B: add scenarios to existing spec
/mf-plan docs/specs/auth/auth.md "add password reset flow" # Mode C: update existing specModes:
- Mode A — Creates a new spec with stories and acceptance scenarios from your description.
- Mode B — Reads an existing spec that has no acceptance scenarios yet, adds them.
- Mode C — Updates an existing spec: creates a snapshot before Major changes, shows a change report, waits for confirmation, then applies.
How it works:
- Phase 0: Codebase Awareness — Scans existing code,
docs/specs/, and project patterns before planning. Prevents specs that conflict with existing implementations. - Phase 1: Scope & Split + Scope Challenge — Evaluates feature size (>7 stories or >20 AS → must split). When a feature is large, applies Sizing & Phasing: Phase 1 (minimum viable — smallest slice with value), Phase 2 (core experience — happy path), Phase 3 (edge cases, polish), Phase 4 (optimization, monitoring) — each phase mergeable independently. Also runs a Scope Challenge before drafting: checks for existing code that already solves sub-problems (reuse vs rebuild), flags complexity smells (8+ files or 2+ new classes/services), searches for framework built-ins, checks for distribution needs (new artifact → CI/CD in scope?), and applies the Completeness Principle (complete version costs only
CC: ≤15mmore → recommend it directly). - Phase 2: Draft Spec — Generates a structured spec with stories and acceptance scenarios (Given/When/Then). Depth scales by priority: P0 gets full GWT + test data, P1 gets GWT, P2 gets 1-2 line descriptions. Runs consistency checks (CC1-CC6) before showing draft.
- Phase 3: Clarify Ambiguities — Systematically finds gaps across behavioral, data, auth, non-functional, integration, and concurrency dimensions. Questions include
(human: ~X / CC: ~Y)effort scales andCompleteness: X/10scores for each option. - Phase 4: Summary — Shows story counts, AS counts, implementation order, next steps. Every spec also gets a "What Already Exists" section (existing code that partially solves the problem) and a "Not in Scope" section (deferred work with rationale — prevents work from silently dropping).
Mode C (Update) adds:
- Classification — Walks through M1-M6 checklist to determine Major vs Minor change.
- Snapshot — Major changes trigger an automatic snapshot (
cp, bit-perfect) before editing. - Change report — Shows what will change, waits for user confirmation.
- Consistency check — Runs CC1-CC6 after every update.
Traceability IDs:
S-NNN— Stories (with priority P0/P1/P2)AS-NNN— Acceptance Scenarios (Given/When/Then, embedded in stories)FR-NNN— Functional Requirements (if needed)SC-NNN— Success Criteria (if needed)- IDs are immutable — deleted IDs are never reused.
Directory structure:
docs/specs/<feature>/
<feature>.md # single source of truth — always read this file
snapshots/ # version history (managed by mf-plan, not developers)
YYYY-MM-DD.md
YYYY-MM-DD-<REF>.mdOutput:
- Spec with acceptance scenarios:
docs/specs/<feature>/<feature>.md - (Optional) Scannable HTML view:
docs/specs/<feature>/<feature>.html— generated by running/mf-spec-render <feature>after/mf-plan./mf-plansuggests the command at the end of Phase 4 and Mode C but does not invoke it. Source.mdremains canonical; HTML is regenerable.
/mf-spec-render — Render Spec as HTML View
Usage:
/mf-spec-render <feature> # render by feature slug
/mf-spec-render docs/specs/auth/auth.md # render specific spec
/mf-spec-render docs/specs/billing/ # render spec dir
/mf-spec-render --all # bulk re-render all specs
/mf-spec-render # list + promptWhen to use: Decoupled from /mf-plan — you invoke it explicitly when you want the HTML view. /mf-plan writes the spec markdown and ends; it suggests /mf-spec-render at the end of Phase 4 and Mode C but never calls it automatically. Run it:
- After
/mf-planto generate the initial HTML view (sidebar TOC, story cards, collapsible AS) - After a Mode C update to refresh a now-stale
.html - After fixing a typo directly in
<feature>.md(no spec semantics changed, but HTML is stale) - For specs written before this skill existed
- Bulk (
--all) after changingtemplate.htmlorcomponents.md
How it works:
- Reads
docs/specs/<feature>/<feature>.md(+ sub-specs if multi-spec). - Reads
template.html+components.md(cached, not regenerated each call). - Parses spec: frontmatter, stories with priority badges, acceptance scenarios (Given/When/Then), constraints, change log, snapshots.
- Builds the HTML buffer in-memory using component snippets — copy verbatim, fill content. AI never writes CSS or component markup from scratch.
- Writes
<feature>.htmlnext to<feature>.mdin one Write call.
Output features (the rendered HTML):
- Sticky top bar: doc type + feature name + version + last-updated + counts (specs / stories / AS) + status pill (Active/Draft/Deprecated)
- Mandatory TL;DR card immediately after the title
- Sidebar TOC with scroll-spy + search filter, grouped by sub-spec (multi-spec) or by section (single)
- Story cards with priority badge (P0/P1/P2) + AS count badge
- AS as collapsible details (first AS of each story open by default), with Given/When/Then grid
- Constraint callouts (warning style), grouped per sub-spec for large specs
- Change Log and Snapshots collapsed by default
- Dark/light/auto theme toggle (system preference honored)
- Print stylesheet (sidebar hidden, all details expanded, page-break-aware)
- Self-contained: zero external dependencies, no CDN, opens offline
Source remains truth:
.mdis canonical. Edit.mdvia/mf-plan; regenerate.htmlvia this skill.- Never hand-edit the
.html. Re-rendering is idempotent — run/mf-spec-renderany time you want the HTML to catch up with the.md.
Token cost: 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).
/mf-md-render — Render Any Markdown as HTML View
Generic counterpart to /mf-spec-render. Same template/component architecture, but for arbitrary long-form markdown with no fixed schema — investigation reports, explore docs, RFCs, retros, design notes, READMEs.
Usage:
/mf-md-render docs/investigate/payment-bug-2026-05-16.md # render next to source
/mf-md-render <file.md> --out report.html # custom output path
/mf-md-render docs/notes/ # list + prompt
/mf-md-render # prompt for pathWhen to use: Any non-spec markdown you want as a scannable, shareable single HTML file. It refuses spec files (heading ### S-NNN:) and points you to /mf-spec-render instead.
How it works: Reads source + template.html + components.md, then uses an analyzer pattern (not fixed parsing) — each markdown chunk is mapped to the best component: numbered actions → step cards, GFM admonitions → callouts, ```mermaid → diagrams, pros/cons → compare cards, long appendices → collapsible. Builds the buffer in-memory, writes once.
Output features: sidebar TOC + scroll-spy + search, anchored headings with copy-link, code blocks with copy button + language label, Mermaid diagrams (CDN), 4-variant callouts (note/tip/warn/danger), step cards, compare cards, task lists, footnotes, figure+caption, dark/light/auto theme, scroll progress bar, mobile drawer, print stylesheet. Self-contained (only Mermaid loads from CDN).
Token cost: 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).
/mf-challenge — Adversarial Plan Review
Usage:
/mf-challenge docs/specs/auth/auth.md # challenge a spec
/mf-challenge "user authentication" # challenge by feature nameHow it works (7 phases):
Read & Map — Reads the spec (including acceptance scenarios) and maps: decisions made, assumptions (stated AND implied), dependencies, scope boundaries, risk acknowledgments, story-AS consistency.
Scale Reviewers — Assesses complexity and selects reviewers:
| Complexity | Signals | Reviewers | |------------|---------|-----------| | Simple | 1 spec section, <20 acceptance scenarios, no auth/data | 2 | | Standard | Multiple sections, auth or data involved | 3 | | Complex | Multiple integrations, concurrency, migrations, 6+ phases | 4 |
Spawn Reviewers — Launches parallel subagents, each with an adversarial lens:
Security Adversary
- OWASP Top 10
- Injection vectors
- Auth/authz bypass
- Crypto issues
- Data exposure
- Supply chain risks
Failure Mode Analyst — "Everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic"
- Partial failures
- Concurrency & race conditions
- Cascading failures
- Recovery paths
- Idempotency
- Observability gaps
Assumption Destroyer — "'It should work' is not evidence"
- Unverified claims
- Scale assumptions
- Environment differences
- Integration contracts
- Data shape assumptions
- Timing dependencies
- Hidden dependencies
Scope & YAGNI Critic — "The best code is no code. The best feature is the one you didn't build"
- Over-engineering
- Premature abstraction
- Missing MVP cuts
- Gold plating
- Simpler alternatives
Deduplicate & Rate — Collects all findings, removes duplicates, rates severity using a Likelihood x Impact matrix. Caps at 15 findings: keeps all Critical, top High by specificity, notes how many Medium were dropped. Each reviewer is limited to top 7 findings.
Adjudicate — Evaluates each finding: Accept (valid flaw, plan should change) or Reject (false positive, acceptable risk, already handled). 1-sentence rationale for each.
User Choice — Two modes: "Apply all accepted" (fast) or "Review each" (walk through one by one).
Apply — Surgical edits only to accepted findings. Doesn't rewrite surrounding sections.
Finding format: Each finding includes Title, Severity, Confidence score (9-10 = verified; 7-8 = strong match; 5-6 = note caveat; ≤4 = omit unless Critical), Location, Flaw description, Evidence (direct quote from the plan), step-by-step Failure scenario, and Suggested fix.
6 non-negotiable rules:
- Spawn reviewers in parallel (not sequential)
- Reviewers read files directly, not summarized content
- Be hostile — no praise, no softening
- Every finding must quote the plan directly as evidence
- Quality over quantity — 3 honest findings > 15 padded ones
- Skip style/formatting — substance only
When to use:
- After
/mf-plan, before coding — for complex features - Features involving auth, payments, data pipelines, multi-service integration
- NOT needed for simple CRUD, small bug fixes, or trivial features
Token cost: 15-30k (uses parallel subagents, doesn't bloat main context)
/mf-build — TDD Delivery Loop
Usage:
/mf-build # build all changes vs base branch
/mf-build src/api/users.ts # build specific file
/mf-build "user authentication" # build specific featureHow it works:
- Phase 0: Build Context — Finds changed files vs base branch, reads the spec (acceptance scenarios in
## Storiessection are the roadmap), checksdocs/specs/<feature>/.build-progressto resume from a previous interrupted session, reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists. - Phase 1: Decide What to Test — Determines test scope from acceptance scenarios. Applies the Completeness Principle: AI writes tests ~50x faster than humans, so if full coverage costs
CC: ≤15m, it writes complete tests without asking. Always checks 8 mandatory edge case categories: null/undefined, empty arrays/strings, invalid types, boundary values (min/max), error paths (network failures, DB errors), race conditions, large data (10k+ items), and special characters (Unicode, SQL chars). - Phase 1.5: Coverage Map — Before writing a single test, traces every code path (if/else, switch, guard, try/catch) AND user flows (double-click, stale session, navigate away mid-op). Draws an ASCII diagram marking each path as
[★★★ TESTED],[★★ TESTED],[★ TESTED], or[GAP]. Gaps marked[GAP] [→E2E]need E2E tests;[GAP] [→EVAL]need evals — when flagged, defines capability + regression evals before implementing and reports pass@1/pass@3. Regression rule: if the diff changes existing behavior with no covering test, a regression test is a CRITICAL requirement — no asking, no skipping. - Phase 2: Write Tests — Writes tests for every
[GAP]identified in the Coverage Map. Before moving to Phase 3, verifies: all public functions have unit tests, all API endpoints have integration tests, edge cases covered, error paths tested, tests independent, assertions specific. - Phase 3: Build and Run — Compiles/typechecks first, then runs tests.
- Phase 4: Fix Loop — If tests fail, fixes test code only (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks whether to fix production code or adjust the test — with effort scales
(human: ~X / CC: ~Y). - Phase 5: Report — Summary with test counts, results, coverage, files touched, and any E2E/eval gaps to follow up on.
Rules:
- Never changes production code without asking first
- Never deletes or weakens existing tests
- Never adds
skip/xit/@disabledto hide failures - Max 3 fix attempts — then stops and reports the issue
What NOT to test: Private/internal methods, framework behavior, trivial getters/setters, implementation details.
/mf-investigate — Read-Only Root Cause Investigation (Optional)
Usage:
/mf-investigate "production 500s after deploy on /api/orders"
/mf-investigate "intermittent data corruption in nightly sync"When to use: OPTIONAL branch before /mf-fix. Use for complex bugs, production outages, data corruption, unclear regressions, or when the user wants a diagnosis report without any code change. Skip for trivial/obvious bugs — go straight to /mf-fix.
What it does NOT do: Never edits source code, tests, or config. The only write it performs is the investigation report at docs/investigate/<slug>-<date>.md.
How it works (adaptive depth, auto-scales):
- Phase 1: Understand the Report — Extract symptom, expected, actual from
$ARGUMENTS. Asks ONE clarifying question via AskUserQuestion if required fields are missing. - Phase 2: Locate — Entry-point search (error/stack/function/feature), recurring-bug check (3+ fix commits on same pattern → architectural smell), data-flow trace, git history (regression signal).
- Phase 3: Pattern Match — 12 known bug patterns (nil propagation, race, state corruption, off-by-one, type coercion, stale cache, config drift, silent error swallow, ordering/timing, resource leak, merge conflict, API contract). Skipped if Phase 2 already produced a HIGH-confidence hypothesis.
- Phase 4: Form Hypothesis — Specific, testable, falsifiable. Location + mechanism + causal chain + disproof condition + confidence (HIGH/MEDIUM/LOW). 3-strike rule: if 3 hypotheses all stay below MEDIUM → escalate via AskUserQuestion.
- Phase 5: Map Blast Radius — Investigation scope, bug path diagram (skipped if ISOLATED), impact scope (direct/indirect/data/user-facing), similar-risk scan (5-min timebox).
- Phase 6: Recommend Next Steps — CRITICAL/HIGH/MEDIUM actions, test strategy, fix approach (minimal / targeted refactor / architectural).
- Output — Writes structured Investigation Report to
docs/investigate/<slug>-<date>.md. Signals/mf-fix <file>for handoff.
Status values: ROOT_CAUSE_FOUND | PROBABLE_CAUSE | INSUFFICIENT_EVIDENCE | BLOCKED
Iron Law: Follow evidence, never start with a theory. Every claim references file:line or git commit. INSUFFICIENT_EVIDENCE is a valid outcome — don't inflate confidence to ship a report.
Token cost: 8–15k
/mf-fix — Test-First Bug Fix
Usage:
/mf-fix "description of the bug"How it works:
- Phase 0: Investigate — Parses the bug report, locates relevant code, checks git history, and forms a root cause hypothesis. Then draws a Bug Path Diagram (same
[GAP]/[★★ TESTED]format as/mf-build) for the buggy function — if no specific[GAP]path can be identified, the hypothesis isn't specific enough yet. - Phase 1: Write Failing Test — Regression rule first: if the bug exists because the diff changed existing behavior with no test covering that path, a regression test is a CRITICAL requirement. Creates a test that reproduces the bug and MUST fail with current code.
- Phase 2: Fix — Minimal change only. Blast radius check: if fix touches >5 files, stops and asks before editing.
- Phase 3: Verify — Bug test must pass; full suite must show no new regressions.
- Phase 4: Root Cause Analysis — Documents: Symptom, Root cause, Gap (why wasn't this caught earlier?), Prevention (one of: type constraint, validation, lint rule, spec update). Non-optional for serious bugs.
- Phase 5: Report — Structured debug report with hypothesis, fix, evidence, and regression test reference.
Multiple bugs: Triages by severity, fixes one at a time, commits each separately.
/mf-review — Pre-Merge Quality Gate
Usage:
/mf-review # review all changes vs base branch
/mf-review src/auth/ # review specific directoryHow it works:
- Phase 0: Understand Intent — Reads commit messages, checks for related spec, expands blast radius. Also notes what already exists: flags if the diff rebuilds something that already exists in the codebase.
- Phase 1: Smart Focus — Auto-detects what to focus on based on the diff (auth → security, SQL → injection, payments → idempotency, etc.). Spends 60% of analysis on the primary focus.
- Phase 2: Review — Security, correctness, API/Backend patterns (unvalidated input, missing rate limiting, missing timeouts, missing CORS, error message leakage), spec-test alignment, code quality (including diagram maintenance: stale ASCII diagrams in comments are flagged), performance, a Failure Mode Grid for each new codepath (3 dimensions: test covers it? error handling exists? user sees a clear error or silent failure? — all 3 missing = Critical gap), and an AI-generated code addendum when reviewing AI-written changes (behavioral regressions, trust boundaries, architecture drift, model cost escalation).
- Phase 3: Report — Structured report. Every finding includes a confidence score
(confidence: N/10): 9-10 = verified in code; 7-8 = strong pattern match; 5-6 = possible false positive; <5 = appendix only. Includes a "Not in scope" section listing deferred work with rationale.
Proportional review: A 5-line doc change gets a light review. A 500-line auth rewrite gets file-by-file deep analysis.
Verdicts: APPROVE / REQUEST CHANGES / NEEDS DISCUSSION.
Rules:
- At least 1 positive note — reinforces good patterns, not just problems
- Never auto-fixes code — report only
- Checks spec-test alignment: code changed → spec/acceptance scenarios/tests also changed?
/mf-commit — Smart Git Commit
Usage:
/mf-commitHow it works:
- Analyze — Scans
git status, diff stats, and file contents in one pass. - Scan for secrets — Matches patterns:
api_key,token,password,secret,private_key,credential,auth_token. Hard block — stops immediately if found, non-negotiable. - Scan for debug code — Matches:
console.log,debugger,print(),TODO:remove,HACK:,FIXME:temp,binding.pry,var_dump. Soft warn — proceeds if you confirm. - Stage files — Stages specific files by name. Never uses
git add -A. - Generate message — Conventional format:
type(scope): description. Imperative tense ("add" not "added"), no period, WHAT+WHY not HOW. - Commit — Does NOT push (safe default). Ask Claude explicitly to push.
Large diff warning: If >10 files OR >300 lines changed, suggests splitting into smaller commits for easier review.
Never stages: .env, credentials, build artifacts, generated files, binaries >1MB.
Breaking changes: If the diff removes/renames a public function, export, or API endpoint, uses feat! or fix! type, or adds a BREAKING CHANGE: footer.
/mf-voices — Multi-LLM Review (Optional)
Usage:
/mf-voices # review current diff with multi-LLM panel
/mf-voices docs/specs/auth/auth.md # review a spec
/mf-voices src/payment/ # review specific filesWhen to use: Optional second opinion after /mf-review for high-stakes changes (auth, payment, data pipelines), when /mf-review returns mixed-confidence findings (most at 5–7), or any time you want cross-model verification before merge. Skip for routine refactors and small CRUD.
How it works:
- Detect available LLMs — Checks for OpenAI / Codex CLI / Gemini / Perplexity / Anthropic API / Ollama in priority order. Falls back to a self-spawned Claude sub-agent if no external LLM is available, with the limitation flagged in the report.
- Construct open-ended review prompts — Same material to every voice with a light bias nudge (correctness / security / design). No structured templates, no severity scale forced on reviewers — they think freely; we structure the synthesis.
- Call voices in parallel — 2–3 voices typically; temperature 0.3; graceful degradation if any voice fails.
- Synthesize — Parses free-form responses into findings, classifies severity/category ourselves, identifies CONSENSUS (2+ voices agree → REINFORCED), UNIQUE findings (single voice → flag for verification), and DISAGREEMENTS (voices contradict → present both sides; tiebreaker for HIGH+).
- Output report — Critical/High findings, disagreements, voice breakdown table, agreement rate (100% may indicate shared blind spot), blind spots (categories with 0 findings).
Decision points (all use AskUserQuestion): review type ambiguous, voice panel size for large reviews, voice unavailable, critical consensus finding, disagreement resolution, follow-up cost > $0.10, report destination.
Rules: Same material different lenses. Don't resolve disagreements — present both sides, human decides. Consensus ≠ correct (flag if agreement rate is 100%). Findings must be specific (auth.ts:47 not "code could be improved").
Token cost: 10–30k host + external API cost (Budget: ~$0.01–0.05; Standard: ~$0.05–0.20; Premium: ~$0.20–0.50 per review).
/mf-humanize — Rephrase to Human Voice
Usage:
/mf-humanize <paste plan/notes/draft> # infer format + audience from context
/mf-humanize reply jira <notes> # target a specific format
/mf-humanize draft a customer email <notes> # switch audience, hide implementationWhen to use: You have a plan, bullet notes, or AI-generated draft and want it rewritten into natural, send-ready text — a PR description, release note, slack announcement, postmortem, customer reply, LinkedIn post, or plain email. Not part of the spec-first dev cycle. Skip for pure translation, summarization, or generating content from zero.
How it works:
- Infer target format — From explicit instruction → session context → input shape → fallback to tight plain text. No fixed whitelist; uncommon or hybrid formats follow their own conventions.
- Infer audience — Engineering, customer, executive, public, or mixed. Same content, phrasing shifts by reader (technical terms for engineers, outcome-focused for customers).
- Preserve facts — Numbers, names, error codes, file paths, commands, URLs, commitments, and decisions are never paraphrased. Certainty is never softened ("will ship Monday" ≠ "hope to ship Monday").
- Strip AI tone — Removes em-dash overuse, banned buzzwords (EN + VI), hollow openings/closings, fake enthusiasm, and "rule of three" pile-ups. Varies sentence rhythm.
- Return send-ready text — The final version directly, no preamble, no explanation of edits.
Language: Follows the session's dominant language. Mixed Vietnamese-English is fine — technical terms stay untranslated.
Token cost: 2–6k, no external API.
6. Automatic Guards (Hooks)
Hooks run automatically — you don't invoke them. They provide passive protection.
File Guard (file-guard.js)
Trigger: After every Write or Edit operation. Action: If a modified source code file exceeds 350 lines, injects a warning suggesting modularization. Docs, configs, and templates are intentionally excluded — they are naturally long. Blocking: No — warns only, does not prevent the edit.
Checked extensions: .ts, .tsx, .js, .jsx, .py, .php, .rb, .rs, .go, .swift, .kt, .java, .cs, .cpp, .c, .dart, .vue, .svelte, .astro, and more.
Not checked: .md, .json, .yaml, .toml, .html, .css, .sh, and other non-source files.
Configuration:
# Change the line threshold (default: 350)
export FILE_GUARD_THRESHOLD=500
# Exclude files from checking (comma-separated globs)
export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js"Path Guard (path-guard.sh)
Trigger: Before every Bash command. Action: Blocks commands that reference large directories (node_modules, build artifacts, etc.). Blocking: Yes — prevents the command from running.
Default blocked paths:
node_modules, __pycache__, .git/objects, dist/, build/, .next/, vendor/, Pods/, .build/, DerivedData/, .gradle/, target/debug, target/release, .nuget, .cache
Configuration:
# Add project-specific blocked paths (pipe-separated)
export PATH_GUARD_EXTRA="\.terraform|\.vagrant|\.docker"Glob Guard (glob-guard.js)
Trigger: Before every Glob (file search) operation. Action: Blocks overly broad glob patterns at project root that would return thousands of files and fill the context window. Blocking: Yes — prevents the glob and suggests scoped alternatives.
What it blocks:
**/*.tsat project root (usesrc/**/*.tsinstead)**/*at project root (usesrc/**/*instead)*or**at project root- Any recursive glob without a specific directory prefix
What it allows:
src/**/*.ts— scoped to a specific directorytests/**/*.test.js— scoped to tests**/*.tswhen run from inside a scoped directory (e.g.,path: "src")
Comment Guard (comment-guard.js)
Trigger: After every Edit operation.
Action: Detects when real code is replaced with placeholder comments like // ... existing code ... or // rest of implementation. This is a common LLM laziness pattern.
Blocking: Yes — rejects the edit and tells Claude to preserve the original code.
What it catches:
// ... existing code ...,// ... rest of implementation// [previous code remains],// unchanged/* ... */replacing real code# ... existing ...(Python placeholders)// TODO: implementreplacing real code- Any edit where real code is replaced with a much shorter comment-only block
What it allows:
- Editing comments (old content was already comments)
- Adding comments alongside code (new content has both)
- Normal code replacements
Sensitive Guard (sensitive-guard.sh)
Trigger: Before every Read, Write, Edit, and Bash command.
Action: Protects files containing secrets: .env, private keys, credentials, tokens.
Blocking: Read/Write/Edit → blocks (exit 2). Bash commands → warns only (allows access).
The Bash warn-only behavior enables an approval flow: Claude asks the user for permission, and if approved, can use bash cat .env to read the file.
Protected files:
.env,.env.local,.env.production, etc. (but NOT.env.example)- Private keys:
*.pem,*.key,*.p12,*.pfx,*.jks - SSH keys:
id_rsa,id_ecdsa,id_ed25519 - Cloud credentials:
serviceAccountKey.json,firebase-adminsdk* - Token files:
.npmrc,.pypirc,.netrc - Any file matching
*credential*,*secret*,*private_key*
Supports .agentignore: Create a .agentignore file (or .aiignore, .cursorignore) in the project root with gitignore-style patterns to add project-specific protections.
Configuration:
# Add extra patterns (pipe-separated regex)
export SENSITIVE_GUARD_EXTRA="\.vault|.*_token\.json"Self-Review (self-review.sh)
Trigger: When Claude is about to stop (Stop event). Action: Injects a self-review checklist reminding Claude to verify quality before finishing. Blocking: No — just a reminder.
Questions asked:
- Did you leave any TODO/FIXME that should be resolved now?
- Did you create mock/fake implementations just to pass tests?
- Did you replace real code with placeholder comments?
- Do all changed files compile and typecheck cleanly?
- Did you run the full test suite, not just the new tests?
- Are there any files you modified but forgot to include in the summary?
Configuration:
# Disable self-review
export SELF_REVIEW_ENABLED=falseTesting Hooks Manually
You can test hooks by piping mock JSON payloads:
# ── Path Guard ──
# Should exit 2 (blocked)
echo '{"tool_input":{"command":"ls node_modules"}}' | bash .claude/hooks/path-guard.sh
echo $? # expect: 2
# Should exit 0 (allowed)
echo '{"tool_input":{"command":"ls src"}}' | bash .claude/hooks/path-guard.sh
echo $? # expect: 0
# ── File Guard ──
seq 1 250 > /tmp/test-large.txt
echo '{"tool_input":{"file_path":"/tmp/test-large.txt"}}' | node .claude/hooks/file-guard.js
# Should output JSON with additionalContext warning
# ── Comment Guard ──
# Should exit 2 (blocked — replacing code with placeholder)
echo '{"tool_input":{"old_string":"function hello() {\n return world;\n}","new_string":"// ... existing code ..."}}' | node .claude/hooks/comment-guard.js
echo $? # expect: 2
# Should exit 0 (allowed — replacing code with code)
echo '{"tool_input":{"old_string":"return a;","new_string":"return b;"}}' | node .claude/hooks/comment-guard.js
echo $? # expect: 0
# ── Sensitive Guard ──
# Should exit 2 (blocked)
echo '{"tool_input":{"file_path":".env"}}' | bash .claude/hooks/sensitive-guard.sh
echo $? # expect: 2
# Should exit 0 (allowed)
echo '{"tool_input":{"file_path":".env.example"}}' | bash .claude/hooks/sensitive-guard.sh
echo $? # expect: 0
# Should exit 0 (warn only — bash commands are allowed for approved access)
echo '{"tool_input":{"command":"cat .env.local"}}' | bash .claude/hooks/sensitive-guard.sh
echo $? # expect: 0 (with warning on stderr)
# ── Glob Guard ──
# Should exit 2 (blocked — broad pattern at root)
echo '{"tool_input":{"pattern":"**/*.ts"}}' | node .claude/hooks/glob-guard.js
echo $? # expect: 2
# Should exit 0 (allowed — scoped pattern)
echo '{"tool_input":{"pattern":"src/**/*.ts"}}' | node .claude/hooks/glob-guard.js
echo $? # expect: 07. Spec Format
Spec Template
Create specs at docs/specs/<feature>/<feature>.md:
# Spec: <Feature Name>
**Created:** 2026-04-02
**Last updated:** 2026-04-02
**Status:** Draft | Active | Deprecated
## Overview
What this feature does, why it exists, who uses it. 2-3 sentences.
## Data Model
Entities, attributes, relationships (if applicable).
## Stories
### S-001: <Story name> (P0)
**Description:** [user story]
**Source:** [optional: ticket/issue ref]
**Acceptance Scenarios:**
AS-001: <short description>
- **Given:** [state]
- **When:** [action]
- **Then:** [expected]
- **Data:** [test data]
AS-002: <short description>
- **Given:** [error state]
- **When:** [action]
- **Then:** [error handling]
### S-002: <Story name> (P1)
AS-003: <short description>
- **Given:** [state]
- **When:** [action]
- **Then:** [expected]
### S-003: <Story name> (P2)
AS-004: <short description>
- [flow description + expected behavior]
## Constraints & Invariants
Rules that must always hold.
## Change Log
| Date | Change | Ref |
|------|--------|-----|
| 2026-04-02 | Initial creation | -- |Skip sections that don't apply. Match depth to feature complexity.
Acceptance Scenario depth by priority:
- P0: Full Given + When + Then + Data + Setup. At least 1 happy path + 1 error path.
- P1: Given + When + Then. At least 1 happy path.
- P2: 1-2 line flow description. At least 1 scenario.
Snapshots (Version History)
When /mf-plan Mode C detects a Major change (new story, removed story, priority change, flow change, behavior change for P0, or constraint change), it automatically creates a snapshot before updating:
docs/specs/<feature>/snapshots/
2026-04-02.md ← full copy at that point in time
2026-04-05-BILL-101.md ← with ticket referenceSnapshots are immutable, managed by mf-plan (not developers), and capped at 5 most recent.
Naming Conventions
| Item | Convention | Example |
|------|-----------|---------|
| Spec directory | docs/specs/<feature>/ | docs/specs/user-auth/ |
| Spec file | <feature>.md in feature directory | user-auth.md |
| Story ID | S-NNN sequential per spec | S-001, S-005 |
| Scenario ID | AS-NNN sequential across all stories | AS-001, AS-042 |
| Priority | P0 (critical), P1 (important), P2 (nice-to-have) — per story | — |
| Snapshot | YYYY-MM-DD.md or YYYY-MM-DD-<REF>.md in snapshots/ | 2026-04-02.md |
8. Customization
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| FILE_GUARD_THRESHOLD | 200 | Max lines before file guard warns |
| FILE_GUARD_EXCLUDE | (empty) | Comma-separated globs to skip (e.g. *.generated.swift) |
| PATH_GUARD_EXTRA | (empty) | Additional pipe-separated patterns to block (e.g. \.terraform) |
| SENSITIVE_GUARD_EXTRA | (empty) | Additional pipe-separated patterns for sensitive files (e.g. \.vault) |
| SELF_REVIEW_ENABLED | true | Set to false to disable the self-review checklist on Stop |
Set these in your shell profile or project .envrc (if using direnv).
Extending CLAUDE.md
Add project-specific rules to .claude/CLAUDE.md:
## Project-Specific Rules
- All API endpoints must have OpenAPI annotations
- Database migrations must be reversible
- UI components must support dark mode
- All strings must be localized via i18n keysAdding Custom Skills
Create new skills in .claude/skills/<name>/SKILL.md:
# .claude/skills/deploy/SKILL.md
Run the deployment pipeline:
1. /mf-review
2. /mf-commit
3. Run: bash scripts/deploy.sh $ARGUMENTS
4. Verify deployment health: curl -f https://api.example.com/healthThen use: /deploy staging
9. Token Cost Guide
| Activity | Tokens | Frequency |
|----------|--------|-----------|
| /mf-scaffold (greenfield bootstrap) | 15–40k + install/build time | Once per new project, before the first spec |
| /mf-build (incremental, 1-3 files) | 5–10k | Every code chunk |
| /mf-investigate (complex bug) | 8–15k | OPTIONAL before /mf-fix — complex/outage only |
| /mf-fix (single bug) | 3–5k | As needed |
| /mf-commit | 2–4k | Every commit |
| /mf-review (diff-based) | 10–20k | Before merge |
| /mf-plan (new feature) | 20–40k | Start of feature |
| /mf-challenge (adversarial review) | 15–30k | After /mf-plan, complex features |
| /mf-spec-render (HTML view) | 3–8k | User-invoked after /mf-plan when HTML view wanted, or to refresh stale .html |
| /mf-md-render (HTML view, any md) | 3–8k | User-invoked for non-spec markdown — investigation, explore, RFC, retro, README |
| /mf-voices (multi-LLM review) | 10–30k + external API cost (~$0.01–0.50) | Optional — after /mf-review for high-stakes changes |
| Full audit (manual prompt) | 100k+ | Before release |
Minimizing Token Usage
- Test incrementally.
/mf-buildafter each small chunk uses 5-10k. Waiting until everything is done then running/mf-buildon a large diff uses 50k+. - Use filters.
/mf-build src/auth/login.tsis cheaper than/mf-buildon the whole project. - Skip
/mf-planfor tiny changes. Under 5 lines with no behavior change? Just/mf-buildand/mf-commit. - Use
/mf-reviewonly before merge. Not after every commit.
10. Troubleshooting
Hook not firing
Symptom: File guard or path guard doesn't trigger.
Check:
- Is
settings.jsonvalid?node -e "JSON.parse(require('fs').readFileSync('.claude/settings.json','utf-8'))" - Are hooks executable?
ls -la .claude/hooks/ - Is Node.js available?
node --version - Is
$CLAUDE_PROJECT_DIRset? Check in Claude Code with:echo $CLAUDE_PROJECT_DIR
Tests not detected
Symptom: /mf-build or /mf-fix can't figure out how to run the tests.
Check:
- Are you in the project root?
pwd - Does the project marker file exist? (e.g.,
package.json,Cargo.toml,pyproject.toml) - If your test command is non-standard, set it explicitly in
.claude/CLAUDE.mdunder Testing so the skills use it.
Wrong base branch
Symptom: /mf-build or /mf-review compares against wrong branch.
Check:
git symbolic-ref refs/remotes/origin/HEADIf this is wrong or missing:
git remote set-head origin <your-main-branch>Path guard blocking a legitimate command
Symptom: Claude can't run a command you need.
Fix: The path guard blocks broad patterns. If you need to access build/ for a specific reason, run the command directly in your terminal (not through Claude Code).
File guard warning on generated files
Fix: Set the exclude pattern:
export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"11. FAQ
Q: Do I need specs for every tiny change?
A: No. Changes under 5 lines with no behavior change can skip the spec. Just /mf-build and /mf-commit. The spec-first rule is for meaningful behavior changes.
Q: Can I use mocks in tests? A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.
Q: What if Claude writes a test that tests the wrong thing?
A: This usually means the spec is ambiguous. Clarify the spec first, then re-run /mf-build. Good specs produce good tests.
Q: Can I use this with other AI coding tools? A: The commands and hooks are Claude Code-specific. The specs and workflow work with any tool or manual workflow.
Q: When should I use /mf-challenge?
A: After /mf-plan, for complex features involving authentication, payments, data pipelines, or multi-service integration. It spawns parallel hostile reviewers that find security holes, failure modes, and false assumptions BEFORE you write code. Skip it for simple CRUD or small features — the overhead isn't worth it.
Q: How do I do a full coverage audit? A: This is intentionally not a command (it's expensive and rare). When needed, prompt Claude directly: "Audit test coverage for feature X against docs/specs/X/X.md acceptance scenarios. Identify gaps and write missing tests."
Q: What if my project uses multiple languages?
A: The skills auto-detect the test command from the first project marker they find. For monorepos, run /mf-build from each sub-project directory, or pin the test command per project in .claude/CLAUDE.md under Testing.
Q: Can I add more skills?
A: Yes. Create a directory .claude/skills/<name>/SKILL.md and it becomes available as a slash command. See Customization.
Q: How do I update the kit in existing projects?
A: Run npx claude-devkit-cli upgrade. It automatically detects which files you've customized and only updates unchanged files. Use --force to overwrite everything.
Q: What's the HTML view next to my spec, and how do I generate it?
A: It's a scannable view of the spec — sidebar TOC, story cards, collapsible AS, dark/light theme. Reading a 1000-line spec markdown in an editor is painful; the HTML is what a tired human can actually skim. Generate or refresh it by running /mf-spec-render <feature> — /mf-plan does not create it automatically, it just suggests the command at the end. .md remains the source of truth (AI and /mf-build read it, git diffs work normally). .html is a regenerable artifact — never edit it by hand, let /mf-spec-render rebuild it. You can email/Slack the HTML to PMs/stakeholders who don't want to clone the repo.
Q: I installed with the old setup.sh — how do I migrate?
A: Run npx claude-devkit-cli init --adopt . to generate a manifest from your existing files without overwriting anything. Future upgrades will then work normally.
