claude-devkit-cli

v1.13.1

Published

3 days ago

CLI toolkit for spec-first development with Claude Code — hooks, commands, guards, and test runners

0High
0Medium
0Low

microvn

claude claude-code devkit spec-first testing hooks guards

claude-devkit-cli

A lightweight, spec-first development toolkit for Claude Code. It enforces the cycle spec (with acceptance scenarios) → code + tests → build pass through custom commands, automatic hooks, and a universal test runner.

Works with: Swift, TypeScript/JavaScript, Python, Rust, Go, Java/Kotlin, C#, Ruby. Dependencies: None (requires only Claude Code CLI, Node.js, Git, and Bash). Optional: GraphAtlas MCP server for graph-based code intelligence — six skills use it automatically when present and fall back to grep when it isn't. See §3 Setup.

1. Philosophy

The Core Cycle

SPEC (with acceptance scenarios) → CODE + TESTS → BUILD PASS

Every code change — feature, fix, or removal — follows this cycle. The spec is the source of truth. Acceptance scenarios (Given/When/Then) are embedded directly in the spec — no separate test plan file. If code contradicts the spec, the code is wrong.

Why Spec-First?

Prevents drift. Acceptance scenarios live inside the spec — no separate test plan to fall out of sync.
Tests have purpose. Scenarios derived from specs test behavior, not implementation details. This means tests survive refactoring.
AI writes better code. When Claude Code has a spec with concrete Given/When/Then scenarios, it generates more accurate implementations and more meaningful tests.
Reviews are grounded. Reviewers can check code against the spec rather than guessing at intent.

Principles

Specs are source of truth — Code changes require spec updates first.
Incremental, not big-bang — Test after each code chunk, not after everything is done.
Tests travel with code — Every PR includes production code + tests + spec updates.
Build pass is the gate — Nothing merges with failing tests.
Everything in the repo — Specs, plans, tests, and code are version-controlled and reviewable.

2. Quick Start

Time needed: 5 minutes. Below is a realistic transcript — user input, what each skill actually asks, what it actually outputs. Nothing embellished.

npx claude-devkit-cli init .   # one-time install
claude                          # open Claude Code

Step 1 — Spec the feature (`/mf-plan`)

You: /mf-plan "add tag filter to the todo list screen"

Claude → running /mf-plan
  Phase 0: Codebase scan — found docs/specs/todo-list/todo-list.md (exists)
  Phase 1: Scope Challenge — TodoList already supports text search; tag
           filter can reuse the same query pipeline. Flagging: reuse vs rebuild.

AskUserQuestion: "Extend existing search, or add a separate tag filter UI?"
  A) Extend search       — one input, parsed (text + "tag:work")   [lowest cost]
  B) Separate tag chips  — chips below the search box              [clearer UX]
  C) Both                 — chips drive the same query             [most work]

You: B

Claude: Drafting spec…
  docs/specs/todo-list/todo-list.md  (Mode C update, snapshot created)

  ## Stories (added)
  ### S-007: Filter todos by tag (P0)
    AS-014: Given todos tagged [work, home], when user taps "work" chip,
            then only work-tagged todos show.
    AS-015: Given no todos match the selected tags, when filter applied,
            then empty state shows "No todos match these tags".

  What Already Exists: TodoRepository.query() handles predicates — reuse.
  Not in Scope: saving favorite tag combinations (deferred).

Complex spec? Run /mf-challenge docs/specs/<feature>/<feature>.md before coding. It spawns parallel hostile reviewers (Security Adversary, Failure Mode Analyst, Assumption Destroyer, Scope & YAGNI Critic) that quote the spec and propose surgical edits — e.g. "AS-014 doesn't define tag case-sensitivity → add constraint or AS". Skip for simple CRUD like this tag filter.

Step 2 — Implement + build (`/mf-build`)

You: (edit TodoListView.swift, TagFilterChip.swift) — now /mf-build

Claude → running /mf-build
  Phase 1.5: Coverage Map
    [+] TodoListView.swift
        └── applyTagFilter(selected:)
            ├── [★★  TESTED] empty selection → returns all   (new test)
            ├── [★★  TESTED] single tag match                 (new test)
            ├── [GAP]         no-match empty state            (← AS-015)
            └── [★★  TESTED] multi-tag AND semantics          (new test)

  Writing test for [GAP]… building… running.
  ✓ 14 tests passed (3 new, 0 regressions)

Step 3 — Bug found? (`/mf-fix`)

You: /mf-fix "selecting two tags shows OR results, should be AND"

Claude → running /mf-fix
  Phase 0: Bug Path Diagram
    applyTagFilter(selected: ["work","urgent"])
      └── [BUG] predicate uses .contains(anyOf:) — should be .contains(allOf:)

  Phase 1: writing failing test (AS-014 variant: 2 tags, AND)… FAILS ✓
  Phase 2: changing .anyOf → .allOf in TodoRepository.swift:88 (1 line)
  Phase 3: test now PASSES, full suite green.

  DEBUG REPORT
    Root cause: wrong set operator, regression from S-005 search refactor
    Prevention: add AS for multi-tag AND to spec (auto-signaled)

Step 4 — Review + commit (`/mf-review` → `/mf-commit`)

You: /mf-review

Claude → Smart Focus: UI state + query predicate
  Failure Mode Grid: all paths have tests ✓ error states covered ✓
  Finding (confidence 8/10): selected-tags state not persisted across
    navigation. Not in current spec — logged under "Not in Scope".
  Verdict: APPROVE

You: /mf-commit

Claude → secret scan: clean. debug-code scan: clean.
  Staging 4 files. Conventional message:
    feat(todo): filter list by selected tags with AND semantics
  ✓ commit a1b2c3d (not pushed — run `git push` when ready)

Complex bug? Insert /mf-investigate "<bug>" before /mf-fix. It's read-only, writes docs/investigate/<slug>-<date>.md with hypotheses + blast radius, then /mf-fix auto-picks it up. Skip for trivial bugs.

That's the 5 minutes. The CLI auto-detected your project (Swift + XCTest here) — no config touched.

3. Setup

Prerequisites

| Tool | Required | Why | |------|----------|-----| | Claude Code CLI | Yes | Runs the commands and hooks | | Git | Yes | Change detection, commit workflow | | Node.js (18+) | Yes | File guard hook, JSON parsing | | Bash (4+) | Yes | Path guard hook, shell-based hooks | | Language toolchain | Yes | Whatever your project uses (Swift, npm, pytest, etc.) | | GraphAtlas | Optional | Graph-based code intelligence — skills prefer it over grep when connected (see below) |

Installation

Option A: One-command install (recommended)

npx claude-devkit-cli init .

Option B: Global install

npm install -g claude-devkit-cli

# Then, in any project:
cd my-project
claude-devkit init .

Option C: Global skills install (available in all projects without running init again)

claude-devkit init --global
# or after per-project init, answer "yes" to the global prompt

Skills installed globally at ~/.claude/skills/ are available in every project. Per-project .claude/skills/ always takes precedence over global — so projects can still override individual skills.

Option D: Force re-install (overwrites existing files)

npx claude-devkit-cli init --force .

Option D: Selective install (only specific components)

npx claude-devkit-cli init --only hooks,skills .

What Gets Installed

your-project/
├── .claude/
│   ├── CLAUDE.md              ← Project rules hub
│   ├── settings.json          ← Hook wiring
│   ├── hooks/
│   │   ├── file-guard.js      ← Warns on large files
│   │   ├── path-guard.sh      ← Blocks wasteful Bash paths
│   │   ├── glob-guard.js      ← Blocks broad glob patterns
│   │   ├── comment-guard.js   ← Blocks placeholder comments
│   │   ├── sensitive-guard.sh ← Blocks access to secrets
│   │   └── self-review.sh     ← Quality checklist on stop
│   └── skills/
│       ├── mf-explore/SKILL.md      ← /mf-explore skill
│       ├── mf-scaffold/             ← /mf-scaffold skill (greenfield bootstrap)
│       │   ├── SKILL.md
│       │   └── references/          ← ARCHITECTURE/DESIGN templates, ADR template,
│       │       │                       stack-profiles/ seeds (copy to ~/.claude or
│       │       │                       ./.claude to customize — bundled copy is overwritten on upgrade)
│       │       ├── ARCHITECTURE.md.tmpl
│       │       ├── DESIGN.md.tmpl
│       │       ├── adr/NNNN-template.md
│       │       └── stack-profiles/react.md
│       ├── mf-plan/SKILL.md         ← /mf-plan skill
│       ├── mf-challenge/SKILL.md    ← /mf-challenge skill
│       ├── mf-build/SKILL.md        ← /mf-build skill
│       ├── mf-investigate/SKILL.md  ← /mf-investigate skill (optional, read-only)
│       ├── mf-fix/SKILL.md          ← /mf-fix skill
│       ├── mf-review/SKILL.md       ← /mf-review skill
│       ├── mf-commit/SKILL.md       ← /mf-commit skill
│       ├── mf-spec-render/          ← /mf-spec-render skill (spec HTML view, user-invoked)
│       │   ├── SKILL.md
│       │   ├── template.html
│       │   ├── components.md
│       │   └── examples/
│       ├── mf-md-render/            ← /mf-md-render skill (generic markdown HTML view)
│       │   ├── SKILL.md
│       │   ├── template.html
│       │   └── components.md
│       ├── mf-voices/SKILL.md       ← /mf-voices skill (multi-LLM review)
│       └── mf-humanize/SKILL.md     ← /mf-humanize skill (rephrase to human voice)
└── docs/
    ├── specs/                 ← Your specs (folder-per-feature)
    │   └── <feature>/
    │       ├── <feature>.md   ← Spec with acceptance scenarios
    │       └── snapshots/     ← Version history (managed by /mf-plan)
    └── WORKFLOW.md            ← Process reference

Optional: GraphAtlas Code Intelligence

The mf-* skills work out of the box with grep. But when GraphAtlas (GA) is connected as an MCP server, six skills — /mf-explore, /mf-plan, /mf-build, /mf-fix, /mf-review, /mf-investigate — prefer it over grep for code discovery, call-graph tracing, and blast-radius analysis.

Why it helps: grep can't tell a call site from a string literal, doesn't see polymorphic dispatch, and won't follow re-exports. An agent that edits one function but misses its callers, test files, and overrides in other modules ships a bug. GA indexes the repo once into a local graph with typed CALL / IMPORT / OVERRIDE edges, then answers structural questions deterministically in milliseconds with a small token footprint. It runs 100% locally — no LLM, no embeddings, no telemetry.

How the skills use it: each skill runs a one-time probe (ga_architecture) at the start. If GA responds, it leans on tools like ga_impact (blast radius + affected tests), ga_callers / ga_callees (call graph), ga_symbols (definition lookup), and ga_rename_safety. If GA is absent — or the index is stale — the skill falls back to grep/glob automatically. Nothing breaks; you only lose the precision.

Setup: GA is a separate tool, not bundled with this kit. Install and register it as an MCP server following the instructions at github.com/microvn/graphatlas. Once registered, the skills detect it on their own — no changes to this kit's config needed.

Post-Install Configuration

The CLI auto-detects your project type and fills in CLAUDE.md. Verify it's correct:

cat .claude/CLAUDE.md

Look for the Project Info section. Ensure language, test framework, and directories are correct. Edit manually if needed.

Upgrade

npx claude-devkit-cli upgrade

Smart upgrade — updates kit files but preserves any you've customized. Use --force to overwrite everything.

# Check if update is available
npx claude-devkit-cli check

# See what changed
npx claude-devkit-cli diff

# View installed files and status
npx claude-devkit-cli list

Uninstall

npx claude-devkit-cli remove

This removes hooks, skills, and settings. It preserves CLAUDE.md (which you may have customized) and docs/ (which contains your specs).

4. Daily Workflows

New Project (Greenfield)

When: Brand-new project — no codebase yet (empty repo, no package manager / src/).

1. /mf-explore "what you're building"
   → Detects greenfield, also decides app-type + stack (researched, current),
     emits a Bootstrap Brief in docs/explore/<feature>.md.

2. /mf-scaffold
   → Generator-first runnable skeleton (core/ + one pattern-demonstrating module +
     tests), smoke-gated (install→build→start GREEN), + ARCHITECTURE.md / ADRs.
     Hands off only when it RUNS.

3. /mf-plan → /mf-build   → normal New Feature flow, now on a runnable base.

Explore Before Planning

When: Requirements are unclear, you're debating between approaches, or it's a brownfield feature with existing code to understand first.

1. /mf-explore "feature description"
   → Asks questions as a Client Technical Lead — one topic at a time.
   → Clarifies: why, behavior, boundaries, business rules, edge cases, permissions, UI.
   → Output: docs/explore/<feature>.md

2. /mf-plan "feature description"
   → Auto-detects docs/explore/<feature>.md, skips redundant discovery.
   → Continue with the normal New Feature flow.

Example:

/mf-explore "cancel order request"

New Feature

When: Building something new — no existing code or spec.

1. /mf-plan "description of the feature"
   → Generates spec with acceptance scenarios at docs/specs/<feature>/<feature>.md.

2. Implement code in chunks.
   After each chunk: /mf-build
   Repeat until green.

3. /mf-review (before merge)

4. /mf-commit

Example:

/mf-plan "User authentication with email/password login, password reset via email, and session management with 24h expiry"

Update Existing Feature

When: Changing behavior of something that already exists.

1. /mf-plan docs/specs/<feature>/<feature>.md "description of changes"
   → Mode C handles everything: snapshot → classification → change report → apply.
   Do NOT manually edit the spec before running /mf-plan.

2. Implement the code change.
   /mf-build
   Fix until green.

3. /mf-review → /mf-commit

Bug Fix

When: Something is broken.

0. (OPTIONAL) /mf-investigate "description of the bug"
   → Use for complex bugs, outages, data corruption, or when the cause is unclear.
   → Read-only: hypothesis + blast radius + evidence, no code changes.
   → Writes docs/investigate/<slug>-<date>.md for /mf-fix to consume.
   → Skip for trivial/obvious bugs — go straight to /mf-fix.

1. /mf-fix "description of the bug"  (or /mf-fix docs/investigate/<slug>-<date>.md)
   → Writes failing test → fixes code → runs full suite.

2. /mf-commit

Example:

/mf-fix "Search returns no results when query contains apostrophes like O'Brien"

Remove Feature

When: Deleting code, removing deprecated functionality.

1. /mf-plan docs/specs/<feature>/<feature>.md "remove stories S-XXX"
   → Mode C creates a snapshot (removing stories = Major), then marks as removed.

2. Delete production code + related tests.

3. Run the full test suite (your project's native test command).
   Fix cascading breaks.

4. /mf-commit

5. Commands Reference

/mf-explore — Feature Discovery as Client Technical Lead

Usage:

/mf-explore "cancel order request"
/mf-explore "user notification preferences"

When to use: Requirements are unclear, you're debating between approaches, or you want to clarify a feature deeply before committing to a spec. Runs before /mf-plan.

How it works:

Phase 0: Codebase scan — Silently checks for existing code, related specs, and existing explore docs before asking anything.
Phase 1: Why, not what — Asks what problem requires this feature, who faces it, and how they handle it today. Prevents building the wrong thing.
Phase 2: Desired behavior — Walks through the flow step by step, identifies trigger and final result, checks for multi-role approval chains.
Phase 2.5: UI/UX expectation — Clarifies interface type (table, form, wizard, dashboard). Offers sensible defaults when the client is unsure. Suggests simpler approaches when expectations are complex.
Phase 3: Boundaries — Impact on existing screens, data changes, migration needs, out of scope, permissions.
Phase 3.5: Scope optimization — Identifies what can ship fast vs what can defer to phase 2.
Phase 4: Business rules & validation — Conditions, formulas (with real numbers), input validation, notifications, time constraints, concurrency.
Phase 5: Edge cases — Empty states, error messages, double submit, network loss, limits, sensitive data, domain-specific cases (payment double-charge, booking overbooking, etc.).
Phase 6: Scenario confirmation — Presents concrete happy path + unhappy paths with fake data. Confirms with user before proceeding.
Phase 7: Handoff summary — Compiles everything into a structured doc, confirms with user, writes to docs/explore/<feature>.md.

Output: docs/explore/<feature>.md — auto-detected by /mf-plan, which skips redundant discovery and maps explore findings directly to spec sections.

Token cost: 10–20k

/mf-scaffold — Greenfield Project Bootstrap

Usage:

/mf-scaffold                                # bootstrap from the Bootstrap Brief in docs/explore/
/mf-scaffold "Next.js + Nest pnpm monorepo" # standalone: gather app-type/stack itself

When to use: A brand-new project with no runnable codebase yet. Runs between /mf-explore (greenfield branch) and /mf-plan: mf-explore → mf-scaffold → mf-plan → mf-build. Skip if a runnable project already exists — go straight to /mf-plan. /mf-build's Foundation Gate refuses to start the TDD loop until this has produced a runnable harness.

How it works:

Precondition — confirms greenfield; resumes a partial repo without clobbering user files.
App-type + stack — taken from the Bootstrap Brief (or asked); never silently defaulted; current versions researched, not recalled from training memory. Optional layered stack profiles (./.claude/ > ~/.claude/ > kit seed) supply opinionated defaults; the Brief always wins.
Skeleton (generator-first) — official create-* CLIs give real pinned deps (defends against hallucinated/typosquatted packages); monorepos orchestrated root-first; imposes core/ + modules/ + co-located tests; seeds ONE module that demonstrates the architecture pattern (the template every feature copies).
Smoke gate (non-negotiable) — install → build → start/smoke must be GREEN, with ≥1 real passing test (this resolves TEST_CMD for /mf-build). Not green → BLOCKED; never a half-scaffold.
Docs — fills ARCHITECTURE.md (codemap + invariants), one ADR per major stack choice, optional DESIGN.md.
Hygiene & handoff — secret scan, .gitignore, .env.example; reports the resolved TEST_CMD.

Output: a runnable walking skeleton + canonical docs. Thin by design — features come later via /mf-plan → /mf-build.

Token cost: 15–40k + real install/build time (heavier than other skills — it runs generators and builds).

/mf-plan — Generate Spec with Acceptance Scenarios

Usage:

/mf-plan "user authentication with OAuth2"                          # Mode A: new spec from description
/mf-plan docs/specs/auth/auth.md                                    # Mode B: add scenarios to existing spec
/mf-plan docs/specs/auth/auth.md "add password reset flow"          # Mode C: update existing spec

Modes:

Mode A — Creates a new spec with stories and acceptance scenarios from your description.
Mode B — Reads an existing spec that has no acceptance scenarios yet, adds them.
Mode C — Updates an existing spec: creates a snapshot before Major changes, shows a change report, waits for confirmation, then applies.

How it works:

Phase 0: Codebase Awareness — Scans existing code, docs/specs/, and project patterns before planning. Prevents specs that conflict with existing implementations.
Phase 1: Scope & Split + Scope Challenge — Evaluates feature size (>7 stories or >20 AS → must split). When a feature is large, applies Sizing & Phasing: Phase 1 (minimum viable — smallest slice with value), Phase 2 (core experience — happy path), Phase 3 (edge cases, polish), Phase 4 (optimization, monitoring) — each phase mergeable independently. Also runs a Scope Challenge before drafting: checks for existing code that already solves sub-problems (reuse vs rebuild), flags complexity smells (8+ files or 2+ new classes/services), searches for framework built-ins, checks for distribution needs (new artifact → CI/CD in scope?), and applies the Completeness Principle (complete version costs only CC: ≤15m more → recommend it directly).
Phase 2: Draft Spec — Generates a structured spec with stories and acceptance scenarios (Given/When/Then). Depth scales by priority: P0 gets full GWT + test data, P1 gets GWT, P2 gets 1-2 line descriptions. Runs consistency checks (CC1-CC6) before showing draft.
Phase 3: Clarify Ambiguities — Systematically finds gaps across behavioral, data, auth, non-functional, integration, and concurrency dimensions. Questions include (human: ~X / CC: ~Y) effort scales and Completeness: X/10 scores for each option.
Phase 4: Summary — Shows story counts, AS counts, implementation order, next steps. Every spec also gets a "What Already Exists" section (existing code that partially solves the problem) and a "Not in Scope" section (deferred work with rationale — prevents work from silently dropping).

Mode C (Update) adds:

Classification — Walks through M1-M6 checklist to determine Major vs Minor change.
Snapshot — Major changes trigger an automatic snapshot (cp, bit-perfect) before editing.
Change report — Shows what will change, waits for user confirmation.
Consistency check — Runs CC1-CC6 after every update.

Traceability IDs:

S-NNN — Stories (with priority P0/P1/P2)
AS-NNN — Acceptance Scenarios (Given/When/Then, embedded in stories)
FR-NNN — Functional Requirements (if needed)
SC-NNN — Success Criteria (if needed)
IDs are immutable — deleted IDs are never reused.

Directory structure:

docs/specs/<feature>/
  <feature>.md              # single source of truth — always read this file
  snapshots/                # version history (managed by mf-plan, not developers)
    YYYY-MM-DD.md
    YYYY-MM-DD-<REF>.md

Output:

Spec with acceptance scenarios: docs/specs/<feature>/<feature>.md
(Optional) Scannable HTML view: docs/specs/<feature>/<feature>.html — generated by running /mf-spec-render <feature> after /mf-plan. /mf-plan suggests the command at the end of Phase 4 and Mode C but does not invoke it. Source .md remains canonical; HTML is regenerable.

/mf-spec-render — Render Spec as HTML View

Usage:

/mf-spec-render <feature>                              # render by feature slug
/mf-spec-render docs/specs/auth/auth.md                # render specific spec
/mf-spec-render docs/specs/billing/                    # render spec dir
/mf-spec-render --all                                  # bulk re-render all specs
/mf-spec-render                                        # list + prompt

When to use: Decoupled from /mf-plan — you invoke it explicitly when you want the HTML view. /mf-plan writes the spec markdown and ends; it suggests /mf-spec-render at the end of Phase 4 and Mode C but never calls it automatically. Run it:

After /mf-plan to generate the initial HTML view (sidebar TOC, story cards, collapsible AS)
After a Mode C update to refresh a now-stale .html
After fixing a typo directly in <feature>.md (no spec semantics changed, but HTML is stale)
For specs written before this skill existed
Bulk (--all) after changing template.html or components.md

How it works:

Reads docs/specs/<feature>/<feature>.md (+ sub-specs if multi-spec).
Reads template.html + components.md (cached, not regenerated each call).
Parses spec: frontmatter, stories with priority badges, acceptance scenarios (Given/When/Then), constraints, change log, snapshots.
Builds the HTML buffer in-memory using component snippets — copy verbatim, fill content. AI never writes CSS or component markup from scratch.
Writes <feature>.html next to <feature>.md in one Write call.

Output features (the rendered HTML):

Sticky top bar: doc type + feature name + version + last-updated + counts (specs / stories / AS) + status pill (Active/Draft/Deprecated)
Mandatory TL;DR card immediately after the title
Sidebar TOC with scroll-spy + search filter, grouped by sub-spec (multi-spec) or by section (single)
Story cards with priority badge (P0/P1/P2) + AS count badge
AS as collapsible details (first AS of each story open by default), with Given/When/Then grid
Constraint callouts (warning style), grouped per sub-spec for large specs
Change Log and Snapshots collapsed by default
Dark/light/auto theme toggle (system preference honored)
Print stylesheet (sidebar hidden, all details expanded, page-break-aware)
Self-contained: zero external dependencies, no CDN, opens offline

Source remains truth:

.md is canonical. Edit .md via /mf-plan; regenerate .html via this skill.
Never hand-edit the .html. Re-rendering is idempotent — run /mf-spec-render any time you want the HTML to catch up with the .md.

Token cost: 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).

/mf-md-render — Render Any Markdown as HTML View

Generic counterpart to /mf-spec-render. Same template/component architecture, but for arbitrary long-form markdown with no fixed schema — investigation reports, explore docs, RFCs, retros, design notes, READMEs.

Usage:

/mf-md-render docs/investigate/payment-bug-2026-05-16.md   # render next to source
/mf-md-render <file.md> --out report.html                  # custom output path
/mf-md-render docs/notes/                                   # list + prompt
/mf-md-render                                                # prompt for path

When to use: Any non-spec markdown you want as a scannable, shareable single HTML file. It refuses spec files (heading ### S-NNN:) and points you to /mf-spec-render instead.

How it works: Reads source + template.html + components.md, then uses an analyzer pattern (not fixed parsing) — each markdown chunk is mapped to the best component: numbered actions → step cards, GFM admonitions → callouts, ```mermaid → diagrams, pros/cons → compare cards, long appendices → collapsible. Builds the buffer in-memory, writes once.

Output features: sidebar TOC + scroll-spy + search, anchored headings with copy-link, code blocks with copy button + language label, Mermaid diagrams (CDN), 4-variant callouts (note/tip/warn/danger), step cards, compare cards, task lists, footnotes, figure+caption, dark/light/auto theme, scroll progress bar, mobile drawer, print stylesheet. Self-contained (only Mermaid loads from CDN).

Token cost: 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).

/mf-challenge — Adversarial Plan Review

Usage:

/mf-challenge docs/specs/auth/auth.md   # challenge a spec
/mf-challenge "user authentication"     # challenge by feature name

How it works (7 phases):

Read & Map — Reads the spec (including acceptance scenarios) and maps: decisions made, assumptions (stated AND implied), dependencies, scope boundaries, risk acknowledgments, story-AS consistency.
Scale Reviewers — Assesses complexity and selects reviewers:
| Complexity | Signals | Reviewers | |------------|---------|-----------| | Simple | 1 spec section, <20 acceptance scenarios, no auth/data | 2 | | Standard | Multiple sections, auth or data involved | 3 | | Complex | Multiple integrations, concurrency, migrations, 6+ phases | 4 |
Spawn Reviewers — Launches parallel subagents, each with an adversarial lens:
- Security Adversary
  - OWASP Top 10
  - Injection vectors
  - Auth/authz bypass
  - Crypto issues
  - Data exposure
  - Supply chain risks
- Failure Mode Analyst — "Everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic"
  - Partial failures
  - Concurrency & race conditions
  - Cascading failures
  - Recovery paths
  - Idempotency
  - Observability gaps
- Assumption Destroyer — "'It should work' is not evidence"
  - Unverified claims
  - Scale assumptions
  - Environment differences
  - Integration contracts
  - Data shape assumptions
  - Timing dependencies
  - Hidden dependencies
- Scope & YAGNI Critic — "The best code is no code. The best feature is the one you didn't build"
  - Over-engineering
  - Premature abstraction
  - Missing MVP cuts
  - Gold plating
  - Simpler alternatives
Deduplicate & Rate — Collects all findings, removes duplicates, rates severity using a Likelihood x Impact matrix. Caps at 15 findings: keeps all Critical, top High by specificity, notes how many Medium were dropped. Each reviewer is limited to top 7 findings.
Adjudicate — Evaluates each finding: Accept (valid flaw, plan should change) or Reject (false positive, acceptable risk, already handled). 1-sentence rationale for each.
User Choice — Two modes: "Apply all accepted" (fast) or "Review each" (walk through one by one).
Apply — Surgical edits only to accepted findings. Doesn't rewrite surrounding sections.

Finding format: Each finding includes Title, Severity, Confidence score (9-10 = verified; 7-8 = strong match; 5-6 = note caveat; ≤4 = omit unless Critical), Location, Flaw description, Evidence (direct quote from the plan), step-by-step Failure scenario, and Suggested fix.

6 non-negotiable rules:

Spawn reviewers in parallel (not sequential)
Reviewers read files directly, not summarized content
Be hostile — no praise, no softening
Every finding must quote the plan directly as evidence
Quality over quantity — 3 honest findings > 15 padded ones
Skip style/formatting — substance only

When to use:

After /mf-plan, before coding — for complex features
Features involving auth, payments, data pipelines, multi-service integration
NOT needed for simple CRUD, small bug fixes, or trivial features

Token cost: 15-30k (uses parallel subagents, doesn't bloat main context)

/mf-build — TDD Delivery Loop

Usage:

/mf-build                              # build all changes vs base branch
/mf-build src/api/users.ts             # build specific file
/mf-build "user authentication"        # build specific feature

How it works:

Phase 0: Build Context — Finds changed files vs base branch, reads the spec (acceptance scenarios in ## Stories section are the roadmap), checks docs/specs/<feature>/.build-progress to resume from a previous interrupted session, reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists.
Phase 1: Decide What to Test — Determines test scope from acceptance scenarios. Applies the Completeness Principle: AI writes tests ~50x faster than humans, so if full coverage costs CC: ≤15m, it writes complete tests without asking. Always checks 8 mandatory edge case categories: null/undefined, empty arrays/strings, invalid types, boundary values (min/max), error paths (network failures, DB errors), race conditions, large data (10k+ items), and special characters (Unicode, SQL chars).
Phase 1.5: Coverage Map — Before writing a single test, traces every code path (if/else, switch, guard, try/catch) AND user flows (double-click, stale session, navigate away mid-op). Draws an ASCII diagram marking each path as [★★★ TESTED], [★★ TESTED], [★ TESTED], or [GAP]. Gaps marked [GAP] [→E2E] need E2E tests; [GAP] [→EVAL] need evals — when flagged, defines capability + regression evals before implementing and reports pass@1/pass@3. Regression rule: if the diff changes existing behavior with no covering test, a regression test is a CRITICAL requirement — no asking, no skipping.
Phase 2: Write Tests — Writes tests for every [GAP] identified in the Coverage Map. Before moving to Phase 3, verifies: all public functions have unit tests, all API endpoints have integration tests, edge cases covered, error paths tested, tests independent, assertions specific.
Phase 3: Build and Run — Compiles/typechecks first, then runs tests.
Phase 4: Fix Loop — If tests fail, fixes test code only (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks whether to fix production code or adjust the test — with effort scales (human: ~X / CC: ~Y).
Phase 5: Report — Summary with test counts, results, coverage, files touched, and any E2E/eval gaps to follow up on.

Rules:

Never changes production code without asking first
Never deletes or weakens existing tests
Never adds skip/xit/@disabled to hide failures
Max 3 fix attempts — then stops and reports the issue

What NOT to test: Private/internal methods, framework behavior, trivial getters/setters, implementation details.

/mf-investigate — Read-Only Root Cause Investigation (Optional)

Usage:

/mf-investigate "production 500s after deploy on /api/orders"
/mf-investigate "intermittent data corruption in nightly sync"

When to use: OPTIONAL branch before /mf-fix. Use for complex bugs, production outages, data corruption, unclear regressions, or when the user wants a diagnosis report without any code change. Skip for trivial/obvious bugs — go straight to /mf-fix.

What it does NOT do: Never edits source code, tests, or config. The only write it performs is the investigation report at docs/investigate/<slug>-<date>.md.

How it works (adaptive depth, auto-scales):

Phase 1: Understand the Report — Extract symptom, expected, actual from $ARGUMENTS. Asks ONE clarifying question via AskUserQuestion if required fields are missing.
Phase 2: Locate — Entry-point search (error/stack/function/feature), recurring-bug check (3+ fix commits on same pattern → architectural smell), data-flow trace, git history (regression signal).
Phase 3: Pattern Match — 12 known bug patterns (nil propagation, race, state corruption, off-by-one, type coercion, stale cache, config drift, silent error swallow, ordering/timing, resource leak, merge conflict, API contract). Skipped if Phase 2 already produced a HIGH-confidence hypothesis.
Phase 4: Form Hypothesis — Specific, testable, falsifiable. Location + mechanism + causal chain + disproof condition + confidence (HIGH/MEDIUM/LOW). 3-strike rule: if 3 hypotheses all stay below MEDIUM → escalate via AskUserQuestion.
Phase 5: Map Blast Radius — Investigation scope, bug path diagram (skipped if ISOLATED), impact scope (direct/indirect/data/user-facing), similar-risk scan (5-min timebox).
Phase 6: Recommend Next Steps — CRITICAL/HIGH/MEDIUM actions, test strategy, fix approach (minimal / targeted refactor / architectural).
Output — Writes structured Investigation Report to docs/investigate/<slug>-<date>.md. Signals /mf-fix <file> for handoff.

Status values: ROOT_CAUSE_FOUND | PROBABLE_CAUSE | INSUFFICIENT_EVIDENCE | BLOCKED

Iron Law: Follow evidence, never start with a theory. Every claim references file:line or git commit. INSUFFICIENT_EVIDENCE is a valid outcome — don't inflate confidence to ship a report.

Token cost: 8–15k

/mf-fix — Test-First Bug Fix

Usage:

/mf-fix "description of the bug"

How it works:

Phase 0: Investigate — Parses the bug report, locates relevant code, checks git history, and forms a root cause hypothesis. Then draws a Bug Path Diagram (same [GAP]/[★★ TESTED] format as /mf-build) for the buggy function — if no specific [GAP] path can be identified, the hypothesis isn't specific enough yet.
Phase 1: Write Failing Test — Regression rule first: if the bug exists because the diff changed existing behavior with no test covering that path, a regression test is a CRITICAL requirement. Creates a test that reproduces the bug and MUST fail with current code.
Phase 2: Fix — Minimal change only. Blast radius check: if fix touches >5 files, stops and asks before editing.
Phase 3: Verify — Bug test must pass; full suite must show no new regressions.
Phase 4: Root Cause Analysis — Documents: Symptom, Root cause, Gap (why wasn't this caught earlier?), Prevention (one of: type constraint, validation, lint rule, spec update). Non-optional for serious bugs.
Phase 5: Report — Structured debug report with hypothesis, fix, evidence, and regression test reference.

Multiple bugs: Triages by severity, fixes one at a time, commits each separately.

/mf-review — Pre-Merge Quality Gate

Usage:

/mf-review                            # review all changes vs base branch
/mf-review src/auth/                  # review specific directory

How it works:

Phase 0: Understand Intent — Reads commit messages, checks for related spec, expands blast radius. Also notes what already exists: flags if the diff rebuilds something that already exists in the codebase.
Phase 1: Smart Focus — Auto-detects what to focus on based on the diff (auth → security, SQL → injection, payments → idempotency, etc.). Spends 60% of analysis on the primary focus.
Phase 2: Review — Security, correctness, API/Backend patterns (unvalidated input, missing rate limiting, missing timeouts, missing CORS, error message leakage), spec-test alignment, code quality (including diagram maintenance: stale ASCII diagrams in comments are flagged), performance, a Failure Mode Grid for each new codepath (3 dimensions: test covers it? error handling exists? user sees a clear error or silent failure? — all 3 missing = Critical gap), and an AI-generated code addendum when reviewing AI-written changes (behavioral regressions, trust boundaries, architecture drift, model cost escalation).
Phase 3: Report — Structured report. Every finding includes a confidence score (confidence: N/10): 9-10 = verified in code; 7-8 = strong pattern match; 5-6 = possible false positive; <5 = appendix only. Includes a "Not in scope" section listing deferred work with rationale.

Proportional review: A 5-line doc change gets a light review. A 500-line auth rewrite gets file-by-file deep analysis.

Verdicts: APPROVE / REQUEST CHANGES / NEEDS DISCUSSION.

Rules:

At least 1 positive note — reinforces good patterns, not just problems
Never auto-fixes code — report only
Checks spec-test alignment: code changed → spec/acceptance scenarios/tests also changed?

/mf-commit — Smart Git Commit

Usage:

/mf-commit

How it works:

Analyze — Scans git status, diff stats, and file contents in one pass.
Scan for secrets — Matches patterns: api_key, token, password, secret, private_key, credential, auth_token. Hard block — stops immediately if found, non-negotiable.
Scan for debug code — Matches: console.log, debugger, print(), TODO:remove, HACK:, FIXME:temp, binding.pry, var_dump. Soft warn — proceeds if you confirm.
Stage files — Stages specific files by name. Never uses git add -A.
Generate message — Conventional format: type(scope): description. Imperative tense ("add" not "added"), no period, WHAT+WHY not HOW.
Commit — Does NOT push (safe default). Ask Claude explicitly to push.

Large diff warning: If >10 files OR >300 lines changed, suggests splitting into smaller commits for easier review.

Never stages: .env, credentials, build artifacts, generated files, binaries >1MB.

Breaking changes: If the diff removes/renames a public function, export, or API endpoint, uses feat! or fix! type, or adds a BREAKING CHANGE: footer.

/mf-voices — Multi-LLM Review (Optional)

Usage:

/mf-voices                              # review current diff with multi-LLM panel
/mf-voices docs/specs/auth/auth.md      # review a spec
/mf-voices src/payment/                 # review specific files

When to use: Optional second opinion after /mf-review for high-stakes changes (auth, payment, data pipelines), when /mf-review returns mixed-confidence findings (most at 5–7), or any time you want cross-model verification before merge. Skip for routine refactors and small CRUD.

How it works:

Detect available LLMs — Checks for OpenAI / Codex CLI / Gemini / Perplexity / Anthropic API / Ollama in priority order. Falls back to a self-spawned Claude sub-agent if no external LLM is available, with the limitation flagged in the report.
Construct open-ended review prompts — Same material to every voice with a light bias nudge (correctness / security / design). No structured templates, no severity scale forced on reviewers — they think freely; we structure the synthesis.
Call voices in parallel — 2–3 voices typically; temperature 0.3; graceful degradation if any voice fails.
Synthesize — Parses free-form responses into findings, classifies severity/category ourselves, identifies CONSENSUS (2+ voices agree → REINFORCED), UNIQUE findings (single voice → flag for verification), and DISAGREEMENTS (voices contradict → present both sides; tiebreaker for HIGH+).
Output report — Critical/High findings, disagreements, voice breakdown table, agreement rate (100% may indicate shared blind spot), blind spots (categories with 0 findings).

Decision points (all use AskUserQuestion): review type ambiguous, voice panel size for large reviews, voice unavailable, critical consensus finding, disagreement resolution, follow-up cost > $0.10, report destination.

Rules: Same material different lenses. Don't resolve disagreements — present both sides, human decides. Consensus ≠ correct (flag if agreement rate is 100%). Findings must be specific (auth.ts:47 not "code could be improved").

Token cost: 10–30k host + external API cost (Budget: ~$0.01–0.05; Standard: ~$0.05–0.20; Premium: ~$0.20–0.50 per review).

/mf-humanize — Rephrase to Human Voice

Usage:

/mf-humanize <paste plan/notes/draft>           # infer format + audience from context
/mf-humanize reply jira <notes>                  # target a specific format
/mf-humanize draft a customer email <notes>      # switch audience, hide implementation

When to use: You have a plan, bullet notes, or AI-generated draft and want it rewritten into natural, send-ready text — a PR description, release note, slack announcement, postmortem, customer reply, LinkedIn post, or plain email. Not part of the spec-first dev cycle. Skip for pure translation, summarization, or generating content from zero.

How it works:

Infer target format — From explicit instruction → session context → input shape → fallback to tight plain text. No fixed whitelist; uncommon or hybrid formats follow their own conventions.
Infer audience — Engineering, customer, executive, public, or mixed. Same content, phrasing shifts by reader (technical terms for engineers, outcome-focused for customers).
Preserve facts — Numbers, names, error codes, file paths, commands, URLs, commitments, and decisions are never paraphrased. Certainty is never softened ("will ship Monday" ≠ "hope to ship Monday").
Strip AI tone — Removes em-dash overuse, banned buzzwords (EN + VI), hollow openings/closings, fake enthusiasm, and "rule of three" pile-ups. Varies sentence rhythm.
Return send-ready text — The final version directly, no preamble, no explanation of edits.

Language: Follows the session's dominant language. Mixed Vietnamese-English is fine — technical terms stay untranslated.

Token cost: 2–6k, no external API.

6. Automatic Guards (Hooks)

Hooks run automatically — you don't invoke them. They provide passive protection.

File Guard (`file-guard.js`)

Trigger: After every Write or Edit operation. Action: If a modified source code file exceeds 350 lines, injects a warning suggesting modularization. Docs, configs, and templates are intentionally excluded — they are naturally long. Blocking: No — warns only, does not prevent the edit.

Checked extensions: .ts, .tsx, .js, .jsx, .py, .php, .rb, .rs, .go, .swift, .kt, .java, .cs, .cpp, .c, .dart, .vue, .svelte, .astro, and more. Not checked: .md, .json, .yaml, .toml, .html, .css, .sh, and other non-source files.

Configuration:

# Change the line threshold (default: 350)
export FILE_GUARD_THRESHOLD=500

# Exclude files from checking (comma-separated globs)
export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js"

Path Guard (`path-guard.sh`)

Trigger: Before every Bash command. Action: Blocks commands that reference large directories (node_modules, build artifacts, etc.). Blocking: Yes — prevents the command from running.

Default blocked paths: node_modules, __pycache__, .git/objects, dist/, build/, .next/, vendor/, Pods/, .build/, DerivedData/, .gradle/, target/debug, target/release, .nuget, .cache

Configuration:

# Add project-specific blocked paths (pipe-separated)
export PATH_GUARD_EXTRA="\.terraform|\.vagrant|\.docker"

Glob Guard (`glob-guard.js`)

Trigger: Before every Glob (file search) operation. Action: Blocks overly broad glob patterns at project root that would return thousands of files and fill the context window. Blocking: Yes — prevents the glob and suggests scoped alternatives.

What it blocks:

**/*.ts at project root (use src/**/*.ts instead)
**/* at project root (use src/**/* instead)
* or ** at project root
Any recursive glob without a specific directory prefix

What it allows:

src/**/*.ts — scoped to a specific directory
tests/**/*.test.js — scoped to tests
**/*.ts when run from inside a scoped directory (e.g., path: "src")

Comment Guard (`comment-guard.js`)

Trigger: After every Edit operation. Action: Detects when real code is replaced with placeholder comments like // ... existing code ... or // rest of implementation. This is a common LLM laziness pattern. Blocking: Yes — rejects the edit and tells Claude to preserve the original code.

What it catches:

// ... existing code ..., // ... rest of implementation
// [previous code remains], // unchanged
/* ... */ replacing real code
# ... existing ... (Python placeholders)
// TODO: implement replacing real code
Any edit where real code is replaced with a much shorter comment-only block

What it allows:

Editing comments (old content was already comments)
Adding comments alongside code (new content has both)
Normal code replacements

Sensitive Guard (`sensitive-guard.sh`)

Trigger: Before every Read, Write, Edit, and Bash command. Action: Protects files containing secrets: .env, private keys, credentials, tokens. Blocking: Read/Write/Edit → blocks (exit 2). Bash commands → warns only (allows access).

The Bash warn-only behavior enables an approval flow: Claude asks the user for permission, and if approved, can use bash cat .env to read the file.

Protected files:

.env, .env.local, .env.production, etc. (but NOT .env.example)
Private keys: *.pem, *.key, *.p12, *.pfx, *.jks
SSH keys: id_rsa, id_ecdsa, id_ed25519
Cloud credentials: serviceAccountKey.json, firebase-adminsdk*
Token files: .npmrc, .pypirc, .netrc
Any file matching *credential*, *secret*, *private_key*

Supports .agentignore: Create a .agentignore file (or .aiignore, .cursorignore) in the project root with gitignore-style patterns to add project-specific protections.

Configuration:

# Add extra patterns (pipe-separated regex)
export SENSITIVE_GUARD_EXTRA="\.vault|.*_token\.json"

Self-Review (`self-review.sh`)

Trigger: When Claude is about to stop (Stop event). Action: Injects a self-review checklist reminding Claude to verify quality before finishing. Blocking: No — just a reminder.

Questions asked:

Did you leave any TODO/FIXME that should be resolved now?
Did you create mock/fake implementations just to pass tests?
Did you replace real code with placeholder comments?
Do all changed files compile and typecheck cleanly?
Did you run the full test suite, not just the new tests?
Are there any files you modified but forgot to include in the summary?

Configuration:

# Disable self-review
export SELF_REVIEW_ENABLED=false

Testing Hooks Manually

You can test hooks by piping mock JSON payloads:

# ── Path Guard ──
# Should exit 2 (blocked)
echo '{"tool_input":{"command":"ls node_modules"}}' | bash .claude/hooks/path-guard.sh
echo $?  # expect: 2

# Should exit 0 (allowed)
echo '{"tool_input":{"command":"ls src"}}' | bash .claude/hooks/path-guard.sh
echo $?  # expect: 0

# ── File Guard ──
seq 1 250 > /tmp/test-large.txt
echo '{"tool_input":{"file_path":"/tmp/test-large.txt"}}' | node .claude/hooks/file-guard.js
# Should output JSON with additionalContext warning

# ── Comment Guard ──
# Should exit 2 (blocked — replacing code with placeholder)
echo '{"tool_input":{"old_string":"function hello() {\n  return world;\n}","new_string":"// ... existing code ..."}}' | node .claude/hooks/comment-guard.js
echo $?  # expect: 2

# Should exit 0 (allowed — replacing code with code)
echo '{"tool_input":{"old_string":"return a;","new_string":"return b;"}}' | node .claude/hooks/comment-guard.js
echo $?  # expect: 0

# ── Sensitive Guard ──
# Should exit 2 (blocked)
echo '{"tool_input":{"file_path":".env"}}' | bash .claude/hooks/sensitive-guard.sh
echo $?  # expect: 2

# Should exit 0 (allowed)
echo '{"tool_input":{"file_path":".env.example"}}' | bash .claude/hooks/sensitive-guard.sh
echo $?  # expect: 0

# Should exit 0 (warn only — bash commands are allowed for approved access)
echo '{"tool_input":{"command":"cat .env.local"}}' | bash .claude/hooks/sensitive-guard.sh
echo $?  # expect: 0 (with warning on stderr)

# ── Glob Guard ──
# Should exit 2 (blocked — broad pattern at root)
echo '{"tool_input":{"pattern":"**/*.ts"}}' | node .claude/hooks/glob-guard.js
echo $?  # expect: 2

# Should exit 0 (allowed — scoped pattern)
echo '{"tool_input":{"pattern":"src/**/*.ts"}}' | node .claude/hooks/glob-guard.js
echo $?  # expect: 0

7. Spec Format

Spec Template

Create specs at docs/specs/<feature>/<feature>.md:

# Spec: <Feature Name>

**Created:** 2026-04-02
**Last updated:** 2026-04-02
**Status:** Draft | Active | Deprecated

## Overview
What this feature does, why it exists, who uses it. 2-3 sentences.

## Data Model
Entities, attributes, relationships (if applicable).

## Stories

### S-001: <Story name> (P0)

**Description:** [user story]
**Source:** [optional: ticket/issue ref]

**Acceptance Scenarios:**

AS-001: <short description>
- **Given:** [state]
- **When:** [action]
- **Then:** [expected]
- **Data:** [test data]

AS-002: <short description>
- **Given:** [error state]
- **When:** [action]
- **Then:** [error handling]

### S-002: <Story name> (P1)

AS-003: <short description>
- **Given:** [state]
- **When:** [action]
- **Then:** [expected]

### S-003: <Story name> (P2)

AS-004: <short description>
- [flow description + expected behavior]

## Constraints & Invariants
Rules that must always hold.

## Change Log

| Date | Change | Ref |
|------|--------|-----|
| 2026-04-02 | Initial creation | -- |

Skip sections that don't apply. Match depth to feature complexity.

Acceptance Scenario depth by priority:

P0: Full Given + When + Then + Data + Setup. At least 1 happy path + 1 error path.
P1: Given + When + Then. At least 1 happy path.
P2: 1-2 line flow description. At least 1 scenario.

Snapshots (Version History)

When /mf-plan Mode C detects a Major change (new story, removed story, priority change, flow change, behavior change for P0, or constraint change), it automatically creates a snapshot before updating:

docs/specs/<feature>/snapshots/
  2026-04-02.md              ← full copy at that point in time
  2026-04-05-BILL-101.md     ← with ticket reference

Snapshots are immutable, managed by mf-plan (not developers), and capped at 5 most recent.

Naming Conventions

| Item | Convention | Example | |------|-----------|---------| | Spec directory | docs/specs/<feature>/ | docs/specs/user-auth/ | | Spec file | <feature>.md in feature directory | user-auth.md | | Story ID | S-NNN sequential per spec | S-001, S-005 | | Scenario ID | AS-NNN sequential across all stories | AS-001, AS-042 | | Priority | P0 (critical), P1 (important), P2 (nice-to-have) — per story | — | | Snapshot | YYYY-MM-DD.md or YYYY-MM-DD-<REF>.md in snapshots/ | 2026-04-02.md |

8. Customization

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | FILE_GUARD_THRESHOLD | 200 | Max lines before file guard warns | | FILE_GUARD_EXCLUDE | (empty) | Comma-separated globs to skip (e.g. *.generated.swift) | | PATH_GUARD_EXTRA | (empty) | Additional pipe-separated patterns to block (e.g. \.terraform) | | SENSITIVE_GUARD_EXTRA | (empty) | Additional pipe-separated patterns for sensitive files (e.g. \.vault) | | SELF_REVIEW_ENABLED | true | Set to false to disable the self-review checklist on Stop |

Set these in your shell profile or project .envrc (if using direnv).

Extending CLAUDE.md

Add project-specific rules to .claude/CLAUDE.md:

## Project-Specific Rules

- All API endpoints must have OpenAPI annotations
- Database migrations must be reversible
- UI components must support dark mode
- All strings must be localized via i18n keys

Adding Custom Skills

Create new skills in .claude/skills/<name>/SKILL.md:

# .claude/skills/deploy/SKILL.md

Run the deployment pipeline:
1. /mf-review
2. /mf-commit
3. Run: bash scripts/deploy.sh $ARGUMENTS
4. Verify deployment health: curl -f https://api.example.com/health

Then use: /deploy staging

9. Token Cost Guide

| Activity | Tokens | Frequency | |----------|--------|-----------| | /mf-scaffold (greenfield bootstrap) | 15–40k + install/build time | Once per new project, before the first spec | | /mf-build (incremental, 1-3 files) | 5–10k | Every code chunk | | /mf-investigate (complex bug) | 8–15k | OPTIONAL before /mf-fix — complex/outage only | | /mf-fix (single bug) | 3–5k | As needed | | /mf-commit | 2–4k | Every commit | | /mf-review (diff-based) | 10–20k | Before merge | | /mf-plan (new feature) | 20–40k | Start of feature | | /mf-challenge (adversarial review) | 15–30k | After /mf-plan, complex features | | /mf-spec-render (HTML view) | 3–8k | User-invoked after /mf-plan when HTML view wanted, or to refresh stale .html | | /mf-md-render (HTML view, any md) | 3–8k | User-invoked for non-spec markdown — investigation, explore, RFC, retro, README | | /mf-voices (multi-LLM review) | 10–30k + external API cost (~$0.01–0.50) | Optional — after /mf-review for high-stakes changes | | Full audit (manual prompt) | 100k+ | Before release |

Minimizing Token Usage

Test incrementally. /mf-build after each small chunk uses 5-10k. Waiting until everything is done then running /mf-build on a large diff uses 50k+.
Use filters. /mf-build src/auth/login.ts is cheaper than /mf-build on the whole project.
Skip /mf-plan for tiny changes. Under 5 lines with no behavior change? Just /mf-build and /mf-commit.
Use /mf-review only before merge. Not after every commit.

10. Troubleshooting

Hook not firing

Symptom: File guard or path guard doesn't trigger.

Check:

Is settings.json valid? node -e "JSON.parse(require('fs').readFileSync('.claude/settings.json','utf-8'))"
Are hooks executable? ls -la .claude/hooks/
Is Node.js available? node --version
Is $CLAUDE_PROJECT_DIR set? Check in Claude Code with: echo $CLAUDE_PROJECT_DIR

Tests not detected

Symptom: /mf-build or /mf-fix can't figure out how to run the tests.

Check:

Are you in the project root? pwd
Does the project marker file exist? (e.g., package.json, Cargo.toml, pyproject.toml)
If your test command is non-standard, set it explicitly in .claude/CLAUDE.md under Testing so the skills use it.

Wrong base branch

Symptom: /mf-build or /mf-review compares against wrong branch.

Check:

git symbolic-ref refs/remotes/origin/HEAD

If this is wrong or missing:

git remote set-head origin <your-main-branch>

Path guard blocking a legitimate command

Symptom: Claude can't run a command you need.

Fix: The path guard blocks broad patterns. If you need to access build/ for a specific reason, run the command directly in your terminal (not through Claude Code).

File guard warning on generated files

Fix: Set the exclude pattern:

export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"

11. FAQ

Q: Do I need specs for every tiny change? A: No. Changes under 5 lines with no behavior change can skip the spec. Just /mf-build and /mf-commit. The spec-first rule is for meaningful behavior changes.

Q: Can I use mocks in tests? A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.

Q: What if Claude writes a test that tests the wrong thing? A: This usually means the spec is ambiguous. Clarify the spec first, then re-run /mf-build. Good specs produce good tests.

Q: Can I use this with other AI coding tools? A: The commands and hooks are Claude Code-specific. The specs and workflow work with any tool or manual workflow.

Q: When should I use /mf-challenge? A: After /mf-plan, for complex features involving authentication, payments, data pipelines, or multi-service integration. It spawns parallel hostile reviewers that find security holes, failure modes, and false assumptions BEFORE you write code. Skip it for simple CRUD or small features — the overhead isn't worth it.

Q: How do I do a full coverage audit? A: This is intentionally not a command (it's expensive and rare). When needed, prompt Claude directly: "Audit test coverage for feature X against docs/specs/X/X.md acceptance scenarios. Identify gaps and write missing tests."

Q: What if my project uses multiple languages? A: The skills auto-detect the test command from the first project marker they find. For monorepos, run /mf-build from each sub-project directory, or pin the test command per project in .claude/CLAUDE.md under Testing.

Q: Can I add more skills? A: Yes. Create a directory .claude/skills/<name>/SKILL.md and it becomes available as a slash command. See Customization.

Q: How do I update the kit in existing projects? A: Run npx claude-devkit-cli upgrade. It automatically detects which files you've customized and only updates unchanged files. Use --force to overwrite everything.

Q: What's the HTML view next to my spec, and how do I generate it? A: It's a scannable view of the spec — sidebar TOC, story cards, collapsible AS, dark/light theme. Reading a 1000-line spec markdown in an editor is painful; the HTML is what a tired human can actually skim. Generate or refresh it by running /mf-spec-render <feature> — /mf-plan does not create it automatically, it just suggests the command at the end. .md remains the source of truth (AI and /mf-build read it, git diffs work normally). .html is a regenerable artifact — never edit it by hand, let /mf-spec-render rebuild it. You can email/Slack the HTML to PMs/stakeholders who don't want to clone the repo.

Q: I installed with the old setup.sh — how do I migrate? A: Run npx claude-devkit-cli init --adopt . to generate a manifest from your existing files without overwriting anything. Future upgrades will then work normally.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

claude-devkit-cli

Table of Contents

1. Philosophy

The Core Cycle

Why Spec-First?

Principles

2. Quick Start

Step 1 — Spec the feature (/mf-plan)

Step 2 — Implement + build (/mf-build)

Step 3 — Bug found? (/mf-fix)

Step 4 — Review + commit (/mf-review → /mf-commit)

3. Setup

Prerequisites

Installation

What Gets Installed

Optional: GraphAtlas Code Intelligence

Post-Install Configuration

Upgrade

Uninstall

4. Daily Workflows

New Project (Greenfield)

Explore Before Planning

New Feature

Update Existing Feature

Bug Fix

Remove Feature

5. Commands Reference

/mf-explore — Feature Discovery as Client Technical Lead

/mf-scaffold — Greenfield Project Bootstrap

/mf-plan — Generate Spec with Acceptance Scenarios

/mf-spec-render — Render Spec as HTML View

/mf-md-render — Render Any Markdown as HTML View

/mf-challenge — Adversarial Plan Review

/mf-build — TDD Delivery Loop

/mf-investigate — Read-Only Root Cause Investigation (Optional)

/mf-fix — Test-First Bug Fix

/mf-review — Pre-Merge Quality Gate

/mf-commit — Smart Git Commit

/mf-voices — Multi-LLM Review (Optional)

/mf-humanize — Rephrase to Human Voice

6. Automatic Guards (Hooks)

File Guard (file-guard.js)

Path Guard (path-guard.sh)

Glob Guard (glob-guard.js)

Comment Guard (comment-guard.js)

Sensitive Guard (sensitive-guard.sh)

Self-Review (self-review.sh)

Testing Hooks Manually

7. Spec Format

Spec Template

Snapshots (Version History)

Naming Conventions

8. Customization

Environment Variables

Extending CLAUDE.md

Adding Custom Skills

9. Token Cost Guide

Minimizing Token Usage

10. Troubleshooting

Hook not firing

Tests not detected

Wrong base branch

Path guard blocking a legitimate command

File guard warning on generated files

11. FAQ

Step 1 — Spec the feature (`/mf-plan`)

Step 2 — Implement + build (`/mf-build`)

Step 3 — Bug found? (`/mf-fix`)

Step 4 — Review + commit (`/mf-review` → `/mf-commit`)

File Guard (`file-guard.js`)

Path Guard (`path-guard.sh`)

Glob Guard (`glob-guard.js`)

Comment Guard (`comment-guard.js`)

Sensitive Guard (`sensitive-guard.sh`)

Self-Review (`self-review.sh`)