claude-devkit-cli
v1.5.8
Published
CLI toolkit for spec-first development with Claude Code — hooks, commands, guards, and test runners
Maintainers
Readme
claude-devkit-cli
A lightweight, spec-first development toolkit for Claude Code. It enforces the cycle spec (with acceptance scenarios) → code + tests → build pass through custom commands, automatic hooks, and a universal test runner.
Works with: Swift, TypeScript/JavaScript, Python, Rust, Go, Java/Kotlin, C#, Ruby. Dependencies: None (requires only Claude Code CLI, Node.js, Git, and Bash).
Table of Contents
- Philosophy
- Quick Start
- Setup
- Daily Workflows
- Commands Reference
- Automatic Guards (Hooks)
- Build Test Script
- Spec Format
- Customization
- Token Cost Guide
- Troubleshooting
- FAQ
1. Philosophy
The Core Cycle
SPEC (with acceptance scenarios) → CODE + TESTS → BUILD PASSEvery code change — feature, fix, or removal — follows this cycle. The spec is the source of truth. Acceptance scenarios (Given/When/Then) are embedded directly in the spec — no separate test plan file. If code contradicts the spec, the code is wrong.
Why Spec-First?
- Prevents drift. Acceptance scenarios live inside the spec — no separate test plan to fall out of sync.
- Tests have purpose. Scenarios derived from specs test behavior, not implementation details. This means tests survive refactoring.
- AI writes better code. When Claude Code has a spec with concrete Given/When/Then scenarios, it generates more accurate implementations and more meaningful tests.
- Reviews are grounded. Reviewers can check code against the spec rather than guessing at intent.
Principles
- Specs are source of truth — Code changes require spec updates first.
- Incremental, not big-bang — Test after each code chunk, not after everything is done.
- Tests travel with code — Every PR includes production code + tests + spec updates.
- Build pass is the gate — Nothing merges with failing tests.
- Everything in the repo — Specs, plans, tests, and code are version-controlled and reviewable.
2. Quick Start
Time needed: 5 minutes.
# 1. Install dev-kit into your project
npx claude-devkit-cli init .
# 2. Open your project in Claude Code
claude
# 3. Create your first spec
/mf-plan "describe your feature here"
# 4. Write code, then test
/mf-build
# 5. Review before merging
/mf-review
# 6. Commit
/mf-commitThat's it. The CLI auto-detects your project type and configures everything.
3. Setup
Prerequisites
| Tool | Required | Why | |------|----------|-----| | Claude Code CLI | Yes | Runs the commands and hooks | | Git | Yes | Change detection, commit workflow | | Node.js (18+) | Yes | File guard hook, JSON parsing | | Bash (4+) | Yes | Path guard hook, build-test script | | Language toolchain | Yes | Whatever your project uses (Swift, npm, pytest, etc.) |
Installation
Option A: One-command install (recommended)
npx claude-devkit-cli init .Option B: Global install
npm install -g claude-devkit-cli
# Then, in any project:
cd my-project
claude-devkit init .Option C: Global skills install (available in all projects without running init again)
claude-devkit init --global
# or after per-project init, answer "yes" to the global promptSkills installed globally at ~/.claude/skills/ are available in every project. Per-project .claude/skills/ always takes precedence over global — so projects can still override individual skills.
Option D: Force re-install (overwrites existing files)
npx claude-devkit-cli init --force .Option D: Selective install (only specific components)
npx claude-devkit-cli init --only hooks,skills .What Gets Installed
your-project/
├── .claude/
│ ├── CLAUDE.md ← Project rules hub
│ ├── settings.json ← Hook wiring
│ ├── hooks/
│ │ ├── file-guard.js ← Warns on large files
│ │ ├── path-guard.sh ← Blocks wasteful Bash paths
│ │ ├── glob-guard.js ← Blocks broad glob patterns
│ │ ├── comment-guard.js ← Blocks placeholder comments
│ │ ├── sensitive-guard.sh ← Blocks access to secrets
│ │ └── self-review.sh ← Quality checklist on stop
│ └── skills/
│ ├── mf-explore/SKILL.md ← /mf-explore skill
│ ├── mf-plan/SKILL.md ← /mf-plan skill
│ ├── mf-challenge/SKILL.md ← /mf-challenge skill
│ ├── mf-build/SKILL.md ← /mf-build skill
│ ├── mf-fix/SKILL.md ← /mf-fix skill
│ ├── mf-review/SKILL.md ← /mf-review skill
│ └── mf-commit/SKILL.md ← /mf-commit skill
├── scripts/
│ └── build-test.sh ← Universal test runner
└── docs/
├── specs/ ← Your specs (folder-per-feature)
│ └── <feature>/
│ ├── <feature>.md ← Spec with acceptance scenarios
│ └── snapshots/ ← Version history (managed by /mf-plan)
└── WORKFLOW.md ← Process referencePost-Install Configuration
The CLI auto-detects your project type and fills in CLAUDE.md. Verify it's correct:
cat .claude/CLAUDE.mdLook for the Project Info section. Ensure language, test framework, and directories are correct. Edit manually if needed.
Upgrade
npx claude-devkit-cli upgradeSmart upgrade — updates kit files but preserves any you've customized. Use --force to overwrite everything.
# Check if update is available
npx claude-devkit-cli check
# See what changed
npx claude-devkit-cli diff
# View installed files and status
npx claude-devkit-cli listUninstall
npx claude-devkit-cli removeThis removes hooks, skills, settings, and build-test.sh. It preserves CLAUDE.md (which you may have customized) and docs/ (which contains your specs).
4. Daily Workflows
Explore Before Planning
When: Requirements are unclear, you're debating between approaches, or it's a brownfield feature with existing code to understand first.
1. /mf-explore "feature description"
→ Asks questions as a Client Technical Lead — one topic at a time.
→ Clarifies: why, behavior, boundaries, business rules, edge cases, permissions, UI.
→ Output: docs/explore/<feature>.md
2. /mf-plan "feature description"
→ Auto-detects docs/explore/<feature>.md, skips redundant discovery.
→ Continue with the normal New Feature flow.Example:
/mf-explore "cancel order request"New Feature
When: Building something new — no existing code or spec.
1. /mf-plan "description of the feature"
→ Generates spec with acceptance scenarios at docs/specs/<feature>/<feature>.md.
2. Implement code in chunks.
After each chunk: /mf-build
Repeat until green.
3. /mf-review (before merge)
4. /mf-commitExample:
/mf-plan "User authentication with email/password login, password reset via email, and session management with 24h expiry"Update Existing Feature
When: Changing behavior of something that already exists.
1. /mf-plan docs/specs/<feature>/<feature>.md "description of changes"
→ Mode C handles everything: snapshot → classification → change report → apply.
Do NOT manually edit the spec before running /mf-plan.
2. Implement the code change.
/mf-build
Fix until green.
3. /mf-review → /mf-commitBug Fix
When: Something is broken.
1. /mf-fix "description of the bug"
→ Writes failing test → fixes code → runs full suite.
2. /mf-commitExample:
/mf-fix "Search returns no results when query contains apostrophes like O'Brien"Remove Feature
When: Deleting code, removing deprecated functionality.
1. /mf-plan docs/specs/<feature>/<feature>.md "remove stories S-XXX"
→ Mode C creates a snapshot (removing stories = Major), then marks as removed.
2. Delete production code + related tests.
3. bash scripts/build-test.sh (run full suite)
Fix cascading breaks.
4. /mf-commit5. Commands Reference
/mf-explore — Feature Discovery as Client Technical Lead
Usage:
/mf-explore "cancel order request"
/mf-explore "user notification preferences"When to use: Requirements are unclear, you're debating between approaches, or you want to clarify a feature deeply before committing to a spec. Runs before /mf-plan.
How it works:
- Phase 0: Codebase scan — Silently checks for existing code, related specs, and existing explore docs before asking anything.
- Phase 1: Why, not what — Asks what problem requires this feature, who faces it, and how they handle it today. Prevents building the wrong thing.
- Phase 2: Desired behavior — Walks through the flow step by step, identifies trigger and final result, checks for multi-role approval chains.
- Phase 2.5: UI/UX expectation — Clarifies interface type (table, form, wizard, dashboard). Offers sensible defaults when the client is unsure. Suggests simpler approaches when expectations are complex.
- Phase 3: Boundaries — Impact on existing screens, data changes, migration needs, out of scope, permissions.
- Phase 3.5: Scope optimization — Identifies what can ship fast vs what can defer to phase 2.
- Phase 4: Business rules & validation — Conditions, formulas (with real numbers), input validation, notifications, time constraints, concurrency.
- Phase 5: Edge cases — Empty states, error messages, double submit, network loss, limits, sensitive data, domain-specific cases (payment double-charge, booking overbooking, etc.).
- Phase 6: Scenario confirmation — Presents concrete happy path + unhappy paths with fake data. Confirms with user before proceeding.
- Phase 7: Handoff summary — Compiles everything into a structured doc, confirms with user, writes to
docs/explore/<feature>.md.
Output: docs/explore/<feature>.md — auto-detected by /mf-plan, which skips redundant discovery and maps explore findings directly to spec sections.
Token cost: 10–20k
/mf-plan — Generate Spec with Acceptance Scenarios
Usage:
/mf-plan "user authentication with OAuth2" # Mode A: new spec from description
/mf-plan docs/specs/auth/auth.md # Mode B: add scenarios to existing spec
/mf-plan docs/specs/auth/auth.md "add password reset flow" # Mode C: update existing specModes:
- Mode A — Creates a new spec with stories and acceptance scenarios from your description.
- Mode B — Reads an existing spec that has no acceptance scenarios yet, adds them.
- Mode C — Updates an existing spec: creates a snapshot before Major changes, shows a change report, waits for confirmation, then applies.
How it works:
- Phase 0: Codebase Awareness — Scans existing code,
docs/specs/, and project patterns before planning. Prevents specs that conflict with existing implementations. - Phase 1: Scope & Split + Scope Challenge — Evaluates feature size (>7 stories or >20 AS → must split). When a feature is large, applies Sizing & Phasing: Phase 1 (minimum viable — smallest slice with value), Phase 2 (core experience — happy path), Phase 3 (edge cases, polish), Phase 4 (optimization, monitoring) — each phase mergeable independently. Also runs a Scope Challenge before drafting: checks for existing code that already solves sub-problems (reuse vs rebuild), flags complexity smells (8+ files or 2+ new classes/services), searches for framework built-ins, checks for distribution needs (new artifact → CI/CD in scope?), and applies the Completeness Principle (complete version costs only
CC: ≤15mmore → recommend it directly). - Phase 2: Draft Spec — Generates a structured spec with stories and acceptance scenarios (Given/When/Then). Depth scales by priority: P0 gets full GWT + test data, P1 gets GWT, P2 gets 1-2 line descriptions. Runs consistency checks (CC1-CC6) before showing draft.
- Phase 3: Clarify Ambiguities — Systematically finds gaps across behavioral, data, auth, non-functional, integration, and concurrency dimensions. Questions include
(human: ~X / CC: ~Y)effort scales andCompleteness: X/10scores for each option. - Phase 4: Summary — Shows story counts, AS counts, implementation order, next steps. Every spec also gets a "What Already Exists" section (existing code that partially solves the problem) and a "Not in Scope" section (deferred work with rationale — prevents work from silently dropping).
Mode C (Update) adds:
- Classification — Walks through M1-M6 checklist to determine Major vs Minor change.
- Snapshot — Major changes trigger an automatic snapshot (
cp, bit-perfect) before editing. - Change report — Shows what will change, waits for user confirmation.
- Consistency check — Runs CC1-CC6 after every update.
Traceability IDs:
S-NNN— Stories (with priority P0/P1/P2)AS-NNN— Acceptance Scenarios (Given/When/Then, embedded in stories)FR-NNN— Functional Requirements (if needed)SC-NNN— Success Criteria (if needed)- IDs are immutable — deleted IDs are never reused.
Directory structure:
docs/specs/<feature>/
<feature>.md # single source of truth — always read this file
snapshots/ # version history (managed by mf-plan, not developers)
YYYY-MM-DD.md
YYYY-MM-DD-<REF>.mdOutput:
- Spec with acceptance scenarios:
docs/specs/<feature>/<feature>.md
/mf-challenge — Adversarial Plan Review
Usage:
/mf-challenge docs/specs/auth/auth.md # challenge a spec
/mf-challenge "user authentication" # challenge by feature nameHow it works (7 phases):
Read & Map — Reads the spec (including acceptance scenarios) and maps: decisions made, assumptions (stated AND implied), dependencies, scope boundaries, risk acknowledgments, story-AS consistency.
Scale Reviewers — Assesses complexity and selects reviewers:
| Complexity | Signals | Reviewers | |------------|---------|-----------| | Simple | 1 spec section, <20 acceptance scenarios, no auth/data | 2 | | Standard | Multiple sections, auth or data involved | 3 | | Complex | Multiple integrations, concurrency, migrations, 6+ phases | 4 |
Spawn Reviewers — Launches parallel subagents, each with an adversarial lens:
Security Adversary
- OWASP Top 10
- Injection vectors
- Auth/authz bypass
- Crypto issues
- Data exposure
- Supply chain risks
Failure Mode Analyst — "Everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic"
- Partial failures
- Concurrency & race conditions
- Cascading failures
- Recovery paths
- Idempotency
- Observability gaps
Assumption Destroyer — "'It should work' is not evidence"
- Unverified claims
- Scale assumptions
- Environment differences
- Integration contracts
- Data shape assumptions
- Timing dependencies
- Hidden dependencies
Scope & YAGNI Critic — "The best code is no code. The best feature is the one you didn't build"
- Over-engineering
- Premature abstraction
- Missing MVP cuts
- Gold plating
- Simpler alternatives
Deduplicate & Rate — Collects all findings, removes duplicates, rates severity using a Likelihood x Impact matrix. Caps at 15 findings: keeps all Critical, top High by specificity, notes how many Medium were dropped. Each reviewer is limited to top 7 findings.
Adjudicate — Evaluates each finding: Accept (valid flaw, plan should change) or Reject (false positive, acceptable risk, already handled). 1-sentence rationale for each.
User Choice — Two modes: "Apply all accepted" (fast) or "Review each" (walk through one by one).
Apply — Surgical edits only to accepted findings. Doesn't rewrite surrounding sections.
Finding format: Each finding includes Title, Severity, Confidence score (9-10 = verified; 7-8 = strong match; 5-6 = note caveat; ≤4 = omit unless Critical), Location, Flaw description, Evidence (direct quote from the plan), step-by-step Failure scenario, and Suggested fix.
6 non-negotiable rules:
- Spawn reviewers in parallel (not sequential)
- Reviewers read files directly, not summarized content
- Be hostile — no praise, no softening
- Every finding must quote the plan directly as evidence
- Quality over quantity — 3 honest findings > 15 padded ones
- Skip style/formatting — substance only
When to use:
- After
/mf-plan, before coding — for complex features - Features involving auth, payments, data pipelines, multi-service integration
- NOT needed for simple CRUD, small bug fixes, or trivial features
Token cost: 15-30k (uses parallel subagents, doesn't bloat main context)
/mf-build — TDD Delivery Loop
Usage:
/mf-build # build all changes vs base branch
/mf-build src/api/users.ts # build specific file
/mf-build "user authentication" # build specific featureHow it works:
- Phase 0: Build Context — Finds changed files vs base branch, reads the spec (acceptance scenarios in
## Storiessection are the roadmap), checksdocs/specs/<feature>/.build-progressto resume from a previous interrupted session, reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists. - Phase 1: Decide What to Test — Determines test scope from acceptance scenarios. Applies the Completeness Principle: AI writes tests ~50x faster than humans, so if full coverage costs
CC: ≤15m, it writes complete tests without asking. Always checks 8 mandatory edge case categories: null/undefined, empty arrays/strings, invalid types, boundary values (min/max), error paths (network failures, DB errors), race conditions, large data (10k+ items), and special characters (Unicode, SQL chars). - Phase 1.5: Coverage Map — Before writing a single test, traces every code path (if/else, switch, guard, try/catch) AND user flows (double-click, stale session, navigate away mid-op). Draws an ASCII diagram marking each path as
[★★★ TESTED],[★★ TESTED],[★ TESTED], or[GAP]. Gaps marked[GAP] [→E2E]need E2E tests;[GAP] [→EVAL]need evals — when flagged, defines capability + regression evals before implementing and reports pass@1/pass@3. Regression rule: if the diff changes existing behavior with no covering test, a regression test is a CRITICAL requirement — no asking, no skipping. - Phase 2: Write Tests — Writes tests for every
[GAP]identified in the Coverage Map. Before moving to Phase 3, verifies: all public functions have unit tests, all API endpoints have integration tests, edge cases covered, error paths tested, tests independent, assertions specific. - Phase 3: Build and Run — Compiles/typechecks first, then runs tests.
- Phase 4: Fix Loop — If tests fail, fixes test code only (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks whether to fix production code or adjust the test — with effort scales
(human: ~X / CC: ~Y). - Phase 5: Report — Summary with test counts, results, coverage, files touched, and any E2E/eval gaps to follow up on.
Rules:
- Never changes production code without asking first
- Never deletes or weakens existing tests
- Never adds
skip/xit/@disabledto hide failures - Max 3 fix attempts — then stops and reports the issue
What NOT to test: Private/internal methods, framework behavior, trivial getters/setters, implementation details.
/mf-fix — Test-First Bug Fix
Usage:
/mf-fix "description of the bug"How it works:
- Phase 0: Investigate — Parses the bug report, locates relevant code, checks git history, and forms a root cause hypothesis. Then draws a Bug Path Diagram (same
[GAP]/[★★ TESTED]format as/mf-build) for the buggy function — if no specific[GAP]path can be identified, the hypothesis isn't specific enough yet. - Phase 1: Write Failing Test — Regression rule first: if the bug exists because the diff changed existing behavior with no test covering that path, a regression test is a CRITICAL requirement. Creates a test that reproduces the bug and MUST fail with current code.
- Phase 2: Fix — Minimal change only. Blast radius check: if fix touches >5 files, stops and asks before editing.
- Phase 3: Verify — Bug test must pass; full suite must show no new regressions.
- Phase 4: Root Cause Analysis — Documents: Symptom, Root cause, Gap (why wasn't this caught earlier?), Prevention (one of: type constraint, validation, lint rule, spec update). Non-optional for serious bugs.
- Phase 5: Report — Structured debug report with hypothesis, fix, evidence, and regression test reference.
Multiple bugs: Triages by severity, fixes one at a time, commits each separately.
/mf-review — Pre-Merge Quality Gate
Usage:
/mf-review # review all changes vs base branch
/mf-review src/auth/ # review specific directoryHow it works:
- Phase 0: Understand Intent — Reads commit messages, checks for related spec, expands blast radius. Also notes what already exists: flags if the diff rebuilds something that already exists in the codebase.
- Phase 1: Smart Focus — Auto-detects what to focus on based on the diff (auth → security, SQL → injection, payments → idempotency, etc.). Spends 60% of analysis on the primary focus.
- Phase 2: Review — Security, correctness, API/Backend patterns (unvalidated input, missing rate limiting, missing timeouts, missing CORS, error message leakage), spec-test alignment, code quality (including diagram maintenance: stale ASCII diagrams in comments are flagged), performance, a Failure Mode Grid for each new codepath (3 dimensions: test covers it? error handling exists? user sees a clear error or silent failure? — all 3 missing = Critical gap), and an AI-generated code addendum when reviewing AI-written changes (behavioral regressions, trust boundaries, architecture drift, model cost escalation).
- Phase 3: Report — Structured report. Every finding includes a confidence score
(confidence: N/10): 9-10 = verified in code; 7-8 = strong pattern match; 5-6 = possible false positive; <5 = appendix only. Includes a "Not in scope" section listing deferred work with rationale.
Proportional review: A 5-line doc change gets a light review. A 500-line auth rewrite gets file-by-file deep analysis.
Verdicts: APPROVE / REQUEST CHANGES / NEEDS DISCUSSION.
Rules:
- At least 1 positive note — reinforces good patterns, not just problems
- Never auto-fixes code — report only
- Checks spec-test alignment: code changed → spec/acceptance scenarios/tests also changed?
/mf-commit — Smart Git Commit
Usage:
/mf-commitHow it works:
- Analyze — Scans
git status, diff stats, and file contents in one pass. - Scan for secrets — Matches patterns:
api_key,token,password,secret,private_key,credential,auth_token. Hard block — stops immediately if found, non-negotiable. - Scan for debug code — Matches:
console.log,debugger,print(),TODO:remove,HACK:,FIXME:temp,binding.pry,var_dump. Soft warn — proceeds if you confirm. - Stage files — Stages specific files by name. Never uses
git add -A. - Generate message — Conventional format:
type(scope): description. Imperative tense ("add" not "added"), no period, WHAT+WHY not HOW. - Commit — Does NOT push (safe default). Ask Claude explicitly to push.
Large diff warning: If >10 files OR >300 lines changed, suggests splitting into smaller commits for easier review.
Never stages: .env, credentials, build artifacts, generated files, binaries >1MB.
Breaking changes: If the diff removes/renames a public function, export, or API endpoint, uses feat! or fix! type, or adds a BREAKING CHANGE: footer.
6. Automatic Guards (Hooks)
Hooks run automatically — you don't invoke them. They provide passive protection.
File Guard (file-guard.js)
Trigger: After every Write or Edit operation. Action: If a modified source code file exceeds 350 lines, injects a warning suggesting modularization. Docs, configs, and templates are intentionally excluded — they are naturally long. Blocking: No — warns only, does not prevent the edit.
Checked extensions: .ts, .tsx, .js, .jsx, .py, .php, .rb, .rs, .go, .swift, .kt, .java, .cs, .cpp, .c, .dart, .vue, .svelte, .astro, and more.
Not checked: .md, .json, .yaml, .toml, .html, .css, .sh, and other non-source files.
Configuration:
# Change the line threshold (default: 350)
export FILE_GUARD_THRESHOLD=500
# Exclude files from checking (comma-separated globs)
export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js"Path Guard (path-guard.sh)
Trigger: Before every Bash command. Action: Blocks commands that reference large directories (node_modules, build artifacts, etc.). Blocking: Yes — prevents the command from running.
Default blocked paths:
node_modules, __pycache__, .git/objects, dist/, build/, .next/, vendor/, Pods/, .build/, DerivedData/, .gradle/, target/debug, target/release, .nuget, .cache
Configuration:
# Add project-specific blocked paths (pipe-separated)
export PATH_GUARD_EXTRA="\.terraform|\.vagrant|\.docker"Glob Guard (glob-guard.js)
Trigger: Before every Glob (file search) operation. Action: Blocks overly broad glob patterns at project root that would return thousands of files and fill the context window. Blocking: Yes — prevents the glob and suggests scoped alternatives.
What it blocks:
**/*.tsat project root (usesrc/**/*.tsinstead)**/*at project root (usesrc/**/*instead)*or**at project root- Any recursive glob without a specific directory prefix
What it allows:
src/**/*.ts— scoped to a specific directorytests/**/*.test.js— scoped to tests**/*.tswhen run from inside a scoped directory (e.g.,path: "src")
Comment Guard (comment-guard.js)
Trigger: After every Edit operation.
Action: Detects when real code is replaced with placeholder comments like // ... existing code ... or // rest of implementation. This is a common LLM laziness pattern.
Blocking: Yes — rejects the edit and tells Claude to preserve the original code.
What it catches:
// ... existing code ...,// ... rest of implementation// [previous code remains],// unchanged/* ... */replacing real code# ... existing ...(Python placeholders)// TODO: implementreplacing real code- Any edit where real code is replaced with a much shorter comment-only block
What it allows:
- Editing comments (old content was already comments)
- Adding comments alongside code (new content has both)
- Normal code replacements
Sensitive Guard (sensitive-guard.sh)
Trigger: Before every Read, Write, Edit, and Bash command.
Action: Protects files containing secrets: .env, private keys, credentials, tokens.
Blocking: Read/Write/Edit → blocks (exit 2). Bash commands → warns only (allows access).
The Bash warn-only behavior enables an approval flow: Claude asks the user for permission, and if approved, can use bash cat .env to read the file.
Protected files:
.env,.env.local,.env.production, etc. (but NOT.env.example)- Private keys:
*.pem,*.key,*.p12,*.pfx,*.jks - SSH keys:
id_rsa,id_ecdsa,id_ed25519 - Cloud credentials:
serviceAccountKey.json,firebase-adminsdk* - Token files:
.npmrc,.pypirc,.netrc - Any file matching
*credential*,*secret*,*private_key*
Supports .agentignore: Create a .agentignore file (or .aiignore, .cursorignore) in the project root with gitignore-style patterns to add project-specific protections.
Configuration:
# Add extra patterns (pipe-separated regex)
export SENSITIVE_GUARD_EXTRA="\.vault|.*_token\.json"Self-Review (self-review.sh)
Trigger: When Claude is about to stop (Stop event). Action: Injects a self-review checklist reminding Claude to verify quality before finishing. Blocking: No — just a reminder.
Questions asked:
- Did you leave any TODO/FIXME that should be resolved now?
- Did you create mock/fake implementations just to pass tests?
- Did you replace real code with placeholder comments?
- Do all changed files compile and typecheck cleanly?
- Did you run the full test suite, not just the new tests?
- Are there any files you modified but forgot to include in the summary?
Configuration:
# Disable self-review
export SELF_REVIEW_ENABLED=falseTesting Hooks Manually
You can test hooks by piping mock JSON payloads:
# ── Path Guard ──
# Should exit 2 (blocked)
echo '{"tool_input":{"command":"ls node_modules"}}' | bash .claude/hooks/path-guard.sh
echo $? # expect: 2
# Should exit 0 (allowed)
echo '{"tool_input":{"command":"ls src"}}' | bash .claude/hooks/path-guard.sh
echo $? # expect: 0
# ── File Guard ──
seq 1 250 > /tmp/test-large.txt
echo '{"tool_input":{"file_path":"/tmp/test-large.txt"}}' | node .claude/hooks/file-guard.js
# Should output JSON with additionalContext warning
# ── Comment Guard ──
# Should exit 2 (blocked — replacing code with placeholder)
echo '{"tool_input":{"old_string":"function hello() {\n return world;\n}","new_string":"// ... existing code ..."}}' | node .claude/hooks/comment-guard.js
echo $? # expect: 2
# Should exit 0 (allowed — replacing code with code)
echo '{"tool_input":{"old_string":"return a;","new_string":"return b;"}}' | node .claude/hooks/comment-guard.js
echo $? # expect: 0
# ── Sensitive Guard ──
# Should exit 2 (blocked)
echo '{"tool_input":{"file_path":".env"}}' | bash .claude/hooks/sensitive-guard.sh
echo $? # expect: 2
# Should exit 0 (allowed)
echo '{"tool_input":{"file_path":".env.example"}}' | bash .claude/hooks/sensitive-guard.sh
echo $? # expect: 0
# Should exit 0 (warn only — bash commands are allowed for approved access)
echo '{"tool_input":{"command":"cat .env.local"}}' | bash .claude/hooks/sensitive-guard.sh
echo $? # expect: 0 (with warning on stderr)
# ── Glob Guard ──
# Should exit 2 (blocked — broad pattern at root)
echo '{"tool_input":{"pattern":"**/*.ts"}}' | node .claude/hooks/glob-guard.js
echo $? # expect: 2
# Should exit 0 (allowed — scoped pattern)
echo '{"tool_input":{"pattern":"src/**/*.ts"}}' | node .claude/hooks/glob-guard.js
echo $? # expect: 07. Build Test Script
Usage
bash scripts/build-test.sh # run all tests
bash scripts/build-test.sh --filter "Auth" # filter by pattern
bash scripts/build-test.sh --list # show detected project type
bash scripts/build-test.sh --ci # machine-readable output
bash scripts/build-test.sh --help # show usageSupported Languages
| Language | Detected By | Test Command |
|----------|-------------|-------------|
| Swift (SPM) | Package.swift | swift test |
| Swift (Xcode) | *.xcworkspace / *.xcodeproj | xcodebuild test |
| Node (Vitest) | vitest.config.* or vitest in package.json | npx vitest run |
| Node (Jest) | jest.config.* or jest in package.json | npx jest |
| Python (pytest) | pyproject.toml, setup.py, pytest.ini | python3 -m pytest |
| Rust | Cargo.toml | cargo test |
| Go | go.mod | go test -race ./... |
| Java (Gradle) | build.gradle / build.gradle.kts | ./gradlew test |
| Java (Maven) | pom.xml | mvn test |
| C# (.NET) | *.sln / *.csproj | dotnet test |
| Ruby (RSpec) | Gemfile with rspec | bundle exec rspec |
| Ruby (Minitest) | Gemfile without rspec | bundle exec rake test |
Detection order: first match wins. The script also detects package managers (pnpm, bun) for Node projects.
Exit Codes
| Code | Meaning | |------|---------| | 0 | All tests passed | | 1 | Tests failed | | 2 | No project detected or missing tooling |
CI Integration
# GitHub Actions example
- name: Run tests
run: bash scripts/build-test.sh --ciAdding a New Language
Edit scripts/build-test.sh:
- Add a
detect_<language>()function - Add it to the
DETECTORSarray - The function should set
LANG_NAMEandTEST_CMD
8. Spec Format
Spec Template
Create specs at docs/specs/<feature>/<feature>.md:
# Spec: <Feature Name>
**Created:** 2026-04-02
**Last updated:** 2026-04-02
**Status:** Draft | Active | Deprecated
## Overview
What this feature does, why it exists, who uses it. 2-3 sentences.
## Data Model
Entities, attributes, relationships (if applicable).
## Stories
### S-001: <Story name> (P0)
**Description:** [user story]
**Source:** [optional: ticket/issue ref]
**Acceptance Scenarios:**
AS-001: <short description>
- **Given:** [state]
- **When:** [action]
- **Then:** [expected]
- **Data:** [test data]
AS-002: <short description>
- **Given:** [error state]
- **When:** [action]
- **Then:** [error handling]
### S-002: <Story name> (P1)
AS-003: <short description>
- **Given:** [state]
- **When:** [action]
- **Then:** [expected]
### S-003: <Story name> (P2)
AS-004: <short description>
- [flow description + expected behavior]
## Constraints & Invariants
Rules that must always hold.
## Change Log
| Date | Change | Ref |
|------|--------|-----|
| 2026-04-02 | Initial creation | -- |Skip sections that don't apply. Match depth to feature complexity.
Acceptance Scenario depth by priority:
- P0: Full Given + When + Then + Data + Setup. At least 1 happy path + 1 error path.
- P1: Given + When + Then. At least 1 happy path.
- P2: 1-2 line flow description. At least 1 scenario.
Snapshots (Version History)
When /mf-plan Mode C detects a Major change (new story, removed story, priority change, flow change, behavior change for P0, or constraint change), it automatically creates a snapshot before updating:
docs/specs/<feature>/snapshots/
2026-04-02.md ← full copy at that point in time
2026-04-05-BILL-101.md ← with ticket referenceSnapshots are immutable, managed by mf-plan (not developers), and capped at 5 most recent.
Naming Conventions
| Item | Convention | Example |
|------|-----------|---------|
| Spec directory | docs/specs/<feature>/ | docs/specs/user-auth/ |
| Spec file | <feature>.md in feature directory | user-auth.md |
| Story ID | S-NNN sequential per spec | S-001, S-005 |
| Scenario ID | AS-NNN sequential across all stories | AS-001, AS-042 |
| Priority | P0 (critical), P1 (important), P2 (nice-to-have) — per story | — |
| Snapshot | YYYY-MM-DD.md or YYYY-MM-DD-<REF>.md in snapshots/ | 2026-04-02.md |
9. Customization
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| FILE_GUARD_THRESHOLD | 200 | Max lines before file guard warns |
| FILE_GUARD_EXCLUDE | (empty) | Comma-separated globs to skip (e.g. *.generated.swift) |
| PATH_GUARD_EXTRA | (empty) | Additional pipe-separated patterns to block (e.g. \.terraform) |
| SENSITIVE_GUARD_EXTRA | (empty) | Additional pipe-separated patterns for sensitive files (e.g. \.vault) |
| SELF_REVIEW_ENABLED | true | Set to false to disable the self-review checklist on Stop |
Set these in your shell profile or project .envrc (if using direnv).
Extending CLAUDE.md
Add project-specific rules to .claude/CLAUDE.md:
## Project-Specific Rules
- All API endpoints must have OpenAPI annotations
- Database migrations must be reversible
- UI components must support dark mode
- All strings must be localized via i18n keysAdding Custom Skills
Create new skills in .claude/skills/<name>/SKILL.md:
# .claude/skills/deploy/SKILL.md
Run the deployment pipeline:
1. /mf-review
2. /mf-commit
3. Run: bash scripts/deploy.sh $ARGUMENTS
4. Verify deployment health: curl -f https://api.example.com/healthThen use: /deploy staging
10. Token Cost Guide
| Activity | Tokens | Frequency |
|----------|--------|-----------|
| /mf-build (incremental, 1-3 files) | 5–10k | Every code chunk |
| /mf-fix (single bug) | 3–5k | As needed |
| /mf-commit | 2–4k | Every commit |
| /mf-review (diff-based) | 10–20k | Before merge |
| /mf-plan (new feature) | 20–40k | Start of feature |
| /mf-challenge (adversarial review) | 15–30k | After /mf-plan, complex features |
| Full audit (manual prompt) | 100k+ | Before release |
Minimizing Token Usage
- Test incrementally.
/mf-buildafter each small chunk uses 5-10k. Waiting until everything is done then running/mf-buildon a large diff uses 50k+. - Use filters.
/mf-build src/auth/login.tsis cheaper than/mf-buildon the whole project. - Skip
/mf-planfor tiny changes. Under 5 lines with no behavior change? Just/mf-buildand/mf-commit. - Use
/mf-reviewonly before merge. Not after every commit.
11. Troubleshooting
Hook not firing
Symptom: File guard or path guard doesn't trigger.
Check:
- Is
settings.jsonvalid?node -e "JSON.parse(require('fs').readFileSync('.claude/settings.json','utf-8'))" - Are hooks executable?
ls -la .claude/hooks/ - Is Node.js available?
node --version - Is
$CLAUDE_PROJECT_DIRset? Check in Claude Code with:echo $CLAUDE_PROJECT_DIR
Tests not detected
Symptom: build-test.sh says "No supported project detected."
Check:
- Are you in the project root?
pwd - Does the project marker file exist? (e.g.,
package.json,Cargo.toml) - Run
bash scripts/build-test.sh --listfor diagnostic output.
Wrong base branch
Symptom: /mf-build or /mf-review compares against wrong branch.
Check:
git symbolic-ref refs/remotes/origin/HEADIf this is wrong or missing:
git remote set-head origin <your-main-branch>Path guard blocking a legitimate command
Symptom: Claude can't run a command you need.
Fix: The path guard blocks broad patterns. If you need to access build/ for a specific reason, run the command directly in your terminal (not through Claude Code).
File guard warning on generated files
Fix: Set the exclude pattern:
export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"12. FAQ
Q: Do I need specs for every tiny change?
A: No. Changes under 5 lines with no behavior change can skip the spec. Just /mf-build and /mf-commit. The spec-first rule is for meaningful behavior changes.
Q: Can I use mocks in tests? A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.
Q: What if Claude writes a test that tests the wrong thing?
A: This usually means the spec is ambiguous. Clarify the spec first, then re-run /mf-build. Good specs produce good tests.
Q: Can I use this with other AI coding tools?
A: The commands and hooks are Claude Code-specific. The specs, workflow, and build-test.sh work with any tool or manual workflow.
Q: When should I use /mf-challenge?
A: After /mf-plan, for complex features involving authentication, payments, data pipelines, or multi-service integration. It spawns parallel hostile reviewers that find security holes, failure modes, and false assumptions BEFORE you write code. Skip it for simple CRUD or small features — the overhead isn't worth it.
Q: How do I do a full coverage audit? A: This is intentionally not a command (it's expensive and rare). When needed, prompt Claude directly: "Audit test coverage for feature X against docs/specs/X/X.md acceptance scenarios. Identify gaps and write missing tests."
Q: What if my project uses multiple languages?
A: build-test.sh detects the first match. For monorepos, you may need to run it from each sub-project directory or customize the script.
Q: Can I add more skills?
A: Yes. Create a directory .claude/skills/<name>/SKILL.md and it becomes available as a slash command. See Customization.
Q: How do I update the kit in existing projects?
A: Run npx claude-devkit-cli upgrade. It automatically detects which files you've customized and only updates unchanged files. Use --force to overwrite everything.
Q: I installed with the old setup.sh — how do I migrate?
A: Run npx claude-devkit-cli init --adopt . to generate a manifest from your existing files without overwriting anything. Future upgrades will then work normally.
