@kiwidata/grimoire
v0.1.3
Published
Gherkin + MADR spec-driven development for AI coding assistants
Maintainers
Readme
Grimoire
Spec-driven AI development framework. Encodes decades of software engineering discipline — requirements, design review, TDD, change management, traceability — into AI coding workflows so they can't be skipped.
Your request → Gherkin spec → Implementation plan → Red-green BDD → Verified, auditable codeWhy Grimoire
The software industry spent decades learning hard lessons about building reliable systems. AI coding agents have abandoned most of these practices, hoping LLMs will magically produce correct code without discipline. They don't — AI-generated code has 1.7x more bugs, 76% of LLM refactoring suggestions are hallucinations, and developers using AI are 19% slower while believing they're faster.
Grimoire adds the missing discipline:
- Specs before code — every behavior is a Gherkin
.featurefile that doubles as an executable acceptance test - Plans before implementation — concrete task lists with exact file paths, not "implement the feature"
- Tests that actually test — mandatory red-green BDD cycle with assertion quality checks
- Codebase knowledge without exploration — area docs, data schemas, and symbol maps so the AI doesn't waste context reading files
- Full audit trail — every commit traces back to a requirement via git trailers
- Architecture decisions on record — MADR decision records so the AI doesn't re-litigate choices
Works with any AI coding agent that reads AGENTS.md: Claude Code, Cursor, Codex, Windsurf, Cline, Aider, and more.
Install
npm install -g @kiwidata/grimoireRequires Node.js 20+ and git.
git clone https://github.com/kiwi-data/grimoire.git
cd grimoire
npm install
npm run build
npm link # makes `grimoire` available globally
grimoire --version # should print 0.1.2To update after pulling new changes:
cd /path/to/grimoire
git pull
npm run build
cd /path/to/your-project
grimoire update # refreshes AGENTS.md + skills to latestTo unlink: npm unlink -g @kiwidata/grimoire
Quick Start
cd my-project
grimoire init # Auto-detect tools, configure checks, install skills
grimoire map # Snapshot codebase structure into .grimoire/docs/Then talk to your AI assistant:
You: "Users should be able to log in with 2FA"
→ /grimoire:draft Creates login.feature with Given/When/Then scenarios
→ /grimoire:plan Generates tasks: write step defs, then production code
→ /grimoire:review (optional) Product, security, and engineering review
→ /grimoire:apply Implements with strict red-green BDD
→ /grimoire:verify Confirms all scenarios pass, no regressions
→ grimoire archive Syncs to baseline, archives manifest
→ grimoire pr Generates PR description from artifactsInteractive setup that auto-detects your project's tools and asks preferences for commit style, doc generator, AI agents, security tools, and compliance frameworks (OWASP, PCI-DSS, HIPAA, SOC2, GDPR, ISO 27001). Creates:
AGENTS.md— workflow instructions read by AI coding assistants.grimoire/config.yaml— tool configuration and check pipeline.grimoire/— decisions, docs, change tracking, archive directoriesfeatures/— where Gherkin specs live.claude/skills/— Claude Code skill definitions (ignored by other agents).git/hooks/pre-commit— runsgrimoire checkbefore commits
Use grimoire init --no-detect to skip interactive tool detection. Most unconfigured steps are skipped, but security, dep_audit, secrets, and best_practices have built-in LLM fallbacks that run automatically — every project gets baseline security scanning out of the box.
Workflow
1. Draft — Define what you're building
Grimoire routes your request to the right format:
- "Users should be able to log in with 2FA" → Gherkin feature
- "We should use PostgreSQL instead of MySQL" → MADR decision record
- "The login page is broken" →
/grimoire:bug(reproduce first, then fix) - "A tester found a problem" →
/grimoire:bug-report→/grimoire:bug-triage→ routed fix
Produces .feature files (with security tags like @security, @auth, @pii, @pci-dss when applicable), decision records, data.yml for schema changes, and a manifest tracking the change.
2. Plan — Generate concrete tasks
Every scenario becomes a pair: write the step definition (test), then write the production code. Tasks reference exact file paths, exact assertions, and real patterns from area docs. Data changes (models, migrations) are ordered before feature code.
The plan skill reads .grimoire/docs/ to find reusable utilities, coding patterns, and where new code should go — so the AI plans with real codebase knowledge, not guesses.
3. Review — Multi-perspective design review (optional)
Five personas validate the change before any code is written:
- Product manager — completeness, missing edge cases, unclear requirements
- Senior engineer — simplicity, code reuse, architecture fit, task quality
- Security engineer — STRIDE threat analysis, OWASP Top 10 / CWE classification, compliance verification (PCI-DSS, HIPAA, GDPR, SOC2 when configured), input validation, auth boundaries, vulnerable dependencies, secrets
- QA engineer — testability, negative scenarios, edge cases, observability, regression risk
- Data engineer — schema design, migration safety, index coverage (when
data.ymlpresent)
Issues flagged as blocker or suggestion. Security findings tagged with OWASP category and CWE ID. Skip for small/low-risk changes.
4. Apply — Build with strict red-green BDD
For each task:
- Write the step definition (test)
- Run it — must fail (red). A test that passes immediately is broken.
- Write production code
- Run it — must pass (green)
- Test quality check — verify strong assertions, not
assert True - Mark done, move to next task
Session management: Each task (or group of 2-3) runs in a fresh subagent to avoid context bloat. tasks.md is the coordination mechanism — if the session is interrupted, the next agent picks up where you left off.
Stuck detection: After 3 failed attempts with different approaches on a single task, the agent stops and asks for help instead of looping.
5. Verify — Confirm everything works
- Completeness — all tasks done
- Correctness — every scenario has a step definition with real assertions
- Coherence — architecture decisions are followed
- Test quality — flags weak assertions (
assert True,toBeDefined()), empty bodies, tautological tests - Security compliance — verifies plan-stage security patterns were followed (parameterized queries, bcrypt, no hardcoded secrets), checks review blockers were addressed, runs OWASP Top 10 surface scan on the diff, validates security-tagged scenarios (
@security,@auth,@pii,@pci-dss, etc.) - Dead features — specs that exist but code no longer implements
6. PR & Archive
grimoire pr generates a PR description from manifests, features, decisions, and task progress. Optional --review runs an LLM review of the actual diff. --create creates via gh or glab.
grimoire archive syncs features to baseline, accepts decisions, updates data schema, and archives the manifest.
Walkthrough
Full grimoire cycle end-to-end — adding two-factor authentication to an existing login feature.
Draft
You: "Users should verify their identity with a TOTP code after entering their password"The AI runs /grimoire:draft and produces:
.grimoire/changes/add-2fa-login/
├── manifest.md # Why, what's changing, scope
├── features/
│ └── auth/
│ └── login.feature # Updated with 2FA scenarios
└── decisions/
└── 0003-totp-library.md # Chose pyotp over django-otplogin.feature:
Feature: Login with two-factor authentication
As a user
I want to verify my identity with a second factor
So that my account is protected from unauthorized access
Background:
Given I am a registered user with 2FA enabled
Scenario: Successful login with valid TOTP code
Given I have entered valid credentials
When I enter a valid TOTP code
Then I should be redirected to the dashboard
And my session should be marked as fully authenticated
Scenario: Login rejected with expired TOTP code
Given I have entered valid credentials
When I enter an expired TOTP code
Then I should see an error message "Code expired"
And I should remain on the verification page
Scenario: Login rejected with invalid TOTP code
Given I have entered valid credentials
When I enter an invalid TOTP code
Then I should see an error message "Invalid code"
And I should remain on the verification pageYou review and approve. Manifest status: draft → approved.
Plan
The AI runs /grimoire:plan, reads the approved features + area docs + data schema, and generates tasks.md:
# Tasks: add-2fa-login
> **Change**: Add TOTP-based 2FA to login
> **Features**: auth/login.feature
> **Decisions**: 0003-totp-library.md
> **Test command**: `pytest tests/ -k "auth"`
> **Status**: 0/8 tasks complete
## 1. Data Layer
- [ ] 1.1 Add `totp_secret` field to User model in `src/models/user.py`
- CharField, nullable, 32 chars
- Migration: add nullable column (safe for live DB)
- [ ] 1.2 Run `python manage.py makemigrations && python manage.py migrate`
## 2. TOTP Verification
- [ ] 2.1 Write step defs in `tests/step_defs/test_auth.py` for scenario: "Successful login with valid TOTP code"
- Given: create user with 2FA enabled, generate valid TOTP secret
- When: POST to `/verify-totp/` with valid code from `pyotp.TOTP(secret).now()`
- Then: assert response redirects to `/dashboard/` (status 302)
- Then: assert session `is_2fa_verified` is True
- [ ] 2.2 Implement `VerifyTOTPView` in `src/views/auth.py`
## 3. Error Cases
- [ ] 3.1 Write step defs for scenario: "Login rejected with expired TOTP code"
- [ ] 3.2 Write step defs for scenario: "Login rejected with invalid TOTP code"
## 4. Verification
- [ ] 4.1 Run `pytest tests/ -k "auth"` — all new scenarios green
- [ ] 4.2 Run `pytest tests/` — no regressions in existing testsApply
The AI runs /grimoire:apply, creates branch feat/add-2fa-login, and works through tasks:
Task 1.1: Add totp_secret field to User model
→ Editing src/models/user.py
✓ Done
Task 2.1: Write step defs for "Successful login with valid TOTP code"
→ Writing tests/step_defs/test_auth.py
→ Running: pytest tests/ -k "successful_login_with_valid_totp"
✗ FAILED (red) — good, test correctly fails without production code
Task 2.2: Implement VerifyTOTPView
→ Writing src/views/auth.py
→ Running: pytest tests/ -k "successful_login_with_valid_totp"
✓ PASSED (green)
→ Test quality check: assert response.status_code == 302 ✓ strong assertion
...continues through all tasks...
Task 4.2: Full regression check
→ Running: pytest tests/
✓ 47 passed, 0 failedEvery commit includes Change: add-2fa-login trailer. tasks.md is updated in real time.
Verify
The AI runs /grimoire:verify:
# Verification Report: add-2fa-login
## Summary
- Scenarios verified: 3
- Decisions verified: 1
- Issues found: 0 critical, 1 suggestion
## Verified Scenarios
- [x] "Successful login with valid TOTP code" — step def in test_auth.py:42
- [x] "Login rejected with expired TOTP code" — step def in test_auth.py:67
- [x] "Login rejected with invalid TOTP code" — step def in test_auth.py:85
## Suggestions
- Consider adding a rate-limiting scenario for repeated failed TOTP attempts
Recommendation: Ready to archive.PR & Archive
grimoire pr --create # Creates PR via gh with full description
grimoire archive add-2fa-login # Syncs features, accepts decision, archives manifestThe feature files move to features/auth/login.feature (baseline). The decision moves to .grimoire/decisions/0003-totp-library.md with status accepted. The manifest is archived to .grimoire/archive/.
grimoire trace src/views/auth.py:42 now shows: commit abc123 → Change: add-2fa-login → features: auth/login.feature → decision: 0003-totp-library.md.
Tester hits a failure during exploratory checkout. Developer reproduces, classifies, fixes, hands back for verification.
Report
Tester runs Playwright against staging and a checkout step fails. They run /grimoire:bug-report and paste the Playwright output (or hand it via the Playwright MCP):
You: /grimoire:bug-report
[pastes Playwright failure: timeout on #place-order, screenshot, trace.zip]The skill scans features/checkout/*.feature for matching scenarios, references the affected spec, and writes a structured report:
.grimoire/bugs/0042-place-order-timeout/
├── report.md # Reproduction steps, env, severity, spec refs
└── artifacts/
├── screenshot.png
└── trace.zipreport.md lists: failing scenario (features/checkout/place-order.feature:23), exact steps, expected vs actual, env (browser, build SHA), and a confidence note (high — Playwright trace shows network 504 from /api/orders).
Triage
Developer picks it up, runs /grimoire:bug-triage 0042. The skill classifies into one of 8 categories (code, infra, config, data, third-party, security, docs, not-a-bug) and routes:
Bug 0042: place-order timeout
Category: CODE (small)
Root cause hypothesis: missing timeout on outbound payment-provider call
Spec coverage: place-order.feature covers happy path; no timeout scenario
Route: /grimoire:bug (reproduce-first fix in current repo)
Suggested feature gap: add "payment provider unavailable" scenarioFor INFRA/CONFIG it would emit a ticket stub for the platform team. For SECURITY it routes to the restricted workflow with confidential handling.
Fix
Developer runs /grimoire:bug 0042. Reproduce-first discipline:
1. Write failing test reproducing the bug
→ tests/checkout/test_place_order.py::test_payment_timeout
→ pytest -k test_payment_timeout
✗ FAILED — reproduces the timeout
2. Add timeout + retry to PaymentClient.charge()
→ src/checkout/payment.py
→ pytest -k test_payment_timeout
✓ PASSED
3. Full regression
→ pytest tests/checkout/
✓ 31 passedSkill also drafts the missing scenario into features/checkout/place-order.feature (under a # pending tester sign-off comment) and appends a tester verification checklist to report.md:
.grimoire/bugs/0042-place-order-timeout/report.md (verification section)
- [ ] Original Playwright scenario passes against the fix branch
- [ ] New "payment provider unavailable" scenario passes
- [ ] No regression in existing checkout suiteCommit trailer: Bug: 0042-place-order-timeout. Tester runs through the checklist, marks complete, and the bug archives alongside the change.
Reviewing PR #312 from a teammate. Run /grimoire:pr-review 312 (or paste the PR URL).
The skill fetches the diff via gh pr view 312 --json + gh pr diff 312, loads relevant area docs and feature files, and runs the multi-persona lens (PM, engineer, security, QA, data — same set as /grimoire:review on outgoing changes):
PR #312: Add bulk export endpoint
Spec coverage: features/exports/bulk-export.feature ✓ (3 scenarios)
Decisions referenced: 0021-export-pagination.md ✓
PM lens ⚠ scope drift — diff also touches user-search; not in PR description
Engineering lens ✗ N+1 in src/exports/serializer.py:48 (loop calls user.profile)
Security lens ✗ no rate limit on /api/exports/bulk — DoS risk (CWE-770)
QA lens ⚠ no scenario for partial-failure path (some rows succeed, some fail)
Data lens ✓ schema unchangedOutput is structured Markdown ready to paste as a PR comment, or wired through gh pr comment 312 --body-file review.md. Each finding includes file:line, severity, and a suggested change — same format the post-implementation review uses on your own diffs (grimoire pr --review), so reviewers and authors share one mental model.
Scope & Boundaries
Grimoire owns the inner loop — the Dev and Sec portions of DevSecOps. Ops is explicitly out of scope.
What Grimoire covers
| Area | What it does | How |
|---|---|---|
| Requirements engineering | Gherkin specs as executable acceptance tests | Draft skill |
| Architecture decisions | MADR records with cost-of-ownership | Draft skill |
| Design review | Multi-persona review before code is written | Review skill |
| Test-driven development | Strict red-green BDD enforcement | Apply skill |
| Test quality | Static analysis for weak/empty/tautological tests | grimoire test-quality, verify skill |
| Regression prevention | All existing tests must pass; regressions block completion | Apply + verify skills |
| Change management | Manifests, task tracking, session resumption, archive | Full lifecycle |
| Traceability | Every commit → change → feature → decision | grimoire trace |
| Security review | STRIDE threat modeling, OWASP/CWE tagging at design time | Review + plan + verify skills |
| Security tooling | SAST, SCA, secrets scanning in pre-commit pipeline | grimoire check |
| Bug discipline | Reproduce-first fixes, structured triage, confidential security handling | Bug workflow skills |
| Exploratory testing | Gap analysis, coverage mapping, charter-based sessions | Bug-explore + bug-session skills |
| Tech debt tracking | Structured debt register with severity and formal exceptions | Refactor skill |
| CI integration | Spec validation + checks + test quality with GHA annotations | grimoire ci |
What Grimoire does not cover
Ops is out of scope. The outer loop — deploy, run, monitor, scale — requires infrastructure and environment management that a repo-local framework cannot own:
- Deployment automation — CD pipelines, environment promotion, rollback, blue-green/canary deploys
- Integration and e2e testing — need running services, realistic data, and production-like infrastructure
- Performance and load testing — requires dedicated infrastructure and load generators
- Monitoring and observability — APM, alerting, SLOs, incident response tooling
- Infrastructure as code — Terraform, Pulumi, Kubernetes manifests
- Feature flags and progressive rollout
Grimoire captures environment context (.grimoire/docs/context.yml) so the AI understands deployment topology, and the review skill flags when changes need integration or performance testing. But orchestrating those tests is platform work, not framework work.
Security model
Grimoire's security capabilities are AI-mediated at design time, not static analysis enforcement at build time. The review skill runs STRIDE threat modeling, the plan skill mandates proven security patterns (OAuth2, bcrypt, parameterized queries), and the verify skill checks that guidance was followed. The check pipeline runs SAST/SCA/secrets tools when configured.
This means security coverage depends on: (1) configuring the right tools in your check pipeline, and (2) the AI following its own instructions. Projects that run grimoire init with detection get solid defaults. Projects that skip detection should configure tools.security, tools.dep_audit, and tools.secrets in .grimoire/config.yaml.
Grimoire does not provide compliance framework enforcement (OWASP ASVS checklists, CWE mapping), SBOM generation, artifact signing, or DAST. These require dedicated security tooling.
Features
Codebase Intelligence
grimoire map # Structural snapshot (.grimoire/docs/.snapshot.json)
grimoire map --refresh # Diff against existing docs, show gaps
grimoire map --duplicates # Run jscpd duplicate detection
grimoire map --depth <n> # Max directory depth to scan (default 4)Snapshots the directory layout, language mix, and per-area metrics so area docs and plans don't have to re-explore the tree. No native dependencies.
For richer intelligence (call graphs, data flow tracing, dependency analysis), grimoire integrates with codebase-memory-mcp. grimoire init offers to install it.
Area Docs & Data Schema
grimoire map + /grimoire:discover generates docs in .grimoire/docs/:
- Purpose and boundaries of each module
- Key files with responsibilities
- Reusable code inventory — exact function names, file paths, line numbers
- Naming conventions, structural patterns, where new code goes
.grimoire/docs/data/schema.yml captures your data layer — SQL tables, document collections, external API contracts — so the AI reads this instead of model files.
grimoire docs generates a browsable .grimoire/docs/OVERVIEW.md project summary.
Pre-Commit Pipeline
grimoire check
lint ✓ passed (0.8s)
format ✓ passed (0.3s)
duplicates ✓ passed (1.2s)
complexity ✓ passed (0.5s)
unit_test ✓ passed (3.4s)
bdd_test ✓ passed (2.1s)
security ✓ passed (12.1s)
dep_audit ✓ passed (1.0s)
secrets ✓ passed (0.4s)
best_practices ✓ passed (8.2s)
9 passed, 0 failed, 1 skippedAuto-detected during grimoire init. Any tool can use name: llm with a prompt: for AI-powered review. Also sets up enforcement hooks for Claude Code (.claude/hooks.json) and git (.git/hooks/pre-commit).
Test Quality
grimoire test-quality # Analyze all test files
grimoire test-quality tests/** # Specific filesStatic analysis catching weak tests: empty bodies, missing assertions, weak assertions (assert True, toBeDefined()), tautological tests. Supports Python and JS/TS. Integrated into apply (per-task gate) and verify (test intelligence).
Bug Workflow
Tester finds issue → /grimoire:bug-report → structured report with spec references
↓
Developer picks it up → /grimoire:bug-triage → classify root cause
↓
┌─────────────┬───────────┼───────────────┐
↓ ↓ ↓ ↓
CODE (small) CODE (big) INFRA/CONFIG SECURITY
/grimoire:bug → draft route to team confidential fix
(repro → fix manifest (create ticket) (restricted workflow)
→ tester stub)
checklist)Bug reports accept output from testing tools (Playwright, Cypress, Postman, k6) via MCP or pasted directly — auto-extracting failed assertions, screenshots, and reproduction steps.
Triage classifies into 8 categories (code, infrastructure, configuration, data, third-party, security, documentation, not-a-bug) and routes to the right team. Security issues follow a restricted workflow with confidential handling.
Bug fixes (/grimoire:bug) follow reproduce-first discipline and generate a tester verification checklist.
Exploratory testing (/grimoire:bug-explore) operates in tester mode (spec-only gap analysis), developer mode (code-level analysis), and onboard mode (tester's guide).
Testing sessions (/grimoire:bug-session) provide charter-based exploratory testing with progress tracking, inline bug filing, and structured debrief.
Audit Trail
Every commit includes a Change: git trailer linking code → commit → change → feature → decision.
grimoire trace src/auth.py:42 # What requirement introduced this line?
grimoire log --from v1.0 # Release notes from archived changesProject Health
grimoire health
features 100% ██████████ 12 scenarios in 5 files
decisions 89% █████████░ 8/9 current
area docs 75% ████████░░ 6/8 areas documented
data schema 100% ██████████ 4 models documented
test coverage 60% ██████░░░░ 3/5 features have step definitions
unit coverage 82% █████████░ 82% line coverage
duplicates — 2 clones detected
complexity — no high-complexity functions
Overall 84% █████████░Contract Testing
The plan, apply, and verify skills enforce a contract-first approach for external APIs:
- Mock at the HTTP boundary only — never mock internal code or client wrappers
- Fixtures must match
schema.yml— test data mirrors the documented API contract - Contract drift detection — verify flags when external API changes don't have matching test updates
- Client code reads only documented fields — prevents coupling to undocumented API behavior
Caveman Mode
Token optimization for context-constrained agents. Set project.caveman in .grimoire/config.yaml:
| Level | Effect |
|-------|--------|
| none | Full AGENTS.md instructions (default) |
| lite | Trimmed explanations, same workflow |
| full | Minimal instructions, experienced users |
| ultra | Bare-minimum workflow skeleton |
Conflict Detection
grimoire list detects when multiple active changes modify the same feature file and flags the conflict.
Debt Register
The refactor skill maintains .grimoire/debt-register.yml — a persistent record of tech debt items with severity, Fowler quadrant classification (deliberate/inadvertent × prudent/reckless), fingerprint-based dedup, and aging signals. Formal exceptions live in .grimoire/debt-exceptions.yml with optional expiry dates.
Multi-LLM Support
Grimoire works with any AI coding assistant that reads AGENTS.md (open standard, 60K+ repos):
- Claude Code — skills in
.claude/skills/, hooks via.claude/hooks.json - OpenCode — skills in
.opencode/skills/(also reads.claude/skills/natively) - Codex (OpenAI) — skills in
.agents/skills/ - Cursor —
.cursor/rules/grimoire.mdc(AGENTS.md derivative) - GitHub Copilot —
.github/copilot-instructions.md(AGENTS.md derivative) - Windsurf, Cline, Aider, etc. — read
AGENTS.mdfor workflow instructions
grimoire init prompts for which agents you use and installs skills to the correct path(s) for each. You can also pass --agent to select non-interactively:
grimoire init --agent claude --agent opencode # skills to both dirs
grimoire init --agent cursor # .cursor/rules/grimoire.mdc
grimoire init --agent copilot # .github/copilot-instructions.mdReference
| Skill | Purpose |
|-------|---------|
| /grimoire:draft | Draft features and/or decisions collaboratively |
| /grimoire:plan | Generate detailed implementation tasks from specs |
| /grimoire:review | Multi-perspective design review (PM, engineer, security, QA, data) |
| /grimoire:apply | Execute tasks with strict red-green BDD |
| /grimoire:verify | Post-implementation verification + test quality |
| /grimoire:audit | Discover undocumented features and decisions |
| /grimoire:remove | Tracked feature removal with impact assessment |
| /grimoire:discover | Generate area docs and data schema from codebase |
| /grimoire:refactor | Find, prioritize, and track tech debt |
| /grimoire:bug | Disciplined bug fix with reproduction test first |
| /grimoire:bug-report | Structured bug reporting (accepts test tool output) |
| /grimoire:bug-triage | Classify and route bug reports |
| /grimoire:bug-explore | AI-guided exploratory testing and gap analysis |
| /grimoire:bug-session | Charter-based exploratory testing sessions |
| /grimoire:branch-guard | Enforce branch hygiene before starting new feature work (also wired as a hook) |
| /grimoire:commit | Contextual commit messages with change trailers |
| /grimoire:pr | Generate PR description + optional diff review |
| /grimoire:pr-review | Review a teammate's PR with the multi-persona lens |
| Command | Description |
|---------|-------------|
| grimoire init [path] | Initialize grimoire (auto-detects tools, installs skills, sets up hooks) |
| grimoire init --agent <type> | Add agent (claude/opencode/codex/cursor/copilot, repeatable) |
| grimoire init --skip-agents | Skip generating AGENTS.md instructions |
| grimoire init --skip-skills | Skip installing skills for selected agents |
| grimoire init --no-detect | Skip auto-detection of project tools |
| grimoire init --install-codebase-memory-mcp | Mark codebase-memory-mcp as a recommended integration |
| grimoire init --install-caveman-plugin | Mark caveman skill plugin as a recommended integration |
| grimoire update [path] | Update AGENTS.md, skills, and hooks to latest version |
| grimoire update --skip-agents\|--skip-skills\|--skip-hooks\|--skip-templates\|--skip-config | Skip parts of the update |
| grimoire update --force-templates | Overwrite existing template files |
| grimoire list | List active changes (with conflict detection) |
| grimoire list --features | List feature files |
| grimoire list --decisions | List decision records |
| grimoire status <id> | Show change status, branch, and task progress |
| grimoire validate [id] | Validate features, decisions, and manifests |
| grimoire validate --strict | Enable strict validation |
| grimoire archive <id> [-y] | Archive a completed change (-y skips confirmation) |
| grimoire map | Structural codebase scan |
| grimoire map --duplicates | Run jscpd duplicate detection |
| grimoire map --refresh | Diff against existing docs, show gaps |
| grimoire map --depth <n> | Max directory depth to scan (default 4) |
| grimoire check [steps...] | Run pre-commit pipeline |
| grimoire ci | Run CI pipeline |
| grimoire ci --setup | Generate .github/workflows/grimoire.yml template |
| grimoire ci --annotations | Output GitHub Actions annotations |
| grimoire ci --skip <steps...> | Skip specific check steps |
| grimoire pr [id] | Generate PR description from change artifacts |
| grimoire pr --create | Create PR via gh/glab |
| grimoire pr --review | Run post-implementation LLM review of diff |
| grimoire test-quality [files] | Analyze test files for quality issues |
| grimoire log [--from <ref>] [--to <ref>] | Generate change log / release notes |
| grimoire trace <file[:line]> | Trace file to originating grimoire change |
| grimoire diff <id> | Compare proposed change specs against the baseline |
| grimoire docs [-o <path>] | Generate human-readable project overview |
| grimoire health | Project health score |
| grimoire health --badges <file> | Write shields.io badges into a file (e.g., README.md) |
| grimoire branch-check | Branch-guard check (used by hook; --hook, --prompt <text>) |
Most commands support --json for machine-readable output. grimoire check runs all steps by default and also supports --changed (only changed files), --fail-fast (stop at first failure), and --skip <steps...>.
| Check step | What it does | Example tools |
|---|---|---|
| lint | Static analysis / linter | eslint, biome, ruff, flake8 |
| format | Code formatting | prettier, biome, black, ruff format |
| unit_test | Unit test runner | vitest, jest, pytest, go test |
| bdd_test | BDD / feature test runner | cucumber-js, behave, pytest-bdd |
| duplicates | Copy-paste detection | jscpd |
| complexity | Cyclomatic complexity | radon, eslint-complexity |
| dead_code | Unused code detection | knip, ts-prune, vulture |
| doc_style | Docstring/comment style compliance | Built-in (Google, NumPy, Sphinx, JSDoc, TSDoc) |
| security | Security scanner | bandit, semgrep, npm audit, or name: llm |
| dep_audit | Dependency vulnerability audit | npm audit, pip-audit, safety |
| secrets | Hardcoded secret detection | gitleaks, detect-secrets, trufflehog, or name: llm |
| best_practices | General code review | name: llm (LLM-powered) |
# .grimoire/config.yaml
project:
language: typescript # Auto-detected: python, typescript, javascript, go, rust
package_manager: npm # Auto-detected: npm, yarn, pnpm, uv, poetry, pip, cargo
commit_style: conventional # conventional, angular, or custom
doc_tool: typedoc # sphinx, mkdocs, typedoc, jsdoc, rustdoc, godoc
comment_style: tsdoc # google, numpy, sphinx, jsdoc, tsdoc, pep257
caveman: none # Token optimization: none, lite, full, ultra
compliance: # Compliance frameworks (affects review, plan, verify, check)
- owasp # Options: owasp, pci-dss, hipaa, soc2, gdpr, iso27001
- gdpr
features_dir: features # Gherkin feature files
decisions_dir: .grimoire/decisions # MADR decision records
# Separate thinking (planning, review) and coding (implementation) agents
llm:
thinking:
command: claude
model: opus
coding:
command: claude
model: sonnet
# Tool configuration — each key matches a check step name
tools:
lint:
name: eslint
command: npx eslint .
format:
name: prettier
check_command: npx prettier --check .
unit_test:
name: vitest
command: npx vitest run
bdd_test:
name: cucumber-js
command: npx cucumber-js
security:
name: llm
prompt: "Review these changed files for security vulnerabilities"
# Check pipeline — ordered list of steps (must match keys in tools)
checks:
- lint
- format
- duplicates
- complexity
- unit_test
- bdd_test
- security
- dep_audit
- secrets
- best_practices
# Bug tracking and testing tools
bug_trackers:
- name: jira
mcp:
name: atlassian
url: https://mcp.atlassian.com/v1/sse
transport: sse
testing_tools:
- name: playwright
purpose: e2e
mcp:
name: playwright
command: npx
args: ["-y", "@playwright/mcp@latest"]Contributing
Development Setup
git clone https://github.com/kiwi-data/grimoire.git
cd grimoire
npm install
npm run build # Compile TypeScript
npm run dev # Watch mode
npm test # vitest
npm run lint # eslintProject Structure
grimoire/
├── src/
│ ├── cli/index.ts # CLI entry point
│ ├── commands/ # Command definitions (thin — delegate to core/)
│ ├── core/ # Business logic
│ └── utils/ # Config, path resolution, helpers
├── skills/ # Claude Code skill definitions (SKILL.md per skill)
├── templates/ # Files copied during grimoire init
├── AGENTS.md # Universal LLM instructions (installed into projects)
└── bin/grimoire.js # CLI entry scriptAdding a New Skill
- Create
skills/grimoire-<name>/SKILL.mdwith trigger, prerequisites, workflow, and important notes - Add
"grimoire-<name>"to theskillNamesarray in bothsrc/core/init.tsandsrc/core/update.ts - Build and test:
npm run build && node bin/grimoire.js update .
Skills are pure markdown — instructions for the AI, not executable code.
Adding a New CLI Command
- Create
src/commands/<name>.ts— thin wrapper that parses args and calls core - Create
src/core/<name>.ts— business logic - Register in
src/cli/index.ts
Adding a New Tool Detection
- Add a
detect<Tool>function insrc/core/detect.ts - Add it to the
checksarray indetectTools - Add the category to
CATEGORY_LABELSandCATEGORY_ORDERinsrc/core/init.ts
Philosophy
- Features are tests. A
.featurefile is both the requirement and the acceptance test. - Red-green is mandatory. A test must fail before it passes. If it doesn't fail, it's not a real test.
- Decisions are documented. Architecture choices that aren't written down get relitigated.
- Reproduce before you fix. Every bug gets a failing test before any code changes.
- Simple over clever. Less code, fewer abstractions, smallest surface area.
- Verify before using. Confirm imports, functions, and packages exist before writing code that depends on them.
- Removal is deliberate. Removing a feature gets the same rigor as adding one.
- The fix is upstream. You don't fix codebase entropy by reviewing harder — you fix it by requiring specs before code.
License
MIT
