dev-playbooks
v4.0.6
Published
AI-driven spec-based development workflow
Maintainers
Readme
DevBooks
Transform AI coding from "says it's done" to "proves it's done."
DevBooks is an engineering protocol for AI that uses upstream Single Source of Truth (SSOT), executable gates, and evidence loops to upgrade AI programming from "conversational guessing" to "auditable engineering delivery."
Core Philosophy
The essence of software engineering is building reliable systems on unreliable components. Traditional engineering uses RAID against unreliable disks, TCP retransmission against unreliable networks, and Code Review against unreliable human programmers. AI engineering likewise needs gates and evidence loops to constrain unreliable LLM outputs.
DevBooks is not prompt optimization. It's engineering constraints.
Upstream SSOT: Single Authority Across the Development Lifecycle
At the core of DevBooks is the Single Source of Truth (SSOT)—all critical knowledge persisted and versioned, stable across conversations and changes.
Your requirement docs (if any)
↓ Extract constraints, build index
specs/ (terms, boundaries, decisions, scenarios) ← "Project memory" stable across changes
↓ Derive change packages
changes/<id>/ (proposal, design, tasks, evidence)
↓ Archive writeback
specs/ (update truth)Problems SSOT Solves:
| Problem | Root Cause | How SSOT Solves It | |---------|------------|-------------------| | Re-teach every time | Conversations are temporary, knowledge isn't persisted | Terms, boundaries, constraints written in files, not dependent on conversation memory | | Forgets what you said earlier | Context window is limited, early info gets pushed out | Truth artifacts persisted, critical constraints force-injected | | Don't know what changed | No auditable change record | Every change has complete record—proposal, design, tasks, evidence |
Ledger & Index: Continuous Completion Tracking
DevBooks continuously tracks delivery status through Completion Contracts and Requirements Index.
Completion Contract: Compile "what I want" into machine-checkable lists
obligations:
- id: O-001
describes: "User can login via email"
severity: must
checks:
- id: C-001
type: test
covers: [O-001]
artifacts: ["evidence/gates/login-test.log"]Not "roughly done," but "all 5 obligations have evidence."
Requirements Index: Turn upstream docs into traceable obligation lists
set_id: ARCH-P3
source_ref: "truth://specs/architecture/design.md"
requirements:
- id: R-001
severity: must
statement: "All APIs must support versioning"
- id: R-002
severity: should
statement: "Response time < 200ms"When a change package claims "upstream task completed," the system can judge that claim—not verbal confirmation, but machine verification.
Knife Slicing Protocol: Turn Large Requirements into Executable Queues
What happens when you hand large requirements directly to AI? Fixes A, breaks B. Fixes B, breaks C.
The Knife protocol uses complexity budgets and topological sorting to slice Epics into independently verifiable atomic change package queues.
Slicing Algorithm
Score = w₁·Files + w₂·Modules + w₃·RiskFlags + w₄·HotspotWeight| Signal | Weight | Description | |--------|--------|-------------| | files_touched | 1.0 | 1 point per file | | modules_touched | 5.0 | Cross-module risk is high | | risk_flags | 10.0 | 10 points per risk flag | | hotspot_weight | 2.0 | High churn areas weighted |
Over budget means slice again—no "forcing through."
Slicing Invariants
- MECE Coverage: Union of all slice acceptance criteria equals Epic's complete set, no overlap
- Independently Green: Each slice has at least one deterministic verification anchor, no "intermediate state won't compile"
- Topologically Sortable: Dependency graph must be acyclic, execution order must be topological
- Budget Circuit Breaker: Over budget must recursively re-slice, or flow back for more info
Parallel Execution Scheduling
When a Knife Plan contains multiple Slices, you can generate a parallel execution schedule:
knife-parallel-schedule.sh <epic-id> --format md --out parallel-schedule.mdOutput contents:
- Maximum Parallelism: Max number of Agents that can start simultaneously
- Layered Execution Schedule: Layer 0 (no deps) → Layer 1 → Layer N
- Critical Path: Serial dependency depth
- Launch Command Templates: Agent launch command for each Slice
Autoflow (Beta): Control Plane + tmux/worktree Script Generation
To reduce manual overhead ("open windows / copy-paste / resume"), use the CLI to generate a control plane and helper scripts (safe-by-default; does not execute AI):
# Generate schedule + dashboard + runbook (derived cache; safe to delete/rebuild)
dev-playbooks autoflow --epic <epic-id> --out .devbooks/_derived/autoflow/<epic-id>
# Explicitly enable Beta: additionally generate tmux/worktree helper scripts (still does not execute AI)
dev-playbooks autoflow --beta --epic <epic-id> --out .devbooks/_derived/autoflow/<epic-id>
# Run explicitly when needed (example: start tmux session)
bash .devbooks/_derived/autoflow/<epic-id>/tmux-start.shNotes:
- Autoflow only generates guidance and scripts. It does not auto-run external AI CLIs, bypass permissions, or auto-merge.
- Test Owner and Coder must still run in isolated sessions/instances. For parallel work, use worktrees (a helper script is generated in Beta mode).
7 Gates: Full-Chain Judgeable Checkpoints
| Gate | What It Checks | Failure Consequence | |------|----------------|---------------------| | G0 | Is input ready? Are baseline artifacts complete? | Flow back to Bootstrap | | G1 | Are all required files present? Is structure correct? | Block | | G2 | Are all tasks complete? Does green evidence exist? | Block | | G3 | Is slicing correct? Are anchors complete? (large requests) | Flow back to Knife | | G4 | Are docs in sync? Are extension packs complete? | Block | | G5 | Is risk covered? Is rollback strategy present? (high-risk) | Block | | G6 | Is evidence complete? Is contract satisfied? Ready to archive? | Block |
Any failure blocks the flow. Not a warning. A block.
Role Isolation: Prevent AI from Validating Itself
| Role | Responsibility | Hard Constraint | |------|---------------|-----------------| | Test Owner | Derive acceptance tests from design | Cannot see implementation code | | Coder | Implement features per tasks | Cannot modify tests/ | | Reviewer | Review readability and consistency | Cannot change tests or design |
Test Owner and Coder must execute in different contexts—not "different people," but "different conversations/instances." They can only exchange information through persisted artifacts.
Quick Start
npm install -g dev-playbooks
dev-playbooks initThen, in your AI tool chat, type (the single entry):
/devbooks:deliveryNote:
/devbooks:deliverymaps to the Router Skilldevbooks-start. It runs “demand alignment → SSOT staged advancement”, then routesrequest_kindand orchestrates the minimal sufficient closed loop.Optional: view entry guidance in the terminal (does not run AI):
dev-playbooks delivery
Your request
↓
Start (demand alignment → SSOT staging → route request_kind → generate RUNBOOK)
↓
┌─────────────────────────────────┐
│ Small change → Execute directly │
│ Large request → Slice first │
│ Uncertain → Research first │
└─────────────────────────────────┘
↓
Gate checks (7 checkpoints, any failure blocks)
↓
Evidence archive (test logs, build outputs, approvals)Directory Structure
project/
├── .devbooks/config.yaml # Config entry point
└── dev-playbooks/
├── constitution.md # Hard constraints (non-bypassable rules)
├── specs/ # Truth source (SSOT)
│ ├── ssot/ # Project SSOT pack (requirements layer)
│ │ ├── SSOT.md
│ │ ├── requirements.index.yaml
│ │ └── requirements.ledger.yaml # Derived cache (discardable/rebuildable)
│ ├── _meta/
│ │ ├── glossary.md # Unified language
│ │ ├── boundaries.md # Module boundaries
│ │ ├── capabilities.yaml # Capability registry
│ │ └── epics/ # Knife slice plans
│ └── ...
└── changes/ # Change packages
└── <change-id>/
├── proposal.md # Why and what
├── design.md # How, acceptance criteria
├── tasks.md # Executable steps
├── completion.contract.yaml # Completion contract
├── verification.md # How to prove it's correct
└── evidence/ # Test logs, build outputsUse Cases
- Brownfield Onboarding: Auto-index existing docs, extract judgeable constraints, establish minimal SSOT package
- Greenfield Projects: Guide completion of terms, boundaries, scenarios, decisions to establish baseline
- Daily Changes: Minimal sufficient loop with reproducible verification anchors + evidence archive
- Large Refactors: Knife slicing + migration patterns (Expand-Contract / Strangler Fig / Branch by Abstraction)
Next Steps
License
MIT
