anvil-dev-framework
v0.1.9
Published
AI-powered development framework for Claude Code - multi-agent coordination, specifications, and quality gates
Downloads
362
Maintainers
Readme
___ _ ___ _____ _
/ \ | \ | \ \ / /_ _| |
/ /_\ \ | \| |\ \ / / | || | v0.1.9.0 (alpha)
/ _____ \| |\ | \ V / | || |___
/_/ \_\_| \_| \_/ |___|_____|
══════════════════════════════════════════════════════════
Where raw specs are forged into production code.
══════════════════════════════════════════════════════════Anvil Development Framework v0.1.9.0
A structured AI development system for solo builders who demand production-quality output.
Anvil is a comprehensive framework for AI-assisted software development that combines phase-gated workflows, persistent memory systems, and automated quality gates to transform how you build software with AI coding assistants.
📦 Latest Changes in v0.1.9.0
Released: 2026-01-17
- Ralph Visibility & Notification System (ANV-298) — Real-time monitoring for autonomous execution
- Live terminal watcher (
ralph-watch) with progress bars and event stream - macOS Notification Center and TTS announcements for milestones
- Slack/Discord webhook integrations for team visibility
- REST API with SSE for future GUI integration
- Toggle notifications:
--enable/--disable {all,macos,tts,slack,discord}
- Live terminal watcher (
- Token Efficiency Audit Framework — Complete token consumption tracking and optimization
/efficiencycommand for historical analysis with weekly/monthly reports/token-budgetcommand for session budget management with alerts
- CodeRabbit Deep Integration — Automated code review workflow
- Enhanced
.coderabbit.yamlwith pre-merge checks and custom Anvil validations
- Enhanced
See CHANGELOG.md for complete history.
Note: Version numbers were reset in January 2026 from
1.xto0.1.xto accurately reflect alpha status. See Versioning Strategy for details.
🔥 The Problem
AI coding assistants are powerful but chaotic:
❌ Context lost between sessions
❌ No memory of what was decided or why
❌ Agents run off in wrong directions
❌ Quality varies wildly
❌ No structured workflow
❌ Duplicate work, missed patterns
❌ "It works on my machine" PRs⚒️ The Anvil Solution
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ 📋 SPECS 🧠 MEMORY 🚦 GATES 🔄 FLOW │
│ ──────── ──────── ─────── ────── │
│ What to build What happened Quality checks How to work │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ EXPLORE → SPECIFY → PLAN → TASKS → IMPLEMENT → VERIFY → ✓ │ │
│ │ │ │ │ │ │ │ │ │
│ │ └─────────┴────────┴───────┴─────────┴──────────┘ │ │
│ │ Human gates at each phase │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ✅ Context preserved across sessions │
│ ✅ Structured specs prevent scope creep │
│ ✅ Phase gates catch problems early │
│ ✅ Evidence-based PR completion │
│ ✅ Memory that actually persists │
│ │
└─────────────────────────────────────────────────────────────────────────┘🎯 When to Use Anvil
Anvil is for you if:
- Solo developer or small team (1-3 people)
- Building production software with AI assistance
- Want structured workflows, not just chat
- Value quality gates and evidence-based PRs
- Need context to persist across sessions
Anvil is NOT for:
- Large teams with existing robust processes
- Quick prototypes / throwaway code
- Those who prefer unstructured AI interaction
Architecture Philosophy:
- Single generalist agent with on-demand skills (not multi-agent)
- Skills = Domain knowledge loaded when needed
- Sub-agents = Focused multi-step workflows
- Coordination = For parallel Claude terminals, not agent roles
📊 Architecture Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ ANVIL ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ SINGLE GENERALIST AGENT │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Skills │ │ Sub-Agents │ │ Context │ │ │
│ │ │ (on-demand) │ │ (read-only) │ │ (tiers) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┼───────────────────────────────────────┐ │
│ │ MEMORY SYSTEM │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Spec │ │ Task │ │ Session │ │ Handoff │ │Convention│ │ │
│ │ │ Memory │ │ Memory │ │ Memory │ │ Memory │ │ Memory │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┼───────────────────────────────────────┐ │
│ │ QUALITY GATES │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Pre-Work │ │ During │ │ Pre-PR │ │ Post-PR │ │ │
│ │ │ Gate │ │ Gate │ │ Gate │ │ Gate │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘🚀 Quick Start
Installation
Option 1: Bun (Recommended for Claude Code users)
bun install -g anvil-dev-framework
anvil initOption 2: npm
npm install -g anvil-dev-framework
anvil initOption 3: From Source
git clone https://github.com/AMPMIO/anvil-dev-framework.git
cd anvil-dev-framework
./scripts/install.sh # Auto-configures PATH
anvil initSee docs/INSTALLATION.md for complete installation guide.
Templates available:
anvil init # Generic project
anvil init --template saas # SaaS project
anvil init --template api-python # Python API project
anvil init --with-linear # With Linear integration
anvil init --dry-run # Preview changesSee docs/anvil-init.md for complete anvil init documentation.
Keeping Updated
After making changes to the framework, sync to your projects:
# Sync global config
./scripts/sync.sh --global
# Sync a specific project
./scripts/sync.sh --project /path/to/your/project
# Preview changes first
./scripts/sync.sh --project . --dry-run
# Or use the Claude Code command
/anvil-sync target:bothSee docs/sync.md for complete sync documentation.
First Skill
After installation, create your first skill: See docs/FIRST-SKILL-TUTORIAL.md
Daily Workflow
/orient → /sprint → /validate → [work] → /evidence → /handoffFor new features, add:
/explore → /spec → /plan → /tasks → [implement]See Session Workflow Guide for the complete step-by-step walkthrough.
📁 Framework Structure
anvil-dev-framework/
│
├── global/ # → Installs to ~/.claude/
│ ├── CLAUDE.md # Personal defaults & preferences
│ ├── standards/ # Universal coding standards
│ │ ├── typescript.md
│ │ ├── react.md
│ │ ├── testing.md
│ │ └── security.md
│ ├── templates/ # Reusable spec templates
│ │ ├── feature-spec.md
│ │ ├── bug-fix-spec.md
│ │ └── refactor-spec.md
│ ├── commands/ # Global slash commands
│ │ ├── orient.md
│ │ ├── ready.md
│ │ ├── validate.md
│ │ ├── evidence.md
│ │ ├── shard.md
│ │ └── decay-review.md
│ ├── skills/ # Claude Code skills
│ └── analytics/ # Metrics & reports
│
├── project/ # → Installs to .claude/
│ ├── CLAUDE.md.template # Project context (customize)
│ ├── constitution.md.template # Non-negotiable principles
│ ├── product.md.template # Mission & roadmap
│ ├── commands/ # Project-specific commands
│ │ ├── explore.md
│ │ ├── spec.md
│ │ ├── plan.md
│ │ ├── tasks.md
│ │ ├── discover.md
│ │ ├── change.md
│ │ └── handoff.md
│ ├── specs/ # Specifications
│ │ ├── current/
│ │ └── archive/
│ ├── changes/ # Brownfield change proposals
│ ├── handoffs/ # Session continuity docs
│ └── examples/ # Convention examples
│
├── quality-gates/ # → Installs to project root
│ ├── .coderabbit.yaml # AI code review config
│ ├── .semgrep/ # SAST rules
│ ├── .pre-commit-config.yaml # Pre-commit hooks
│ └── .github/workflows/ci.yaml # CI pipeline
│
├── docs/ # Documentation
│ ├── research/ # Research reports
│ │ └── v5-research-report.md
│ ├── patterns/ # Pattern explanations
│ ├── implementation-guide.md # Step-by-step setup
│ └── command-reference.md # All commands documented
│
├── scripts/ # Automation
│ ├── install.sh # Fresh install to ~/.claude/
│ ├── init-project.sh # Initialize in project
│ ├── sync.sh # Sync framework updates
│ └── rollback.sh # Rollback changes
│
└── examples/ # Reference implementations
└── baby-gift-garden/🧠 Core Concepts
1. Phase-Gated Workflow
Every feature flows through structured phases with human checkpoints:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ EXPLORE │───▶│ SPECIFY │───▶│ PLAN │───▶│ TASKS │
│ │ │ │ │ │ │ │
│ /explore │ │ /spec │ │ /plan │ │ /tasks │
└──────────┘ └────┬─────┘ └────┬─────┘ └──────────┘
│ │ │
▼ ▼ ▼
[APPROVE] [APPROVE] [IMPLEMENT]
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ VERIFY │
│ │ │ │
│ │ │/evidence │
│ │ └────┬─────┘
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ ARCHIVE │
│ │ │ │
│ │ │ Done! │
│ │ └──────────┘2. Five-Layer Memory System
┌─────────────────────────────────────────────────────────────────────┐
│ MEMORY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: SPECIFICATION MEMORY │
│ ───────────────────────────── │
│ Location: .claude/specs/ │
│ Purpose: What SHOULD be built │
│ Decay: Archive when feature complete │
│ │
│ Layer 2: TASK MEMORY │
│ ──────────────────── │
│ Location: Linear (via MCP or CLI) │
│ Purpose: What needs to be DONE │
│ Decay: Archive closed >30 days │
│ │
│ Layer 3: SESSION MEMORY │
│ ─────────────────────── │
│ Location: Claude-Mem observations │
│ Purpose: What HAPPENED in past sessions │
│ Decay: Auto-compress to ~500 tokens/observation │
│ │
│ Layer 4: HANDOFF MEMORY │
│ ────────────────────── │
│ Location: .claude/handoffs/ │
│ Purpose: Where we LEFT OFF │
│ Decay: Keep last 5-7, archive older │
│ │
│ Layer 5: CONVENTION MEMORY │
│ ───────────────────────── │
│ Location: CLAUDE.md + Skills │
│ Purpose: HOW to build │
│ Decay: Rarely (evolve instead) │
│ │
└─────────────────────────────────────────────────────────────────────┘3. Three-Tier Context Hierarchy
┌─────────────────────────────────────────────────────────────────────┐
│ CONTEXT HIERARCHY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ TIER 1: GLOBAL (~/.claude/) │
│ ─────────────────────────── │
│ • Personal preferences │
│ • Universal standards │
│ • Reusable templates │
│ Lifetime: Permanent │
│ │
│ TIER 2: PROJECT (.claude/) │
│ ────────────────────────── │
│ • Project-specific CLAUDE.md │
│ • Constitution (non-negotiables) │
│ • Product definition │
│ Lifetime: Project duration │
│ │
│ TIER 3: FEATURE (.claude/specs/) │
│ ───────────────────────────────── │
│ • Feature specifications │
│ • Implementation plans │
│ • Change proposals │
│ Lifetime: Feature duration → archive │
│ │
└─────────────────────────────────────────────────────────────────────┘4. Quality Gate System
┌─────────────────────────────────────────────────────────────────────┐
│ QUALITY GATES │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ PRE-WORK GATE (/validate) │
│ ───────────────────────── │
│ ✓ Git status clean │
│ ✓ On feature branch (not main) │
│ ✓ Dependencies installed (npm ci) │
│ ✓ Tests passing (baseline) │
│ ✓ Types passing (no errors) │
│ │
│ DURING-WORK GATE │
│ ──────────────── │
│ ✓ Read before write (cite files) │
│ ✓ Follow conventions (check examples) │
│ ✓ File discovered work immediately │
│ ✓ Update Linear status │
│ │
│ PRE-PR GATE (/evidence) │
│ ─────────────────────── │
│ ✓ Lint passes (full output) │
│ ✓ Types pass (full output) │
│ ✓ Tests pass (full output) │
│ ✓ Only expected files changed │
│ ✓ Evidence captured in PR │
│ │
│ POST-PR GATE │
│ ──────────── │
│ ✓ CodeRabbit review │
│ ✓ Semgrep security scan │
│ ✓ Human review │
│ ✓ CI pipeline passes │
│ │
└─────────────────────────────────────────────────────────────────────┘📋 Command Reference
Session Commands
| Command | Purpose | When to Use |
|---------|---------|-------------|
| /orient | Session startup orientation | Start of every session |
| /ready | Calculate ready work (no blockers) | Before selecting a task |
| /handoff | Generate session continuity doc | End of every session |
Workflow Commands
| Command | Purpose | When to Use |
|---------|---------|-------------|
| /explore | Discovery phase | Before any new feature |
| /spec | Generate specification | After exploration approved |
| /plan | Create implementation plan | After spec approved |
| /tasks | Break plan into Linear issues | After plan approved |
| /change | Create brownfield change proposal | Modifying existing features |
Quality Commands
| Command | Purpose | When to Use |
|---------|---------|-------------|
| /validate | Environment validation | Before any code changes |
| /evidence | Capture quality gate proof | Before creating PR |
| /discover | File discovered work | During implementation |
Maintenance Commands
| Command | Purpose | When to Use |
|---------|---------|-------------|
| /anvil-sync | Sync framework updates | After pulling framework changes |
| /shard | Break large specs into pieces | When specs exceed 2000 tokens |
| /decay-review | Archive old issues/handoffs | Weekly maintenance |
| /weekly-review | Generate analytics report | Weekly review |
Power Mode Commands
| Command | Purpose | When to Use |
|---------|---------|-------------|
| /ralph start | Initialize autonomous execution | Large refactoring, overnight runs |
| /ralph status | Check iteration progress | Monitor unattended execution |
| /ralph stop | Gracefully terminate loop | End autonomous session early |
Note: Ralph Wiggum mode is a specialized power tool for specific scenarios (large refactoring, framework migrations, TDD with clear specs). It is NOT part of the standard daily workflow. See When to Use Ralph below.
🤖 When to Use Ralph Wiggum Mode
Ralph Wiggum is a specialized power tool for autonomous, long-running AI execution — NOT a replacement for the standard workflow.
Standard Workflow (Daily Use)
/orient → /sprint → /validate → [work] → /evidence → /handoffThis remains your default approach for all normal development work.
Ralph Mode (Special Scenarios Only)
# Manual task description
/ralph start "Migrate all tests from Jest to Vitest" --max-iterations 50
# From Linear issue (recommended) - fetches subtasks automatically
/ralph start --issue ANV-209
# From Linear project - process all issues in a project
/ralph start --project "HUD Development"Linear Integration Flags:
| Flag | Description |
|------|-------------|
| --issue | Linear issue ID to fetch subtasks from (e.g., ANV-209) |
| --project | Linear project name to process all issues |
| --subtasks | Filter subtasks (e.g., ANV-1..ANV-5 or ANV-1,ANV-3) |
| --include-done | Include already-completed issues in project mode |
| --no-sync | Disable syncing status back to Linear |
| Good For | Not Good For | |----------|--------------| | ✅ Large-scale refactoring with clear completion criteria | ❌ Exploratory work (figuring things out) | | ✅ Framework migrations (Jest→Vitest, CJS→ESM) | ❌ Ambiguous requirements | | ✅ TDD with clear failing tests to pass | ❌ Security-sensitive code | | ✅ Greenfield projects with detailed specs | ❌ Architecture decisions needing judgment | | ✅ Test coverage expansion across many files | ❌ Quick fixes (overkill) | | ✅ Overnight/unattended execution (8+ hours) | ❌ Interactive debugging |
Cost Awareness
| Scenario | Estimated Cost | |----------|----------------| | 10 iterations, small codebase | $5-15 | | 50 iterations, medium codebase | $50-100+ | | 100+ iterations, large codebase | $200+ |
Recommendation: Start with --max-iterations 10 to understand costs before running overnight.
Monitoring Ralph Sessions
Watch Ralph progress in real-time with the visibility tools:
# Terminal 1: Run Ralph
/ralph start --issue ANV-209
# Terminal 2: Watch progress (full display)
python3 global/tools/ralph-watch
# Or compact single-line mode
python3 global/tools/ralph-watch --compactNotification Options:
# Toggle notifications
python3 global/lib/ralph_notifier.py --disable tts # Mute TTS
python3 global/lib/ralph_notifier.py --enable macos # Enable desktop
# Start API server for external monitoring
python3 global/api/ralph_api.py --port 8765Event Types:
session_started— Ralph begins worksubtask_complete— Linear subtask finishedsession_complete— All work doneerror_occurred— Something went wrongcircuit_breaker— No file changes detected (stuck)
See global/tools/README.md for full documentation.
Decision Flowchart
Is this task...
├── Quick fix or bug? → Standard workflow
├── Exploratory / unclear? → Standard workflow
├── Needs human decisions? → Standard workflow
├── Large with clear spec? → Consider Ralph
├── Migration / refactoring? → Consider Ralph
└── Overnight / unattended? → Ralph is ideal🔬 Research Foundation
Anvil is built on extensive research across 15+ systems with 200k+ combined GitHub stars:
| Source | Key Pattern Extracted | |--------|----------------------| | Factory Droid | 6 structural guardrails (58.8% Terminal-Bench vs 43.2% baseline) | | Beads (4.2k ⭐) | Task memory patterns, ready work calculation | | BMAD (25.2k ⭐) | Document sharding, phase gates | | SpecKit (55.6k ⭐) | Constitution pattern, structured specs | | OpenSpec (12k ⭐) | Brownfield change tracking, Gherkin scenarios | | Agent OS (2.7k ⭐) | Three-tier context hierarchy | | Claude-Mem | Session compression, semantic search | | Prompt Coach | Prompt analytics, time-lost calculations |
Key Research Findings
- Agent design matters more than model choice — Droid achieved 58.8% vs Claude Code's 43.2% on Terminal-Bench using the same model
- Single agent with skills beats multi-agent — Coordination overhead typically 40-60% of cycles
- Phase gates prevent disasters — Universal across all production systems studied
- Identity claims hurt performance — "Idiot" persona outperformed "Genius" by 2.2% on MMLU
See docs/research/v5-research-report.md for the complete analysis.
💰 Cost Structure
| Component | Cost | Purpose | |-----------|------|---------| | Claude Code Max | $200/mo | AI coding assistant | | CodeRabbit | $25/mo | AI code review | | Semgrep OSS | Free | Security scanning | | Trivy | Free | Vulnerability scanning | | Gitleaks | Free | Secrets detection | | Pre-commit | Free | Git hooks | | Total | $225/mo | |
📈 Success Metrics
Leading Indicators (Weekly)
| Metric | Target | |--------|--------| | Orientation time | < 2 minutes | | Ready work accuracy | > 95% | | Phase gate compliance | 100% | | Handoff coverage | 100% |
Lagging Indicators (Monthly)
| Metric | Target | |--------|--------| | First-pass PR approval | > 70% | | Clarification rate | < 15% | | Security findings per PR | < 2 high | | Discovery completion | > 80% within 2 weeks |
🗺️ Roadmap
Versioning Strategy
Anvil uses four-part versioning: MILESTONE.MAJOR.MINOR.PATCH
| Component | Meaning | |-----------|---------| | MILESTONE | 0 = Alpha (pre-1.0), 1 = Production-ready | | MAJOR | Significant feature sets or breaking changes | | MINOR | New features | | PATCH | Bug fixes |
Road to 1.0.0.0 (Production-Ready)
| Requirement | Status | Notes |
|-------------|--------|-------|
| HUD/TUI fully implemented | ✅ Complete | Multi-agent terminal dashboard with 6 panels |
| External user testing | ⏳ Not Started | Beta testers outside core team |
| One-command installation | ✅ Complete | anvil init with templates |
| All core commands documented | ✅ Complete | 13 commands with examples |
| No known critical bugs | 🔄 Ongoing | Continuous improvement |
| Cross-platform testing | ⏳ Not Started | macOS, Linux, WSL |
| Monetization/licensing defined | ⏳ Not Started | Pricing and distribution model |
In Progress (v0.1.5.0)
- [x] HUD v2 Multi-Agent Command Center (ANV-78)
- [x] HUD Kanban Panel (ANV-76)
- [x] Provider Pattern for issue tracking (Linear + Local)
- [x] Local JSON issue tracker for non-Linear users
- [x] Cost Tracker, Context Health, Task Status panels
- [x] Quality Gates, Coordination panels
- [x] GitHub/CI and CodeRabbit integration
- [x] HUD Configuration system
- [x] Documentation for Local Issue Tracking System (ANV-109)
- [ ] Unified
/orient,/sprint,/readycommands
Completed Milestones
v0.1.4.0 (Current)
- [x] Statusline configuration (full/minimal/off variants)
- [x]
/releasecommand for version coordination - [x] Enhanced
/evidenceand/handoffcommands - [x] Code review config integration
v0.1.3.0
- [x] Framework Healthcheck System (ANV-17)
v0.1.2.0
- [x] Use-case based templates (
saas,api-python,generic) - [x] Next.js + Supabase + Vercel template
- [x] FastAPI + PostgreSQL + pytest template
v0.1.1.0
- [x] CLI tool (
anvil initcommand) - [x] Project initialization with templates
- [x] Granular hook control (
--no-tts,--no-memory) - [x] Linear sub-issue support
v0.1.0.0
- [x] Core framework structure
- [x] Phase-gated workflow commands
- [x] Memory system architecture
- [x] Quality gate configurations
- [x] Documentation
Future (Post-1.0)
- [ ] Homebrew CLI distribution (macOS/Linux)
- [ ] Additional templates (mobile, Rails, Go)
- [ ] VS Code extension
- [ ] Dashboard for metrics
- [ ] Team collaboration features
📚 Documentation
| Document | Description | |----------|-------------| | System Architecture | OVERVIEW — How Linear + CodeRabbit + Claude Code + Memory integrate | | Session Workflow | START HERE — Daily coding workflow | | Local Issue Tracking | File-based issues without Linear | | Sync Guide | Keep projects updated with framework changes | | Installation Guide | Initial framework setup | | Implementation Guide | How to set up Anvil | | Command Reference | All commands detailed | | Planning Responsibilities | Who decides what | | Simplification Principles | Framework simplicity guidance | | Simplification Plan Template | Audit and simplify checklist | | Research Report | Full research analysis | | Pattern Catalog | Pattern explanations |
⚖️ License
Proprietary — All rights reserved.
This framework is not open source. Contact for licensing inquiries.
🤝 Contact
For licensing, questions, or collaboration:
- Author: Alex Cahiz
- Project: Anvil Development Framework
══════════════════════════════════════════════════════════
Built with 🔥 for developers who refuse to compromise.
══════════════════════════════════════════════════════════