anvil-ai
v0.7.0
Published
Lightweight AI Code Factory — Build anything from a single command. Pure TypeScript, zero setup.
Maintainers
Readme
Anvil
Lightweight AI Code Factory — Build anything from a single command. Zero setup.
Anvil orchestrates a team of AI agents to build entire projects from a natural-language spec. Spiritual successor to Forge — same structured agent roles, same review rigor, radically simplified.
npx anvil-ai run "Build a REST API for a todo app with Express and TypeScript"One command. You get a complete project with clean git history, passing tests, and full audit trail.
Quick Start
# Create a new project directory
mkdir my-project && cd my-project && git init
# Build it
npx anvil-ai run "Build a CLI calculator with add, subtract, multiply, divide"Requirements: Node.js 22+, Git, and a Claude Code / Gemini CLI / any AI CLI that provides auth (Anvil inherits authentication from the parent environment — no API key needed).
What Happens
You ──► "Build a todo API"
│
┌────▼─────┐
│ Planner │ Spec → JSON plan with tasks, dependencies, interface contracts
└────┬─────┘
│
Plan Critic ──► Deterministic + LLM validation (loop till clean)
│
Plan Review ──► Y / n / edit
│
┌────▼────┐
│ Wave 1 │ Independent tasks in parallel (git worktrees)
│ Workers │ Each reads context from earlier waves
└────┬────┘
│
Sub-Judges ──► tsc / vitest / touch-map / security / interface
│ (5 judges, all code — $0, no AI)
│
│ ✗ Failed? → Retry with error context (up to 2x)
│
┌────▼────┐
│ Wave 2 │ Dependent tasks execute next
└────┬────┘
│
Sub-Judges ──► (same 5 checks)
│
Final Integration ──► tsc + vitest on fully merged codebase
│
High Court ──► AI architectural review
│ merge ✓ / human_required ⚠ / abort ✗
│
Librarian ──► README.md + ARCHITECTURE.md
│
Done ──► Clean git history, full audit trailCommands
anvil run "spec" # Build from natural language
anvil run "spec" --skip-review # Skip interactive plan review
anvil run "spec" --stack python # Use Python stack preset
anvil run "spec" --spec todo.md # Read detailed spec from file
anvil run "spec" --sequential # Force sequential execution
anvil stacks # List available stack presets
anvil status # View last build state
anvil cost # Token/cost breakdown
anvil logs # View build logs
anvil logs --wave 2 # Logs for a specific waveStack Presets
anvil run "Build X" # Default: TypeScript
anvil run "Build X" --stack python # Python + FastAPI + pytest
anvil run "Build X" --stack go # Go + Chi + stdlib testing
anvil run "Build X" --stack react # React 19 + Vite + VitestAgent Roles
| Agent | Type | What it does |
|-------|------|-------------|
| Planner | AI (JSON) | Spec → plan with tasks, deps, interface contracts (exports[]) |
| Plan Critic | Code + AI | Validates plan structure, loops until clean |
| Worker | AI (tool use) | Executes one task in isolated git worktree. Reads context, self-verifies with tsc/vitest |
| Sub-Judges | Code only ($0) | tsc, vitest, touch-map, security (5 regex rules), interface contract enforcement |
| High Court | AI (JSON) | Architectural review. Abort → git reset --hard (nothing leaks) |
| Librarian | AI (markdown) | Generates README.md + ARCHITECTURE.md |
| Cost Auditor | Code only | Tracks tokens per call, calculates cost per wave |
What You Get
your-project/
├── src/ # Generated source code
├── tests/ # Generated tests (if requested)
├── README.md # Auto-generated by Librarian
├── ARCHITECTURE.md # Auto-generated by Librarian
├── package.json # Project config
└── .anvil/ # Audit trail
├── roadmap.json # The execution plan
├── cost-report.json # Token usage + cost breakdown
├── high-court-report.json
└── reports/ # Per-wave Sub-Judge reportsPlus a clean git history:
feat(anvil): Create project scaffold
feat(anvil): Implement calculator logic
feat(anvil): Add CLI entry point and tests
docs(anvil): auto-generated README and ARCHITECTUREv0.2.0 — What's New
First fully successful end-to-end build. Benchmark: CLI calculator, 3 waves, $0.26, all judges pass.
| Feature | Description |
|---------|-------------|
| Interface Contracts | Planner declares exact exports per task. InterfaceJudge enforces them. |
| Wave Retry Loop | Failed waves retry 2x with error context injected into worker prompts |
| Plan Critic | Deterministic structural validation + LLM review before execution |
| Final Integration Check | tsc + vitest on fully merged codebase before High Court |
| Security Judge | Catches eval(), hardcoded secrets, SQL injection, innerHTML, insecure HTTP |
| Worker Self-Verification | Workers run tsc + vitest before declaring complete |
| Context Injection | Workers read actual file contents from earlier waves (no more guessing imports) |
| Stack Presets | --stack typescript/python/go/react |
| Brownfield Support | Detects existing projects, injects file tree + export signatures |
| Worker Timeout | 5-minute AbortController per worker |
| Lockfile De-confliction | Parallel workers don't conflict on package-lock.json |
Cost
Typical builds cost $0.25–$30 depending on complexity. Workers are 93–98% of spend.
| Project Size | Tasks | Cost | |-------------|-------|------| | Simple (calculator, CLI tool) | 4-5 | $0.25–$3 | | Medium (REST API with tests) | 10-15 | $5–$13 | | Complex (full-stack app) | 20-30 | $15–$30 |
Development
git clone https://github.com/fepvenancio/anvil.git
cd anvil
npm install
npm test # 174 tests
npm run typecheck # strict mode, zero errors
npm run dev -- run "Build a hello world Express app"Tech Stack
| Dependency | Purpose |
|-----------|---------|
| @anthropic-ai/claude-agent-sdk | Claude Code Agent SDK (workers, planner, high court) |
| commander | CLI framework |
| simple-git | Git worktree management |
| zod | Schema validation (plans, reports, config) |
| p-limit | Parallel wave execution |
| chalk + ora | Terminal UI |
| pino | Structured JSON logging |
License
MIT
