@redgreen-labs/forge-cli
v0.6.0
Published
Autonomous multi-agent development orchestrator with TDD, security, and craftsmanship
Downloads
1,172
Maintainers
Readme
Forge CLI
Autonomous multi-agent development orchestrator that drives Claude Code through TDD loops with security scanning and conventional commits.
You describe what to build. Forge builds it — test-first, secure, and documented.
Features
- TDD Enforcement — Red-Green-Refactor cycle tracked and enforced. Tests are written before code, every time.
- Multi-Agent Teams — 6 specialized roles (Architect, Implementer, Tester, Reviewer, Security, Documenter) with automatic task-to-agent matching
- Quality Gates with Auto-Fix — 5-gate pipeline (tests, coverage, security, lint, commit) with blocking/warning severity. When gates fail, Claude automatically attempts to fix the issues before giving up.
- Security Scanning — Secret detection, SAST, dependency audit on every iteration
- Conventional Commits — Automatic commit per TDD phase (
test:,feat:,refactor:) - Live TUI Dashboard — Real-time Claude output, cost tracking, TDD pipeline visualization, quality gate status. Press
dfor detailed dashboard overlay,qfor graceful quit. - Cost Tracking — Real-time cost per task, per execution, and per phase with averages
- Smart Task Ordering — Tasks sorted by priority (critical → low) and dependency graph depth (foundational tasks first)
- Auto Task Decomposition — Large complex tasks are automatically split into TDD-friendly subtasks
- Session Continuity — Unified TDD prompt preserves Claude's context across Red/Green/Refactor phases. Resume loops across restarts with persistent state.
- Spec-Kit Integration — Use GitHub's spec-kit for planning, Forge for execution
- Human-in-the-Loop — When a task fails repeatedly, Forge pauses and prompts you: retry with guidance, defer to later, skip permanently, or abort. Your guidance is injected into the next attempt.
- Deferred Tasks — Skip a task for now and come back to it later. Deferred tasks run after all other pending work completes.
- Circuit Breaker — Stagnation detection with auto-recovery (Nygard pattern)
- Stream Output — Real-time structured JSON streaming from Claude CLI for live TUI feedback
Quick Start
# Install
npm install -g @redgreen-labs/forge-cli
# Run inside your existing project directory
cd my-project
forge init
# Import requirements
forge import requirements.md
# Start the development loop
forge run --iterations 20
# Check progress
forge statusTask Sources
Forge supports three task formats, auto-detected in priority order:
1. Spec-Kit (recommended for new projects)
Use spec-kit for the planning phase, then Forge for execution:
# Generate specs with spec-kit
npx spec-kit specify
npx spec-kit plan
npx spec-kit tasks
# Forge auto-detects specs/tasks.md
forge runForge reads from specs/:
| File | Purpose |
|------|---------|
| specs/tasks.md | Task list with T-IDs, phases, dependencies |
| specs/constitution.md | Project principles — injected into agent prompts |
| specs/spec.md | Detailed requirements — injected into agent prompts |
| specs/plan.md | Architecture decisions — injected into agent prompts |
Spec-kit task format:
## Phase 1: Setup (Shared Infrastructure)
- [ ] T001 [P] Initialize project structure
- [ ] T002 Configure CI pipeline (depends on T001)
## Phase 2: User Story 1 - Authentication (Priority: P1)
- [ ] T003 [US1] Implement login endpoint
- Returns JWT token on success
- Returns 401 on invalid credentialsMarkers: [P] = parallelizable, [US1] = user story ref, (depends on T001) = dependency.
2. Forge PRD (JSON)
forge import requirements.md # Parses to .forge/prd.json
forge run3. Markdown Task List
# Place a tasks.md in .forge/
forge runCommands
| Command | Description |
|---------|-------------|
| forge init | Initialize project, auto-detect workspaces and language |
| forge import <file> | Import PRD, scan for existing implementations, auto-decompose |
| forge run | Start the autonomous development loop |
| forge status | Show session progress and quality metrics |
| forge report | Generate a project health report |
| forge decompose | Decompose large tasks into smaller TDD-friendly subtasks |
| forge agents | List available agent roles and their tools |
forge init Options
-n, --name <name> Project name
-i, --interactive Guided PRD creation
-f, --force Overwrite existing .forge directory
--no-scan Skip workspace auto-detection
-v, --verbose Show detailed scan outputforge import Options
--no-scan Skip codebase scan for existing implementations
--no-decompose Skip automatic decomposition of large tasks
-v, --verbose Show detailed scan outputforge run Options
-n, --iterations <n> Maximum iterations (default: 50)
--resume Resume from previous run, skipping completed tasks
--no-tui Disable live TUI (plain text output)
-v, --verbose Show detailed executor output
--solo Single agent mode (no team rotation)
--dry-run Simulate execution without running Claudeforge status Options
--json Output as JSON
-w, --watch Refresh status every few seconds
--interval <seconds> Watch interval in seconds (default: 3)forge report Options
-f, --format <type> Output format: terminal, html, or json (default: terminal)forge decompose Options
--threshold <n> Complexity threshold 1-10 (tasks above this are decomposed)
--max-subtasks <n> Max subtasks per parent task
--dry-run Show which tasks would be decomposed without calling Claude
-v, --verbose Show detailed outputHow It Works
Each iteration follows this pipeline:
Select Task → TDD Red (write failing test)
→ TDD Green (implement to pass)
→ TDD Refactor (clean up)
→ Security Scan
→ Quality Gates ──→ Pass → Commit → Next Task
└→ Fail → Auto-Fix → Re-run Gates (up to 3 retries)Tasks are selected by priority (critical first) and dependency graph depth (foundational tasks that unblock others run first). The loop stops when all tasks are complete, max iterations reached, or the circuit breaker trips (repeated failures).
TUI Dashboard
The live dashboard shows real-time progress:
╭──────────────── FORGE Development Loop ─────────────────╮
│ │
├──────────────────────────────────────────────────────────┤
│ Phase: IMPLEMENTING Tasks: 3/10 Iter: 5 │
│ Elapsed: 04:32 Cost: $0.45 Commits: 8 Files: 3 │
│ │
│ [████████████░░] 85% │
│ Task: Implement user authentication │
│ │
│ ✓Red → ✓Green → ●Refactor → ○Gates (2 cycles) │
│ Gates: ✓tests ✓security ✓lint ✗coverage │
├──────────────────────────────────────────────────────────┤
│ Claude Output │
│ ⚡ Writing src/auth/login.ts... │
│ Running npm test -- --reporter verbose │
│ Tests 42 passed (42) │
├──────────────────────────────────────────────────────────┤
│ [d] Dashboard [q] Quit ✓tests ✓sec ✓lint ✗cov │
╰──────────────────────────────────────────────────────────╯Press d to toggle the dashboard overlay with cost breakdown, coverage, security findings, and code quality metrics.
Human-in-the-Loop
When a task fails maxTaskFailures times (default 3), Forge pauses and shows an interactive prompt:
╭─ Task Failed (3x) ──────────────────────────────────────╮
│ │
│ Automated UI tests (integration_test package) │
│ Green phase failed: Process timed out (exit code 143) │
│ │
│ What would you like to do? │
│ │
│ ▸ Retry with guidance — provide a hint to help │
│ Skip for now — defer to later │
│ Skip permanently — won't retry │
│ Abort session — stop forge │
│ │
│ Use ↑↓ arrows and Enter to select │
╰──────────────────────────────────────────────────────────╯- Retry with guidance: Type a hint (e.g., "Use widget tests instead of integration tests"). Your guidance is injected into the next attempt's prompt.
- Skip for now (defer): The task moves to the back of the queue. Other tasks run first, then it retries with a fresh failure count.
- Skip permanently: The task won't be retried (current default behavior).
- Abort session: Stops Forge immediately.
In non-interactive mode (--no-tui), tasks are auto-skipped after maxTaskFailures (no prompt shown).
Configuration
.forge/forge.config.json:
{
"maxIterations": 50,
"maxCallsPerHour": 100,
"timeoutMinutes": 15,
"tdd": {
"enabled": true,
"requireFailingTestFirst": true,
"commitPerPhase": true
},
"coverage": {
"lineThreshold": 80,
"branchThreshold": 70
},
"security": {
"enabled": true,
"sast": true,
"dependencyAudit": true,
"secretScanning": true,
"blockOnSeverity": "high"
},
"agents": {
"team": ["architect", "implementer", "tester", "reviewer"],
"soloMode": false
}
}Environment Variable Overrides
| Variable | Effect |
|----------|--------|
| FORGE_MAX_ITERATIONS | Override max loop iterations |
| FORGE_MAX_CALLS_PER_HOUR | Override API rate limit |
| FORGE_TDD_ENABLED | Enable/disable TDD enforcement |
| FORGE_SECURITY_ENABLED | Enable/disable security scanning |
Architecture
src/
├── agents/ # Multi-agent role system (roles, handoff, teams)
├── commands/ # CLI commands (init, import, run, status, report)
├── commits/ # Conventional commit classification and planning
├── config/ # Zod-validated config with env var overrides
├── docs/ # Changelog and ADR generation
├── gates/ # Quality gate pipeline with auto-fix retry
├── logging/ # Pino structured logger with file transport
├── loop/ # Core engine, orchestrator, executor, circuit breaker
├── metrics/ # Code complexity and test ratio analysis
├── prd/ # PRD parsing, task graph (DAG), auto-decomposition
├── security/ # Secret detection, SAST, dependency audit
├── speckit/ # Spec-kit format parser and context injection
├── tdd/ # Red-Green-Refactor enforcement
├── tui/ # Ink live dashboard with cost tracking and overlays
├── cli.ts # CLI entry point
└── index.ts # Public APIDevelopment
npm install
npm test # Run tests
npm run test:coverage # Coverage report
npm run typecheck # TypeScript strict mode
npm run build # Build with tsupRequirements
- Node.js >= 20
- Claude Code CLI installed and authenticated
License
MIT
