salazar-cli
v1.0.0
Published
Autonomous coding orchestrator. The tool that builds itself.
Downloads
12
Maintainers
Readme
Salazar
The tool that builds itself.
An autonomous coding orchestrator that builds software end-to-end from a markdown spec — no human code required. Planner/generator/evaluator agent loop using Claude via @anthropic-ai/claude-agent-sdk, with a terminal UI and contract-gated agent handoffs.
Named after the serpent — an ouroboros that eats its own tail. We pointed it at a spec for its own CLI and it built a 1,141-test terminal app in 4 hours. Then we pointed it at its own codebase in brownfield mode to add features to itself.
Proven output: mini-jwt — 38/38 features, 76 tests, 96% coverage, built in 70 minutes for $9.27.
The meta part: The CLI itself was built by Salazar. We wrote a spec for an Ink TUI, pointed Salazar at it, and walked away. 4 hours later: 63/63 features, 1,141 tests, fully functional CLI. The tool built its own interface.
Demo
Architecture
┌─────────────────────────────────────────────────────────┐
│ CLI / TUI (Ink) │
│ Onboarding, live progress, session history │
│ Direct engine integration — no subprocess IPC │
└──────────────────────┬──────────────────────────────────┘
│ imports
┌──────────────────────▼──────────────────────────────────┐
│ ENGINE (TypeScript) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌───────────────────┐ │
│ │ Planner │───▶│ Generator │───▶│ Hard Validators │ │
│ │ │ │ │ │ (tsc, eslint, │ │
│ │ Reads │ │ Builds 1 │ │ build, test) │ │
│ │ spec, │ │ feature │ │ │ │
│ │ creates │ │ per │ │ Must all pass │ │
│ │ feature │ │ session │ │ before proceeding │ │
│ │ list │ │ │ └─────────┬─────────┘ │
│ └──┬───────┘ └─────▲─────┘ │ │
│ │ │ ▼ │
│ │ Zod │ ┌───────────────┐ │
│ │ contract │ │ Evaluator │ │
│ │ gate └───────────│ (adversarial │ │
│ │ feedback │ reviewer) │ │
│ ▼ loop │ │ │
│ ┌──────────┐ if < 7.0 │ Zod contract │ │
│ │ Schema │ │ gate on │ │
│ │ validate │ │ output │ │
│ │ + retry │ └───────────────┘ │
│ └──────────┘ │
│ │
│ EventEmitter ──▶ TUI subscribes directly │
│ SQLite ──▶ session history persisted locally │
└─────────────────────────────────────────────────────────┘Quick Start
# Install
npm i -g salazar-cli
# Or run directly
npx salazar-cli
# Build from a spec
salazar run my-app-spec.md
# With model overrides
salazar run spec.md --model claude-sonnet-4-6 --model-evaluator claude-opus-4-6
# With custom output directory
salazar run spec.md --output-dir ./my-projectHow It Works
The Loop
Planner reads a product spec and decomposes it into features with BDD scenarios. Output validated against Zod schema — retries if the feature list doesn't match the contract.
Generator picks the next incomplete feature, gets a fresh Claude Code session, implements it with TDD, writes tests, and updates the feature list. One feature per session — clean context every time.
Hard Validators run automatically: TypeScript type checking, ESLint, build, test suite. The generator cannot skip these — if they fail, it gets the error output and retries (max 3 attempts).
Evaluator (moderate/complex features only) is a separate Claude Code session with an adversarial system prompt. Scores on spec compliance (35%), code quality (25%), security (25%), usability (15%). Minimum 7.0/10 to pass. Output validated against Zod schema — retries internally up to 3 times if the evaluation can't be parsed.
Setup and simple features skip the evaluator — validators are sufficient. This cuts ~50% of total runtime.
Contract-Gated Handoffs
Every agent-to-agent transition is validated by a Zod schema:
| Handoff | Contract | On Failure |
|---|---|---|
| Planner → Orchestrator | FeatureListSchema | Retry planner with schema error |
| Generator → Validators | Exit code + test output | Retry generator with failure output |
| Evaluator → Orchestrator | EvalOutputSchema | Retry evaluator session (up to 3x) |
Agents write what they want. Contracts enforce what we need. No prescriptive prompts — mechanical validation gates.
Complexity Routing
| Complexity | Validators | Evaluator | Typical Time |
|---|---|---|---|
| setup | All gates | Skipped | ~2-3 min |
| simple | All gates | Skipped | ~3-4 min |
| moderate | All gates | Full review | ~6-8 min |
| complex | All gates | Full review | ~8-12 min |
Model Tiers
salazar run spec.md \
--model claude-sonnet-4-6 \ # Fast, good at coding
--model-evaluator claude-opus-4-6 # Deep, good at critiqueProject Structure
salazar/
├── src/
│ ├── index.ts # CLI entry point (meow)
│ ├── engine/
│ │ ├── orchestrator.ts # Core loop: planner → generator → evaluator
│ │ ├── contracts.ts # Zod schemas for agent handoff validation
│ │ ├── agents/
│ │ │ ├── planner.ts # Spec → feature_list.json
│ │ │ ├── generator.ts # TDD feature implementation
│ │ │ └── evaluator.ts # Adversarial scoring rubric
│ │ ├── client.ts # Agent SDK options factory
│ │ ├── validators.ts # Hard gates: tsc, eslint, build, test
│ │ ├── progress.ts # feature_list.json tracking
│ │ ├── storage.ts # SQLite via better-sqlite3
│ │ └── security.ts # Bash command allowlist
│ ├── tui/
│ │ ├── app.tsx # Ink TUI
│ │ └── hooks/
│ │ └── use-engine.ts # Direct engine event subscription
│ └── lib/
│ ├── types.ts # All shared interfaces
│ ├── events.ts # Typed EventEmitter
│ ├── config.ts # ~/.salazar/config.json
│ └── paths.ts # Runtime directories
├── prompts/ # Agent system prompts
│ ├── planner.md
│ ├── generator.md
│ └── evaluator.md
├── package.json
└── tsconfig.jsonThe Meta Story
The harness built its own CLI. Here's what happened:
- We wrote a spec for an Ink terminal UI
- Pointed Salazar at it:
salazar run tui_spec.md - Walked away
- 4 hours later: 63/63 features, 1,141 tests, fully functional CLI
Build Stats
| | mini-jwt (proof) | CLI (meta) | Counter (smoke test) | |---|---|---|---| | Features | 38/38 | 63/63 | 15/15 | | Tests | 76 | 1,141 | 66 | | Coverage | 96% | — | — | | Time | 70 min | ~4 hours | 33 min | | Cost | $9.27 | ~$30 | $9.79 | | Human code | 0 lines | 0 lines | 0 lines |
CLI Commands
salazar # Launch TUI
salazar run <spec.md> # Build from spec (headless)
salazar run <spec.md> --output-dir ./out # Custom output directory
salazar config # Configure models
salazar --help # Full help textHow It's Built
Salazar is a single TypeScript npm package. The engine spawns Claude Code sessions programmatically via @anthropic-ai/claude-agent-sdk. Each agent (planner, generator, evaluator) runs in its own Claude Code session with a focused system prompt, sandboxed tools, and a cost cap.
No raw API calls. No API keys needed. Uses your Claude Code authentication.
References
- Effective Harnesses for Long-Running Agents — Anthropic Engineering
- @anthropic-ai/claude-agent-sdk — TypeScript SDK for programmatic Claude Code sessions
- Ink — React for CLIs
- mini-jwt — First proof-of-concept output
