@mknightzzz/stw
v0.1.0
Published
Spec-to-Workers — a CLI orchestrator that turns markdown specs into reviewed, merge-ready code changes using AI agents
Maintainers
Readme
stw
Spec-to-Workers (pronounced "stew") — a CLI orchestrator that turns markdown specs into reviewed, merge-ready code changes using AI agents.
You write a spec (or describe an idea), STW decomposes it into isolated tasks, routes each to an AI worker, validates the output, and produces a merge request with all changes.
How it works
spec.md ──> planner ──> task graph ──> workers ──> review ──> MR
T1 ──────> worker ──> validate ──┐
T2 ──────> worker ──> validate ──┤
T3 (dep: T1) ─> worker ─> validate ┘- Ingest — validates your spec and creates a run
- Plan — an AI planner decomposes the spec into a task graph with scopes, dependencies, and acceptance checks
- Run — workers execute tasks in dependency order, each in an isolated git worktree
- Review — mechanical checks (tests, typecheck, scope) + optional LLM review
- Retry — failed tasks get diagnostic context and retry with tier escalation
- Merge — completed work is pushed and synced as a GitLab/GitHub MR
Quick start
1. Install
npm install -g stw2. Prerequisites
STW needs an API provider for AI models and (optionally) an agentic backend for complex tasks.
API provider (required): An OpenAI-compatible API endpoint. OpenRouter works out of the box and gives access to multiple model families.
Agentic backend (optional, recommended): Either OpenCode or Claude Code for tasks that need iterative exploration (read-edit-test loops).
3. Initialize
cd your-repo
stw initThis creates .stw/config.yml with model routing, budget limits, and backend settings. Default preset is minimax (balanced cost/quality). Other presets: glm5 (budget), anthropic (premium), mixed (multi-provider).
Set your API key:
export OPENROUTER_API_KEY="sk-or-..."4. Check readiness
stw doctorVerifies: API keys are set, agentic backends are installed, config is valid.
5. Create a spec
Write a markdown spec describing what you want built:
# Spec: Add rate limiting to API endpoints
## Objective
Add configurable rate limiting to all public API endpoints.
## Requirements
### T1: Rate limiter middleware
Create a rate limiting middleware in src/middleware/rate-limit.ts.
- Accept requests-per-minute and burst-size as parameters
- Use a sliding window algorithm
- Return 429 with Retry-After header when exceeded
### T2: Apply middleware to routes
Wire the rate limiter into src/routes/api.ts for all public endpoints.
- Default: 60 requests/minute, burst of 10
- Make limits configurable via environment variables
## Constraints
- No external dependencies (use in-memory store)
- Must not break existing testsSave it as specs/rate-limiting.md.
6. Run
# Ingest the spec and create a run
stw ingest specs/rate-limiting.md
# Auto-plan tasks (or provide your own task graph)
stw plan-auto <run-id>
# Execute all tasks
stw run <run-id>
# Check progress
stw progress <run-id>
# Push to GitLab/GitHub
stw merge <run-id>Or use the shortcut that does ingest + plan + run in one command:
stw go "add rate limiting to API endpoints" --spec specs/rate-limiting.mdOr skip the spec entirely — just describe what you want:
stw go "add dark mode toggle to the settings page"STW generates a structured spec from your rough idea, plans tasks, and executes them.
Configuration
STW uses .stw/config.yml in your repo root. Key sections:
# API providers — where to send model requests
providers:
openrouter:
api_key_env: OPENROUTER_API_KEY # env var name (not the key itself)
base_url: https://openrouter.ai/api/v1
models:
strong: anthropic/claude-sonnet-4 # for planning, review, complex tasks
medium: openai/gpt-4.1-mini # for moderate tasks
cheap: google/gemini-2.5-flash # for mechanical/simple tasks
# How tasks are routed to providers
routing_policy:
require_plan_approval: true # require human approval before execution
provider_preferences:
strong: [openrouter]
medium: [openrouter]
cheap: [openrouter]
# Execution defaults
defaults:
max_retries: 2 # retries per task before escalation
max_concurrent_workers: 2 # parallel task execution
task_timeout_minutes: 30
api_timeout_seconds: 120
execution_mode: agentic # 'agentic' or 'conservative'
# Agentic backend config (for iterative task execution)
agentic:
default_backend: opencode # 'opencode' or 'claude-code'
backends:
opencode:
command: opencode # path to opencode binary
claude-code:
command: claude # path to claude binary
# Optional: per-run budget limits
budget:
max_run_usd: 5.00
warn_at_usd: 3.00Model tiers
STW routes tasks to models based on complexity:
| Tier | Use case | Example models | |------|----------|----------------| | cheap | Mechanical refactors, simple edits | gemini-2.5-flash, minimax-m2.5 | | medium | Moderate reasoning, test writing | gpt-4.1-mini, minimax-m2.7 | | strong | Planning, review, complex logic | claude-sonnet-4, kimi-k2.5 |
The planner assigns tiers automatically. If a task fails, STW escalates to the next tier and retries.
Architecture
src/
cli.ts # Command-line interface (commander.js)
planner.ts # AI planner — decomposes specs into task graphs
planner-prompt.ts # Planner prompt construction + import graph
task-executor.ts # Task execution pipeline (worker → review → retry)
worker-runner.ts # Worker dispatch (single-call API or agentic subprocess)
task-review.ts # Mechanical review + LLM review phase
validation.ts # Acceptance check execution (tests, typecheck)
router.ts # Model tier routing with escalation
provider.ts # OpenAI-compatible API client
agentic-runtime.ts # Agentic backend subprocess management
parallel.ts # Parallel task execution + merge-back
scope.ts # Scope enforcement (which files a task can touch)
cost.ts # Cost tracking and budget enforcement
config.ts # Config loading and validation
git-utils.ts # Git operations (worktrees, diffs, merges)
tests/ # 1100+ tests (vitest)
specs/ # Spec files (input)
.stw/ # Runtime state (runs, tasks, artifacts)
config.yml # Project configuration
runs/ # Per-run state and artifacts
<run-id>/
manifest.json # Run metadata
tasks/
T1/ # Per-task artifacts
status.json
diff.patch
worker_prompt.md
response.jsonCLI commands
| Command | Description |
|---------|-------------|
| stw init | Initialize repo with .stw/config.yml |
| stw idea <description> | Generate a spec from a rough idea |
| stw ingest <spec> | Validate spec and create a run |
| stw plan-auto <run-id> | AI-generate task graph from spec |
| stw run <run-id> | Execute all tasks in dependency order |
| stw go <spec> | Ingest + plan + run in one command |
| stw progress <run-id> | Show run progress and task status |
| stw status <run-id> | Detailed run/task status |
| stw retry <run-id> <task-id> | Retry a failed/escalated task |
| stw resume <run-id> | Resume from last checkpoint |
| stw stop <run-id> | Pause a running run |
| stw continue <run-id> | Resume a paused run |
| stw merge <run-id> | Push and create MR |
| stw doctor | Check local readiness |
| stw cleanup [run-id] | Clean up old runs/worktrees |
| stw logs <run-id> | View structured logs |
How tasks execute
Each task runs in an isolated git worktree branched from the run branch:
- Worktree created — fresh copy of the codebase
- Dependencies installed —
npm ciifpackage-lock.jsonexists - Worker executes — either a single API call (fast, cheap) or an agentic session (iterative, higher quality)
- Scope enforced — worker output is checked against declared scope files
- Acceptance checks run — tests, typecheck, linter (configured per task)
- LLM review (optional) — a reviewer model checks the diff
- On failure — diagnostic context is captured, task retries with the previous attempt's errors
- On exhausted retries — tier escalates (cheap → medium → strong), or task is marked escalated for human review
- On success — changes merge back to the run branch
Troubleshooting
"No models configured for requested tier" — Your .stw/config.yml is missing model entries for the tier the planner assigned. Add models to your provider's models section or adjust the preset.
"Planner received null content" — The planner model returned empty output. This happens with some reasoning models when max_tokens is too low. Try a different model or increase api_timeout_seconds.
Task keeps escalating — Check .stw/runs/<run-id>/tasks/<task-id>/diagnostic.json for the failure reason. Common causes: acceptance checks reference non-existent files, scope is too narrow (missing caller files), or the spec is ambiguous.
"Agentic backend failed" — The agentic backend (opencode/claude) crashed or timed out. Run stw doctor to verify it's installed. Check agentic_transcript.txt in the task directory for details.
Budget exceeded — Run stw progress <run-id> to see cost breakdown. Adjust budget.max_run_usd in config or use stw continue <run-id> to resume with remaining budget.
Development
git clone https://gitlab.com/mygelknightz/mamarracho.git stw
cd stw
npm install
npm test # run all tests
npm run lint # eslint + tsc
npm run build # compile TypeScriptLicense
MIT
