@mknightzzz/stw

v0.1.0

Published

20 days ago

Spec-to-Workers — a CLI orchestrator that turns markdown specs into reviewed, merge-ready code changes using AI agents

0High
0Medium
0Low

mknightz9000

ai orchestrator code-generation agents cli spec-driven task-graph

stw

Spec-to-Workers (pronounced "stew") — a CLI orchestrator that turns markdown specs into reviewed, merge-ready code changes using AI agents.

You write a spec (or describe an idea), STW decomposes it into isolated tasks, routes each to an AI worker, validates the output, and produces a merge request with all changes.

How it works

spec.md ──> planner ──> task graph ──> workers ──> review ──> MR
                          T1 ──────> worker ──> validate ──┐
                          T2 ──────> worker ──> validate ──┤
                          T3 (dep: T1) ─> worker ─> validate ┘

Ingest — validates your spec and creates a run
Plan — an AI planner decomposes the spec into a task graph with scopes, dependencies, and acceptance checks
Run — workers execute tasks in dependency order, each in an isolated git worktree
Review — mechanical checks (tests, typecheck, scope) + optional LLM review
Retry — failed tasks get diagnostic context and retry with tier escalation
Merge — completed work is pushed and synced as a GitLab/GitHub MR

Quick start

1. Install

npm install -g stw

2. Prerequisites

STW needs an API provider for AI models and (optionally) an agentic backend for complex tasks.

API provider (required): An OpenAI-compatible API endpoint. OpenRouter works out of the box and gives access to multiple model families.

Agentic backend (optional, recommended): Either OpenCode or Claude Code for tasks that need iterative exploration (read-edit-test loops).

3. Initialize

cd your-repo
stw init

This creates .stw/config.yml with model routing, budget limits, and backend settings. Default preset is minimax (balanced cost/quality). Other presets: glm5 (budget), anthropic (premium), mixed (multi-provider).

Set your API key:

export OPENROUTER_API_KEY="sk-or-..."

4. Check readiness

stw doctor

Verifies: API keys are set, agentic backends are installed, config is valid.

5. Create a spec

Write a markdown spec describing what you want built:

# Spec: Add rate limiting to API endpoints

## Objective
Add configurable rate limiting to all public API endpoints.

## Requirements

### T1: Rate limiter middleware
Create a rate limiting middleware in src/middleware/rate-limit.ts.
- Accept requests-per-minute and burst-size as parameters
- Use a sliding window algorithm
- Return 429 with Retry-After header when exceeded

### T2: Apply middleware to routes
Wire the rate limiter into src/routes/api.ts for all public endpoints.
- Default: 60 requests/minute, burst of 10
- Make limits configurable via environment variables

## Constraints
- No external dependencies (use in-memory store)
- Must not break existing tests

Save it as specs/rate-limiting.md.

6. Run

# Ingest the spec and create a run
stw ingest specs/rate-limiting.md

# Auto-plan tasks (or provide your own task graph)
stw plan-auto <run-id>

# Execute all tasks
stw run <run-id>

# Check progress
stw progress <run-id>

# Push to GitLab/GitHub
stw merge <run-id>

Or use the shortcut that does ingest + plan + run in one command:

stw go "add rate limiting to API endpoints" --spec specs/rate-limiting.md

Or skip the spec entirely — just describe what you want:

stw go "add dark mode toggle to the settings page"

STW generates a structured spec from your rough idea, plans tasks, and executes them.

Configuration

STW uses .stw/config.yml in your repo root. Key sections:

# API providers — where to send model requests
providers:
  openrouter:
    api_key_env: OPENROUTER_API_KEY        # env var name (not the key itself)
    base_url: https://openrouter.ai/api/v1
    models:
      strong: anthropic/claude-sonnet-4     # for planning, review, complex tasks
      medium: openai/gpt-4.1-mini          # for moderate tasks
      cheap: google/gemini-2.5-flash       # for mechanical/simple tasks

# How tasks are routed to providers
routing_policy:
  require_plan_approval: true              # require human approval before execution
  provider_preferences:
    strong: [openrouter]
    medium: [openrouter]
    cheap: [openrouter]

# Execution defaults
defaults:
  max_retries: 2                           # retries per task before escalation
  max_concurrent_workers: 2                # parallel task execution
  task_timeout_minutes: 30
  api_timeout_seconds: 120
  execution_mode: agentic                  # 'agentic' or 'conservative'

  # Agentic backend config (for iterative task execution)
  agentic:
    default_backend: opencode              # 'opencode' or 'claude-code'
    backends:
      opencode:
        command: opencode                  # path to opencode binary
      claude-code:
        command: claude                    # path to claude binary

# Optional: per-run budget limits
budget:
  max_run_usd: 5.00
  warn_at_usd: 3.00

Model tiers

STW routes tasks to models based on complexity:

| Tier | Use case | Example models | |------|----------|----------------| | cheap | Mechanical refactors, simple edits | gemini-2.5-flash, minimax-m2.5 | | medium | Moderate reasoning, test writing | gpt-4.1-mini, minimax-m2.7 | | strong | Planning, review, complex logic | claude-sonnet-4, kimi-k2.5 |

The planner assigns tiers automatically. If a task fails, STW escalates to the next tier and retries.

Architecture

src/
  cli.ts              # Command-line interface (commander.js)
  planner.ts          # AI planner — decomposes specs into task graphs
  planner-prompt.ts   # Planner prompt construction + import graph
  task-executor.ts    # Task execution pipeline (worker → review → retry)
  worker-runner.ts    # Worker dispatch (single-call API or agentic subprocess)
  task-review.ts      # Mechanical review + LLM review phase
  validation.ts       # Acceptance check execution (tests, typecheck)
  router.ts           # Model tier routing with escalation
  provider.ts         # OpenAI-compatible API client
  agentic-runtime.ts  # Agentic backend subprocess management
  parallel.ts         # Parallel task execution + merge-back
  scope.ts            # Scope enforcement (which files a task can touch)
  cost.ts             # Cost tracking and budget enforcement
  config.ts           # Config loading and validation
  git-utils.ts        # Git operations (worktrees, diffs, merges)

tests/               # 1100+ tests (vitest)
specs/                # Spec files (input)
.stw/                 # Runtime state (runs, tasks, artifacts)
  config.yml          # Project configuration
  runs/               # Per-run state and artifacts
    <run-id>/
      manifest.json   # Run metadata
      tasks/
        T1/           # Per-task artifacts
          status.json
          diff.patch
          worker_prompt.md
          response.json

CLI commands

| Command | Description | |---------|-------------| | stw init | Initialize repo with .stw/config.yml | | stw idea <description> | Generate a spec from a rough idea | | stw ingest <spec> | Validate spec and create a run | | stw plan-auto <run-id> | AI-generate task graph from spec | | stw run <run-id> | Execute all tasks in dependency order | | stw go <spec> | Ingest + plan + run in one command | | stw progress <run-id> | Show run progress and task status | | stw status <run-id> | Detailed run/task status | | stw retry <run-id> <task-id> | Retry a failed/escalated task | | stw resume <run-id> | Resume from last checkpoint | | stw stop <run-id> | Pause a running run | | stw continue <run-id> | Resume a paused run | | stw merge <run-id> | Push and create MR | | stw doctor | Check local readiness | | stw cleanup [run-id] | Clean up old runs/worktrees | | stw logs <run-id> | View structured logs |

How tasks execute

Each task runs in an isolated git worktree branched from the run branch:

Worktree created — fresh copy of the codebase
Dependencies installed — npm ci if package-lock.json exists
Worker executes — either a single API call (fast, cheap) or an agentic session (iterative, higher quality)
Scope enforced — worker output is checked against declared scope files
Acceptance checks run — tests, typecheck, linter (configured per task)
LLM review (optional) — a reviewer model checks the diff
On failure — diagnostic context is captured, task retries with the previous attempt's errors
On exhausted retries — tier escalates (cheap → medium → strong), or task is marked escalated for human review
On success — changes merge back to the run branch

Troubleshooting

"No models configured for requested tier" — Your .stw/config.yml is missing model entries for the tier the planner assigned. Add models to your provider's models section or adjust the preset.

"Planner received null content" — The planner model returned empty output. This happens with some reasoning models when max_tokens is too low. Try a different model or increase api_timeout_seconds.

Task keeps escalating — Check .stw/runs/<run-id>/tasks/<task-id>/diagnostic.json for the failure reason. Common causes: acceptance checks reference non-existent files, scope is too narrow (missing caller files), or the spec is ambiguous.

"Agentic backend failed" — The agentic backend (opencode/claude) crashed or timed out. Run stw doctor to verify it's installed. Check agentic_transcript.txt in the task directory for details.

Budget exceeded — Run stw progress <run-id> to see cost breakdown. Adjust budget.max_run_usd in config or use stw continue <run-id> to resume with remaining budget.

Development

git clone https://gitlab.com/mygelknightz/mamarracho.git stw
cd stw
npm install
npm test          # run all tests
npm run lint      # eslint + tsc
npm run build     # compile TypeScript

License

MIT