specwork

v0.2.7

Published

2 months ago

Spec-driven, test-first, graph-based workflow engine for Claude Code

0High
0Medium
0Low

pupubird

claude-code workflow dag spec-driven test-first ai-agents specwork

Specwork

Stop babysitting your AI agent.

A spec-driven workflow engine that keeps AI agents focused, verified, and honest — from first test to done flow.

You've been here before

You ask your AI agent to add authentication to your API. It starts strong — writes a few files, sets up a middleware. Then somewhere around step 4, it quietly modifies your database schema. By step 7, it's forgotten why it started. You scroll through 200 lines of changes and realize half of them are wrong.

You re-explain the goal. It apologizes. It drifts again.

The bigger the task, the worse this gets. Context fades. Tests get skipped "to save time." The agent leaves behind // TODO: implement this and marks the task complete. You end up doing more work managing the agent than you would have writing the code yourself.

This is the problem Specwork was built to solve.

The core idea: a workflow engine for AI agents

Specwork doesn't give the agent a plan and hope for the best. It runs a state machine — each unit of work is a node that transitions through a strict lifecycle. The agent never sees the full workflow. It receives one instruction at a time, embedded in the output of each CLI command.

stateDiagram-v2
    [*] --> pending
    pending --> in_progress : start
    pending --> skipped : upstream failed

    in_progress --> complete : wave QA passes
    in_progress --> failed : wave QA fails

    failed --> in_progress : retry (auto)
    failed --> escalated : retries exhausted

    escalated --> in_progress : manual retry

    complete --> [*]
    skipped --> [*]
    escalated --> [*]

Every transition produces a next_action — a concrete instruction telling the agent exactly what to do next. The agent doesn't plan. It doesn't improvise. It follows next_action.

How `next_action` drives everything

When the agent runs any specwork command, the JSON response includes a next_action field. This is the engine's steering wheel. The agent reads it, executes it, and the cycle repeats.

┌──────────────────────────────────────────────────────────────────┐
│                                                                  │
│   Agent runs command  ──►  Engine returns next_action            │
│          ▲                          │                            │
│          │                          ▼                            │
│          └────────  Agent executes next_action                   │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Here's what that looks like in practice. The agent runs specwork go:

{
  "status": "ready",
  "ready": ["write-tests", "impl-types"],
  "wave": 1,
  "progress": { "complete": 1, "total": 6, "failed": 0 },
  "next_action": {
    "command": "team:spawn",
    "description": "Spawn one teammate per ready node: write-tests, impl-types",
    "context": "Add JWT authentication to the API"
  }
}

The agent doesn't need memory of the overall plan. It reads command, sees "team:spawn", spawns the teammates, then waits for the wave to finish. When every teammate in the wave is done, it runs one QA pass for the wave:

{
  "verdict": "PASS",
  "next_action": {
    "command": "wave:await-qa",
    "description": "Wave ready for QA. Spawn specwork-qa once for: write-tests, impl-types.",
    "on_pass": "specwork node complete add-jwt-auth <wave-node>",
    "on_fail": "specwork node fail add-jwt-auth <affected-node>"
  }
}

And when QA fails, the lead agent fails only the affected node(s) and re-spawns those implementers with the findings:

{
  "next_action": {
    "command": "subagent:respawn",
    "description": "1 retry remaining. Re-spawn with failure feedback.",
    "context": "Add JWT authentication to the API"
  }
}

Notice: every response carries context — the original goal, pulled from your description. At every state transition, the agent is reminded why it's doing what it's doing. The goal never fades.

Wave-based execution

Specwork models your change as a DAG (directed acyclic graph). The engine walks it in waves — batches of nodes whose dependencies are all satisfied, capped by max_concurrent (default: 5).

graph TD
    S["write-tests<br/><small>opus · wave 1</small>"]:::done --> I1["impl-types<br/><small>sonnet · wave 2</small>"]:::active
    S --> I2["impl-service<br/><small>sonnet · wave 2</small>"]:::active
    I1 --> I3["impl-middleware<br/><small>sonnet · wave 3</small>"]:::pending
    I2 --> I3

    classDef done fill:#166534,stroke:#4ADE80,color:#BBF7D0
    classDef active fill:#1E40AF,stroke:#60A5FA,color:#BFDBFE
    classDef pending fill:#374151,stroke:#9CA3AF,color:#D1D5DB

Each wave is QA-gated before the next starts. This means:

No agent conflicts — agents in the same wave work on distinct files, and the next wave only starts after the completed wave passes QA
Natural review points — you can inspect the results after each wave before the engine continues
Bounded cost — no unbounded parallelism; max_concurrent controls how many agents run at once

If a node fails and exhausts its retries, the engine cascades skip — all downstream nodes are marked skipped, so the agent doesn't waste time on work that can't succeed.

The node lifecycle

Every node — whether it's writing tests, implementing code, or running a shell command — follows the same lifecycle:

sequenceDiagram
    participant E as Engine
    participant A as Agent
    participant Q as QA

    E->>A: next_action: start wave<br/>(one teammate per ready node)
    A->>A: Execute wave work
    A->>E: Wave done (or failed)
    E->>Q: next_action: QA review<br/>(one adversarial pass for the wave)
    Q->>E: PASS / FAIL with affected nodes

    alt PASS
        E->>E: Mark every wave node complete
        E->>A: next_action: run specwork go<br/>(find next wave)
    else FAIL (retries left)
        E->>A: next_action: respawn affected nodes<br/>(with QA feedback injected)
    else FAIL (exhausted)
        E->>E: Escalate to user<br/>(with actionable suggestions)
    end

Three critical rules:

The implementer never grades its own homework. After every wave, a separate QA agent runs tests, type-checks where appropriate, and adversarially reviews the wave. The CLI records state but never runs the checks itself.
Tests before implementation. The write-tests node always runs first. Tests must fail (red state) before any implementation begins.
Stack-agnostic verification. Agents detect the test runner from package.json — jest, vitest, mocha, pytest, whatever the project uses. No hardcoded commands, no framework assumptions.

Micro-spec context: how nodes share knowledge

When a subagent starts working on a node, it doesn't receive the full conversation history or a raw context dump. It gets a micro-spec — a curated, node-specific document assembled from six structured sections:

┌─────────────────────────────────────────────────┐
│  MICRO-SPEC for impl-service                    │
├─────────────────────────────────────────────────┤
│  1. Objective                                   │
│     "Implement the auth service"                │
│                                                 │
│  2. Spec Scenarios                              │
│     ### Requirement: Token Validation           │
│     #### Scenario: Expired token submitted      │
│     ...                                         │
│                                                 │
│  3. Parent Decisions (structured L1)            │
│     write-tests: 23 tests, 0 passing            │
│     impl-types: exported JwtPayload, AuthConfig │
│     Decision: discriminated union for tokens    │
│                                                 │
│  4. Scope                                       │
│     src/services/auth.ts                        │
│     src/middleware/jwt.ts                        │
│                                                 │
│  5. Completed Nodes (L0)                        │
│     write-tests: complete, 23 tests (all red)   │
│     impl-types: complete, 2 interfaces          │
│                                                 │
│  6. Prior Node L0 Timeline                      │
│     write-tests: complete, 23 tests (all red)   │
│     impl-types: complete, 2 interfaces          │
└─────────────────────────────────────────────────┘

Why this matters: A 10-node workflow could easily consume 50K+ tokens of context if you dump everything. With micro-specs, the same workflow uses ~2K tokens per node — and each agent knows exactly which spec scenarios it's responsible for, what its parents decided, and what sibling nodes own (so it doesn't step on their work).

The L0 / L1 / L2 tiers behind micro-specs

graph TB
    subgraph "Context tiers"
        L0["<b>L0 — All completed nodes</b><br/>~10 tokens each<br/><i>One-line status + key stat</i>"]
        L1["<b>L1 — Direct parent nodes only</b><br/>~100 tokens each<br/><i>Structured JSON: decisions, contracts, changed files</i>"]
        L2["<b>L2 — On demand (EXPAND)</b><br/>~1000+ tokens<br/><i>Full git diff + verification output</i>"]
    end

    L0 -->|always included| Bundle((Micro-Spec<br/>Bundle))
    L1 -->|parent deps only| Bundle
    L2 -.->|"agent outputs EXPAND(node-id)"| Bundle

    style L0 fill:#374151,stroke:#9CA3AF,color:#F9FAFB
    style L1 fill:#1E3A5F,stroke:#60A5FA,color:#BFDBFE
    style L2 fill:#3B1F6E,stroke:#A78BFA,color:#DDD6FE
    style Bundle fill:#92400E,stroke:#FBBF24,color:#FEF3C7

Every completed node gets an L0 headline, an L1 structured summary (decisions, contracts, changed files), and an L2 full artifact. The micro-spec composer pulls from these tiers to build exactly the right context for each node.

Plan visualization

Before running specwork go, review the full plan in your browser:

specwork viz add-jwt-auth

This generates an interactive HTML page at .specwork/changes/<change>/overview.html with:

A Mermaid DAG showing all nodes, dependencies, and types
The proposal (why this change exists)
Spec requirements mapped to each node
Node detail panels with wave, scope, agent, dependencies, and node responsibilities

Review the plan visually, then run specwork go when you're confident.

Quick start

Prerequisites: Claude Code with Agent Teams support + Node.js >= 18

# Install
npm install -g specwork

# Initialize (one-time, in your project root)
specwork init

# Plan a change
specwork plan "Add JWT authentication to the API"

# Review the plan visually
specwork viz add-jwt-authentication

# Run the workflow
specwork go add-jwt-authentication

# Check progress anytime
specwork status

Or use Claude Code slash commands:

/specwork-plan "Add JWT authentication"
/specwork-go add-jwt-authentication
/specwork-status

Workflow commands

| Command | Description | | --- | --- | | specwork init | Initialize project (creates .specwork/ + all Claude Code integration files) | | specwork plan "<description>" | Create a new change from plain English | | specwork go <change> | Run the workflow autonomously (wave-based execution) | | specwork status [change] | Show progress for all or a specific change | | specwork update | Update project files to the current specwork version (with backups) | | specwork archive <change> | Archive a completed change (promotes specs, generates summary) | | specwork viz <change> | Generate and open interactive HTML plan visualization | | specwork doctor [change] | Health-check project or change artifacts |

Node and graph commands (used by the engine)

| Command | Description | | --- | --- | | specwork node start <change> <node> | Start a specific node (injects micro-spec context) | | specwork node complete <change> <node> | Mark a node complete | | specwork node fail <change> <node> | Mark a node failed | | specwork graph generate <change> | Generate DAG from tasks | | specwork graph show <change> | Display the node graph | | specwork run <change> | Find ready nodes and output execution plan | | specwork retry <change/node> | Reset a failed/escalated node to pending | | specwork report <change> | Full markdown report with L0/L1 summaries and metrics | | specwork log <change> [node] | Show node L2 detail or all L0 headlines | | specwork context assemble <change> <node> | Inspect the assembled micro-spec for a node |

Utility commands

| Command | Description | | --- | --- | | specwork new <change> | Create a new change from templates (without planning agent) | | specwork config | Read and update Specwork configuration |

All commands support --json for machine-readable output with next_action guidance.

.specwork/
├── config.yaml              # Engine + spec configuration
├── manifest.yaml            # SHA256 checksums of managed files (for specwork update)
├── schema.yaml              # Artifact dependency graph
├── specs/                   # Source-of-truth behavior specs
├── changes/                 # In-flight changes (proposal + specs + design + tasks + overview.html)
│   └── archive/             # Completed changes (auto-archived)
├── graph/<change>/
│   ├── graph.yaml           # Node DAG (dependencies, scope, agent assignments)
│   └── state.yaml           # Runtime state (status, wave, retries per node)
├── nodes/<change>/          # Per-node artifacts (L0/L1/L2, L1-structured.json, output.txt)
├── backups/                 # Pre-update backups by version
├── templates/               # Starter templates for proposals, specs, design, tasks
└── examples/                # Example graphs for reference

.claude/
├── agents/                  # Subagent definitions
├── skills/                  # Engine logic (specwork-engine, specwork-context, specwork-conventions)
├── commands/                # Slash commands (specwork-plan, specwork-go, specwork-status)
└── hooks/                   # Lifecycle hooks (type-check, session-init, node-complete)

Subagents

| Agent | Model | Role | | --- | --- | --- | | specwork-planner | sonnet | Explores codebase, asks clarifying questions, generates proposal/specs/design/tasks | | specwork-test-writer | opus | Writes tests from specs — must all fail (RED state). No stubs allowed. | | specwork-implementer | sonnet | Makes tests pass with minimum code. No TODOs, no deferred work. | | specwork-qa | sonnet | Adversarial wave QA — tries to break the output. Checks edge cases, regressions, spec compliance. Read-only. |

Node types

deterministic — Runs a shell command. Captures stdout/stderr, validates exit code.
llm — Spawns a subagent with micro-spec context and a scoped prompt.
human — Pauses execution for manual approval.

State machine

Every node tracks: status, retries, verified, l0 (headline), start_sha, and wave execution metadata.

Terminal states: complete, skipped, rejected. Retryable: failed → in_progress. Escalatable: escalated → in_progress (manual via specwork retry).

.specwork/config.yaml:

models:
  default: sonnet
  test_writer: opus

execution:
  max_retries: 2        # Retry failed nodes up to N times
  expand_limit: 1       # Max EXPAND requests per node
  parallel_mode: parallel
  snapshot_refresh: after_each_node

context:
  ancestors: L0         # All completed nodes get L0
  parents: L1           # Direct deps get L1 (structured JSON)

spec:
  specs_dir: .specwork/specs
  changes_dir: .specwork/changes
  archive_dir: .specwork/changes/archive
  templates_dir: .specwork/templates

graph:
  graphs_dir: .specwork/graph
  nodes_dir: .specwork/nodes

environments:
  env_dir: .specwork/env
  active: development

Version migration

When you upgrade specwork, run specwork update to bring your project files forward:

specwork update           # Apply updates (with automatic backups)
specwork update --dry-run # Preview what would change

The update system uses a SHA256 manifest to detect which files you've customized vs. which are stock — it won't overwrite your modifications without telling you. Version-specific migrations run automatically in semver order.

Specs describe behavior, not implementation. No class names, no library choices — just what the system should do.

### Requirement: Token Validation

The system SHALL reject expired JWT tokens with a 401 status code.

#### Scenario: Expired token submitted

- **GIVEN** a JWT token with `exp` in the past
- **WHEN** the token is submitted to any authenticated endpoint
- **THEN** the system responds with HTTP 401 and error body `{"error": "token_expired"}`

Keywords: SHALL/MUST (absolute requirement), SHOULD (recommended).

Specs live in .specwork/specs/ (source of truth) and .specwork/changes/ (proposed deltas). When a change is archived, its specs are promoted to the source of truth.

Credits

Specwork's spec convention system is based on OpenSpec by Fission AI.

Contributing

See CONTRIBUTING.md for dev setup, PR process, and code style.

License

MIT — see LICENSE.