speckit-pipeline
v1.8.0
Published
Build/design pipeline extension for spec-kit projects. Adds dev/eval and designer/critic loops with Claude Preview MCP browser verification.
Maintainers
Readme
speckit-pipeline
Build and design pipeline extension for spec-kit projects. Automates multi-agent workflows that implement and verify user stories through iterative loops — design/critique and dev/eval — with real browser verification via Playwright MCP.
What it does
speckit-pipeline installs specialized Claude Code agents and skills into your spec-kit project:
Design loop (/design) — A designer agent creates HTML/CSS prototypes for each view in your spec, then a design-critic agent evaluates them in a real browser, scoring against a rubric. The loop iterates until the designs pass.
Build loop (/build) — A developer agent implements sprint tasks from your spec-kit plan, then an evaluator agent verifies the implementation in a real browser against acceptance criteria. The loop iterates until each sprint passes.
Both loops produce structured feedback in pipeline/feedback/ and log progress to pipeline/build-log.md.
Agents
| Agent | Role | |-------|------| | designer | Creates production-grade HTML/CSS prototypes, reusing existing components and patterns for consistent UX across similar pages | | design-critique | Evaluates prototypes in-browser across breakpoints, scores on design quality, originality, craft, functionality, and cross-page consistency | | developer | Implements user stories with test-first development, enforcing code quality and design fidelity | | evaluator | Verifies implementations end-to-end in a real browser, writes actionable feedback with severity levels |
Prerequisites
- A spec-kit project (with
.specify/integration.json) - Claude Code
- Node.js >= 18
Installation
npx speckit-pipeline initThis will:
- Install agents to
.claude/agents/ - Install skills to
.claude/skills/ - Install helper scripts to
.claude/scripts/— trace hook, repeat-command guard, trace summariser, env-facts verifier, migration-drift checker, dev-server starter, run-artifact cleaner - Merge launch configs into
.claude/launch.json(dev server on port 3000, design server on port 4444) - Add required permissions and trace hooks to
.claude/settings.json(includingmcp__playwright__*) - Add Playwright MCP server at project scope (for browser-based verification)
- Append pipeline documentation to
CLAUDE.md - Install an opinionated constitution to
.specify/memory/constitution.md - Create
pipeline/feedback/for evaluator reports - Create
pipeline/traces/for JSONL run traces - Seed
pipeline/procedures.mdwith the overlay-dismissal meta-procedure (existing files are preserved)
Note on the constitution: The installer ships an opinionated constitution with 7 principles (Test-First, Security-First, Code Quality, Component Separation, Library-First, Database Migrations, Design Fidelity). If your project has the default blank spec-kit template (
[PROJECT_NAME] Constitution), it will be replaced automatically. If you've already written your own constitution, it will be preserved — use--forceto overwrite. Review.specify/memory/constitution.mdafter install and adjust the principles to fit your project.
Options:
--dry-run— Preview what would be installed without writing files--force— Overwrite existing files (full reinstall, including constitution)--update— Upgrade pipeline files to the current package version, but only for files you haven't locally modified. Skips user-customized files and never touches the constitution,pipeline/procedures.md,environment-facts.md, orrun-state.md. Use this to pull in agent-file improvements without losing local customizations. Mutually exclusive with--force.
Usage
1. Plan your project with spec-kit
Run the spec-kit commands in order:
/speckit-constitution → /speckit-specify → /speckit-plan → /speckit-tasks2. Design
/designStarts the designer/critic loop. Prototypes are written to designs/ and served on port 4444 for browser verification.
3. Build
/buildStarts the developer/evaluator loop. Implements sprints from specs/<latest-branch>/tasks.md in order, verifying each against acceptance criteria in a real browser on port 3000.
How it works
Each loop runs up to 5 cycles per sprint/design. On each cycle:
- A creator agent (designer or developer) does the work, reading any prior feedback
- A verifier agent (design-critic or evaluator) checks the result in a real browser using Playwright MCP
- The verifier writes scored feedback to
pipeline/feedback/ - If the work passes the quality gate, the loop advances. Otherwise it cycles back with the feedback.
Issues marked [High] always block — they must be resolved before a sprint can pass.
Per-run cleanup
pipeline/feedback/ and pipeline/traces/ are per-run scratch space. Both
/build and /design wipe them at the start of every invocation via
.claude/scripts/clean-run-artifacts.mjs. The persistent caches
(build-log.md, environment-facts.md, procedures.md, run-state.md)
are left untouched. If you want to keep a previous run's feedback or
trace, copy pipeline/feedback/ somewhere safe before re-running.
Authenticated apps
The evaluator handles apps that require login. On the first cycle:
- Dismiss any overlay blocking the form (cookie consent, GDPR notice, etc.) — guided by the
## Overlays blocking formsmeta-procedure pre-seeded inpipeline/procedures.md - Look for test credentials in
prisma/seed.ts,.env.local, orREADME.md - Fill and submit the login form
- Confirm authentication, then write a
## Loginprocedure topipeline/procedures.mdso future cycles skip discovery
No extra configuration needed — make sure your seed data includes at least one test user. If only an admin user is seeded, the evaluator uses admin-as-customer for verification and notes the caveat in feedback (rather than guessing customer passwords and hitting rate limits).
Per-project caches
Three files in pipeline/ accumulate project knowledge between cycles:
pipeline/environment-facts.md— Stable shell facts (typecheck command, dev server command, env file path, DB resolution). Written by the developer on cycle 1, read first on later cycles. Theverify-environment-facts.mjsscript (run automatically before evaluator hand-off) cross-checks recorded DB paths against Prisma's resolution rules, runsprisma migrate statusto catch drift from manual SQL or forged_prisma_migrationsrows, and flags orphan dev servers.pipeline/procedures.md— Multi-step UI flows (login, logout, navigation patterns) discovered by the evaluator and design-critic. Subagents grep by procedure name before doing browser work. Discovered flows survive upgrades (init --forcenever overwrites existing procedures).pipeline/run-state.md— Per-run state written by the orchestrator (active spec branch, sprint plan, in-scope task IDs). Subagents read it first to avoid re-resolving what the orchestrator already determined.
Tracing
Every tool call by every subagent is logged as JSONL to pipeline/traces/<run>.jsonl via Claude Code hooks. To digest a trace:
node .claude/scripts/trace-summarise.mjs pipeline/traces/<file>.jsonlOutput includes per-tool call frequency, suspected flailing (≥5 consecutive identical-fingerprint calls), per-subagent stop markers, and total token usage by run.
Runtime guard
guard-repeat-commands.mjs runs as a PreToolUse hook on every Bash call and refuses two specific anti-patterns mid-run rather than logging them after the fact:
- Re-running an expensive command to re-filter output — if
npm run test/lint/typecheck/build,playwright test,prisma migrate, etc. already ran in this session and noEditorWritetool call happened since, the second invocation is denied with a message pointing totee /tmp/<name>-out.txtonce, then grep the file. The "base" command is matched after stripping trailing| grep/tail/head/awk/sed/wc/jqpipes, so re-running with a different filter still counts as a repeat. - State-wipe loops — three or more
rm -rfof the same path (matchingpglite,.next,node_modules,data/) within 30 minutes is denied. If the same failure persists after wiping, the bug isn't stale state.
Set SPECKIT_GUARD=0 in the environment to bypass both rules for a session.
Constitution principles
The installed constitution enforces these principles during development and evaluation:
- Test-First — All functionality must have tests. Red-green-refactor.
- Security-First — Input validation, OWASP guidelines, no committed secrets.
- Code Quality — Max 500 lines per file, low cognitive complexity, linting passes.
- Component Separation — UI components in own files, single responsibility.
- Library-First — Use maintained open-source packages over custom implementations.
- Database Migrations — All schema changes via migrations. Never edit existing migrations.
- Design Fidelity — Follow designs to the pixel. Architecture specs are binding.
Project structure after install
.claude/
agents/
designer.md
design-critique.md
developer.md
evaluator.md
skills/
build/SKILL.md
design/SKILL.md
scripts/
trace-hook.mjs # Claude Code hook — writes JSONL events to pipeline/traces/
trace-path.mjs # Shared helper: locate the per-session trace file
guard-repeat-commands.mjs # PreToolUse hook — blocks redundant expensive commands & state-wipe loops
trace-summarise.mjs # CLI digest — tool frequency, flails, token totals
verify-environment-facts.mjs # Pre-handoff sanity check (orphan servers, DB paths, migration drift, file size)
check-migration-drift.mjs # Helper for verify-environment-facts: runs `prisma migrate status`
start-dev-server.mjs # Start dev server, parse bound URL from output, record to pipeline/
clean-run-artifacts.mjs # Wipe pipeline/feedback/ + pipeline/traces/ at start of /build and /design
launch.json
settings.json
pipeline/
feedback/ # Evaluator reports per sprint/cycle
traces/ # JSONL traces of every /build and /design run
build-log.md # Full progress log (human-readable)
environment-facts.md # Cached project facts (commands, paths) — written by developer
procedures.md # Cached UI flows (login, etc.) — written by evaluator/critic
run-state.md # Per-run state — written by orchestrator
designs/ # HTML/CSS prototypes (created by designer)License
MIT
