specsmith

v0.6.0

Published

a day ago

Claude Code skills, agents, and scripts that drive an end-to-end development workflow: grill the idea, write a PRD, plan it, decompose into tasks, then design + build with Playwright-MCP-driven evaluation.

Downloads

716

0High
0Medium
0Low

iandeherdt

claude-code skills agents prd planning playwright mcp

specsmith

A bundle of Claude Code skills, agents, and scripts that drive a feature from a vague idea to verified, browser-tested code.

You install it once into a project; from then on, building a feature is a sequence of slash commands. Each command produces a checked-in artifact (prd.md, plan.md, data-model.md, tasks.md) on its own git branch — so the work is reviewable, resumable, and version-controlled like any other code change.

The pipeline

       /grill-me                  /write-prd                /plan                   /grill-plan                /tasks                  /design                /build
   (interrogate idea)        (synthesise PRD,        (Technical Context,        (pressure-test the         (decompose into          (HTML/CSS              (developer ↔
                              create branch +          Constitution Check,        plan: rows, files,         phased markdown          prototypes via         evaluator loop;
                              numbered folder)         Project Structure,         hidden complexity,         checkbox tasks)          designer ↔             evaluator
                                                       Files-to-touch)            failure modes)                                      design-critique)       checks tasks off)
                                                       + data-model.md

Five planning skills produce inspectable artifacts. Two execution skills (/build, /design) hand work to subagents and iterate until verification passes.

The artifacts for each feature live at specs/NNN-<feature-slug>/ in the host project — first-class team documentation, not AI scratch.

Requirements

Node.js ≥ 18 (uses ESM and fs.cpSync)
Claude Code installed and on PATH (the installer uses the claude CLI to wire up Playwright MCP; if it's missing, the installer prints a manual snippet to add to .mcp.json)
A git repository (each /write-prd invocation creates a new branch)

Install

In the root of any project:

npx specsmith init

This:

Copies skills into .claude/skills/ (one per skill, with bundled templates/)
Copies subagent definitions into .claude/agents/
Copies trace and helper scripts into .claude/scripts/
Drops a starter .claude/constitution.md (skipped if you already have one)
Merges baseline .claude/settings.json (permissions, additional directories, trace hooks) and .claude/launch.json (debug configurations) — your existing entries are preserved
Appends a "Real Dev Loop" section to your CLAUDE.md (created if missing)
Runs claude mcp add --scope project playwright -- npx @playwright/mcp@latest --isolated so the evaluator and design-critique agents can drive a real browser
Creates pipeline/feedback/ and pipeline/traces/ (runtime scratch space) and seeds pipeline/procedures.md with a starter overlay-handling procedure
Writes a checksummed manifest to .claude/specsmith/manifest.json so subsequent update runs know which files have been edited locally

Useful flags:

--dry-run — preview without writing
--force — overwrite locally-modified files (including constitution.md)

About the permissions init grants

Heads up: init adds Bash(*) to your .claude/settings.json allowlist (alongside Read, Write, Edit, Glob, Grep, Agent, and mcp__playwright__*) so the build and design loops aren't interrupted by per-command prompts during dev-server start/stop, pkill, sed/awk runs, test commands, migrations, etc. If your project requires tighter permissions, hand-edit .claude/settings.json after install — replace Bash(*) with specific patterns like Bash(npm run *), Bash(npx playwright *), Bash(node .claude/scripts/*) and so on. The trade-off is you'll start seeing prompts mid-loop when something the agents need wasn't allowlisted.
Existing entries in settings.json are preserved — init only adds missing entries, it never removes yours.

Per-project conventions (machine-checked)

Prose rules in agent files ("use Tailwind, no inline styles", "extract SVGs to icon components") get ignored when the developer agent has the prototype's HTML right in front of it and the prototype uses inline styles. specsmith's answer is the same one used by the runtime guard: a programmatic gate the developer must pass before handoff, not another paragraph in the agent prompt.

Opt in with:

npx specsmith init --conventions

This drops a starter .claude/conventions.json with four sensible default rules:

| Rule | What it catches | | --- | --- | | no-inline-styles | style={...} in .tsx/.jsx files when tailwind.config.* exists | | svg-extract | inline <svg> blocks > 200 chars outside src/components/icons/** | | data-access-pattern | direct DB/ORM calls (db.select/.insert/.transaction/..., prisma.<model>.findUnique/.create/.update/...) anywhere outside src/lib/**/repository.ts or src/db/** — explicitly catches the "called the database from a component / route handler / job" anti-pattern | | i18n-strings | multi-word user-facing strings inlined in JSX or in a11y attributes (placeholder, aria-label, title, alt, label) | | component-size-cap | component files (in src/components/**, components/**, app/**) exceeding 300 lines — proxy for "this component mixes concerns; extract a hook, selector, or sub-component" |

Each rule is a JSON object with name, filesGlob, optional excludeGlob, optional forbiddenPattern (JS regex), optional maxLines (file size cap; one of forbiddenPattern or maxLines must be set, both is fine too), optional patternFlags, optional skipIfMissing (rule no-ops if the named glob has no matches), and message. Edit, add, or remove rules to fit this project's standards. The schema is documented inline in the file.

Three integration points enforce the rules:

agents/developer.md Step 4 gate 0: node .claude/scripts/check-conventions.mjs runs first in the quality-gate sequence, before typecheck/test/lint. Failure blocks handoff.
agents/evaluator.md Step 3 gate 0: same script runs as a sanity check after the developer claims done. Any violation surfaces as automatic [High] in the feedback (machine-checkable, no judgment).
/plan detects convention ambiguity at planning time. If the codebase has competing patterns (e.g. some entities use queries.ts, others repository.ts) and no rule covers it, /plan raises an OQ-### so you pick one and add a rule — rather than letting the next plan pick the other and the codebase drift further.

Bypass per-run with SPECSMITH_CONVENTIONS=0 in the environment. The script also no-ops gracefully when no conventions.json exists, so you can install specsmith without the flag and add conventions later by hand.

Closing the loop between /design and /build

The pipeline is …/plan → /tasks → /design → /build, and /design runs after /tasks. That order means the designer can introduce regions the plan didn't enumerate (e.g. a summary card the planner left out) and they'd silently disappear from /build's scope unless /tasks knows about them.

specsmith closes this loop in three places:

agents/designer.md writes designs/coverage.md listing every prototype's top-level regions with stable component names. This is the machine-readable bridge between the designer and /tasks.
/design ends with a hand-off message telling you to re-run /tasks before /build. The recommended pipeline becomes: /plan → /tasks → /design → /tasks → /build (the second /tasks runs in merge mode, appending tasks for newly-introduced regions and preserving any [x] checkmarks from the first run).
agents/developer.md Step 4 gate 0b requires the developer to enumerate every region from designs/coverage.md (or the matching designs/<slug>.html) and check each one off in the handoff before declaring the phase ready. Catches the missing-region failure mode at source — one targeted edit instead of a full evaluator → developer round-trip.
agents/evaluator.md Step 2b does a mechanical region diff between the prototype's accessibility-tree snapshot and the implementation's. Any top-level landmark in the prototype that's missing from the implementation is automatic [High] severity — no editorial judgment about "section vs detail". This is the safety net if you skipped step 2 and went straight to /build.
/build explicitly forbids deferring design-fidelity issues as "scope creep". Plan-vs-design tension is resolved by updating the plan/tasks (re-run /tasks), never by ignoring the design.

Together these turn "the designer added something the plan didn't list" from a silent miss into either a merged task (best case) or a phase-blocking [High] (worst case).

Phase-to-phase signal

Three rules in the evaluator make sure issues don't quietly carry forward across cycles:

Carryover list. Every feedback file ends with a ## Carryovers (must fix next cycle) checkbox section. /build copies it verbatim into the next cycle's developer prompt — no re-narration, no summarisation. The developer knows exactly which boxes have to be [x] before handoff.
Unmet FR-### hard-fails the phase. A numbered functional requirement is the spec author's named promise to the user. If the implementation doesn't satisfy an FR-### the phase block was supposed to cover, the phase fails regardless of overall score — fixes the failure mode where rubric averaging hides a real spec violation.
Auto-promote prior [Med] when scope grows. If a prior cycle flagged a cross-cutting concern (provider, layout, i18n setup, error boundary) as [Med] and the current phase added new consumers of it, the evaluator escalates it to [High] for this cycle — fixes the failure mode where a "small login-only" issue becomes load-bearing the moment 5 new client components depend on it.

Runtime guard against agent flailing

init installs scripts/guard-repeat-commands.mjs as a PreToolUse hook on every Bash call. It refuses two patterns mid-run rather than logging them after the fact (real /build traces showed agents falling into both, ignoring prose anti-patterns):

Re-running an expensive command to re-filter output. If npm run test/lint/typecheck/build, playwright test, prisma migrate, tsc, jest, vitest, next build, cargo, go test, mvn, or gradle already ran in this session and no Edit or Write tool call happened since, the second invocation is denied with a message pointing to tee /tmp/last-out.txt once, then grep the file. The "base" command is matched after stripping trailing | grep/tail/head/awk/sed/wc/jq/... pipes, so re-running with a different filter still counts as a repeat.
State-wipe loops. Three or more rm -rf of the same path (matching pglite, .next, node_modules, data/) within 30 minutes is denied. If the same failure persists after wiping, the bug isn't stale state.

Set SPECSMITH_GUARD=0 in the environment to bypass both rules.

Run the loops in an isolated environment

Bash(*) means a Claude Code session running in this project can execute any shell command the host user can run — including git push, reading .env* files, hitting your cloud provider CLIs, calling gh against your repos, and so on. For anything beyond a personal sandbox, run the loops in an environment with a smaller blast radius:

Dedicated SSH key for the agent: generate a separate key that has push rights only to this repo, and remove the rest of your keys from ssh-agent while the loop runs. Limits how far an unintended git push can reach.
Docker / devcontainer: run Claude Code inside a container with the project mounted read-write, your secrets not mounted, and outbound network restricted to the registries the project needs. The Playwright MCP container model handles the browser side; the agent process itself can be containerised the same way.
Service account on a server: stand up a CI-style Linux user (no sudo, scoped credentials, dedicated ssh key, dedicated cloud-provider service account) and run the loops there over SSH. The agent's worst-case behaviour is bounded by what that user can do.

Whichever pattern, the rule of thumb is: assume the agent will at some point execute a command you didn't expect, and make sure that command can't reach anything you can't afford to lose.

Edit the constitution

After install, edit .claude/constitution.md so it reflects this project's principles. The starter ships with seven broadly applicable principles (Test-First, Security-First, Code Quality & Complexity Control, Component Separation, Library-First, Migrations-Only, Design Fidelity); add, remove, or rewrite to fit.

/plan reads this file every time it generates a plan and renders one row per principle into a Constitution Check table — so vague principles produce vague checks. Be specific.

Building a feature

You: /grill-me
     The customer keeps saying our checkout flow is "confusing" but won't say what.

Claude: <asks ~10 questions interactively — users, pain points, success criteria, edge cases, non-goals>

You: /write-prd
     <slug: checkout-revamp>

Claude: Allocates 003-checkout-revamp/, switches branch, writes specs/003-checkout-revamp/prd.md.

You: /plan

Claude: Inspects package.json + repo structure, reads prd.md and .claude/constitution.md,
        writes plan.md (Technical Context, Constitution Check, Project Structure,
        Files to touch) and data-model.md.

You: /grill-plan

Claude: <interrogates the plan: which constitution row is handwavy? which files-to-touch
        entry is missing? which migration looks safe but isn't? — produces punch list>

You: <hand-edits plan.md based on punch list>

You: /tasks

Claude: Writes tasks.md — phased markdown checkboxes, each implementation task paired
        with a sibling check task the evaluator runs.

You: /design     (optional, only if the feature needs visual prototypes)

Claude: Designer subagent generates HTML/CSS prototypes under designs/.
        Critique subagent scores them in a real browser. Loops until they pass.

You: /build

Claude: For each phase with unchecked tasks:
          - Developer subagent implements the phase
          - Evaluator subagent verifies via Playwright MCP, ticks off completed tasks
          - Loops until evaluator passes the phase
        Stops when every task in tasks.md is [x].

Skills

| Skill | Purpose | Output | | --- | --- | --- | | /grill-me | Interrogate a fresh idea, customer feedback, bug report, or gripe | conversation only | | /write-prd | Synthesise the conversation into a Product Requirements Document | new branch + prd.md | | /plan | Translate the PRD into a technical plan and data model | plan.md + data-model.md | | /grill-plan | Pressure-test an existing plan adversarially | conversation + punch list | | /tasks | Decompose plan + data model into phased checkbox tasks | tasks.md | | /design | Run designer ↔ critique loop until prototypes pass | HTML/CSS files in designs/ | | /build | Run developer ↔ evaluator loop until every task is checked off | implemented code, ticked tasks.md |

Subagents

/build and /design are orchestrators — they delegate work to single-responsibility subagents that run in isolated context (the verifier never sees the implementer's reasoning):

| subagent_type | File | Role | | --- | --- | --- | | developing-features | agents/developer.md | Implements one phase; cares about clean code, tests, design fidelity | | evaluating-phases | agents/evaluator.md | Verifies each task in a real browser via Playwright MCP; flips checkboxes only after passing | | designing-interfaces | agents/designer.md | Generates distinctive HTML/CSS prototypes per user story | | critiquing-designs | agents/design-critique.md | Scores prototypes against a rubric in a real browser |

The subagent_type values (left column) are what you pass to the Agent tool. The file basenames (developer.md, etc.) are not the invocation names.

What gets installed where

In the host project after npx specsmith init:

.claude/
├── agents/                       # subagent definitions (4 files)
├── skills/                       # one folder per skill, some with templates/
│   ├── grill-me/SKILL.md
│   ├── write-prd/{SKILL.md, templates/prd.md}
│   ├── plan/{SKILL.md, templates/{plan.md,data-model.md}}
│   ├── grill-plan/SKILL.md
│   ├── tasks/{SKILL.md, templates/tasks.md}
│   ├── design/SKILL.md
│   └── build/SKILL.md
├── scripts/                      # trace hook, env-facts verifier, dev-server helper, etc.
├── constitution.md               # principles /plan checks against — EDIT THIS
├── settings.json                 # permissions + additional directories + trace hooks
├── launch.json                   # debug configurations
└── specsmith/manifest.json   # SHA-256 of every installed file (for `update`)

specs/                            # one folder per feature, created by /write-prd
└── NNN-<feature-slug>/
    ├── prd.md
    ├── plan.md
    ├── data-model.md
    └── tasks.md

pipeline/                         # runtime scratch — created at install, written during /build and /design
├── feedback/                     # evaluator/critic reports per cycle
├── traces/                       # JSONL traces of build/design runs
├── procedures.md                 # cached UI flows (login, cookie consent dismissal)
├── environment-facts.md          # cached project facts (test command, dev server URL)
├── run-state.md                  # current orchestrator run context
└── build-log.md                  # progress log

.mcp.json                         # Playwright MCP server entry

pipeline/procedures.md and pipeline/environment-facts.md are persistent caches — they survive across runs and are never overwritten. The agents discover things once (login flow, dev-server port, test command) and write them here so future cycles skip the discovery step.

Customising

Templates for the file-writing skills live at .claude/skills/<skill>/templates/<artifact>.md. Edit them to change the structure of the PRD, plan, data-model, or tasks files. Your edits are preserved across update runs.
Subagent prompts live at .claude/agents/*.md. Edit them to change how the developer or evaluator behaves. Edits are preserved across update.
Skill prompts live at .claude/skills/*/SKILL.md. Edit similarly.
Constitution: .claude/constitution.md. Never overwritten by update.

The manifest at .claude/specsmith/manifest.json records the SHA-256 of every file at install time. update compares package version vs on-disk vs manifest to decide whether each file is safe to update or has been customised — see "Updating" below.

Updating

npx specsmith update

For each tracked file:

already current (on-disk matches package) → skipped
unmodified since install (on-disk matches manifest) → updated to package version
user-modified (on-disk differs from both) → skipped, logged for manual reconciliation
missing on disk → installed fresh

The constitution and your pipeline/*.md files are never touched.

--dry-run previews the diff without applying. --force (via init --force) overwrites everything including the constitution — use with care.

Troubleshooting

claude: command not found during init — Claude Code isn't installed or isn't on PATH. The installer prints a .mcp.json snippet you can paste manually; everything else still installs.

Update reports every file as user-modified on Windows — line-ending normalisation. The package ships LF; if your repo has core.autocrlf=true you'll see CRLF on disk and the hashes diverge. The package ships a .gitattributes with * text=auto eol=lf to prevent this on its own tree, but for your host project run git config core.autocrlf input (or false) before installing.

/plan complains the constitution is missing — .claude/constitution.md was deleted or init was skipped. Run npx specsmith init again (it will only re-add what's missing).

Evaluator can't drive the browser — Playwright MCP wasn't wired. Check .mcp.json has the playwright entry; if not, run claude mcp add --scope project playwright -- npx @playwright/mcp@latest --isolated.

/grill-plan says it can't find the plan — you're not on a feature branch. The ^\d{3}- branch pattern is how the post-PRD skills locate the spec folder. Either git checkout the right branch, or tell the skill which spec folder to use when it asks.

Design decisions worth knowing

No transcript file from /grill-me. The conversation context is the handoff, so /write-prd must run in the same session as /grill-me.
Plan is detailed but not SpecKit-detailed. /plan produces only plan.md + data-model.md — no research.md, quickstart.md, or contracts/. The plan template has Technical Context, Constitution Check, Project Structure, and Files-to-touch sections.
Tasks are not agent-bound. Tasks have no (dev) / (design) tags. The orchestrator routes; the evaluator decides "done".
Numbering is stable. FR-###, NFR-###, SC-###, OQ-### identifiers in prd.md are append-only — never renumber when adding items.
Each feature gets its own branch. /write-prd runs git checkout -b NNN-<slug> off whatever branch you're on, inheriting any uncommitted changes. Multiple features in flight = multiple branches.

License

MIT. See LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

specsmith

The pipeline

Requirements

Install

About the permissions init grants

Per-project conventions (machine-checked)

Closing the loop between /design and /build

Phase-to-phase signal

Runtime guard against agent flailing

Run the loops in an isolated environment

Edit the constitution

Building a feature

Skills

Subagents

What gets installed where

Customising

Updating

Troubleshooting

Design decisions worth knowing

License