specsmith
v0.6.0
Published
Claude Code skills, agents, and scripts that drive an end-to-end development workflow: grill the idea, write a PRD, plan it, decompose into tasks, then design + build with Playwright-MCP-driven evaluation.
Downloads
716
Maintainers
Readme
specsmith
A bundle of Claude Code skills, agents, and scripts that drive a feature from a vague idea to verified, browser-tested code.
You install it once into a project; from then on, building a feature is a sequence of slash commands. Each command produces a checked-in artifact (prd.md, plan.md, data-model.md, tasks.md) on its own git branch — so the work is reviewable, resumable, and version-controlled like any other code change.
The pipeline
/grill-me /write-prd /plan /grill-plan /tasks /design /build
(interrogate idea) (synthesise PRD, (Technical Context, (pressure-test the (decompose into (HTML/CSS (developer ↔
create branch + Constitution Check, plan: rows, files, phased markdown prototypes via evaluator loop;
numbered folder) Project Structure, hidden complexity, checkbox tasks) designer ↔ evaluator
Files-to-touch) failure modes) design-critique) checks tasks off)
+ data-model.mdFive planning skills produce inspectable artifacts. Two execution skills (/build, /design) hand work to subagents and iterate until verification passes.
The artifacts for each feature live at specs/NNN-<feature-slug>/ in the host project — first-class team documentation, not AI scratch.
Requirements
- Node.js ≥ 18 (uses ESM and
fs.cpSync) - Claude Code installed and on
PATH(the installer uses theclaudeCLI to wire up Playwright MCP; if it's missing, the installer prints a manual snippet to add to.mcp.json) - A git repository (each
/write-prdinvocation creates a new branch)
Install
In the root of any project:
npx specsmith initThis:
- Copies skills into
.claude/skills/(one per skill, with bundledtemplates/) - Copies subagent definitions into
.claude/agents/ - Copies trace and helper scripts into
.claude/scripts/ - Drops a starter
.claude/constitution.md(skipped if you already have one) - Merges baseline
.claude/settings.json(permissions, additional directories, trace hooks) and.claude/launch.json(debug configurations) — your existing entries are preserved - Appends a "Real Dev Loop" section to your
CLAUDE.md(created if missing) - Runs
claude mcp add --scope project playwright -- npx @playwright/mcp@latest --isolatedso the evaluator and design-critique agents can drive a real browser - Creates
pipeline/feedback/andpipeline/traces/(runtime scratch space) and seedspipeline/procedures.mdwith a starter overlay-handling procedure - Writes a checksummed manifest to
.claude/specsmith/manifest.jsonso subsequentupdateruns know which files have been edited locally
Useful flags:
--dry-run— preview without writing--force— overwrite locally-modified files (includingconstitution.md)
About the permissions init grants
Heads up:
initaddsBash(*)to your.claude/settings.jsonallowlist (alongsideRead,Write,Edit,Glob,Grep,Agent, andmcp__playwright__*) so the build and design loops aren't interrupted by per-command prompts during dev-server start/stop,pkill,sed/awkruns, test commands, migrations, etc. If your project requires tighter permissions, hand-edit.claude/settings.jsonafter install — replaceBash(*)with specific patterns likeBash(npm run *),Bash(npx playwright *),Bash(node .claude/scripts/*)and so on. The trade-off is you'll start seeing prompts mid-loop when something the agents need wasn't allowlisted.Existing entries in
settings.jsonare preserved —initonly adds missing entries, it never removes yours.
Per-project conventions (machine-checked)
Prose rules in agent files ("use Tailwind, no inline styles", "extract SVGs to icon components") get ignored when the developer agent has the prototype's HTML right in front of it and the prototype uses inline styles. specsmith's answer is the same one used by the runtime guard: a programmatic gate the developer must pass before handoff, not another paragraph in the agent prompt.
Opt in with:
npx specsmith init --conventionsThis drops a starter .claude/conventions.json with four sensible default rules:
| Rule | What it catches |
| --- | --- |
| no-inline-styles | style={...} in .tsx/.jsx files when tailwind.config.* exists |
| svg-extract | inline <svg> blocks > 200 chars outside src/components/icons/** |
| data-access-pattern | direct DB/ORM calls (db.select/.insert/.transaction/..., prisma.<model>.findUnique/.create/.update/...) anywhere outside src/lib/**/repository.ts or src/db/** — explicitly catches the "called the database from a component / route handler / job" anti-pattern |
| i18n-strings | multi-word user-facing strings inlined in JSX or in a11y attributes (placeholder, aria-label, title, alt, label) |
| component-size-cap | component files (in src/components/**, components/**, app/**) exceeding 300 lines — proxy for "this component mixes concerns; extract a hook, selector, or sub-component" |
Each rule is a JSON object with name, filesGlob, optional excludeGlob, optional forbiddenPattern (JS regex), optional maxLines (file size cap; one of forbiddenPattern or maxLines must be set, both is fine too), optional patternFlags, optional skipIfMissing (rule no-ops if the named glob has no matches), and message. Edit, add, or remove rules to fit this project's standards. The schema is documented inline in the file.
Three integration points enforce the rules:
agents/developer.mdStep 4 gate 0:node .claude/scripts/check-conventions.mjsruns first in the quality-gate sequence, before typecheck/test/lint. Failure blocks handoff.agents/evaluator.mdStep 3 gate 0: same script runs as a sanity check after the developer claims done. Any violation surfaces as automatic[High]in the feedback (machine-checkable, no judgment)./plandetects convention ambiguity at planning time. If the codebase has competing patterns (e.g. some entities usequeries.ts, othersrepository.ts) and no rule covers it,/planraises anOQ-###so you pick one and add a rule — rather than letting the next plan pick the other and the codebase drift further.
Bypass per-run with SPECSMITH_CONVENTIONS=0 in the environment. The script also no-ops gracefully when no conventions.json exists, so you can install specsmith without the flag and add conventions later by hand.
Closing the loop between /design and /build
The pipeline is …/plan → /tasks → /design → /build, and /design runs after /tasks. That order means the designer can introduce regions the plan didn't enumerate (e.g. a summary card the planner left out) and they'd silently disappear from /build's scope unless /tasks knows about them.
specsmith closes this loop in three places:
agents/designer.mdwritesdesigns/coverage.mdlisting every prototype's top-level regions with stable component names. This is the machine-readable bridge between the designer and/tasks./designends with a hand-off message telling you to re-run/tasksbefore/build. The recommended pipeline becomes:/plan → /tasks → /design → /tasks → /build(the second/tasksruns in merge mode, appending tasks for newly-introduced regions and preserving any[x]checkmarks from the first run).agents/developer.mdStep 4 gate 0b requires the developer to enumerate every region fromdesigns/coverage.md(or the matchingdesigns/<slug>.html) and check each one off in the handoff before declaring the phase ready. Catches the missing-region failure mode at source — one targeted edit instead of a full evaluator → developer round-trip.agents/evaluator.mdStep 2b does a mechanical region diff between the prototype's accessibility-tree snapshot and the implementation's. Any top-level landmark in the prototype that's missing from the implementation is automatic[High]severity — no editorial judgment about "section vs detail". This is the safety net if you skipped step 2 and went straight to/build./buildexplicitly forbids deferring design-fidelity issues as "scope creep". Plan-vs-design tension is resolved by updating the plan/tasks (re-run/tasks), never by ignoring the design.
Together these turn "the designer added something the plan didn't list" from a silent miss into either a merged task (best case) or a phase-blocking [High] (worst case).
Phase-to-phase signal
Three rules in the evaluator make sure issues don't quietly carry forward across cycles:
- Carryover list. Every feedback file ends with a
## Carryovers (must fix next cycle)checkbox section./buildcopies it verbatim into the next cycle's developer prompt — no re-narration, no summarisation. The developer knows exactly which boxes have to be[x]before handoff. - Unmet
FR-###hard-fails the phase. A numbered functional requirement is the spec author's named promise to the user. If the implementation doesn't satisfy an FR-### the phase block was supposed to cover, the phase fails regardless of overall score — fixes the failure mode where rubric averaging hides a real spec violation. - Auto-promote prior
[Med]when scope grows. If a prior cycle flagged a cross-cutting concern (provider, layout, i18n setup, error boundary) as[Med]and the current phase added new consumers of it, the evaluator escalates it to[High]for this cycle — fixes the failure mode where a "small login-only" issue becomes load-bearing the moment 5 new client components depend on it.
Runtime guard against agent flailing
init installs scripts/guard-repeat-commands.mjs as a PreToolUse hook on every Bash call. It refuses two patterns mid-run rather than logging them after the fact (real /build traces showed agents falling into both, ignoring prose anti-patterns):
- Re-running an expensive command to re-filter output. If
npm run test/lint/typecheck/build,playwright test,prisma migrate,tsc,jest,vitest,next build,cargo,go test,mvn, orgradlealready ran in this session and noEditorWritetool call happened since, the second invocation is denied with a message pointing totee /tmp/last-out.txtonce, then grep the file. The "base" command is matched after stripping trailing| grep/tail/head/awk/sed/wc/jq/...pipes, so re-running with a different filter still counts as a repeat. - State-wipe loops. Three or more
rm -rfof the same path (matchingpglite,.next,node_modules,data/) within 30 minutes is denied. If the same failure persists after wiping, the bug isn't stale state.
Set SPECSMITH_GUARD=0 in the environment to bypass both rules.
Run the loops in an isolated environment
Bash(*) means a Claude Code session running in this project can execute any shell command the host user can run — including git push, reading .env* files, hitting your cloud provider CLIs, calling gh against your repos, and so on. For anything beyond a personal sandbox, run the loops in an environment with a smaller blast radius:
- Dedicated SSH key for the agent: generate a separate key that has push rights only to this repo, and remove the rest of your keys from
ssh-agentwhile the loop runs. Limits how far an unintendedgit pushcan reach. - Docker / devcontainer: run Claude Code inside a container with the project mounted read-write, your secrets not mounted, and outbound network restricted to the registries the project needs. The Playwright MCP container model handles the browser side; the agent process itself can be containerised the same way.
- Service account on a server: stand up a CI-style Linux user (no sudo, scoped credentials, dedicated ssh key, dedicated cloud-provider service account) and run the loops there over SSH. The agent's worst-case behaviour is bounded by what that user can do.
Whichever pattern, the rule of thumb is: assume the agent will at some point execute a command you didn't expect, and make sure that command can't reach anything you can't afford to lose.
Edit the constitution
After install, edit .claude/constitution.md so it reflects this project's principles. The starter ships with seven broadly applicable principles (Test-First, Security-First, Code Quality & Complexity Control, Component Separation, Library-First, Migrations-Only, Design Fidelity); add, remove, or rewrite to fit.
/plan reads this file every time it generates a plan and renders one row per principle into a Constitution Check table — so vague principles produce vague checks. Be specific.
Building a feature
You: /grill-me
The customer keeps saying our checkout flow is "confusing" but won't say what.
Claude: <asks ~10 questions interactively — users, pain points, success criteria, edge cases, non-goals>
You: /write-prd
<slug: checkout-revamp>
Claude: Allocates 003-checkout-revamp/, switches branch, writes specs/003-checkout-revamp/prd.md.
You: /plan
Claude: Inspects package.json + repo structure, reads prd.md and .claude/constitution.md,
writes plan.md (Technical Context, Constitution Check, Project Structure,
Files to touch) and data-model.md.
You: /grill-plan
Claude: <interrogates the plan: which constitution row is handwavy? which files-to-touch
entry is missing? which migration looks safe but isn't? — produces punch list>
You: <hand-edits plan.md based on punch list>
You: /tasks
Claude: Writes tasks.md — phased markdown checkboxes, each implementation task paired
with a sibling check task the evaluator runs.
You: /design (optional, only if the feature needs visual prototypes)
Claude: Designer subagent generates HTML/CSS prototypes under designs/.
Critique subagent scores them in a real browser. Loops until they pass.
You: /build
Claude: For each phase with unchecked tasks:
- Developer subagent implements the phase
- Evaluator subagent verifies via Playwright MCP, ticks off completed tasks
- Loops until evaluator passes the phase
Stops when every task in tasks.md is [x].Skills
| Skill | Purpose | Output |
| --- | --- | --- |
| /grill-me | Interrogate a fresh idea, customer feedback, bug report, or gripe | conversation only |
| /write-prd | Synthesise the conversation into a Product Requirements Document | new branch + prd.md |
| /plan | Translate the PRD into a technical plan and data model | plan.md + data-model.md |
| /grill-plan | Pressure-test an existing plan adversarially | conversation + punch list |
| /tasks | Decompose plan + data model into phased checkbox tasks | tasks.md |
| /design | Run designer ↔ critique loop until prototypes pass | HTML/CSS files in designs/ |
| /build | Run developer ↔ evaluator loop until every task is checked off | implemented code, ticked tasks.md |
Subagents
/build and /design are orchestrators — they delegate work to single-responsibility subagents that run in isolated context (the verifier never sees the implementer's reasoning):
| subagent_type | File | Role |
| --- | --- | --- |
| developing-features | agents/developer.md | Implements one phase; cares about clean code, tests, design fidelity |
| evaluating-phases | agents/evaluator.md | Verifies each task in a real browser via Playwright MCP; flips checkboxes only after passing |
| designing-interfaces | agents/designer.md | Generates distinctive HTML/CSS prototypes per user story |
| critiquing-designs | agents/design-critique.md | Scores prototypes against a rubric in a real browser |
The subagent_type values (left column) are what you pass to the Agent tool. The file basenames (developer.md, etc.) are not the invocation names.
What gets installed where
In the host project after npx specsmith init:
.claude/
├── agents/ # subagent definitions (4 files)
├── skills/ # one folder per skill, some with templates/
│ ├── grill-me/SKILL.md
│ ├── write-prd/{SKILL.md, templates/prd.md}
│ ├── plan/{SKILL.md, templates/{plan.md,data-model.md}}
│ ├── grill-plan/SKILL.md
│ ├── tasks/{SKILL.md, templates/tasks.md}
│ ├── design/SKILL.md
│ └── build/SKILL.md
├── scripts/ # trace hook, env-facts verifier, dev-server helper, etc.
├── constitution.md # principles /plan checks against — EDIT THIS
├── settings.json # permissions + additional directories + trace hooks
├── launch.json # debug configurations
└── specsmith/manifest.json # SHA-256 of every installed file (for `update`)
specs/ # one folder per feature, created by /write-prd
└── NNN-<feature-slug>/
├── prd.md
├── plan.md
├── data-model.md
└── tasks.md
pipeline/ # runtime scratch — created at install, written during /build and /design
├── feedback/ # evaluator/critic reports per cycle
├── traces/ # JSONL traces of build/design runs
├── procedures.md # cached UI flows (login, cookie consent dismissal)
├── environment-facts.md # cached project facts (test command, dev server URL)
├── run-state.md # current orchestrator run context
└── build-log.md # progress log
.mcp.json # Playwright MCP server entrypipeline/procedures.md and pipeline/environment-facts.md are persistent caches — they survive across runs and are never overwritten. The agents discover things once (login flow, dev-server port, test command) and write them here so future cycles skip the discovery step.
Customising
- Templates for the file-writing skills live at
.claude/skills/<skill>/templates/<artifact>.md. Edit them to change the structure of the PRD, plan, data-model, or tasks files. Your edits are preserved acrossupdateruns. - Subagent prompts live at
.claude/agents/*.md. Edit them to change how the developer or evaluator behaves. Edits are preserved acrossupdate. - Skill prompts live at
.claude/skills/*/SKILL.md. Edit similarly. - Constitution:
.claude/constitution.md. Never overwritten byupdate.
The manifest at .claude/specsmith/manifest.json records the SHA-256 of every file at install time. update compares package version vs on-disk vs manifest to decide whether each file is safe to update or has been customised — see "Updating" below.
Updating
npx specsmith updateFor each tracked file:
- already current (on-disk matches package) → skipped
- unmodified since install (on-disk matches manifest) → updated to package version
- user-modified (on-disk differs from both) → skipped, logged for manual reconciliation
- missing on disk → installed fresh
The constitution and your pipeline/*.md files are never touched.
--dry-run previews the diff without applying. --force (via init --force) overwrites everything including the constitution — use with care.
Troubleshooting
claude: command not found during init — Claude Code isn't installed or isn't on PATH. The installer prints a .mcp.json snippet you can paste manually; everything else still installs.
Update reports every file as user-modified on Windows — line-ending normalisation. The package ships LF; if your repo has core.autocrlf=true you'll see CRLF on disk and the hashes diverge. The package ships a .gitattributes with * text=auto eol=lf to prevent this on its own tree, but for your host project run git config core.autocrlf input (or false) before installing.
/plan complains the constitution is missing — .claude/constitution.md was deleted or init was skipped. Run npx specsmith init again (it will only re-add what's missing).
Evaluator can't drive the browser — Playwright MCP wasn't wired. Check .mcp.json has the playwright entry; if not, run claude mcp add --scope project playwright -- npx @playwright/mcp@latest --isolated.
/grill-plan says it can't find the plan — you're not on a feature branch. The ^\d{3}- branch pattern is how the post-PRD skills locate the spec folder. Either git checkout the right branch, or tell the skill which spec folder to use when it asks.
Design decisions worth knowing
- No transcript file from
/grill-me. The conversation context is the handoff, so/write-prdmust run in the same session as/grill-me. - Plan is detailed but not SpecKit-detailed.
/planproduces onlyplan.md+data-model.md— noresearch.md,quickstart.md, orcontracts/. The plan template has Technical Context, Constitution Check, Project Structure, and Files-to-touch sections. - Tasks are not agent-bound. Tasks have no
(dev)/(design)tags. The orchestrator routes; the evaluator decides "done". - Numbering is stable.
FR-###,NFR-###,SC-###,OQ-###identifiers inprd.mdare append-only — never renumber when adding items. - Each feature gets its own branch.
/write-prdrunsgit checkout -b NNN-<slug>off whatever branch you're on, inheriting any uncommitted changes. Multiple features in flight = multiple branches.
License
MIT. See LICENSE.
