@mirror-factory/ai-dev-kit
v0.2.22
Published
Enforcement scaffold for AI applications on Vercel AI SDK v6 -- git hooks, AGENTS.md generation, IDE configs, tool validation, and quality gates. Pairs with Langfuse for observability and Promptfoo for evals.
Downloads
1,237
Maintainers
Readme
@mirror-factory/ai-dev-kit v0.2.22
Enforcement scaffold for AI applications on Vercel AI SDK v6. Git hooks, AGENTS.md generation, IDE configs, tool validation, and quality gates. Pairs with Langfuse for observability and Promptfoo for evals. IDE-agnostic (Claude Code, Cursor, Windsurf, Gemini CLI).
Hustle Together is the layer we bolt on top of Claude Code to ship production AI applications. Claude Code gives the agent intelligence. This kit gives the agent discipline. The kit won't let the agent say "done" while anything is red.
Read this first
- docs/why-a-harness.md — the thesis. Why a harness on top of a harness, and what this layer takes opinions about. Start here if you've never used the kit.
- docs/getting-started/quickstart.md — 3-5 commands to install and verify.
- docs/roadmap.md — what's next (AI Brand Studio editor, more dashboard editability).
Install (quick)
# Greenfield project:
npx @mirror-factory/ai-dev-kit init
# Existing project (non-destructive retrofit):
npx @mirror-factory/ai-dev-kit adopt
# After either:
npx @mirror-factory/ai-dev-kit onboard # refreshes AGENTS.md Kit Catalog
npx @mirror-factory/ai-dev-kit connect # env-key wizard
ai-dev-kit doctor --strict # verify the plumbingFull install walkthrough: docs/getting-started/full-install.md.
What you get
- 30+ gates across 4 enforcement layers (Claude Code hooks, pre-commit, pre-push, weekly CI).
- 11 primary registries in
.ai-dev-kit/registries/— components, pages, tools, skills, api-routes, mcp-servers, hooks, docs, tests (auto-synced) + design-tokens, design-system (hand-curated). Plus dependencies (weekly audit), test-contracts + index (rollups), and per-vendor JSON. - 21 dashboard pages under
/dev-kit/*./dev-kit/configis a YAML editor;/dev-kit/runs/[run_id]is the live run view;/dev-kit/registriesis a 12-tab inventory of every kind of registry with per-tab filters. - 7 subagents —
@planner,@evaluator,@spec-enricher,@design-agent,@code-reviewer(security / perf / clarity judge),@interview-researcher(doc-aware Phase B interview),@impl-doc-diff(opt-in post-impl docs verification). Each declares an explicitmodel:tier (haiku / sonnet / opus) so cost stays predictable. - 9 slash commands under
.claude/commands/— TDD micro-ops (/red,/green,/refactor,/cycle) + kit drivers (/kit-create,/kit-design,/kit-run,/kit-combine,/kit-status). - Kit audit log at
.ai-dev-kit/state/kit-audit.jsonl— every CLI command, Husky hook step, Claude hook fire, dashboard API call, doctor check, git_state change, and bootstrap step appends arun_id-tagged event. Inspect viaai-dev-kit audit show [--last N] [--kind K]; dump for bug reports withai-dev-kit audit export --format jsonl. - Test-run telemetry — vitest + Playwright reporters append to
.ai-dev-kit/state/test-runs.jsonlafter every run;scanTestsrolls uplast_run/last_duration_ms/last_outcomeintotests.yaml. Doctor flags tests not run in >14 days. - Inventory doctor checks —
checkHookInventoryfails on unwired Claude hooks + ghost hooks;checkDocsIntegrityfails on broken outbound doc links + warns on orphan docs;checkTestInventorywarns on orphan tests undertests/{expect,e2e,integration}/. - 5 skills under
.claude/skills/—context7-first,compliance-fix,observability-debug,visual-qa,wire-telemetry. Invocations are tracked and surfaced in the dashboard. run_idcorrelation — every AI call, vendor call, test, doc lookup, and skill invocation threads through a single ID./dev-kit/runs/[run_id]aggregates the full feature build.@design-agentsubagent +ai-dev-kit design <feature>CLI — pre-build gate that populates design tokens + system spec + wireframes + IA before any.tsxlands.ai-dev-kit run <feature>— OODA verification loop. 11 checks × 5 iterations max. Auto-notifies on done or stuck.- YAML → CSS codegen —
design-tokens.yamlis the source of truth;app/styles/tokens.cssis regenerated on every commit. - Cost attribution — LLM tokens + non-LLM vendor APIs (AssemblyAI per-minute, Firecrawl per-page, etc.) both tagged with
costModeand attributed to the same run. Weekly cost-drift GH Action Firecrawls registrysource_urls and blocks on >2% drift. - Reinstall-safe adopt —
npx ai-dev-kit adopton an existing project now auto-runssync-registries+generate-theme-css+sync-project-index+doctorso the dashboard isn't empty on first load. Opt out with--skip-bootstrap.
Documentation index
Getting started
- Quickstart — install + verify in 5 minutes.
- Full install — new project + existing project paths, flags, after-install verification steps.
- Custom install — module subsets, advanced scaffolding.
Concepts (understand the kit)
- Why a harness — the belief system.
- Run correlation — the
run_idbackbone threading through every record. - Registries — the eleven primary registries + rollups, auto vs manual, what they hold.
- Slash commands — the 9 slash commands: TDD micro-ops (
/red,/green,/refactor,/cycle) + kit drivers (/kit-create,/kit-design,/kit-run,/kit-combine,/kit-status). - Code review —
@code-reviewersubagent +check-code-review.mtspre-push gate (security / perf / clarity / test-gaps). - Design-first — pre-build design gate +
@design-agentworkflow. - Design pipeline — YAML → CSS codegen, why intent and render are separate layers.
- Brand enforcement — static token check + LLM judge + visual regression, all fail-closed.
- Cost tracking — LLM + vendor attribution + budget enforcement + drift detection.
- Ralph loop (OODA) — the
ai-dev-kit runverification driver. - Onboarding flow — how the kit loads into a Claude Code session end-to-end.
Reference
- CLI commands — full command + flag reference.
- API routes —
/api/dev-kit/*endpoints. - Config files — every YAML under
.ai-dev-kit/. - Hooks — Claude Code hooks + husky hooks shipped by the kit.
- Database schema — Supabase tables for observability.
- Telemetry events — event names and payload shapes.
- Test patterns — the 17 test patterns + when to apply each.
Guides
- Layer One rebuild prompt — real-world example: rebuild an audio app on the kit.
- Creating tools — Zod-typed tool patterns.
- Cost management — budget.yaml + cost-drift + per-call alerts.
- Dashboard —
/dev-kit/*pages explained. - Deployment gates — pre-deploy verification.
Project docs
- CHANGELOG — per-release notes.
- Roadmap — shipped → 0.3.x → 0.4.x+.
- Branding guide — Mirror Factory voice for kit docs.
- Concepts: what is a harness — the broader framing.
Compatibility
- Primary: Claude Code (
.claude/settings.json+ native hooks + skills). - Supported via rules files + git hooks: Cursor, Windsurf, Gemini CLI. Every enforcement layer that runs in git runs for every agent.
- Runtime: Node 18+, pnpm, Next.js 15 (for the dashboard + API routes).
- Optional integrations: Langfuse (OTel export), Supabase (observability persistence), Promptfoo (evals), Firecrawl MCP (docs), Context7 MCP (live library docs).
Pricing
Free. MIT-licensed. There's no paid tier. We ship Mirror Factory products on top of the kit every day — the value is in what you build on top, not in the kit itself.
If you extend it, fork it, monetize it: that's the point.
Contributing
Open a discussion or issue at github.com/mirror-factory/vercel-ai-starter-kit/issues. Roadmap items live at docs/roadmap.md — propose new ones there first before opening a PR.
Before pushing: the kit's own pre-push hooks apply to the kit's repo. pnpm test must pass, pnpm typecheck must pass. tests/kit/ covers the enforcement scripts with 137 assertions across 10 files.
License
MIT. See LICENSE.
Built from production experience. Set in JetBrains Mono. Green on black, as it should be.
