@mirror-factory/ai-dev-kit

v0.2.22

Published

9 days ago

Enforcement scaffold for AI applications on Vercel AI SDK v6 -- git hooks, AGENTS.md generation, IDE configs, tool validation, and quality gates. Pairs with Langfuse for observability and Promptfoo for evals.

Downloads

1,237

0High
0Medium
0Low

mirror-factory

ai vercel ai-sdk ai-sdk-v6 nextjs testing enforcement observability mirror-factory dev-kit

@mirror-factory/ai-dev-kit `v0.2.22`

Enforcement scaffold for AI applications on Vercel AI SDK v6. Git hooks, AGENTS.md generation, IDE configs, tool validation, and quality gates. Pairs with Langfuse for observability and Promptfoo for evals. IDE-agnostic (Claude Code, Cursor, Windsurf, Gemini CLI).

Hustle Together is the layer we bolt on top of Claude Code to ship production AI applications. Claude Code gives the agent intelligence. This kit gives the agent discipline. The kit won't let the agent say "done" while anything is red.

Read this first

docs/why-a-harness.md — the thesis. Why a harness on top of a harness, and what this layer takes opinions about. Start here if you've never used the kit.
docs/getting-started/quickstart.md — 3-5 commands to install and verify.
docs/roadmap.md — what's next (AI Brand Studio editor, more dashboard editability).

Install (quick)

# Greenfield project:
npx @mirror-factory/ai-dev-kit init

# Existing project (non-destructive retrofit):
npx @mirror-factory/ai-dev-kit adopt

# After either:
npx @mirror-factory/ai-dev-kit onboard         # refreshes AGENTS.md Kit Catalog
npx @mirror-factory/ai-dev-kit connect         # env-key wizard
ai-dev-kit doctor --strict                     # verify the plumbing

Full install walkthrough: docs/getting-started/full-install.md.

What you get

30+ gates across 4 enforcement layers (Claude Code hooks, pre-commit, pre-push, weekly CI).
11 primary registries in .ai-dev-kit/registries/ — components, pages, tools, skills, api-routes, mcp-servers, hooks, docs, tests (auto-synced) + design-tokens, design-system (hand-curated). Plus dependencies (weekly audit), test-contracts + index (rollups), and per-vendor JSON.
21 dashboard pages under /dev-kit/*. /dev-kit/config is a YAML editor; /dev-kit/runs/[run_id] is the live run view; /dev-kit/registries is a 12-tab inventory of every kind of registry with per-tab filters.
7 subagents — @planner, @evaluator, @spec-enricher, @design-agent, @code-reviewer (security / perf / clarity judge), @interview-researcher (doc-aware Phase B interview), @impl-doc-diff (opt-in post-impl docs verification). Each declares an explicit model: tier (haiku / sonnet / opus) so cost stays predictable.
9 slash commands under .claude/commands/ — TDD micro-ops (/red, /green, /refactor, /cycle) + kit drivers (/kit-create, /kit-design, /kit-run, /kit-combine, /kit-status).
Kit audit log at .ai-dev-kit/state/kit-audit.jsonl — every CLI command, Husky hook step, Claude hook fire, dashboard API call, doctor check, git_state change, and bootstrap step appends a run_id-tagged event. Inspect via ai-dev-kit audit show [--last N] [--kind K]; dump for bug reports with ai-dev-kit audit export --format jsonl.
Test-run telemetry — vitest + Playwright reporters append to .ai-dev-kit/state/test-runs.jsonl after every run; scanTests rolls up last_run / last_duration_ms / last_outcome into tests.yaml. Doctor flags tests not run in >14 days.
Inventory doctor checks — checkHookInventory fails on unwired Claude hooks + ghost hooks; checkDocsIntegrity fails on broken outbound doc links + warns on orphan docs; checkTestInventory warns on orphan tests under tests/{expect,e2e,integration}/.
5 skills under .claude/skills/ — context7-first, compliance-fix, observability-debug, visual-qa, wire-telemetry. Invocations are tracked and surfaced in the dashboard.
run_id correlation — every AI call, vendor call, test, doc lookup, and skill invocation threads through a single ID. /dev-kit/runs/[run_id] aggregates the full feature build.
@design-agent subagent + ai-dev-kit design <feature> CLI — pre-build gate that populates design tokens + system spec + wireframes + IA before any .tsx lands.
ai-dev-kit run <feature> — OODA verification loop. 11 checks × 5 iterations max. Auto-notifies on done or stuck.
YAML → CSS codegen — design-tokens.yaml is the source of truth; app/styles/tokens.css is regenerated on every commit.
Cost attribution — LLM tokens + non-LLM vendor APIs (AssemblyAI per-minute, Firecrawl per-page, etc.) both tagged with costMode and attributed to the same run. Weekly cost-drift GH Action Firecrawls registry source_urls and blocks on >2% drift.
Reinstall-safe adopt — npx ai-dev-kit adopt on an existing project now auto-runs sync-registries + generate-theme-css + sync-project-index + doctor so the dashboard isn't empty on first load. Opt out with --skip-bootstrap.

Documentation index

Getting started

Quickstart — install + verify in 5 minutes.
Full install — new project + existing project paths, flags, after-install verification steps.
Custom install — module subsets, advanced scaffolding.

Concepts (understand the kit)

Why a harness — the belief system.
Run correlation — the run_id backbone threading through every record.
Registries — the eleven primary registries + rollups, auto vs manual, what they hold.
Slash commands — the 9 slash commands: TDD micro-ops (/red, /green, /refactor, /cycle) + kit drivers (/kit-create, /kit-design, /kit-run, /kit-combine, /kit-status).
Code review — @code-reviewer subagent + check-code-review.mts pre-push gate (security / perf / clarity / test-gaps).
Design-first — pre-build design gate + @design-agent workflow.
Design pipeline — YAML → CSS codegen, why intent and render are separate layers.
Brand enforcement — static token check + LLM judge + visual regression, all fail-closed.
Cost tracking — LLM + vendor attribution + budget enforcement + drift detection.
Ralph loop (OODA) — the ai-dev-kit run verification driver.
Onboarding flow — how the kit loads into a Claude Code session end-to-end.

Reference

CLI commands — full command + flag reference.
API routes — /api/dev-kit/* endpoints.
Config files — every YAML under .ai-dev-kit/.
Hooks — Claude Code hooks + husky hooks shipped by the kit.
Database schema — Supabase tables for observability.
Telemetry events — event names and payload shapes.
Test patterns — the 17 test patterns + when to apply each.

Guides

Layer One rebuild prompt — real-world example: rebuild an audio app on the kit.
Creating tools — Zod-typed tool patterns.
Cost management — budget.yaml + cost-drift + per-call alerts.
Dashboard — /dev-kit/* pages explained.
Deployment gates — pre-deploy verification.

Project docs

CHANGELOG — per-release notes.
Roadmap — shipped → 0.3.x → 0.4.x+.
Branding guide — Mirror Factory voice for kit docs.
Concepts: what is a harness — the broader framing.

Compatibility

Primary: Claude Code (.claude/settings.json + native hooks + skills).
Supported via rules files + git hooks: Cursor, Windsurf, Gemini CLI. Every enforcement layer that runs in git runs for every agent.
Runtime: Node 18+, pnpm, Next.js 15 (for the dashboard + API routes).
Optional integrations: Langfuse (OTel export), Supabase (observability persistence), Promptfoo (evals), Firecrawl MCP (docs), Context7 MCP (live library docs).

Pricing

Free. MIT-licensed. There's no paid tier. We ship Mirror Factory products on top of the kit every day — the value is in what you build on top, not in the kit itself.

If you extend it, fork it, monetize it: that's the point.

Contributing

Open a discussion or issue at github.com/mirror-factory/vercel-ai-starter-kit/issues. Roadmap items live at docs/roadmap.md — propose new ones there first before opening a PR.

Before pushing: the kit's own pre-push hooks apply to the kit's repo. pnpm test must pass, pnpm typecheck must pass. tests/kit/ covers the enforcement scripts with 137 assertions across 10 files.

License

MIT. See LICENSE.

Built from production experience. Set in JetBrains Mono. Green on black, as it should be.