@ruchit07/ai-spec
v1.0.0
Published
Scaffold a production-grade AI feature spec before you write a line of code — spec, eval criteria, golden test cases, and an ADR starter
Maintainers
Readme
ai-spec
Scaffold a production-grade AI feature spec before you write a line of code.
Most AI features are vibe-coded: call the LLM, the output looks reasonable, ship. Then three weeks later a prompt change silently breaks 20% of queries — and nobody knows, because "looks reasonable" was never a measurable criterion.
ai-spec fixes the root cause. One command generates the spec, the eval criteria, a seed set of golden test cases, and an ADR starter — so "done" has a precise definition before you start.
By Ruchit Suthar — Software Architect & Technical Leader. 📖 Method: AI-Driven Development: The Spec-First Workflow
Quick start
npx @ruchit07/ai-spec init "semantic search for support tickets"That's it. You'll be prompted for the feature kind, problem statement, latency/cost budgets, and owner — then it writes:
specs/
└── semantic-search-for-support-tickets/
├── spec.md # Problem, I/O contract, acceptance criteria, failure modes
├── eval-criteria.md # The metrics that gate CI, with threshold rationale
├── eval-criteria.json # Machine-readable thresholds for your CI
├── test-cases.json # Seed golden test cases tailored to the feature kind
└── adr.md # Architecture Decision Record starterNon-interactive (for scripts / CI)
ai-spec init "ticket classifier" --kind classification --yes
ai-spec init "support agent" -k agent -d ./ai-features --forceWhy this matters
| Without a spec | With ai-spec | |----------------|--------------| | "Looks good" is the bar | Measurable thresholds (accuracy ≥ 0.8, latency p95 ≤ 2000ms) | | Regressions found by users | Regressions caught in CI | | Prompt is the only documentation | Spec + ADR explain intent and decisions | | No baseline when you switch models | Golden test set is the baseline |
The discipline is the value. ai-spec makes the disciplined path the easy path.
Feature kinds
The seed test cases and spec guidance adapt to what you're building:
| Kind | Tailored guidance |
|------|-------------------|
| rag | Retrieval quality + groundedness as the dominant metric |
| chat | Conversation scope, tools, context-window budget |
| classification | Exact label set, out-of-distribution handling |
| extraction | Typed output schema, field-level accuracy |
| agent | Action space, stopping conditions, safety guardrails |
Programmatic API
import { generateFiles, slugify } from '@ruchit07/ai-spec';
const files = generateFiles({
featureName: 'My Feature',
slug: slugify('My Feature'),
kind: 'rag',
problem: '...',
primaryProvider: 'openai',
latencyP95Ms: 2000,
costPerQueryUsd: 0.005,
accuracyThreshold: 0.8,
groundednessThreshold: 0.85,
owner: '@you',
});
// files: { path, content }[]Pairs with
- ai-native-app-blueprint — the production reference these specs target. Its
packages/evalsruns thetest-cases.jsonthis tool generates. - The Spec-First Workflow — the full method.
License
MIT
