@ruchit07/ai-spec

v1.0.0

Published

18 days ago

Scaffold a production-grade AI feature spec before you write a line of code — spec, eval criteria, golden test cases, and an ADR starter

0High
0Medium
0Low

ruchit07

ai llm spec rag evals cli scaffold spec-driven-development ai-engineering

ai-spec

Scaffold a production-grade AI feature spec before you write a line of code.

Most AI features are vibe-coded: call the LLM, the output looks reasonable, ship. Then three weeks later a prompt change silently breaks 20% of queries — and nobody knows, because "looks reasonable" was never a measurable criterion.

ai-spec fixes the root cause. One command generates the spec, the eval criteria, a seed set of golden test cases, and an ADR starter — so "done" has a precise definition before you start.

By Ruchit Suthar — Software Architect & Technical Leader. 📖 Method: AI-Driven Development: The Spec-First Workflow

Quick start

npx @ruchit07/ai-spec init "semantic search for support tickets"

That's it. You'll be prompted for the feature kind, problem statement, latency/cost budgets, and owner — then it writes:

specs/
└── semantic-search-for-support-tickets/
    ├── spec.md              # Problem, I/O contract, acceptance criteria, failure modes
    ├── eval-criteria.md     # The metrics that gate CI, with threshold rationale
    ├── eval-criteria.json   # Machine-readable thresholds for your CI
    ├── test-cases.json      # Seed golden test cases tailored to the feature kind
    └── adr.md               # Architecture Decision Record starter

Non-interactive (for scripts / CI)

ai-spec init "ticket classifier" --kind classification --yes
ai-spec init "support agent" -k agent -d ./ai-features --force

Why this matters

| Without a spec | With ai-spec | |----------------|--------------| | "Looks good" is the bar | Measurable thresholds (accuracy ≥ 0.8, latency p95 ≤ 2000ms) | | Regressions found by users | Regressions caught in CI | | Prompt is the only documentation | Spec + ADR explain intent and decisions | | No baseline when you switch models | Golden test set is the baseline |

The discipline is the value. ai-spec makes the disciplined path the easy path.

Feature kinds

The seed test cases and spec guidance adapt to what you're building:

| Kind | Tailored guidance | |------|-------------------| | rag | Retrieval quality + groundedness as the dominant metric | | chat | Conversation scope, tools, context-window budget | | classification | Exact label set, out-of-distribution handling | | extraction | Typed output schema, field-level accuracy | | agent | Action space, stopping conditions, safety guardrails |

Programmatic API

import { generateFiles, slugify } from '@ruchit07/ai-spec';

const files = generateFiles({
  featureName: 'My Feature',
  slug: slugify('My Feature'),
  kind: 'rag',
  problem: '...',
  primaryProvider: 'openai',
  latencyP95Ms: 2000,
  costPerQueryUsd: 0.005,
  accuracyThreshold: 0.8,
  groundednessThreshold: 0.85,
  owner: '@you',
});
// files: { path, content }[]

Pairs with

ai-native-app-blueprint — the production reference these specs target. Its packages/evals runs the test-cases.json this tool generates.
The Spec-First Workflow — the full method.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme