@swarmclawai/prompt-snapshot
v0.1.0
Published
Vitest-for-prompts. File-based prompt eval runner with snapshots and assertions — no SaaS, no dashboard, CI-native. Built for agents.
Maintainers
Readme
prompt-snapshot
Vitest-for-prompts. A file-based prompt eval runner with snapshots and assertions. No dashboard, no SaaS, no sign-up — just
.prompt.tsfiles in your repo and a CLI that runs them in CI.
Why this exists
If you ship any LLM feature, you should be testing your prompts the way you test your code — in CI, on every change. The existing options (Langfuse, DeepEval, Phoenix, Promptfoo) are heavy observability platforms shaped for enterprise teams: SDK init, hosted dashboards, sign-ups, YAML configs, whole new abstractions to learn.
A solo dev or small team wants this instead:
- Colocated
.prompt.tsfiles next to the code that uses the prompt. - A TypeScript-native API with types, autocomplete, and Zod schemas.
- Snapshot testing — run a prompt, capture the output, fail the build if it drifts.
- Deterministic assertions (
contains,matchesSchema, …) for the parts that matter. - One CLI, JSON output, stable exit codes — CI-friendly on day one.
That's what prompt-snapshot is.
30-second demo
Create summarizer.prompt.ts:
import { defineSuite, contains, shorterThan } from "@swarmclawai/prompt-snapshot";
export default defineSuite({
name: "summarizer",
model: { provider: "anthropic", model: "claude-haiku-4-5", temperature: 0 },
prompt: ({ doc }: { doc: string }) =>
`Summarize the following in one sentence.\n\n${doc}`,
cases: [
{
name: "short-doc",
input: { doc: "The cat sat on the mat. The dog watched from across the room." },
assertions: [contains("cat", { caseInsensitive: true }), shorterThan(300)],
snapshot: true,
},
],
});Run it:
export ANTHROPIC_API_KEY=sk-...
npx @swarmclawai/prompt-snapshot@latest runFirst run writes a snapshot to __snapshots__/summarizer.snap.json. Every run after that fails if the output drifts. prompt-snapshot update refreshes snapshots on purpose.
Install
pnpm add -D @swarmclawai/prompt-snapshot
# or
npm i -D @swarmclawai/prompt-snapshotCommands
| Command | Purpose |
|---|---|
| prompt-snapshot run [path] | Run every suite under a path |
| prompt-snapshot update [path] | Run + rewrite snapshots |
| prompt-snapshot list [path] | Discover suites without running |
| prompt-snapshot help-agents | Print the machine-readable catalog |
Every command accepts --json and returns a one-line JSON envelope. Exit codes: 0 all passed, 1 some failed or suites wouldn't load, 2 internal error.
Assertions
| Assertion | Checks |
|---|---|
| contains(s, { caseInsensitive? }) | output contains the substring |
| notContains(s, { caseInsensitive? }) | output does not contain the substring |
| matches(/regex/) | output matches a regex |
| matchesSchema(zodSchema) | output is JSON (possibly fenced) and matches the schema |
| startsWith(s) | trimmed output starts with the prefix |
| oneOf([...]) | trimmed output equals one of the options |
| longerThan(n) / shorterThan(n) | char length bound |
Combine them freely per case. Failures are reported with rule name + message.
Snapshots
Set snapshot: true on a case (or on the suite) and the first run writes a normalized copy of the output to <dir>/__snapshots__/<suite>.snap.json. Subsequent runs diff the new output against the saved one and fail on drift. Run with update to rewrite.
Normalization trims trailing whitespace and collapses trailing newlines — so whitespace noise doesn't cause false failures, but substantive drift does.
Providers
| Provider | Env var | Notes |
|---|---|---|
| anthropic | ANTHROPIC_API_KEY | Uses @anthropic-ai/sdk. Defaults: temperature: 0, max_tokens: 1024 |
| openai | OPENAI_API_KEY | Uses openai SDK |
| openai-compatible | varies | Point baseURL at Ollama, LM Studio, vLLM, or any OpenAI-compatible API |
| mock | — | Only via programmatic injection (see below) |
Using from Vitest
prompt-snapshot is a library first, a CLI second. Drive it directly from Vitest if you prefer:
import { test, expect } from "vitest";
import { runSuite, createMockProvider } from "@swarmclawai/prompt-snapshot";
import suite from "./summarizer.prompt.js";
test("summarizer suite", async () => {
const res = await runSuite(suite, {
provider: () => createMockProvider({ respond: () => "A cat story." }),
});
expect(res.failed).toBe(0);
});The provider factory makes it trivial to inject a mock for unit tests and a real client for integration tests.
Built for coding agents
Every swarmclawai CLI follows the same agent conventions so Claude Code, Cursor, Cline, Aider, Codex et al can drive them without guessing:
--jsoneverywhere, one-line envelope on stdout- Stderr for logs, stdout for data
- Stable exit codes:
0/1/2 - Non-interactive by default; missing inputs error cleanly
prompt-snapshot help-agentsreturns the entire command catalog as JSON
See AGENTS.md for the full machine-readable reference.
How it compares
| | prompt-snapshot | Langfuse | DeepEval | Promptfoo | Phoenix | |---|---|---|---|---|---| | Zero setup, no dashboard | ✅ | ❌ | ❌ | ✅ | ❌ | | TypeScript-native API | ✅ | partial | ❌ | partial | ❌ | | Snapshot testing | ✅ | ❌ | ❌ | ❌ | ❌ | | Zod-schema assertions | ✅ | ❌ | ❌ | ❌ | ❌ | | Pluggable providers (incl. local) | ✅ | ✅ | partial | ✅ | ✅ | | CI-first | ✅ | partial | ✅ | ✅ | partial |
Roadmap
similarTo(reference, { threshold })— embedding-based similarityjudge(criteria, { model })— LLM-as-judge (opt-in, cost-aware)- Parallel case execution with configurable concurrency
- A thin Vitest reporter plugin
- GitHub Action template for running on every PR
Contributing
See CONTRIBUTING.md.
