schemalock
v0.3.0
Published
LLM output contract testing CLI - catch prompt regressions before they reach production
Maintainers
Readme
schemalock
LLM output contract testing CLI. Catch prompt regressions before they reach production.
When you update a prompt or switch models, your downstream code can break silently. schemalock gives you a test suite for LLM outputs - define what your pipeline must return, run it against any model, and get a clear pass/fail with cost tracking.
Quick Start
# 1. Install globally
npm install -g schemalock
# 2. Set your API key (stored in ~/.schemalock/.env)
schemalock config set ANTHROPIC_API_KEY sk-ant-...
# 3. Define a contract
schemalock define invoice-extractor \
--prompt prompts/invoice-extractor.txt \
--must-contain "total_amount,currency,date" \
--cases cases/invoice-cases.json
# 4. Run tests
schemalock test invoice-extractor --model claude-sonnet-4-6
# 5. Compare models before switching
schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractorCommands
schemalock define <name>
Create or update a contract.
schemalock define sentiment-classifier \
--prompt prompts/sentiment.txt \
--format json \
--must-contain "sentiment,confidence,reasoning" \
--cases cases/sentiment-cases.json \
--description "Classifies customer review sentiment"| Flag | Description |
|------|-------------|
| --prompt <file> | System prompt file |
| --format | json (default), text, or markdown |
| --must-contain <fields> | Comma-separated required JSON fields |
| --must-not-contain <phrases> | Comma-separated banned phrases (for text output) |
| --schema <file> | JSON Schema file for strict validation |
| --cases <file> | JSON file with test cases |
| --description <text> | Human-readable description |
| --overwrite | Replace an existing contract |
schemalock test <name>
Run the contract against a model. Exits 0 on pass, 1 on fail (CI-friendly).
schemalock test invoice-extractor --model claude-sonnet-4-6
schemalock test invoice-extractor --model gpt-4o --threshold 0.9
schemalock test invoice-extractor --model llama-3.3-70b-versatile # Groq
schemalock test invoice-extractor --model ollama/llama3.2 # local
schemalock test invoice-extractor --output json # machine-readable| Flag | Default | Description |
|------|---------|-------------|
| --model | claude-sonnet-4-6 | Model to test |
| --threshold | 0.8 | Min pass rate (0.0-1.0) for exit code 0 |
| --max-tokens | 1024 | Max output tokens per call |
| --output | console | console or json |
| --cases <file> | - | Override test cases (JSON file) |
| --prompt <file> | - | Override system prompt file |
| --base-url <url> | - | Custom OpenAI-compatible endpoint (Ollama, Groq, LM Studio) |
| --api-key <key> | - | API key (defaults to env var for the chosen model) |
| --delay <ms> | 0 | Delay between API calls in ms (avoids rate limits) |
schemalock diff <model1> <model2>
Find regressions before switching models. Runs the full test suite on both models and shows where they disagree.
schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractor
schemalock diff claude-sonnet-4-6 llama-3.3-70b-versatile --contract invoice-extractorOutput:
Comparison
claude-sonnet-4-6 gpt-4o
──────────────────────────────────────────────────────
Pass Rate 100% 80%
Avg Latency 1230ms 890ms
Total Cost $0.0041 $0.0028
Disagreements (1/5 cases differ):
european-invoice claude-sonnet-4-6=PASS gpt-4o=FAIL
claude-sonnet-4-6 leads by 20 percentage points. Consider regression risk before switching to gpt-4o.| Flag | Default | Description |
|------|---------|-------------|
| --contract <name> | required | Contract to test against |
| --cases <file> | - | Override test cases |
| --prompt <file> | - | Override system prompt |
| --max-tokens | 1024 | Max output tokens per call |
| --base-url <url> | - | Custom OpenAI-compatible endpoint |
| --api-key <key> | - | API key override |
| --delay <ms> | 0 | Delay between API calls in ms |
schemalock list
List all your contracts with last run status.
schemalock list
schemalock list --models # show available models + pricingschemalock report <name>
View test run history for a contract.
schemalock report invoice-extractor # last 5 runs
schemalock report invoice-extractor --last 20 # last 20 runs
schemalock report invoice-extractor --run 7 # full case detail for run #7schemalock delete <name>
Delete a contract and optionally its test history.
schemalock delete invoice-extractor # prompts for confirmation
schemalock delete invoice-extractor --yes # skip prompt (CI/scripts)
schemalock delete invoice-extractor --keep-history # remove contract, keep run data| Flag | Description |
|------|-------------|
| --yes | Skip confirmation prompt (safe for CI/scripts) |
| --keep-history | Keep test run history in the database |
schemalock config
Manage API keys and settings.
schemalock config set ANTHROPIC_API_KEY sk-ant-...
schemalock config set OPENAI_API_KEY sk-...
schemalock config set GROQ_API_KEY gsk_...
schemalock config get ANTHROPIC_API_KEY
schemalock config delete ANTHROPIC_API_KEY
schemalock config list-keys # show all stored key names (values masked)
schemalock config update-pricing # write ~/.schemalock/models.json pricing template
schemalock config env # show active paths and env var overridesKeys are stored in ~/.schemalock/.env - they persist across projects and terminals.
Test Cases Format
[
{
"id": "simple-invoice",
"input": "Invoice from Acme Corp. Date: Jan 15 2024. Total: $100 USD",
"expected": {
"total_amount": 100,
"currency": "USD",
"vendor_name": "Acme Corp"
}
}
]id- unique identifier (shown in test output)input- the user message sent to the LLMexpected- optional key/value pairs that must match the parsed output
Supported Models
Anthropic
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $5.00 | $25.00 |
| claude-haiku-4-5 | $1.00 | $5.00 |
OpenAI
GPT-5 (latest)
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| gpt-5 | $1.25 | $10.00 |
| gpt-5-mini | $0.25 | $2.00 |
| gpt-5-nano | $0.05 | $0.40 |
GPT-4.1
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| gpt-4.1 | $2.00 | $8.00 |
| gpt-4.1-mini | $0.40 | $1.60 |
| gpt-4.1-nano | $0.10 | $0.40 |
GPT-4o (previous generation)
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
o-series reasoning
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| o3 | $2.00 | $8.00 |
| o4-mini | $1.10 | $4.40 |
| o3-mini | $1.10 | $4.40 |
| o1 | $15.00 | $60.00 |
Groq (fast inference)
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| meta-llama/llama-4-scout-17b-16e-instruct | $0.11 | $0.34 |
| llama-3.3-70b-versatile | $0.59 | $0.79 |
| llama-3.1-8b-instant | $0.05 | $0.08 |
| mixtral-8x7b-32768 | $0.24 | $0.24 |
| gemma2-9b-it | $0.20 | $0.20 |
Mistral
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| mistral-large-latest | $0.50 | $1.50 |
| mistral-medium-latest | $0.40 | $2.00 |
| codestral-latest | $0.30 | $0.90 |
| mistral-small-latest | $0.10 | $0.30 |
Google Gemini
| Model | Input $/1M | Output $/1M |
|-------|-----------|------------|
| gemini-2.5-pro | $1.25 | $10.00 |
| gemini-2.5-flash | $0.30 | $2.50 |
| gemini-2.0-flash | $0.10 | $0.40 |
| gemini-2.0-flash-lite | $0.075 | $0.30 |
Requires GOOGLE_API_KEY from Google AI Studio.
Ollama (local, free)
| Model | Notes |
|-------|-------|
| ollama/llama4 | Requires ollama serve running locally |
| ollama/llama3.3 | Requires ollama serve running locally |
| ollama/llama3.2 | Requires ollama serve running locally |
| ollama/mistral | Requires ollama serve running locally |
| ollama/phi4 | Requires ollama serve running locally |
| ollama/qwen2.5 | Requires ollama serve running locally |
Any model served by Ollama works with --model ollama/<model-name>.
Custom / Self-Hosted
Any OpenAI-compatible endpoint (LM Studio, Together AI, Fireworks AI, vLLM, etc.):
schemalock test my-contract \
--model meta-llama/Meta-Llama-3-70B-Instruct \
--base-url https://api.together.xyz/v1 \
--api-key $TOGETHER_API_KEYRun schemalock list --models to see all built-in models with current pricing.
CI/CD Integration
# .github/workflows/test-prompts.yml
- name: Run schemalock contract tests
run: |
npx schemalock test invoice-extractor \
--model claude-sonnet-4-6 \
--threshold 0.9 \
--yes \
--output json > schemalock-results.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}The --output json flag produces machine-readable output:
{
"runId": 12,
"contract": "invoice-extractor",
"model": "claude-sonnet-4-6",
"passRate": 0.95,
"passedCount": 19,
"total": 20,
"passed": true,
"totalCostUsd": 0.0041,
"avgLatencyMs": 1230,
"cases": [...]
}Environment Variables
| Variable | Purpose |
|----------|---------|
| ANTHROPIC_API_KEY | Anthropic API key |
| OPENAI_API_KEY | OpenAI API key |
| GROQ_API_KEY | Groq API key |
| MISTRAL_API_KEY | Mistral API key |
| TOGETHER_API_KEY | Together AI API key |
| FIREWORKS_API_KEY | Fireworks AI API key |
| SCHEMALOCK_DB | Override SQLite DB path (useful for per-project isolation in CI) |
Keys can also be stored persistently with schemalock config set <KEY> <value>.
Data Storage
All data stored locally in ~/.schemalock/:
contracts/- YAML contract definitionsresults.db- SQLite database of all test runs and case results.env- API keys set viaschemalock config setmodels.json- optional pricing overrides (created byschemalock config update-pricing)
Architecture
src/
cli.js # Commander entry point
commands/
define.js # schemalock define
test.js # schemalock test
diff.js # schemalock diff
list.js # schemalock list
report.js # schemalock report
delete.js # schemalock delete
config.js # schemalock config
core/
runner.js # Anthropic + OpenAI API calls, timeout, client cache
validator.js # JSON Schema + field validation + expected value checks
store.js # SQLite persistence (WAL mode, busy_timeout 30s)
contracts.js # YAML contract load/save/delete
utils/
config.js # ~/.schemalock/ directory + DB path management
models.js # Model registry, pricing, pricing overrides
cases.js # Case ID sanitization, count guards, structure validationLicense
MIT
