npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

schemalock

v0.3.0

Published

LLM output contract testing CLI - catch prompt regressions before they reach production

Readme

schemalock

LLM output contract testing CLI. Catch prompt regressions before they reach production.

When you update a prompt or switch models, your downstream code can break silently. schemalock gives you a test suite for LLM outputs - define what your pipeline must return, run it against any model, and get a clear pass/fail with cost tracking.


Quick Start

# 1. Install globally
npm install -g schemalock

# 2. Set your API key (stored in ~/.schemalock/.env)
schemalock config set ANTHROPIC_API_KEY sk-ant-...

# 3. Define a contract
schemalock define invoice-extractor \
  --prompt prompts/invoice-extractor.txt \
  --must-contain "total_amount,currency,date" \
  --cases cases/invoice-cases.json

# 4. Run tests
schemalock test invoice-extractor --model claude-sonnet-4-6

# 5. Compare models before switching
schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractor

Commands

schemalock define <name>

Create or update a contract.

schemalock define sentiment-classifier \
  --prompt prompts/sentiment.txt \
  --format json \
  --must-contain "sentiment,confidence,reasoning" \
  --cases cases/sentiment-cases.json \
  --description "Classifies customer review sentiment"

| Flag | Description | |------|-------------| | --prompt <file> | System prompt file | | --format | json (default), text, or markdown | | --must-contain <fields> | Comma-separated required JSON fields | | --must-not-contain <phrases> | Comma-separated banned phrases (for text output) | | --schema <file> | JSON Schema file for strict validation | | --cases <file> | JSON file with test cases | | --description <text> | Human-readable description | | --overwrite | Replace an existing contract |


schemalock test <name>

Run the contract against a model. Exits 0 on pass, 1 on fail (CI-friendly).

schemalock test invoice-extractor --model claude-sonnet-4-6
schemalock test invoice-extractor --model gpt-4o --threshold 0.9
schemalock test invoice-extractor --model llama-3.3-70b-versatile  # Groq
schemalock test invoice-extractor --model ollama/llama3.2           # local
schemalock test invoice-extractor --output json                      # machine-readable

| Flag | Default | Description | |------|---------|-------------| | --model | claude-sonnet-4-6 | Model to test | | --threshold | 0.8 | Min pass rate (0.0-1.0) for exit code 0 | | --max-tokens | 1024 | Max output tokens per call | | --output | console | console or json | | --cases <file> | - | Override test cases (JSON file) | | --prompt <file> | - | Override system prompt file | | --base-url <url> | - | Custom OpenAI-compatible endpoint (Ollama, Groq, LM Studio) | | --api-key <key> | - | API key (defaults to env var for the chosen model) | | --delay <ms> | 0 | Delay between API calls in ms (avoids rate limits) |


schemalock diff <model1> <model2>

Find regressions before switching models. Runs the full test suite on both models and shows where they disagree.

schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractor
schemalock diff claude-sonnet-4-6 llama-3.3-70b-versatile --contract invoice-extractor

Output:

  Comparison
                 claude-sonnet-4-6        gpt-4o
  ──────────────────────────────────────────────────────
  Pass Rate      100%                     80%
  Avg Latency    1230ms                   890ms
  Total Cost     $0.0041                  $0.0028

  Disagreements (1/5 cases differ):
    european-invoice          claude-sonnet-4-6=PASS  gpt-4o=FAIL

  claude-sonnet-4-6 leads by 20 percentage points. Consider regression risk before switching to gpt-4o.

| Flag | Default | Description | |------|---------|-------------| | --contract <name> | required | Contract to test against | | --cases <file> | - | Override test cases | | --prompt <file> | - | Override system prompt | | --max-tokens | 1024 | Max output tokens per call | | --base-url <url> | - | Custom OpenAI-compatible endpoint | | --api-key <key> | - | API key override | | --delay <ms> | 0 | Delay between API calls in ms |


schemalock list

List all your contracts with last run status.

schemalock list
schemalock list --models    # show available models + pricing

schemalock report <name>

View test run history for a contract.

schemalock report invoice-extractor             # last 5 runs
schemalock report invoice-extractor --last 20   # last 20 runs
schemalock report invoice-extractor --run 7     # full case detail for run #7

schemalock delete <name>

Delete a contract and optionally its test history.

schemalock delete invoice-extractor             # prompts for confirmation
schemalock delete invoice-extractor --yes       # skip prompt (CI/scripts)
schemalock delete invoice-extractor --keep-history  # remove contract, keep run data

| Flag | Description | |------|-------------| | --yes | Skip confirmation prompt (safe for CI/scripts) | | --keep-history | Keep test run history in the database |


schemalock config

Manage API keys and settings.

schemalock config set ANTHROPIC_API_KEY sk-ant-...
schemalock config set OPENAI_API_KEY sk-...
schemalock config set GROQ_API_KEY gsk_...
schemalock config get ANTHROPIC_API_KEY
schemalock config delete ANTHROPIC_API_KEY
schemalock config list-keys      # show all stored key names (values masked)
schemalock config update-pricing # write ~/.schemalock/models.json pricing template
schemalock config env            # show active paths and env var overrides

Keys are stored in ~/.schemalock/.env - they persist across projects and terminals.


Test Cases Format

[
  {
    "id": "simple-invoice",
    "input": "Invoice from Acme Corp. Date: Jan 15 2024. Total: $100 USD",
    "expected": {
      "total_amount": 100,
      "currency": "USD",
      "vendor_name": "Acme Corp"
    }
  }
]
  • id - unique identifier (shown in test output)
  • input - the user message sent to the LLM
  • expected - optional key/value pairs that must match the parsed output

Supported Models

Anthropic

| Model | Input $/1M | Output $/1M | |-------|-----------|------------| | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-opus-4-6 | $5.00 | $25.00 | | claude-haiku-4-5 | $1.00 | $5.00 |

OpenAI

GPT-5 (latest) | Model | Input $/1M | Output $/1M | |-------|-----------|------------| | gpt-5 | $1.25 | $10.00 | | gpt-5-mini | $0.25 | $2.00 | | gpt-5-nano | $0.05 | $0.40 |

GPT-4.1 | Model | Input $/1M | Output $/1M | |-------|-----------|------------| | gpt-4.1 | $2.00 | $8.00 | | gpt-4.1-mini | $0.40 | $1.60 | | gpt-4.1-nano | $0.10 | $0.40 |

GPT-4o (previous generation) | Model | Input $/1M | Output $/1M | |-------|-----------|------------| | gpt-4o | $2.50 | $10.00 | | gpt-4o-mini | $0.15 | $0.60 |

o-series reasoning | Model | Input $/1M | Output $/1M | |-------|-----------|------------| | o3 | $2.00 | $8.00 | | o4-mini | $1.10 | $4.40 | | o3-mini | $1.10 | $4.40 | | o1 | $15.00 | $60.00 |

Groq (fast inference)

| Model | Input $/1M | Output $/1M | |-------|-----------|------------| | meta-llama/llama-4-scout-17b-16e-instruct | $0.11 | $0.34 | | llama-3.3-70b-versatile | $0.59 | $0.79 | | llama-3.1-8b-instant | $0.05 | $0.08 | | mixtral-8x7b-32768 | $0.24 | $0.24 | | gemma2-9b-it | $0.20 | $0.20 |

Mistral

| Model | Input $/1M | Output $/1M | |-------|-----------|------------| | mistral-large-latest | $0.50 | $1.50 | | mistral-medium-latest | $0.40 | $2.00 | | codestral-latest | $0.30 | $0.90 | | mistral-small-latest | $0.10 | $0.30 |

Google Gemini

| Model | Input $/1M | Output $/1M | |-------|-----------|------------| | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.0-flash | $0.10 | $0.40 | | gemini-2.0-flash-lite | $0.075 | $0.30 |

Requires GOOGLE_API_KEY from Google AI Studio.

Ollama (local, free)

| Model | Notes | |-------|-------| | ollama/llama4 | Requires ollama serve running locally | | ollama/llama3.3 | Requires ollama serve running locally | | ollama/llama3.2 | Requires ollama serve running locally | | ollama/mistral | Requires ollama serve running locally | | ollama/phi4 | Requires ollama serve running locally | | ollama/qwen2.5 | Requires ollama serve running locally |

Any model served by Ollama works with --model ollama/<model-name>.

Custom / Self-Hosted

Any OpenAI-compatible endpoint (LM Studio, Together AI, Fireworks AI, vLLM, etc.):

schemalock test my-contract \
  --model meta-llama/Meta-Llama-3-70B-Instruct \
  --base-url https://api.together.xyz/v1 \
  --api-key $TOGETHER_API_KEY

Run schemalock list --models to see all built-in models with current pricing.


CI/CD Integration

# .github/workflows/test-prompts.yml
- name: Run schemalock contract tests
  run: |
    npx schemalock test invoice-extractor \
      --model claude-sonnet-4-6 \
      --threshold 0.9 \
      --yes \
      --output json > schemalock-results.json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

The --output json flag produces machine-readable output:

{
  "runId": 12,
  "contract": "invoice-extractor",
  "model": "claude-sonnet-4-6",
  "passRate": 0.95,
  "passedCount": 19,
  "total": 20,
  "passed": true,
  "totalCostUsd": 0.0041,
  "avgLatencyMs": 1230,
  "cases": [...]
}

Environment Variables

| Variable | Purpose | |----------|---------| | ANTHROPIC_API_KEY | Anthropic API key | | OPENAI_API_KEY | OpenAI API key | | GROQ_API_KEY | Groq API key | | MISTRAL_API_KEY | Mistral API key | | TOGETHER_API_KEY | Together AI API key | | FIREWORKS_API_KEY | Fireworks AI API key | | SCHEMALOCK_DB | Override SQLite DB path (useful for per-project isolation in CI) |

Keys can also be stored persistently with schemalock config set <KEY> <value>.


Data Storage

All data stored locally in ~/.schemalock/:

  • contracts/ - YAML contract definitions
  • results.db - SQLite database of all test runs and case results
  • .env - API keys set via schemalock config set
  • models.json - optional pricing overrides (created by schemalock config update-pricing)

Architecture

src/
  cli.js                  # Commander entry point
  commands/
    define.js             # schemalock define
    test.js               # schemalock test
    diff.js               # schemalock diff
    list.js               # schemalock list
    report.js             # schemalock report
    delete.js             # schemalock delete
    config.js             # schemalock config
  core/
    runner.js             # Anthropic + OpenAI API calls, timeout, client cache
    validator.js          # JSON Schema + field validation + expected value checks
    store.js              # SQLite persistence (WAL mode, busy_timeout 30s)
    contracts.js          # YAML contract load/save/delete
  utils/
    config.js             # ~/.schemalock/ directory + DB path management
    models.js             # Model registry, pricing, pricing overrides
    cases.js              # Case ID sanitization, count guards, structure validation

License

MIT