npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

stepproof

v0.2.21

Published

Regression testing for multi-step AI workflows. Not observability — a CI gate.

Readme

stepproof

Part of Preflight Tests License

Regression testing for multi-step AI workflows. Not observability.


You upgraded to gpt-4o-mini. Your LangSmith traces look fine. Three days later a customer reports your extraction step stopped working. You found out from a Slack message, not a test.

stepproof is what you run before you deploy.

npm install -g stepproof

30-second quickstart

Write a scenario:

# classify.yaml
name: "Intent classification"
iterations: 10

steps:
  - id: classify
    provider: anthropic
    model: claude-sonnet-4-6
    prompt: "Classify the intent of this message: {{input}}"
    variables:
      input: "I need to cancel my subscription"
    min_pass_rate: 0.90
    assertions:
      - type: contains
        value: "cancel"
      - type: json_schema
        schema: ./schemas/intent.json

  - id: respond
    provider: openai
    model: gpt-4o
    prompt: "Given intent '{{classify.output}}', write a helpful reply to: {{input}}"
    min_pass_rate: 0.80
    assertions:
      - type: llm_judge
        prompt: "Is this response helpful and on-topic? Answer yes/no."
        pass_on: "yes"

Run it:

stepproof run classify.yaml

Output:

stepproof v0.2.0 — running "Intent classification" (10 iterations)

  step: classify
    ✓ 9/10 passed (90.0%) — threshold: 90% ✓

  step: respond
    ✓ 8/10 passed (80.0%) — threshold: 80% ✓

All steps passed. Exit 0.

Now break it — swap to a cheaper model, lower the pass rate. It fails:

  step: classify
    ✗ 5/10 passed (50.0%) — threshold: 90% ✗

1 step failed. Exit 1.

Commands

stepproof run <scenario>

Run a scenario file or directory of scenarios.

stepproof run classify.yaml
stepproof run scenarios/
stepproof run scenarios/ --format sarif --output results.sarif
stepproof run scenarios/ --format junit --output results.xml

Flags:

  • --format <format> — output format: terminal (default), sarif, junit
  • --output <file> — write output to file instead of stdout

stepproof init [dir]

Scaffold a starter scenario in the target directory. Defaults to ./scenarios/.

stepproof init
# Creates: ./scenarios/first-test.yaml

stepproof init my-tests
# Creates: ./my-tests/first-test.yaml

The generated first-test.yaml is a working example you can edit and run immediately.


Environment Variables

| Variable | Required | Purpose | |----------|----------|---------| | ANTHROPIC_API_KEY | For Anthropic steps | Authenticates calls to Claude models | | OPENAI_API_KEY | For OpenAI steps | Authenticates calls to GPT models |

Only the keys for the providers you use in your scenarios are required.


CI integration

# .github/workflows/ai-regression.yml
name: AI regression tests
on: [push, pull_request]

jobs:
  stepproof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm install -g stepproof
      - run: stepproof run scenarios/classify.yaml
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Exit code 1 on regression. PR blocked. Done.


Assertions

| Type | What it checks | |------|---------------| | contains | Output includes this string | | not_contains | Output does not include this string | | regex | Output matches this pattern | | json_schema | Output is valid JSON matching this schema | | llm_judge | A second LLM call evaluates the output (boolean verdict) |


Structured reports (v0.2.0)

stepproof outputs machine-readable SARIF 2.1.0 and JUnit XML for CI pipeline integration.

SARIF — GitHub Advanced Security / GitLab / Azure DevOps

# Write SARIF to stdout
stepproof run classify.yaml --format sarif

# Write SARIF to file
stepproof run classify.yaml --format sarif --output results.sarif

Integrate with GitHub Advanced Security:

# .github/workflows/ai-regression.yml
- name: Run stepproof
  run: stepproof run scenarios/ --format sarif --output results.sarif

- name: Upload to GitHub Security tab
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif
  if: always()

JUnit XML — Jenkins / CircleCI / TeamCity

stepproof run classify.yaml --format junit
stepproof run classify.yaml --format junit --output results.xml
# .github/workflows/ai-regression.yml
- name: Run stepproof
  run: stepproof run scenarios/ --format junit --output test-results.xml

- name: Publish test results
  uses: actions/upload-artifact@v4
  with:
    name: test-results
    path: test-results.xml
  if: always()

Default output (no --format flag) is unchanged — human-readable terminal output.

Migration note (v0.2.x → v0.3.0): --report still works but is deprecated and will print a warning. Switch to --format at your next convenience. --report will be removed at v1.0.0.


How this is different from LangSmith / Braintrust / Langfuse

| | stepproof | LangSmith / Braintrust | |--|-----------|------------------------| | When it runs | Before deploy (CI) | After deploy (production) | | What it answers | "Is my pipeline still correct?" | "What did my pipeline do?" | | Output | Pass/fail with exit code | Traces and dashboards | | Use case | Regression testing | Observability |

They tell you what happened. We tell you whether to deploy.

These are different jobs. Use both.


Troubleshooting

Error: "scenarios/" is a directory

stepproof run ./scenarios/first-test.yaml   # ← run a specific file
stepproof run ./scenarios/                  # ← or run the whole dir (note trailing slash)

Error parsing scenario: ...

Your YAML has a syntax error. Common culprits: inconsistent indentation, unquoted {{vars}}, or a missing steps: key. Run node -e "require('fs').readFileSync('./your.yaml')" to catch basic issues.

API errors (401 Unauthorized, 403 Forbidden)

Set the API key for whichever provider your scenario uses:

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...

Only the keys for providers you use in your scenarios are required.

Steps failing when they should pass

Check min_pass_rate in your scenario. The default is not 100% — if you set min_pass_rate: 0.90 you expect 1-in-10 to fail. Lower it, or improve your prompt.

--format must be "sarif" or "junit"

Only sarif and junit are valid format values. For terminal output, omit the --format flag entirely.

Pro features blocked (SARIF / JUnit output)

SARIF and JUnit formats require a Team license. Set your key:

export PREFLIGHT_LICENSE_KEY=preflight_...
stepproof run scenarios/ --format sarif --output results.sarif

Get a license at the Preflight pricing page.


Scenarios

See /examples for copy-paste ready scenarios:


Roadmap

  • v0.2.0 (current): YAML scenarios, N iterations, 5 assertion types, exit code 1 on failure, OpenAI + Anthropic, SARIF 2.1.0 + JUnit XML reporters, stepproof init scaffolding
  • v0.3.0 (next): Baseline comparison (fail on regression from last run), GitHub Actions native action, provider comparison mode — run the same scenario against two models and diff the results
  • Cloud dashboard (month 3–6): Persistent history, trend charts, team workspaces — never in the CLI

Contributing

Issues and PRs welcome. See CONTRIBUTING.md for dev setup and guidelines. The tool is and will remain free. Cloud features are the business model, not the CLI.


Part of the Preflight suite

stepproof is one tool in the Preflight AI Agent DevOps suite — local-first CLIs covering the full lifecycle from pre-deploy validation to production observability:

| Tool | Purpose | Install | |------|---------|---------| | stepproof | Behavioral regression testing | npm install -g stepproof | | agent-comply | EU AI Act compliance scanning | npm install -g agent-comply | | agent-gate | Unified pre-deploy CI gate | npm install -g agent-gate | | agent-shift | Config versioning + environment promotion | npm install -g agent-shift | | agent-trace | Local observability — OTel traces in SQLite | npm install -g agent-trace |

Install the full suite:

npm install -g agent-gate stepproof agent-comply agent-shift agent-trace

stepproof — because "I checked manually before the deploy" is not a test.


Legal