npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-regression-lab

v0.7.1

Published

Regression testing for AI agents — catch prompt and behavior changes before they ship.

Readme

Agent Regression Lab

Agent Regression Lab is the local-first regression spine for agent engineering teams.

It gives teams a repeatable way to define expected agent behavior in YAML, replay it against deterministic tool surfaces or live HTTP agents, store traces and scores locally, and compare candidate behavior against known baselines over time.

This is a local-first alpha for early technical teams. It is strongest when used across one workflow spine:

  • debug a single scenario while building
  • validate a branch with a suite before merge
  • run curated golden suites before release
  • keep incident-derived scenarios as engineering memory

Who It Is For

  • teams shipping prompt, model, tool, workflow, and memory changes
  • engineers who need repeatable before/after evidence instead of vibes
  • teams validating live HTTP agents as well as deterministic local scenarios
  • researchers and technical operators who want local control before adopting heavier hosted infrastructure

Why Teams Use It

  • catch regressions before merge or release
  • debug subtle behavioral changes with full traces
  • compare model, prompt, tool, and workflow changes against a known baseline
  • build a portfolio of golden workflows, historical regressions, and ugly edge cases
  • preserve engineering memory so old failures do not quietly return

What It Supports Today

  • YAML scenarios under scenarios/
  • deterministic built-in tools plus custom tools from agentlab.config.yaml
  • named agents from agentlab.config.yaml
  • built-in mock, openai, external_process, and http agent modes
  • type: conversation multi-turn dialog scenarios for HTTP agents
  • SQLite-backed local run history under artifacts/agentlab.db
  • CLI commands to list, run, show, compare, and launch the UI
  • local web UI for run inspection, run comparison, and suite batch comparison

Workflow Spine

Use this as the default product story:

  1. debug locally with one scenario
  2. validate a branch with a suite
  3. run curated golden suites before release
  4. keep incident-derived scenarios as permanent regression assets

Start Here

If your agent runs as an HTTP service:

If you are validating coding-agent changes:

  • start with the coding scenarios under scenarios/coding/
  • read docs/coding-agents.md
  • use deterministic tool-loop runs first, then compare before/after behavior

If you want pre-merge regression checks in CI:

  • use suite_definitions
  • start with .github/workflows/agentlab-pre-merge.yml
  • run agentlab run --suite-def pre_merge --agent mock-default

First 10 Minutes

The fastest path for new users is the installed CLI.

Path A: npm install

npm install -g agent-regression-lab
agentlab run --demo
agentlab init
agentlab list scenarios
agentlab run support.generated-happy-path --agent mock-default
agentlab approve @last

agentlab init writes agentlab.config.yaml, starter scenarios under scenarios/, fixture stubs under fixtures/, and .gitignore coverage for artifacts/.

Add more starter coverage any time:

agentlab generate --domain support --count 5 --agent mock-default

Use shorthands instead of copying UUIDs:

agentlab show @last
agentlab compare @prev @last
agentlab compare --baseline support.generated-happy-path @last

Path B: local development

npm install
npm run check
npm test
npm run build
npm link
agentlab --help

Try the zero-config demo from either path:

agentlab run --demo

This runs a 2-phase narrative demo: baseline run → simulated prompt change → regression caught.

Launch the local UI:

agentlab ui

The UI starts on http://127.0.0.1:4173.

  1. Run a suite and compare two suite batches:
agentlab run --suite support --agent mock-default
agentlab run --suite support --agent mock-default
agentlab compare --suite <baseline-batch-id> <candidate-batch-id>

run --suite prints a Suite batch: id at the end. That is the id used by compare --suite.

Install

Installed CLI

After the package is published:

npm install -g agent-regression-lab
agentlab --help

You can also use:

npx agent-regression-lab --help

Local Development Install

From this repo:

npm install
npm run build
npm link
agentlab --help

Repo-Local Dev Mode

If you do not want to link the package yet:

npm run start -- --help
npm run start -- run support.refund-correct-order --agent mock-default

CLI

Supported command surface:

agentlab init [project-name]
agentlab generate [--agent <name>] [--domain support|coding|research|ops|general] [--count <n>]
agentlab run --demo
agentlab run <scenario-id> [--agent <name>]
agentlab run --suite <suite-id> [--agent <name>]
agentlab run --suite-def <name> [--agent <name>]
agentlab run <scenario-id> [--variant-set <name>]
agentlab show <run-id|@last|@prev>
agentlab approve <run-id|@last|@prev>
agentlab compare <baseline-run-id|@prev> <candidate-run-id|@last>
agentlab compare --baseline <scenario-id> <candidate-run-id|@last>
agentlab compare --suite <baseline-batch-id> <candidate-batch-id>
agentlab ui
agentlab version
agentlab help

The CLI operates on the current working directory. Run it from the root of a project that contains scenarios/, fixtures/, and optional agentlab.config.yaml.

Canonical Workflow

Use this as the default mental model:

  1. list scenarios
  2. run one scenario or one suite
  3. note the run id or suite batch id
  4. inspect the run in CLI or UI
  5. compare two runs or two suite batches
  6. extend the setup with a named agent or custom tools from repo-local files or installed packages when needed

Canonical Live HTTP Fixture

arl-test/ is the canonical live HTTP regression fixture in this repo.

Use it to verify the production-like HTTP path end to end:

cd arl-test
npm start
node ../dist/index.js list scenarios
node ../dist/index.js run order-tracking-in-transit --agent support-agent

The arl-test scenarios are intended to behave like a real internal-team regression fixture, not just a toy demo.

Config And Extension Points

agentlab.config.yaml is the public extension point for:

  • named agents
  • custom tools from repo-local files or installed npm packages

Supported agent providers:

  • mock
  • openai
  • external_process
  • http — point at a running HTTP service for multi-turn conversation testing

Working sample assets already live in this repo:

  • external agents: custom_agents/node_agent.mjs, custom_agents/python_agent.py
  • custom tool: user_tools/findDuplicateCharge.ts
  • package-style tool examples: examples/support-tools, examples/coding-tools
  • sample config: agentlab.config.yaml

See:

Local Data And Artifacts

By default the product writes local state under artifacts/.

Important paths:

  • SQLite DB: artifacts/agentlab.db
  • per-run trace output: artifacts/<run-id>/trace.json
  • local UI assets at runtime: served from packaged dist/ui-assets or built into artifacts/ui/ in repo mode

If you delete artifacts/, you remove stored run history and generated local outputs.

Determinism

The benchmark is designed to be deterministic enough for repeated local evaluation:

  • built-in tools read from local fixtures
  • scenarios declare fixed tool allowlists and evaluator rules
  • scoring is rule-based
  • suite comparison is based on stored local runs and suite batch ids

Agent behavior can still vary depending on the provider path. The built-in mock path is the most deterministic path for smoke tests and baseline examples.

Limitations

  • this is a local-first alpha, not a hosted platform
  • the published package/example ecosystem is still small
  • external agents integrate through the local stdin/stdout protocol only
  • the UI is intentionally minimal and optimized for debugging
  • SQLite-backed local storage still makes sequential live verification the safest path when reusing the same local artifacts DB
  • the benchmark is broader than before, but still small compared to a mature benchmark product

Next Docs