npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-vcr

v0.2.1

Published

Snapshot tests for LLM tool calls — record once, replay deterministically in CI. Catch regressions in your agent wiring before they hit production.

Readme

Agent VCR


The Problem

You shipped a support agent. It worked perfectly in testing.

Three weeks later, someone tweaks a prompt — and now it's issuing refunds without verifying the order first. The LLM still sounds right. But the tool call sequence broke.

No test caught it. No eval flagged it. Only a production incident revealed it.

Agent VCR fixes this. It's VCR for your tool calls — the same idea as HTTP cassette recording, applied to tool_calls. Record the exact sequence your agent should call, check it into git, and let CI fail the build if anything drifts.

Without Agent VCR:     prompt change → "it still works" → production incident
With Agent VCR:        prompt change → CI fails → you fix it before merging

Why Agent VCR?

| | Without Agent VCR | With Agent VCR | |---|---|---| | CI signal | Manual click-through or flaky live LLM | ✅ Deterministic — same trace every run | | CI cost | API keys + $$$ per PR | ✅ Zero live-model cost in CI once golden traces exist (recording still calls your API) | | Regression detection | Hope for the best | ✅ Build fails if tool sequence changes | | Tool order | Untested | ✅ Guaranteed by golden trace | | Arg validation | Ad-hoc | ✅ Schema v1 + validate CLI in CI | | Framework | Re-invent test harness | ✅ Drop-in adapters for OpenAI, Anthropic, Vercel AI, LangChain |

What it tests: Tool dispatch logic, routing graphs, arg shapes, guard rails. What it doesn't test: Model quality or reasoning — keep those in evals and staging.


Install

npm install agent-vcr
# pnpm add agent-vcr  /  yarn add agent-vcr

Scaffold a new project in one command:

npx agent-vcr init

Creates traces/example.expected.json, a sample Vitest test, and a record.config.json — ready to run.

API keys and environment variables

Live recording and SDK adapters need provider credentials in your environment (or in record.config.json as apiKey for OpenAI-compatible recording only). Never commit real keys — copy .env.example to .env locally and fill in values, or use export / your CI secret store.

| What you use | Typical env var | Where to get a key | |--------------|-----------------|-------------------| | agent-vcr record, recordTrace, OpenAI SDK adapter | OPENAI_API_KEY | OpenAI API keys | | Anthropic SDK adapter (fromAnthropic) | ANTHROPIC_API_KEY | Anthropic console | | Vercel AI + @ai-sdk/openai | OPENAI_API_KEY | Same as OpenAI | | LangChain ChatOpenAI | OPENAI_API_KEY | Same as OpenAI | | OpenAI-compatible base URL (Groq, Ollama, Azure proxy, …) | OPENAI_API_KEY or apiKey in config | Provider docs; Ollama often needs no real secret |

For non-OpenAI Vercel AI providers (for example @ai-sdk/google), use the env variable names from Vercel AI provider docs — Agent VCR does not read those directly; your SDK does.

# Node 20+: load .env for the CLI process (from a project with agent-vcr installed)
node --env-file=.env ./node_modules/.bin/agent-vcr record --config record.config.json --out traces/out.json

Quick Start (3 minutes)

Step 1 — Record your agent with a real LLM (once)

Create record.config.json:

{
  "system": "You are a support agent. Always look up the order before refunding.",
  "user": "Refund $10 on order 123",
  "scenario": "refund_happy_path",
  "model": "gpt-4o-mini",
  "tools": [
    {
      "name": "lookup_order",
      "description": "Look up an order by ID",
      "parameters": {
        "type": "object",
        "properties": { "orderId": { "type": "string" } },
        "required": ["orderId"]
      }
    },
    {
      "name": "refund_order",
      "description": "Refund an order",
      "parameters": {
        "type": "object",
        "properties": {
          "orderId": { "type": "string" },
          "amount": { "type": "number" }
        },
        "required": ["orderId", "amount"]
      }
    }
  ],
  "stubs": {
    "lookup_order": { "found": true, "orderId": "123", "balanceCents": 1000 },
    "refund_order": { "ok": true }
  }
}
export OPENAI_API_KEY="…"   # never commit keys; or run with Node 20+: node --env-file=.env …
npx agent-vcr record --config record.config.json --out traces/refund.expected.json
*  Recording — scenario: refund_happy_path, model: gpt-4o-mini
   user: "Refund $10 on order 123"

   1. lookup_order { orderId: "123" }
   2. refund_order { orderId: "123", amount: 10 }

✔ Saved → traces/refund.expected.json (2 calls)

  Commit this file to git. Run your tests with ScriptedLlm to replay deterministically.

That JSON file is now your golden trace. Commit it.


Step 2 — Replay deterministically in CI (no API key needed)

// tests/refund-agent.vcr.test.ts
import { describe, it, expect } from 'vitest'
import {
  ScriptedLlm,
  assistantText,
  assistantWithTools,
  collectToolCalls,
  compareTraces,
  loadTraceFile,
  toolCall,
} from 'agent-vcr'

describe('refund agent — tool trace', () => {
  it('always looks up before refunding (happy path)', async () => {
    const llm = new ScriptedLlm([
      assistantWithTools([toolCall('lookup_order', { orderId: '123' }, 'c1')]),
      assistantWithTools([toolCall('refund_order', { orderId: '123', amount: 10 }, 'c2')]),
      assistantText('Done.'),
    ])

    const recorded = await collectToolCalls({
      system: 'You are a support agent.',
      user: 'Refund $10 on order 123',
      llm,
      async executeTool(name) {
        if (name === 'lookup_order') return { found: true }
        if (name === 'refund_order') return { ok: true }
        throw new Error(`Unknown tool: ${name}`)
      },
    })

    const expected = await loadTraceFile('traces/refund.expected.json')
    const result = compareTraces(expected.calls, recorded)
    expect(result).toEqual({ ok: true })
  })

  it('catches regression when lookup is skipped', async () => {
    const badLlm = new ScriptedLlm([
      // Simulates a prompt change that causes the model to skip lookup
      assistantWithTools([toolCall('refund_order', { orderId: '123', amount: 10 }, 'c1')]),
      assistantText('Done.'),
    ])

    const recorded = await collectToolCalls({
      system: 'You are a support agent.',
      user: 'Refund $10 on order 123',
      llm: badLlm,
      async executeTool(name) {
        if (name === 'refund_order') return { ok: true }
        throw new Error(`Unknown tool: ${name}`)
      },
    })

    const expected = await loadTraceFile('traces/refund.expected.json')
    expect(compareTraces(expected.calls, recorded).ok).toBe(false) // ← build fails!
  })
})
npx vitest run   # ✔ passes in CI — no API key, no cost, same result every time

Record Mode

agent-vcr record reads record.config.json (prompts, tool schemas, stubs), calls a real LLM over the OpenAI-compatible HTTP API, and writes the resulting tool calls to a golden trace. It does not import or run your app’s own agent entrypoint — for that, use collectToolCalls in tests with your SDK adapter and executeTool implementation.

# Basic usage (OPENAI_API_KEY in env — export, shell profile, or node --env-file=.env)
agent-vcr record --config record.config.json --out traces/my-flow.expected.json

# Override model
agent-vcr record --config record.config.json --out traces/my-flow.expected.json --model gpt-4o

# Point at any OpenAI-compatible API (Anthropic proxy, Groq, Ollama, Azure, ...)
agent-vcr record --config record.config.json --out traces/my-flow.expected.json \
  --base-url https://api.groq.com/openai/v1 \
  --model llama-3.3-70b-versatile

Record config fields:

| Field | Type | Required | Description | |---|---|---|---| | system | string | No | System prompt | | user | string | Yes | User message | | tools | ToolDefinition[] | Yes | Tools with JSON Schema parameters | | stubs | Record<name, result> | No | Tool responses during recording (default: { ok: true }) | | scenario | string | No | Label written to the trace file | | model | string | No | Model name (default: gpt-4o-mini) | | baseURL | string | No | API base URL (default: OpenAI) | | apiKey | string | No | Falls back to OPENAI_API_KEY env var | | maxSteps | number | No | Max tool-calling rounds (default: 16) | | httpFetch | typeof fetch | No | API only — passed to recordTrace() to override global fetch (tests, proxies). Not supported in record.config.json. |


Examples

All examples run with npm run build first (uses dist/):

npm run build && npm run example:all

| Example | Scenario | Key concept | |---|---|---| | minimal | Refund agent | Basic record → replay | | email-router | Route emails by category + priority | Multi-step routing | | financial-transfer | Balance → limit → transfer | High-stakes sequencing + regression demo | | rag-validation | Search → rerank → cite | subsequence mode for flexible pipelines | | ticket-escalation | VIP customer 4-step escalation | Long tool chains |

Run individually:

npm run example:email-router
npm run example:financial-transfer   # includes intentional regression demo
npm run example:rag-validation
npm run example:ticket-escalation

Framework Adapters

Agent VCR works with your existing LLM SDK. Import the adapter for your framework — no peer dependency added to Agent VCR itself.

OpenAI SDK

import OpenAI from 'openai'
import { collectToolCalls, compareTraces, loadTraceFile } from 'agent-vcr'
import { fromOpenAI } from 'agent-vcr/adapters/openai'

const client = new OpenAI()
const llm = fromOpenAI(client, 'gpt-4o-mini', {
  tools: [/* your OpenAI tool definitions */],
})

const recorded = await collectToolCalls({
  user: 'Refund order 123',
  llm,
  executeTool: async (name, args) => myDispatch(name, args),
})

Anthropic SDK

import Anthropic from '@anthropic-ai/sdk'
import { collectToolCalls } from 'agent-vcr'
import { fromAnthropic } from 'agent-vcr/adapters/anthropic'

const client = new Anthropic()
const llm = fromAnthropic(client, 'claude-3-5-sonnet-20241022', {
  tools: [
    {
      name: 'lookup_order',
      description: 'Look up an order by ID',
      input_schema: { type: 'object', properties: { orderId: { type: 'string' } }, required: ['orderId'] },
    },
  ],
})

Vercel AI SDK

import { openai } from '@ai-sdk/openai'
import { collectToolCalls } from 'agent-vcr'
import { fromVercelAI } from 'agent-vcr/adapters/vercel-ai'
import { z } from 'zod'

const llm = fromVercelAI(openai('gpt-4o-mini'), {
  tools: {
    lookup_order: {
      description: 'Look up an order by ID',
      parameters: z.object({ orderId: z.string() }),
    },
  },
})

LangChain.js

import { ChatOpenAI } from '@langchain/openai'
import { collectToolCalls } from 'agent-vcr'
import { fromLangChain } from 'agent-vcr/adapters/langchain'

const chat = new ChatOpenAI({ model: 'gpt-4o-mini' })
const llm = fromLangChain(chat.bindTools([lookupOrderTool, refundOrderTool]))

Any OpenAI-compatible API

# Groq
agent-vcr record --config cfg.json --base-url https://api.groq.com/openai/v1 --model llama-3.3-70b-versatile

# Ollama (local)
agent-vcr record --config cfg.json --base-url http://localhost:11434/v1 --model llama3.2

How It Works

  ┌─────────────────────────────────────────────────────────────────┐
  │  RECORD (once, with real LLM)          REPLAY (CI, no live LLM)  │
  │                                                                 │
  │  User ──► Real LLM ──► tool_calls      User ──► ScriptedLlm   │
  │              │                                      │           │
  │        Real APIs                            executeTool stubs  │
  │              │                                      │           │
  │         save to                          compare to            │
  │   traces/*.expected.json  ◄────────────  traces/*.expected.json│
  │         (git commit)                      fail build on drift  │
  └─────────────────────────────────────────────────────────────────┘

What ScriptedLlm validates: Given these model decisions (scripted), your code still dispatches the right tools with the right arguments.

What it doesn't validate: What the model would actually decide with a live API — that lives in evals, staging, and occasional integration runs.

Two comparison modes:

  • exact — Same length, each call matches name + args. Strictest. Use for deterministic flows like financial operations.
  • subsequence — Expected calls appear in order inside actual; extra calls allowed. Use for flexible pipelines like RAG where intermediate steps may change.

Trace File Format

Golden traces are plain JSON — human-readable, diffable, language-agnostic:

{
  "version": 1,
  "scenario": "refund_happy_path",
  "calls": [
    { "name": "lookup_order",  "args": { "orderId": "123" } },
    { "name": "refund_order",  "args": { "orderId": "123", "amount": 10 } }
  ]
}

Any language can emit this format and use the CLI to diff it. The schema is validated with Zod on both read and write.


CLI

# Scaffold Agent VCR in a new project
agent-vcr init [--dir <path>] [--skip-existing]

# Record a golden trace with a real LLM
agent-vcr record --config <record.config.json> --out <trace.json> [--model <name>] [--base-url <url>]

# Validate a trace file matches schema v1
agent-vcr validate <trace.json>

# Diff expected vs actual (exits 1 on mismatch)
agent-vcr diff <expected.json> <actual.json> [--mode exact|subsequence]

Exit codes:

| Code | Meaning | |---|---| | 0 | Success | | 1 | Trace mismatch (diff) | | 2 | Usage error, invalid JSON, or schema error |

Example diff output (mismatch):

✖ Trace mismatch (exact)
────────────────────────────────────────────────────────────
  reason: call 0 differs
  at index: 0
  expected: lookup_order({ orderId: "123" })
  actual:   refund_order({ orderId: "123", amount: 10 })

Full trace comparison:
  #   Expected                        Actual                          Match
  ────────────────────────────────────────────────────────────────────────────
  0   lookup_order(orderId)           refund_order(orderId, amount)   ✖
  1   refund_order(orderId, amount)   (none)                          ✖

Library API

| Export | Description | |---|---| | collectToolCalls(options) | Run assistant ↔ tool loop; returns ToolCallRecord[] | | ScriptedLlm(turns) | Deterministic complete() — returns scripted turns in order | | toolCall(name, args, id?) | Build a ToolCallSpec for scripted turns | | assistantWithTools(specs) | Build an assistant turn that calls tools | | assistantText(content) | Build a final text-only assistant turn | | compareTraces(expected, actual, options?) | Compare two ToolCallRecord[] lists | | loadTraceFile(path) | Read + validate a trace JSON file | | saveTraceFile(path, trace) | Validate + write a trace JSON file | | recordTrace(config) | Call a real LLM, capture tool calls, return trace (see httpFetch above). Never commit API keys — use OPENAI_API_KEY locally only | | initProject(options?) | Scaffold traces/ dir + sample test | | parseTraceFileV1(raw) | Parse raw JSON (throws on invalid) | | safeParseTraceFileV1(raw) | Parse raw JSON (returns { success, error }) |


GitHub Actions

# This repo’s workflow (see .github/workflows/ci.yml): build, validate every
# examples/*/traces/*.expected.json, then run all package examples.

- uses: actions/checkout@v4
- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: npm
    cache-dependency-path: package-lock.json

- run: npm ci
- run: npm run lint
- run: npm test
- run: npm run build

- name: Validate golden traces
  run: |
    for f in examples/*/traces/*.expected.json; do
      node dist/cli.js validate "$f"
    done

- run: npm run example:all

# In your own app, point the glob at your traces (e.g. traces/*.expected.json)
# and use `npx agent-vcr validate` if you depend on the package instead of dist/.

Roadmap

Current release: trace schema v1, compareTraces (exact / subsequence), ScriptedLlm, collectToolCalls, CLI (validate, diff, record, init), adapters (OpenAI, Anthropic, Vercel AI, LangChain), and five runnable examples under examples/.

Planned: arg redaction for secrets in traces; CLI flags --forbidden-tools / --budget; Python client; LangGraph adapter; HTML diff report.


Use Cases

When should I use Agent VCR?

  • ✅ Support agents that must verify before acting (lookup → refund, check → transfer)
  • ✅ Email/ticket routing agents where wrong routing = real business impact
  • ✅ RAG pipelines where "skipping retrieval" = silent hallucination
  • ✅ Financial operations with mandatory compliance steps
  • ✅ Any agent where tool order matters as much as the final answer

When is it a poor fit?

  • ❌ Agents with no structured tools (pure text generation)
  • ❌ Teams only caring about prose quality (use evals instead)
  • ❌ Highly exploratory agents where tool sequence legitimately varies every run (use subsequence mode or evals)

Further reading

Your agent didn’t break. Your tool trace did. — background on why tool-call order belongs in CI.


Contributing

Contributions welcome — open an issue or PR on GitHub.

git clone https://github.com/ajayvnkt/agent-vcr.git
cd agent-vcr
npm install
npm run build && npm test && npm run example:all

Please run npm run lint && npm test before submitting. See CONTRIBUTING.md for details.


Star History

Star History Chart


License

MIT — see LICENSE for details.