agent-test-recorder

v0.1.0

Published

9 days ago

Record and replay LLM API calls for deterministic testing

0High
0Medium
0Low

jsleekr

llm testing record replay cassette openai anthropic ai-agents deterministic mock vcr

📼 agent-test-recorder

VCR for LLM API calls

Record real LLM interactions once, replay them deterministically forever

Why This Exists

Every team testing AI agents hits the same wall -- LLM responses are non-deterministic, so assertions fail randomly; every CI run burns real API tokens; and multi-turn agent traces disappear the moment the process exits. You end up with flaky tests you can't trust and a growing API bill for code that hasn't changed.

agent-test-recorder is the VCR pattern for LLM calls. Record real interactions once with mode: 'auto', commit the cassette files to git, and replay them forever -- no API key, no cost, no flakiness.

Zero API calls in CI -- cassettes replay locally, so your pipeline never touches OpenAI or Anthropic after the first record
Deterministic assertions -- the same input always returns the same output, making expect() calls reliable
Full trace capture -- tool calls, streaming chunks, multi-turn messages, and error responses are all stored in plain JSON

How It Works

First run (API key required):

  Your Test ──► OpenAI API ──► Response
                    │
              ┌─────▼─────┐
              │  Cassette  │  ← saved as JSON
              │  (.json)   │
              └────────────┘

Every run after (no API key, no cost):

  Your Test ──► Cassette ──► Same Response
                (replay)     (deterministic)

Think of it like a VCR for API calls. Record once, replay forever.

When to Use This

| Scenario | How | |---|---| | CI/CD without API keys | Record cassettes locally, commit to git, CI replays them | | Deterministic assertions | Same input always returns same output -- no more flaky tests | | Cost control | Record once, run tests 1000 times for free | | Regression detection | Change your prompt? Cassette mismatch tells you immediately | | Debugging agent chains | Cassettes capture every step of multi-turn conversations | | Model migration | Switch gpt-4o → gpt-4o-mini, re-record with mode: 'update' |

Features

| Feature | Description | |---|---| | 5 recording modes | record, playback, auto, passthrough, update | | OpenAI adapter | chat.completions.create with streaming and tool_calls support | | OpenAI embeddings | embeddings.create with token usage tracking | | Anthropic adapter | messages.create with streaming events and tool_use blocks | | Fetch adapter | Generic HTTP interception with URL pattern matching | | Streaming support | Records SSE streams as single objects, replays as async iterables | | Incomplete stream handling | Detects stream errors, marks responses with incomplete: true | | SHA-256 matching | Content-based request hashing with deterministic key sorting | | ignoreFields | Strip dynamic fields (default: ['stream']) before hashing | | Custom matcher | Full control via user-provided (request, recorded) => boolean function | | Similarity scoring | On mismatch, reports closest cassette entry with Jaccard similarity | | Multi-turn ordering | Nth call with same hash returns Nth recording | | Atomic writes | Writes to temp file then renames -- no partial cassette corruption | | Cassette versioning | meta.version field with CassetteVersionError on mismatch | | Cassette migration | migrateCassette() upgrades old cassettes; readCassette supports autoMigrate option | | Header masking | Auto-strips Authorization and x-api-key headers before saving | | Vitest + Jest plugins | withCassette() helper with setup/teardown hooks | | CLI tools | list, show, diff, prune, stats, update commands | | Event hooks | onRecord, onReplay, onMismatch callbacks in RecorderConfig | | Recorder introspection | getRecordedCount(), getInteractionCount(), getSummary() methods | | Error recording | Records 4xx/5xx error responses and replays them faithfully | | Input validation | validateCassettePath() prevents path traversal, null bytes, and non-.json paths | | File size guard | CassetteSizeError rejects cassettes > 50 MB (configurable maxSize) | | Cassette schema validation | validateCassette() checks structure for CI pipelines | | Process exit safety | Emergency sync-save on unexpected exit if stop() not called | | Cross-provider cassettes | FetchAdapter records multiple providers in a single cassette |

Requirements

Node.js >= 18
ESM ("type": "module" in package.json, or .mjs files) — the examples below use top-level await inside an async function
TypeScript 5.4+ (if using TypeScript)

Quick Start

npm install agent-test-recorder openai

First run only: set OPENAI_API_KEY in your environment so the real API can be called and the cassette recorded. After the first run, no API key is needed.

import { Recorder, OpenAIAdapter } from 'agent-test-recorder';
import OpenAI from 'openai';

async function main() {
  const client = new OpenAI();
  const recorder = new Recorder({
    cassettePath: './__cassettes__/my-test.json',
    mode: 'auto', // Record if cassette missing, replay if it exists
  });
  const adapter = new OpenAIAdapter(recorder);

  await recorder.start();
  adapter.instrument(client);

  // First run hits the real API, subsequent runs replay from cassette
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'What is 2+2?' }],
  });

  adapter.restore(client);
  await recorder.stop();
}

main();

instrument() monkey-patches the SDK client's API methods to intercept calls. restore() undoes the patch.

That's it. On the first run, the real API is called and the response is saved to __cassettes__/my-test.json (the __cassettes__/ directory is created automatically). On every subsequent run, the cassette is replayed instantly -- no API call, no cost, same result every time.

Recommended Project Structure

my-agent/
├── src/
│   └── agent.ts
├── tests/
│   ├── __cassettes__/          ← commit these to git!
│   │   ├── math-test.json
│   │   ├── tool-use-test.json
│   │   └── multi-turn.json
│   ├── agent.test.ts
│   └── test-helpers.ts         ← shared adapter setup
├── .gitignore
└── package.json

Key rule: commit your cassettes to git. This is what makes CI/CD work without API keys. Cassettes are plain JSON, so you can review LLM response changes in pull request diffs.

Practical Tips

| Tip | Why | |---|---| | Use auto mode as default | Records on first run, replays after -- no manual switching | | Commit cassettes to git | CI runs without API keys, PR diffs show response changes | | Set ignoreFields: ['timestamp', 'request_id'] | Dynamic fields break matching -- strip them | | Use setup/teardown in withCassette | Avoid boilerplate adapter setup in every test | | Run mode: 'update' after model changes | Re-records all cassettes with the new model | | Check getSummary() in test output | Shows how many calls were recorded vs replayed | | Use onRecord hook for cost tracking | Log token usage during recording sessions |

Recording Modes

| Mode | Cassette exists | No cassette | |---|---|---| | record | Overwrite with new recording | Create new cassette | | playback | Replay recorded responses | Throw CassetteNotFoundError | | auto | Replay known, record new interactions | Record live calls, create cassette | | passthrough | Ignore cassette (real API calls) | Ignore (real API calls) | | update | Real call, overwrite cassette | Real call, create cassette |

auto is the recommended default: new interactions are recorded on first run, replayed thereafter.

SDK Adapters

OpenAI

import { Recorder, OpenAIAdapter } from 'agent-test-recorder';
import OpenAI from 'openai';

const recorder = new Recorder({ cassettePath: './cassettes/test.json', mode: 'auto' });
const adapter = new OpenAIAdapter(recorder);
const client = new OpenAI();

await recorder.start();
adapter.instrument(client); // Patches chat.completions.create and embeddings.create

Chat completions (regular and streaming):

// Regular call
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

// Streaming call -- works transparently
const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Embeddings:

const embedding = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'The quick brown fox',
});
// Recorded with token usage (prompt_tokens / total_tokens)

adapter.restore(client); // Restores both chat.completions.create and embeddings.create
await recorder.stop();

Captures: model, messages, token usage, tool_calls in responses, embeddings vectors.

Anthropic

import { Recorder, AnthropicAdapter } from 'agent-test-recorder';
import Anthropic from '@anthropic-ai/sdk';

const recorder = new Recorder({ cassettePath: './cassettes/test.json', mode: 'auto' });
const adapter = new AnthropicAdapter(recorder);
const client = new Anthropic();

await recorder.start();
adapter.instrument(client); // Patches client.messages.create

// Regular call
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello' }],
});

// Streaming call
const stream = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Count to 3' }],
  stream: true,
});
for await (const event of stream) {
  // Replays: message_start, content_block_start, content_block_delta,
  //          content_block_stop, message_delta, message_stop
}

adapter.restore(client);
await recorder.stop();

Captures: tool_use content blocks, input/output token counts, streaming events.

Fetch (Generic HTTP)

import { Recorder, FetchAdapter } from 'agent-test-recorder';

const recorder = new Recorder({ cassettePath: './cassettes/test.json', mode: 'auto' });
const adapter = new FetchAdapter(recorder, {
  urlPattern: /api\.openai\.com|api\.anthropic\.com/, // Only intercept matching URLs
});

await recorder.start();
adapter.instrument(); // Patches globalThis.fetch (no client argument)

const response = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk-...', // Auto-stripped before saving
  },
  body: JSON.stringify({ model: 'gpt-4o', messages: [] }),
});

adapter.restore(); // Restores original globalThis.fetch
await recorder.stop();

URLs not matching urlPattern pass through to the real fetch
Authorization and x-api-key headers are automatically stripped before saving
Records HTTP status code and status text alongside response body
Works with any HTTP-based API -- Gemini, Cohere, Mistral, or your own internal endpoints

Using with Google Gemini

const adapter = new FetchAdapter(recorder, {
  urlPattern: /generativelanguage\.googleapis\.com/,
});
adapter.instrument();

const resp = await fetch('https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'x-api-key': 'AIza...' },
  body: JSON.stringify({ contents: [{ parts: [{ text: 'Hello' }] }] }),
});
// Recorded and replayed just like OpenAI/Anthropic calls

Writing a Custom Adapter

The adapter interface is simple. Extend BaseAdapter and implement instrument() and restore():

import { BaseAdapter, type Recorder, type InteractionRequest, type InteractionResponse } from 'agent-test-recorder';

class MyCustomAdapter extends BaseAdapter {
  readonly name = 'my-provider';
  private originalMethod: Function | null = null;

  constructor(recorder: Recorder) {
    super(recorder);
  }

  instrument(client: any): void {
    this.originalMethod = client.generate;
    const self = this;

    client.generate = async (params: any) => {
      const request: InteractionRequest = {
        provider: 'my-provider',
        method: 'generate',
        params,
      };

      const handler = async (_req: InteractionRequest): Promise<InteractionResponse> => {
        const result = await self.originalMethod!.call(client, params);
        return { body: result };
      };

      return (await self.recorder.handle(request, handler)).body;
    };

    this.instrumented = true;
  }

  restore(client: any): void {
    if (this.originalMethod) {
      client.generate = this.originalMethod;
      this.originalMethod = null;
      this.instrumented = false;
    }
  }
}

Key points:

instrument() patches the client's API methods to route calls through this.recorder.handle()
restore() undoes the patch, restoring the original methods
The handler function is only called when recording; during playback it is skipped
Set this.instrumented = true/false to track state

Test Framework Integration

Vitest

import { withCassette } from 'agent-test-recorder/vitest';
import { OpenAIAdapter } from 'agent-test-recorder';
import OpenAI from 'openai';

describe('my agent', () => {
  // Basic usage -- auto mode, cassette at __cassettes__/answer-test.json
  it('answers correctly', withCassette('answer-test', async (recorder) => {
    const client = new OpenAI();
    const adapter = new OpenAIAdapter(recorder);
    adapter.instrument(client);
    const result = await client.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'What is 2+2?' }],
    });
    adapter.restore(client);
    expect(result.choices[0].message.content).toContain('4');
  }));

  // With options
  it('re-records when needed', withCassette('update-test', { mode: 'update' }, async (recorder) => {
    // ...
  }));

  // With setup/teardown hooks
  it('uses setup hook', withCassette('hook-test', {
    setup: (recorder) => {
      const client = new OpenAI();
      const adapter = new OpenAIAdapter(recorder);
      adapter.instrument(client);
      return {
        teardown: () => adapter.restore(client),
      };
    },
  }, async (recorder) => {
    // client is already instrumented
  }));
});

CassetteOptions

| Option | Type | Default | Description | |---|---|---|---| | mode | RecordMode | 'auto' | Recording mode | | basePath | string | process.cwd() | Base directory for cassette storage | | matcherOptions | MatcherOptions | {} | Matcher configuration | | setup | (recorder: Recorder) => { teardown: () => void } | -- | Setup/teardown hooks |

withCassette stores cassettes at <basePath>/__cassettes__/<name>.json.

Jest

import { withCassette } from 'agent-test-recorder/jest';

// Same API as Vitest
test('answers correctly', withCassette('answer-test', async (recorder) => {
  // ...
}));

Matcher Configuration

The Matcher class controls how incoming requests are matched against recorded interactions.

import { Recorder } from 'agent-test-recorder';

const recorder = new Recorder({
  cassettePath: './cassettes/test.json',
  mode: 'auto',
  matcherOptions: {
    // Strip these fields before computing the match hash (default: ['stream'])
    ignoreFields: ['stream', 'request_id', 'timestamp'],

    // Or provide a fully custom matcher
    custom: (request, recorded) => {
      return request.params.model === recorded.params.model
        && JSON.stringify(request.params.messages) === JSON.stringify(recorded.params.messages);
    },
  },
});

| Option | Type | Default | Description | |---|---|---|---| | ignoreFields | string[] | ['stream'] | Fields to strip from params before SHA-256 hashing | | custom | (request: InteractionRequest, recorded: InteractionRequest) => boolean | -- | Override matching logic entirely; when set, hash-based matching is skipped |

Similarity scoring: When no match is found in playback mode, the Matcher computes Jaccard similarity between the request and all recorded interactions, reporting the closest match in the error message.

Cassette Format

Cassettes are plain JSON files, human-readable and version-controlled.

{
  "meta": {
    "version": 1,
    "name": "my-test",
    "createdAt": "2026-03-25T00:00:00.000Z",
    "updatedAt": "2026-03-25T00:00:00.000Z"
  },
  "interactions": [
    {
      "request": {
        "provider": "openai",
        "method": "chat.completions.create",
        "params": { "model": "gpt-4o", "messages": [{ "role": "user", "content": "Hello" }] }
      },
      "response": {
        "body": { "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }] },
        "tokenUsage": { "prompt": 10, "completion": 3 },
        "latency": 450,
        "streaming": false
      },
      "timestamp": "2026-03-25T00:00:00.000Z",
      "matchKey": "sha256:abc123..."
    }
  ]
}

Atomic writes: Cassettes are written to a temp file (<path>.<uuid>.tmp) then renamed, preventing corruption from crashes or concurrent writes.

CLI

# List all cassettes in a directory (recursive)
agent-test-recorder list
agent-test-recorder list -d ./tests

# Display cassette contents (name, version, dates, interactions)
agent-test-recorder show ./cassettes/my-test.json

# Compare two cassette files (added/removed interactions by matchKey)
agent-test-recorder diff cassette-a.json cassette-b.json

# Delete cassettes older than N days
agent-test-recorder prune --older-than 30d
agent-test-recorder prune --older-than 7d -d ./tests

# Show aggregate statistics (count, total interactions, size, by provider)
agent-test-recorder stats
agent-test-recorder stats -d ./tests

# Migrate cassettes to latest format version
agent-test-recorder update ./cassettes/old-test.json
agent-test-recorder update --all
agent-test-recorder update --all -d ./tests

| Command | Description | |---|---| | list | Recursively find and list all valid cassette files | | show <cassette> | Display cassette metadata and interaction summary | | diff <a> <b> | Compare two cassettes by matchKey (added/removed counts) | | prune --older-than <Nd> | Delete cassettes with createdAt older than N days | | stats | Aggregate stats: cassette count, interaction count, size, breakdown by provider | | update <cassette> | Migrate a cassette file to the latest format version | | update --all | Migrate all cassettes in a directory |

Edge Cases

| Situation | Behavior | |---|---| | Same request multiple times | Nth call with same hash returns Nth recording (ordered matching) | | No match in cassette (playback) | CassetteMismatchError with closest match and Jaccard similarity score | | No match in cassette (auto) | Record live call, append to existing cassette | | API error responses (4xx/5xx) | Recorded as-is and replayed faithfully | | Streaming drops mid-stream | Caught, marked incomplete: true, finish_reason: 'error' | | Parallel requests | Content-based hash matching -- order-independent | | Sensitive headers | Authorization and x-api-key auto-stripped by Fetch adapter | | Corrupted cassette JSON | CassetteCorruptError with file path and parse error details | | Cassette version mismatch | CassetteVersionError with hint to run migration command | | Concurrent test writes | Atomic write via temp file + rename prevents corruption |

Error Messages

All error classes include actionable hints:

CassetteMismatchError: No matching recording found

  Request:
    provider: openai
    method: chat.completions.create
    model: gpt-4o
    first message: "What is 3+3?"

  Closest match in cassette (similarity: 0.85):
    provider: openai
    method: chat.completions.create
    model: gpt-4o
    first message: "What is 2+2?"

  Hint: Run with mode: 'auto' to record this new interaction.

CassetteNotFoundError: Cassette not found: ./cassettes/test.json
  Hint: Run with mode: 'record' or 'auto' to create it.

CassetteVersionError: Cassette version mismatch: ./cassettes/test.json
  Found: v0, Expected: v1
  Hint: Run `agent-test-recorder update ./cassettes/test.json` to migrate.

CassettePathError: Invalid cassette path: ../secret.txt
  Cassette path must end with .json
  Hint: Rename the file to use a .json extension (e.g., "my-test.json").

CassetteSizeError: Cassette file too large: ./cassettes/huge.json
  Size: 55.0 MB, Max: 50.0 MB
  Hint: Split your cassette into smaller files, or pass { maxSize: <bytes> } to readCassette().

Real-World Examples

Testing an AI agent end-to-end

import { withCassette } from 'agent-test-recorder/vitest';
import { OpenAIAdapter } from 'agent-test-recorder';
import OpenAI from 'openai';
import { MyAgent } from '../src/agent.js';

const client = new OpenAI();

describe('MyAgent', () => {
  it('handles a multi-step research task', withCassette('research-task', {
    setup: (recorder) => {
      const adapter = new OpenAIAdapter(recorder);
      adapter.instrument(client);
      return { teardown: () => adapter.restore(client) };
    },
  }, async () => {
    const agent = new MyAgent(client);
    const result = await agent.research('What causes northern lights?');

    // These assertions are deterministic because the LLM responses are replayed
    expect(result.sources).toHaveLength(3);
    expect(result.summary).toContain('solar wind');
    expect(result.toolCalls).toContain('web_search');
  }));
});

Cost monitoring during recording

let totalTokens = 0;

const recorder = new Recorder({
  cassettePath: './cassettes/expensive-test.json',
  mode: 'record',
  onRecord: (interaction) => {
    const usage = interaction.response.tokenUsage;
    if (usage) {
      totalTokens += usage.prompt + usage.completion;
      console.log(`[${interaction.request.params.model}] ${usage.prompt}+${usage.completion} tokens`);
    }
  },
});

// ... run your test ...

await recorder.stop();
console.log(`Total tokens used: ${totalTokens}`);
// Next run with mode: 'auto' → 0 tokens, 0 cost

Using multiple providers in one test

it('compares OpenAI and Anthropic', withCassette('provider-compare', {
  setup: (recorder) => {
    const oai = new OpenAIAdapter(recorder);
    const ant = new AnthropicAdapter(recorder);
    oai.instrument(openaiClient);
    ant.instrument(anthropicClient);
    return {
      teardown: () => {
        oai.restore(openaiClient);
        ant.restore(anthropicClient);
      },
    };
  },
}, async () => {
  const [gptResult, claudeResult] = await Promise.all([
    openaiClient.chat.completions.create({ model: 'gpt-4o', messages }),
    anthropicClient.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, messages }),
  ]);
  // Both responses are recorded in the same cassette and replayed together
}));

API Reference

Exported Classes

| Class | Description | |---|---| | Recorder | Core record/replay engine | | Matcher | Request matching and similarity scoring | | BaseAdapter | Abstract base class for SDK adapters | | OpenAIAdapter | Adapter for OpenAI SDK (chat.completions.create, embeddings.create) | | AnthropicAdapter | Adapter for Anthropic SDK (messages.create) | | FetchAdapter | Adapter for generic HTTP via globalThis.fetch |

Exported Functions

| Function | Signature | Description | |---|---|---| | createEmptyCassette | (name: string) => Cassette | Create a new empty cassette with metadata | | readCassette | (filePath: string, options?: { autoMigrate?: boolean; maxSize?: number }) => Promise<Cassette> | Read and validate a cassette file; auto-migrate old versions when autoMigrate: true; reject files exceeding maxSize bytes (default 50 MB) | | validateCassettePath | (filePath: string, baseDir?: string) => string | Validate and normalize a cassette path; prevents path traversal and injection | | validateCassette | (data: unknown) => CassetteValidationError[] | Validate cassette schema; returns array of errors (empty = valid) | | writeCassette | (filePath: string, cassette: Cassette) => Promise<void> | Atomically write a cassette to disk | | migrateCassette | (cassette: any) => Cassette | Upgrade a cassette object to the latest version |

Exported Error Classes

| Error | When thrown | |---|---| | CassetteMismatchError | No matching recorded interaction in playback mode | | CassetteNotFoundError | Cassette file does not exist in playback mode | | CassetteCorruptError | Cassette file is not valid JSON | | CassetteVersionError | Cassette meta.version does not match CASSETTE_VERSION | | CassettePathError | Invalid cassette path (path traversal, null bytes, non-.json) | | CassetteSizeError | Cassette file exceeds maximum size limit | | RecorderNotStartedError | handle() called before start() | | AdapterInstrumentError | Adapter instrument() received an invalid client |

Exported Functions (continued)

| Function | Signature | Description | |---|---|---| | sanitizeParams | (params: Record<string, unknown>) => Record<string, unknown> | Deep-strip sensitive keys (api_key, token, authorization, etc.) from params |

Exported Constants

| Constant | Value | Description | |---|---|---| | CASSETTE_VERSION | 1 | Current cassette format version | | MAX_CASSETTE_SIZE | 50 * 1024 * 1024 | Maximum cassette file size in bytes (50 MB) |

Exported Types

type RecordMode = 'record' | 'playback' | 'auto' | 'passthrough' | 'update';

interface RecorderConfig {
  cassettePath: string;
  mode: RecordMode;
  matcherOptions?: MatcherOptions;
  onRecord?: (interaction: Interaction) => void;
  onReplay?: (interaction: Interaction) => void;
  onMismatch?: (request: InteractionRequest) => void;
}

interface MatcherOptions {
  ignoreFields?: string[];
  custom?: (request: InteractionRequest, recorded: InteractionRequest) => boolean;
}

interface Cassette {
  meta: CassetteMeta;
  interactions: Interaction[];
}

interface CassetteMeta {
  version: number;
  name: string;
  createdAt: string;
  updatedAt: string;
}

interface Interaction {
  request: InteractionRequest;
  response: InteractionResponse;
  timestamp: string;
  matchKey: string;
}

interface InteractionRequest {
  provider: string;
  method: string;
  params: Record<string, unknown>;
}

interface InteractionResponse {
  body: Record<string, unknown>;
  tokenUsage?: { prompt: number; completion: number };
  latency?: number;
  streaming?: boolean;
  incomplete?: boolean;
}

interface FetchAdapterOptions {
  urlPattern: RegExp;
}

interface CassetteOptions {
  mode?: RecordMode;
  basePath?: string;
  matcherOptions?: MatcherOptions;
  setup?: (recorder: Recorder) => { teardown: () => void };
}

interface CassetteValidationError {
  path: string;
  message: string;
}

const CASSETTE_VERSION = 1;

Recorder Methods

| Method | Signature | Description | |---|---|---| | start() | () => Promise<void> | Load cassette (if exists), prepare for recording/playback | | handle() | (request, handler) => Promise<InteractionResponse> | Match or record an interaction | | stop() | () => Promise<void> | Save cassette (if recording), release resources | | getRecordedCount() | () => number | Number of interactions recorded in this session | | getInteractionCount() | () => number | Total interactions (recorded + existing cassette) | | getSummary() | () => { mode, recorded, replayed, cassettePath, totalInteractions, cassetteInteractions } | Session summary object |

Adapter Methods

| Method | OpenAI | Anthropic | Fetch | |---|---|---|---| | instrument() | instrument(client) | instrument(client) | instrument() | | restore() | restore(client) | restore(client) | restore() | | isInstrumented() | () => boolean | () => boolean | () => boolean |

CI/CD Integration

Run tests in CI without API keys -- cassettes are committed to git and replayed automatically:

# GitHub Actions example
- name: Run tests
  run: npm run ci
  # No OPENAI_API_KEY or ANTHROPIC_API_KEY needed!
  # Cassettes in __cassettes__/ provide all recorded responses.

The ci script runs tests with verbose output and coverage summary. No API keys are required in CI as long as cassettes have been recorded locally and committed to git.

Cassette Schema Validation

Use validateCassette() in CI to catch corrupted or hand-edited cassettes:

import { validateCassette, readCassette } from 'agent-test-recorder';

const cassette = await readCassette('./cassettes/my-test.json');
const errors = validateCassette(cassette);
if (errors.length > 0) {
  console.error('Invalid cassette:', errors);
  process.exit(1);
}

Advanced Usage

Custom Matcher with ignoreFields

Dynamic fields like request_id, timestamp, or session_id change on every API call, breaking hash-based matching. Use ignoreFields to strip them before computing the match key:

const recorder = new Recorder({
  cassettePath: './cassettes/test.json',
  mode: 'auto',
  matcherOptions: {
    // These fields are removed from params before SHA-256 hashing
    ignoreFields: ['stream', 'request_id', 'timestamp', 'session_id'],
  },
});

For more control, use a fully custom matcher function. When custom is set, hash-based matching is skipped entirely:

const recorder = new Recorder({
  cassettePath: './cassettes/test.json',
  mode: 'auto',
  matcherOptions: {
    custom: (request, recorded) => {
      // Match by model + first message only (ignore system prompts, temperature, etc.)
      return request.params.model === recorded.params.model
        && JSON.stringify(request.params.messages?.[0])
           === JSON.stringify(recorded.params.messages?.[0]);
    },
  },
});

Monitoring with Event Hooks

The onRecord, onReplay, and onMismatch hooks give visibility into what the recorder is doing. Useful for debugging, cost tracking, and test diagnostics.

const recorder = new Recorder({
  cassettePath: './cassettes/test.json',
  mode: 'auto',
  onRecord: (interaction) => {
    // Fired when a new interaction is recorded (live API call)
    const { provider, method } = interaction.request;
    const tokens = interaction.response.tokenUsage;
    console.log(`[RECORD] ${provider}/${method} — ${tokens?.prompt ?? 0}+${tokens?.completion ?? 0} tokens`);
  },
  onReplay: (interaction) => {
    // Fired when a recorded interaction is replayed from cassette
    console.log(`[REPLAY] ${interaction.request.method} (${interaction.response.latency}ms saved)`);
  },
  onMismatch: (request) => {
    // Fired when a request has no match in playback mode (before throwing)
    console.warn(`[MISMATCH] No recording for ${request.method} — will throw CassetteMismatchError`);
  },
});

Test Reporting with getSummary()

After a test run, call getSummary() to see what happened:

await recorder.stop();
const summary = recorder.getSummary();
console.log(summary);
// {
//   mode: 'auto',
//   recorded: 2,        ← new API calls made this run
//   replayed: 5,        ← interactions served from cassette
//   cassettePath: './cassettes/test.json',
//   totalInteractions: 7,
//   cassetteInteractions: 5
// }

This is especially useful in CI to verify that no live API calls are being made:

afterAll(() => {
  const summary = recorder.getSummary();
  if (summary.recorded > 0) {
    console.warn(`WARNING: ${summary.recorded} live API call(s) recorded in CI!`);
  }
});

Using FetchAdapter for Non-SDK Providers

The FetchAdapter intercepts globalThis.fetch, making it work with any HTTP-based LLM API -- including providers without official SDKs (Gemini, Cohere, Mistral, local models):

import { Recorder, FetchAdapter } from 'agent-test-recorder';

const recorder = new Recorder({ cassettePath: './cassettes/multi.json', mode: 'auto' });

// Match multiple providers with a single regex
const adapter = new FetchAdapter(recorder, {
  urlPattern: /api\.openai\.com|api\.anthropic\.com|api\.cohere\.ai|localhost:11434/,
});

await recorder.start();
adapter.instrument();

// All matching fetch() calls are now recorded/replayed
const response = await fetch('https://api.cohere.ai/v1/generate', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer ...', 'Content-Type': 'application/json' },
  body: JSON.stringify({ prompt: 'Hello', model: 'command' }),
});

adapter.restore();
await recorder.stop();

URLs not matching the pattern pass through to the real fetch unmodified.

Migration Guide

Upgrading Cassettes When the Library Version Changes

When agent-test-recorder updates its cassette format (indicated by meta.version), old cassettes will throw CassetteVersionError on load. There are three ways to handle this:

Option 1: CLI migration (recommended)

# Migrate a single cassette
agent-test-recorder update ./cassettes/my-test.json

# Migrate all cassettes in a directory
agent-test-recorder update --all -d ./tests

This reads each cassette, runs the internal migrateCassette() function, and writes the updated file back.

Option 2: Auto-migration in code

Pass autoMigrate: true to readCassette(). The cassette is upgraded in memory; call writeCassette() to persist:

import { readCassette, writeCassette } from 'agent-test-recorder';

const cassette = await readCassette('./cassettes/old.json', { autoMigrate: true });
await writeCassette('./cassettes/old.json', cassette); // persist the migration

Option 3: Re-record from scratch

If your test environment has access to the live API, the simplest approach is to delete old cassettes and re-record:

rm -rf ./tests/__cassettes__
npm test  # with mode: 'auto', cassettes are recreated on first run

Version History

| Cassette Version | Library Version | Changes | |---|---|---| | v1 | 0.1.0+ | Initial format: meta.version, interactions[], matchKey |

Best Practices for Migration

Always commit cassettes to git before migrating. If migration produces unexpected results, you can revert.
Run tests after migration to verify replayed responses still match your assertions.
Review diffs in pull requests. Migration may reformat JSON or add new metadata fields -- this is expected.
Pin your library version in CI until you're ready to migrate: "agent-test-recorder": "0.1.0" (exact version).

Troubleshooting

"Cannot find module 'openai'" Install the SDK: npm install openai (or @anthropic-ai/sdk for Anthropic).

Cassette not being saved Make sure you call await recorder.stop() after your test. Use withCassette() which handles this automatically.

"CassetteMismatchError" in playback mode Your request has changed since recording. Use mode: 'auto' to record the new request, or mode: 'update' to re-record everything.

ESM vs CommonJS This package uses ESM ("type": "module"). If your project uses CommonJS, add "type": "module" to your package.json or use .mts file extensions.

Why is stream in ignoreFields by default? The stream: true/false parameter changes the request hash but not the response content. Ignoring it means a recording made with streaming works for non-streaming replay too.

License

MIT -- see LICENSE