npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

agent-testing-library

v1.5.1

Published

Testing framework for AI agents and multi-agent systems with AI Judge verification (Node & Bun compatible)

Readme

Agent Testing Library

A powerful, flexible testing framework for AI agents and multi-agent systems with built-in AI Judge verification. Supports both traditional test frameworks (Bun/Vitest) and standalone JSON-based testing.

npm version License: MIT

Philosophy

Test behavior, not implementation - Focus on what agents do, not how they do it.

Features

  • 🔌 Adapter System - Works with any agent framework (Apteva, LangChain, CrewAI, etc.)
  • 🤖 AI Judge - AI-based verification for non-deterministic outputs
  • 📋 JSON Test Definitions - Declarative, portable test suites
  • 🚀 Standalone Mode - No test framework required
  • 👥 Multi-Agent Native - First-class support for agent interactions
  • Node & Bun Compatible - Works everywhere
  • 🎯 Developer-Friendly - Familiar, intuitive API
  • 🧪 Rich Assertions - Custom matchers for agent testing

Installation

npm install agent-testing-library
# or
yarn add agent-testing-library
# or
bun add agent-testing-library

Quick Start

Option 1: JSON-Based Testing (Standalone)

Perfect for CI/CD, serverless functions, or simple test automation.

1. Create a JSON test file:

{
  "name": "Agent Test Suite",
  "config": {
    "adapter": {
      "type": "apteva",
      "baseUrl": "http://localhost:3000/agents",
      "apiKey": "agt_...",
      "projectId": "1"
    },
    "aiJudge": {
      "enabled": true
    }
  },
  "agents": [
    {
      "id": "my-agent",
      "type": "create",
      "config": {
        "provider": "anthropic",
        "model": "claude-haiku-4-5",
        "systemPrompt": "You are a helpful assistant."
      }
    }
  ],
  "tests": [
    {
      "name": "should answer questions",
      "agent": "my-agent",
      "message": "What is 2+2?",
      "assertions": [
        {"type": "status", "value": "success"},
        {"type": "contains", "value": "4"},
        {
          "type": "ai-judge",
          "criteria": ["Response correctly answers the math question"],
          "minScore": 0.8
        }
      ]
    }
  ]
}

2. Run it (no test framework needed):

import { runJsonTest } from 'agent-testing-library';

const results = await runJsonTest('./my-test.json');
console.log(results);
// {
//   suite: "Agent Test Suite",
//   passed: true,
//   results: [...]
// }

Or use the CLI runner:

node -e "import('agent-testing-library').then(({runJsonTest})=>runJsonTest('./test.json'))"

Option 2: Code-Based Testing (Traditional)

Full programmatic control with test frameworks.

import { describe, it, expect, beforeAll, afterAll } from 'bun:test'; // or 'vitest'
import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';
import { initMatchers } from 'agent-testing-library/matchers';

const aiJudge = new AIJudge(); // Auto-detects API key from env
const tester = createAgentTester({
  adapter: new AptevaAdapter({
    baseUrl: process.env.APTEVA_URL,
    apiKey: process.env.APTEVA_API_KEY,
    projectId: process.env.APTEVA_PROJECT_ID,
  }),
  aiJudge,
});

initMatchers(aiJudge);

describe('My Agent', () => {
  let agent;

  beforeAll(async () => {
    agent = await tester.createAgent({
      name: 'test-agent',
      config: {
        provider: 'anthropic',
        model: 'claude-haiku-4-5',
        systemPrompt: 'You are a helpful assistant.',
      },
    });
  });

  afterAll(async () => {
    await tester.cleanup();
  });

  it('should respond to greetings', async () => {
    const response = await agent.chat('Hello!');

    expect(response).toHaveStatus('success');
    expect(response.message).toBeDefined();
  });

  it('should provide accurate answers', async () => {
    const response = await agent.chat('What is 2+2?');

    // AI Judge verification
    await expect(response).toMatchAIExpectation({
      criteria: ['Response correctly states that 2+2=4'],
    });
  });
});

JSON Test Format

Basic Structure

{
  "name": "Test Suite Name",
  "description": "Optional description",
  "config": {
    "adapter": {
      "type": "apteva",
      "baseUrl": "${APTEVA_URL}",
      "apiKey": "${APTEVA_API_KEY}",
      "projectId": "${APTEVA_PROJECT_ID}"
    },
    "aiJudge": {
      "enabled": true,
      "provider": "openai",
      "model": "gpt-4"
    }
  },
  "agents": [...],
  "tests": [...]
}

Agent Types

Create New Agent:

{
  "id": "my-agent",
  "type": "create",
  "config": {
    "name": "agent-name",
    "provider": "anthropic",
    "model": "claude-haiku-4-5",
    "systemPrompt": "You are..."
  }
}

Use Existing Agent:

{
  "id": "existing-agent",
  "type": "use",
  "agentId": "agent-abc123"
}

Assertion Types

Status Check:

{"type": "status", "value": "success"}

Content Match:

{"type": "contains", "value": "expected text"}

AI Judge Evaluation:

{
  "type": "ai-judge",
  "criteria": [
    "Response is accurate",
    "Response is helpful"
  ],
  "minScore": 0.8
}

Environment Variables

Use ${VAR_NAME} to inject environment variables:

{
  "config": {
    "adapter": {
      "apiKey": "${APTEVA_API_KEY}"
    }
  }
}

Programmatic API

Standalone Function Testing

import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';

async function testMyAgent() {
  const aiJudge = new AIJudge(); // Auto-detects OPENAI_API_KEY or ANTHROPIC_API_KEY

  const tester = createAgentTester({
    adapter: new AptevaAdapter({
      baseUrl: process.env.APTEVA_URL,
      apiKey: process.env.APTEVA_API_KEY,
      projectId: process.env.APTEVA_PROJECT_ID,
    }),
    aiJudge,
  });

  // Create agent
  const agent = await tester.createAgent({
    name: 'my-agent',
    config: {
      provider: 'anthropic',
      model: 'claude-haiku-4-5',
      systemPrompt: 'You are helpful.',
    },
  });

  // Test
  const response = await agent.chat('What is 2+2?');

  // AI Judge evaluation
  const evaluation = await aiJudge.evaluate({
    response: response.message,
    criteria: ['Response correctly answers that 2+2=4'],
    trace: response.trace,
  });

  // Cleanup
  await tester.cleanup();

  return {
    passed: evaluation.passed,
    score: evaluation.score,
    feedback: evaluation.feedback,
  };
}

// Run it
testMyAgent().then(console.log);

Using Existing Agents

// Attach to existing agent (won't be deleted)
const agent = await tester.useAgent('agent-abc123');

// Create persistent agent
const agent = await tester.createAgent({
  name: 'persistent-agent',
  skipCleanup: true, // Won't be deleted during cleanup
  config: {...},
});

Multi-Agent Testing

// JSON format
{
  "agents": [
    {"id": "researcher", "type": "create", "config": {...}},
    {"id": "writer", "type": "create", "config": {...}}
  ],
  "tests": [
    {
      "name": "agent coordination",
      "agent": "researcher",
      "message": "Research AI safety",
      "store": "research"
    },
    {
      "name": "write summary",
      "agent": "writer",
      "message": "Summarize: ${research.message}",
      "assertions": [...]
    }
  ]
}
// Code format
const researcher = await tester.createAgent({...});
const writer = await tester.createAgent({...});

const research = await researcher.chat('Research AI safety');
const article = await writer.chat(`Write about: ${research.message}`);

AI Judge

AI Judge provides intelligent verification of non-deterministic agent outputs.

Auto-Detection

// Automatically detects OPENAI_API_KEY or ANTHROPIC_API_KEY from environment
const aiJudge = new AIJudge();

Manual Configuration

const aiJudge = new AIJudge({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o',
});

Evaluation

const evaluation = await aiJudge.evaluate({
  response: 'Agent response...',
  criteria: [
    'Response is accurate',
    'Response is helpful',
    'Response is concise'
  ],
});

console.log(evaluation);
// {
//   passed: true,
//   score: 0.95,
//   feedback: "The response meets all criteria..."
// }

Custom Matchers

import { initMatchers } from 'agent-testing-library/matchers';

initMatchers(aiJudge);

// Status
expect(response).toHaveStatus('success');

// Content
expect(response).toContainMessage('hello');
expect(response).toMatchPattern(/\d+/);

// AI-based
await expect(response).toMatchAIExpectation({
  criteria: 'Response is helpful and accurate',
});

await expect(response).toBeFactuallyAccurate({
  groundTruth: 'Expected facts...',
});

Environment Variables

Create a .env file:

# Apteva (or your adapter)
APTEVA_URL=http://localhost:3000/agents
APTEVA_API_KEY=agt_...
APTEVA_PROJECT_ID=1

# AI Judge (one or both)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The library automatically loads .env files when using the JSON test runner.

Running Tests

JSON Tests (Standalone)

# Create a runner script
echo 'import("agent-testing-library").then(({runJsonTest})=>runJsonTest("./test.json"))' > run.mjs
node run.mjs

# Or use in your code
import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./test.json');

Code Tests (Framework)

# With Bun
bun test

# With Vitest/Node
npm test

# Specific file
bun test ./tests/my-test.js

Examples

The package includes comprehensive examples:

  • examples/basic-apteva.js - Basic agent testing
  • examples/basic-apteva-tools.js - Agent with MCP tools
  • examples/basic-apteva-multi.js - Multi-agent coordination
  • examples/existing-agent-test.js - Using existing agents
  • examples/run-json-test.js - JSON test runner
  • tests/json/simple-apteva-test.json - JSON test example

API Reference

runJsonTest(jsonPath)

Run a JSON test suite.

Returns: Promise<Object>

{
  suite: string,
  passed: boolean,
  duration: number,
  totalTests: number,
  passedTests: number,
  failedTests: number,
  results: Array<{
    name: string,
    passed: boolean,
    duration: number,
    response: string,
    score?: number,
    feedback?: string,
    assertions: Array<{...}>
  }>
}

createAgentTester(config)

Create a test runner instance.

Options:

  • adapter - Adapter instance (required)
  • aiJudge - AI Judge instance (optional)
  • timeout - Default timeout in ms (default: 30000)
  • cleanup - Auto-cleanup agents (default: true)

Returns: Tester instance with methods:

  • createAgent(config) - Create new agent
  • useAgent(agentId) - Use existing agent
  • cleanup() - Cleanup all agents

AIJudge(config)

Create an AI Judge instance.

Options:

  • provider - 'openai' or 'anthropic' (optional, auto-detected)
  • apiKey - API key (optional, auto-detected from env)
  • model - Model name (optional)

Methods:

  • evaluate({ response, criteria, trace }) - Evaluate response

Adapters

Built-in Adapters

  • AptevaAdapter - For Apteva agent platform

Creating Custom Adapters

import { BaseAgentAdapter } from 'agent-testing-library/adapters';

export class MyAdapter extends BaseAgentAdapter {
  async createAgent(config) {
    // Implement
  }

  async sendMessage(agentId, message, options) {
    // Implement
  }

  async getAgent(agentId) {
    // Implement
  }

  async deleteAgent(agentId) {
    // Implement
  }
}

License

MIT

Contributing

Contributions welcome! Please check GitHub issues.


Built with ❤️ for the AI agent community