agent-testing-library

v1.5.1

Published

8 days ago

Testing framework for AI agents and multi-agent systems with AI Judge verification (Node & Bun compatible)

0High
0Medium
0Low

marcoschwartz

agent testing ai multi-agent test-framework bun node vitest

Agent Testing Library

A powerful, flexible testing framework for AI agents and multi-agent systems with built-in AI Judge verification. Supports both traditional test frameworks (Bun/Vitest) and standalone JSON-based testing.

Philosophy

Test behavior, not implementation - Focus on what agents do, not how they do it.

Features

🔌 Adapter System - Works with any agent framework (Apteva, LangChain, CrewAI, etc.)
🤖 AI Judge - AI-based verification for non-deterministic outputs
📋 JSON Test Definitions - Declarative, portable test suites
🚀 Standalone Mode - No test framework required
👥 Multi-Agent Native - First-class support for agent interactions
⚡ Node & Bun Compatible - Works everywhere
🎯 Developer-Friendly - Familiar, intuitive API
🧪 Rich Assertions - Custom matchers for agent testing

Installation

npm install agent-testing-library
# or
yarn add agent-testing-library
# or
bun add agent-testing-library

Quick Start

Option 1: JSON-Based Testing (Standalone)

Perfect for CI/CD, serverless functions, or simple test automation.

1. Create a JSON test file:

{
  "name": "Agent Test Suite",
  "config": {
    "adapter": {
      "type": "apteva",
      "baseUrl": "http://localhost:3000/agents",
      "apiKey": "agt_...",
      "projectId": "1"
    },
    "aiJudge": {
      "enabled": true
    }
  },
  "agents": [
    {
      "id": "my-agent",
      "type": "create",
      "config": {
        "provider": "anthropic",
        "model": "claude-haiku-4-5",
        "systemPrompt": "You are a helpful assistant."
      }
    }
  ],
  "tests": [
    {
      "name": "should answer questions",
      "agent": "my-agent",
      "message": "What is 2+2?",
      "assertions": [
        {"type": "status", "value": "success"},
        {"type": "contains", "value": "4"},
        {
          "type": "ai-judge",
          "criteria": ["Response correctly answers the math question"],
          "minScore": 0.8
        }
      ]
    }
  ]
}

2. Run it (no test framework needed):

import { runJsonTest } from 'agent-testing-library';

const results = await runJsonTest('./my-test.json');
console.log(results);
// {
//   suite: "Agent Test Suite",
//   passed: true,
//   results: [...]
// }

Or use the CLI runner:

node -e "import('agent-testing-library').then(({runJsonTest})=>runJsonTest('./test.json'))"

Option 2: Code-Based Testing (Traditional)

Full programmatic control with test frameworks.

import { describe, it, expect, beforeAll, afterAll } from 'bun:test'; // or 'vitest'
import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';
import { initMatchers } from 'agent-testing-library/matchers';

const aiJudge = new AIJudge(); // Auto-detects API key from env
const tester = createAgentTester({
  adapter: new AptevaAdapter({
    baseUrl: process.env.APTEVA_URL,
    apiKey: process.env.APTEVA_API_KEY,
    projectId: process.env.APTEVA_PROJECT_ID,
  }),
  aiJudge,
});

initMatchers(aiJudge);

describe('My Agent', () => {
  let agent;

  beforeAll(async () => {
    agent = await tester.createAgent({
      name: 'test-agent',
      config: {
        provider: 'anthropic',
        model: 'claude-haiku-4-5',
        systemPrompt: 'You are a helpful assistant.',
      },
    });
  });

  afterAll(async () => {
    await tester.cleanup();
  });

  it('should respond to greetings', async () => {
    const response = await agent.chat('Hello!');

    expect(response).toHaveStatus('success');
    expect(response.message).toBeDefined();
  });

  it('should provide accurate answers', async () => {
    const response = await agent.chat('What is 2+2?');

    // AI Judge verification
    await expect(response).toMatchAIExpectation({
      criteria: ['Response correctly states that 2+2=4'],
    });
  });
});

JSON Test Format

Basic Structure

{
  "name": "Test Suite Name",
  "description": "Optional description",
  "config": {
    "adapter": {
      "type": "apteva",
      "baseUrl": "${APTEVA_URL}",
      "apiKey": "${APTEVA_API_KEY}",
      "projectId": "${APTEVA_PROJECT_ID}"
    },
    "aiJudge": {
      "enabled": true,
      "provider": "openai",
      "model": "gpt-4"
    }
  },
  "agents": [...],
  "tests": [...]
}

Agent Types

Create New Agent:

{
  "id": "my-agent",
  "type": "create",
  "config": {
    "name": "agent-name",
    "provider": "anthropic",
    "model": "claude-haiku-4-5",
    "systemPrompt": "You are..."
  }
}

Use Existing Agent:

{
  "id": "existing-agent",
  "type": "use",
  "agentId": "agent-abc123"
}

Assertion Types

Status Check:

{"type": "status", "value": "success"}

Content Match:

{"type": "contains", "value": "expected text"}

AI Judge Evaluation:

{
  "type": "ai-judge",
  "criteria": [
    "Response is accurate",
    "Response is helpful"
  ],
  "minScore": 0.8
}

Environment Variables

Use ${VAR_NAME} to inject environment variables:

{
  "config": {
    "adapter": {
      "apiKey": "${APTEVA_API_KEY}"
    }
  }
}

Programmatic API

Standalone Function Testing

import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';

async function testMyAgent() {
  const aiJudge = new AIJudge(); // Auto-detects OPENAI_API_KEY or ANTHROPIC_API_KEY

  const tester = createAgentTester({
    adapter: new AptevaAdapter({
      baseUrl: process.env.APTEVA_URL,
      apiKey: process.env.APTEVA_API_KEY,
      projectId: process.env.APTEVA_PROJECT_ID,
    }),
    aiJudge,
  });

  // Create agent
  const agent = await tester.createAgent({
    name: 'my-agent',
    config: {
      provider: 'anthropic',
      model: 'claude-haiku-4-5',
      systemPrompt: 'You are helpful.',
    },
  });

  // Test
  const response = await agent.chat('What is 2+2?');

  // AI Judge evaluation
  const evaluation = await aiJudge.evaluate({
    response: response.message,
    criteria: ['Response correctly answers that 2+2=4'],
    trace: response.trace,
  });

  // Cleanup
  await tester.cleanup();

  return {
    passed: evaluation.passed,
    score: evaluation.score,
    feedback: evaluation.feedback,
  };
}

// Run it
testMyAgent().then(console.log);

Using Existing Agents

// Attach to existing agent (won't be deleted)
const agent = await tester.useAgent('agent-abc123');

// Create persistent agent
const agent = await tester.createAgent({
  name: 'persistent-agent',
  skipCleanup: true, // Won't be deleted during cleanup
  config: {...},
});

Multi-Agent Testing

// JSON format
{
  "agents": [
    {"id": "researcher", "type": "create", "config": {...}},
    {"id": "writer", "type": "create", "config": {...}}
  ],
  "tests": [
    {
      "name": "agent coordination",
      "agent": "researcher",
      "message": "Research AI safety",
      "store": "research"
    },
    {
      "name": "write summary",
      "agent": "writer",
      "message": "Summarize: ${research.message}",
      "assertions": [...]
    }
  ]
}

// Code format
const researcher = await tester.createAgent({...});
const writer = await tester.createAgent({...});

const research = await researcher.chat('Research AI safety');
const article = await writer.chat(`Write about: ${research.message}`);

AI Judge

AI Judge provides intelligent verification of non-deterministic agent outputs.

Auto-Detection

// Automatically detects OPENAI_API_KEY or ANTHROPIC_API_KEY from environment
const aiJudge = new AIJudge();

Manual Configuration

const aiJudge = new AIJudge({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o',
});

Evaluation

const evaluation = await aiJudge.evaluate({
  response: 'Agent response...',
  criteria: [
    'Response is accurate',
    'Response is helpful',
    'Response is concise'
  ],
});

console.log(evaluation);
// {
//   passed: true,
//   score: 0.95,
//   feedback: "The response meets all criteria..."
// }

Custom Matchers

import { initMatchers } from 'agent-testing-library/matchers';

initMatchers(aiJudge);

// Status
expect(response).toHaveStatus('success');

// Content
expect(response).toContainMessage('hello');
expect(response).toMatchPattern(/\d+/);

// AI-based
await expect(response).toMatchAIExpectation({
  criteria: 'Response is helpful and accurate',
});

await expect(response).toBeFactuallyAccurate({
  groundTruth: 'Expected facts...',
});

Environment Variables

Create a .env file:

# Apteva (or your adapter)
APTEVA_URL=http://localhost:3000/agents
APTEVA_API_KEY=agt_...
APTEVA_PROJECT_ID=1

# AI Judge (one or both)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The library automatically loads .env files when using the JSON test runner.

Running Tests

JSON Tests (Standalone)

# Create a runner script
echo 'import("agent-testing-library").then(({runJsonTest})=>runJsonTest("./test.json"))' > run.mjs
node run.mjs

# Or use in your code
import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./test.json');

Code Tests (Framework)

# With Bun
bun test

# With Vitest/Node
npm test

# Specific file
bun test ./tests/my-test.js

Examples

The package includes comprehensive examples:

examples/basic-apteva.js - Basic agent testing
examples/basic-apteva-tools.js - Agent with MCP tools
examples/basic-apteva-multi.js - Multi-agent coordination
examples/existing-agent-test.js - Using existing agents
examples/run-json-test.js - JSON test runner
tests/json/simple-apteva-test.json - JSON test example

API Reference

`runJsonTest(jsonPath)`

Run a JSON test suite.

Returns: Promise<Object>

{
  suite: string,
  passed: boolean,
  duration: number,
  totalTests: number,
  passedTests: number,
  failedTests: number,
  results: Array<{
    name: string,
    passed: boolean,
    duration: number,
    response: string,
    score?: number,
    feedback?: string,
    assertions: Array<{...}>
  }>
}

`createAgentTester(config)`

Create a test runner instance.

Options:

adapter - Adapter instance (required)
aiJudge - AI Judge instance (optional)
timeout - Default timeout in ms (default: 30000)
cleanup - Auto-cleanup agents (default: true)

Returns: Tester instance with methods:

createAgent(config) - Create new agent
useAgent(agentId) - Use existing agent
cleanup() - Cleanup all agents

`AIJudge(config)`

Create an AI Judge instance.

Options:

provider - 'openai' or 'anthropic' (optional, auto-detected)
apiKey - API key (optional, auto-detected from env)
model - Model name (optional)

Methods:

evaluate({ response, criteria, trace }) - Evaluate response

Adapters

Built-in Adapters

AptevaAdapter - For Apteva agent platform

Creating Custom Adapters

import { BaseAgentAdapter } from 'agent-testing-library/adapters';

export class MyAdapter extends BaseAgentAdapter {
  async createAgent(config) {
    // Implement
  }

  async sendMessage(agentId, message, options) {
    // Implement
  }

  async getAgent(agentId) {
    // Implement
  }

  async deleteAgent(agentId) {
    // Implement
  }
}

License

MIT

Contributing

Contributions welcome! Please check GitHub issues.

Built with ❤️ for the AI agent community

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Agent Testing Library

Philosophy

Features

Installation

Quick Start

Option 1: JSON-Based Testing (Standalone)

Option 2: Code-Based Testing (Traditional)

JSON Test Format

Basic Structure

Agent Types

Assertion Types

Environment Variables

Programmatic API

Standalone Function Testing

Using Existing Agents

Multi-Agent Testing

AI Judge

Auto-Detection

Manual Configuration

Evaluation

Custom Matchers

Environment Variables

Running Tests

JSON Tests (Standalone)

Code Tests (Framework)

Examples

API Reference

runJsonTest(jsonPath)

createAgentTester(config)

AIJudge(config)

Adapters

Built-in Adapters

Creating Custom Adapters

License

Contributing

`runJsonTest(jsonPath)`

`createAgentTester(config)`

`AIJudge(config)`