agent-testing-library
v1.5.1
Published
Testing framework for AI agents and multi-agent systems with AI Judge verification (Node & Bun compatible)
Maintainers
Readme
Agent Testing Library
A powerful, flexible testing framework for AI agents and multi-agent systems with built-in AI Judge verification. Supports both traditional test frameworks (Bun/Vitest) and standalone JSON-based testing.
Philosophy
Test behavior, not implementation - Focus on what agents do, not how they do it.
Features
- 🔌 Adapter System - Works with any agent framework (Apteva, LangChain, CrewAI, etc.)
- 🤖 AI Judge - AI-based verification for non-deterministic outputs
- 📋 JSON Test Definitions - Declarative, portable test suites
- 🚀 Standalone Mode - No test framework required
- 👥 Multi-Agent Native - First-class support for agent interactions
- ⚡ Node & Bun Compatible - Works everywhere
- 🎯 Developer-Friendly - Familiar, intuitive API
- 🧪 Rich Assertions - Custom matchers for agent testing
Installation
npm install agent-testing-library
# or
yarn add agent-testing-library
# or
bun add agent-testing-libraryQuick Start
Option 1: JSON-Based Testing (Standalone)
Perfect for CI/CD, serverless functions, or simple test automation.
1. Create a JSON test file:
{
"name": "Agent Test Suite",
"config": {
"adapter": {
"type": "apteva",
"baseUrl": "http://localhost:3000/agents",
"apiKey": "agt_...",
"projectId": "1"
},
"aiJudge": {
"enabled": true
}
},
"agents": [
{
"id": "my-agent",
"type": "create",
"config": {
"provider": "anthropic",
"model": "claude-haiku-4-5",
"systemPrompt": "You are a helpful assistant."
}
}
],
"tests": [
{
"name": "should answer questions",
"agent": "my-agent",
"message": "What is 2+2?",
"assertions": [
{"type": "status", "value": "success"},
{"type": "contains", "value": "4"},
{
"type": "ai-judge",
"criteria": ["Response correctly answers the math question"],
"minScore": 0.8
}
]
}
]
}2. Run it (no test framework needed):
import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./my-test.json');
console.log(results);
// {
// suite: "Agent Test Suite",
// passed: true,
// results: [...]
// }Or use the CLI runner:
node -e "import('agent-testing-library').then(({runJsonTest})=>runJsonTest('./test.json'))"Option 2: Code-Based Testing (Traditional)
Full programmatic control with test frameworks.
import { describe, it, expect, beforeAll, afterAll } from 'bun:test'; // or 'vitest'
import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';
import { initMatchers } from 'agent-testing-library/matchers';
const aiJudge = new AIJudge(); // Auto-detects API key from env
const tester = createAgentTester({
adapter: new AptevaAdapter({
baseUrl: process.env.APTEVA_URL,
apiKey: process.env.APTEVA_API_KEY,
projectId: process.env.APTEVA_PROJECT_ID,
}),
aiJudge,
});
initMatchers(aiJudge);
describe('My Agent', () => {
let agent;
beforeAll(async () => {
agent = await tester.createAgent({
name: 'test-agent',
config: {
provider: 'anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful assistant.',
},
});
});
afterAll(async () => {
await tester.cleanup();
});
it('should respond to greetings', async () => {
const response = await agent.chat('Hello!');
expect(response).toHaveStatus('success');
expect(response.message).toBeDefined();
});
it('should provide accurate answers', async () => {
const response = await agent.chat('What is 2+2?');
// AI Judge verification
await expect(response).toMatchAIExpectation({
criteria: ['Response correctly states that 2+2=4'],
});
});
});JSON Test Format
Basic Structure
{
"name": "Test Suite Name",
"description": "Optional description",
"config": {
"adapter": {
"type": "apteva",
"baseUrl": "${APTEVA_URL}",
"apiKey": "${APTEVA_API_KEY}",
"projectId": "${APTEVA_PROJECT_ID}"
},
"aiJudge": {
"enabled": true,
"provider": "openai",
"model": "gpt-4"
}
},
"agents": [...],
"tests": [...]
}Agent Types
Create New Agent:
{
"id": "my-agent",
"type": "create",
"config": {
"name": "agent-name",
"provider": "anthropic",
"model": "claude-haiku-4-5",
"systemPrompt": "You are..."
}
}Use Existing Agent:
{
"id": "existing-agent",
"type": "use",
"agentId": "agent-abc123"
}Assertion Types
Status Check:
{"type": "status", "value": "success"}Content Match:
{"type": "contains", "value": "expected text"}AI Judge Evaluation:
{
"type": "ai-judge",
"criteria": [
"Response is accurate",
"Response is helpful"
],
"minScore": 0.8
}Environment Variables
Use ${VAR_NAME} to inject environment variables:
{
"config": {
"adapter": {
"apiKey": "${APTEVA_API_KEY}"
}
}
}Programmatic API
Standalone Function Testing
import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';
async function testMyAgent() {
const aiJudge = new AIJudge(); // Auto-detects OPENAI_API_KEY or ANTHROPIC_API_KEY
const tester = createAgentTester({
adapter: new AptevaAdapter({
baseUrl: process.env.APTEVA_URL,
apiKey: process.env.APTEVA_API_KEY,
projectId: process.env.APTEVA_PROJECT_ID,
}),
aiJudge,
});
// Create agent
const agent = await tester.createAgent({
name: 'my-agent',
config: {
provider: 'anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are helpful.',
},
});
// Test
const response = await agent.chat('What is 2+2?');
// AI Judge evaluation
const evaluation = await aiJudge.evaluate({
response: response.message,
criteria: ['Response correctly answers that 2+2=4'],
trace: response.trace,
});
// Cleanup
await tester.cleanup();
return {
passed: evaluation.passed,
score: evaluation.score,
feedback: evaluation.feedback,
};
}
// Run it
testMyAgent().then(console.log);Using Existing Agents
// Attach to existing agent (won't be deleted)
const agent = await tester.useAgent('agent-abc123');
// Create persistent agent
const agent = await tester.createAgent({
name: 'persistent-agent',
skipCleanup: true, // Won't be deleted during cleanup
config: {...},
});Multi-Agent Testing
// JSON format
{
"agents": [
{"id": "researcher", "type": "create", "config": {...}},
{"id": "writer", "type": "create", "config": {...}}
],
"tests": [
{
"name": "agent coordination",
"agent": "researcher",
"message": "Research AI safety",
"store": "research"
},
{
"name": "write summary",
"agent": "writer",
"message": "Summarize: ${research.message}",
"assertions": [...]
}
]
}// Code format
const researcher = await tester.createAgent({...});
const writer = await tester.createAgent({...});
const research = await researcher.chat('Research AI safety');
const article = await writer.chat(`Write about: ${research.message}`);AI Judge
AI Judge provides intelligent verification of non-deterministic agent outputs.
Auto-Detection
// Automatically detects OPENAI_API_KEY or ANTHROPIC_API_KEY from environment
const aiJudge = new AIJudge();Manual Configuration
const aiJudge = new AIJudge({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o',
});Evaluation
const evaluation = await aiJudge.evaluate({
response: 'Agent response...',
criteria: [
'Response is accurate',
'Response is helpful',
'Response is concise'
],
});
console.log(evaluation);
// {
// passed: true,
// score: 0.95,
// feedback: "The response meets all criteria..."
// }Custom Matchers
import { initMatchers } from 'agent-testing-library/matchers';
initMatchers(aiJudge);
// Status
expect(response).toHaveStatus('success');
// Content
expect(response).toContainMessage('hello');
expect(response).toMatchPattern(/\d+/);
// AI-based
await expect(response).toMatchAIExpectation({
criteria: 'Response is helpful and accurate',
});
await expect(response).toBeFactuallyAccurate({
groundTruth: 'Expected facts...',
});Environment Variables
Create a .env file:
# Apteva (or your adapter)
APTEVA_URL=http://localhost:3000/agents
APTEVA_API_KEY=agt_...
APTEVA_PROJECT_ID=1
# AI Judge (one or both)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...The library automatically loads .env files when using the JSON test runner.
Running Tests
JSON Tests (Standalone)
# Create a runner script
echo 'import("agent-testing-library").then(({runJsonTest})=>runJsonTest("./test.json"))' > run.mjs
node run.mjs
# Or use in your code
import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./test.json');Code Tests (Framework)
# With Bun
bun test
# With Vitest/Node
npm test
# Specific file
bun test ./tests/my-test.jsExamples
The package includes comprehensive examples:
examples/basic-apteva.js- Basic agent testingexamples/basic-apteva-tools.js- Agent with MCP toolsexamples/basic-apteva-multi.js- Multi-agent coordinationexamples/existing-agent-test.js- Using existing agentsexamples/run-json-test.js- JSON test runnertests/json/simple-apteva-test.json- JSON test example
API Reference
runJsonTest(jsonPath)
Run a JSON test suite.
Returns: Promise<Object>
{
suite: string,
passed: boolean,
duration: number,
totalTests: number,
passedTests: number,
failedTests: number,
results: Array<{
name: string,
passed: boolean,
duration: number,
response: string,
score?: number,
feedback?: string,
assertions: Array<{...}>
}>
}createAgentTester(config)
Create a test runner instance.
Options:
adapter- Adapter instance (required)aiJudge- AI Judge instance (optional)timeout- Default timeout in ms (default: 30000)cleanup- Auto-cleanup agents (default: true)
Returns: Tester instance with methods:
createAgent(config)- Create new agentuseAgent(agentId)- Use existing agentcleanup()- Cleanup all agents
AIJudge(config)
Create an AI Judge instance.
Options:
provider- 'openai' or 'anthropic' (optional, auto-detected)apiKey- API key (optional, auto-detected from env)model- Model name (optional)
Methods:
evaluate({ response, criteria, trace })- Evaluate response
Adapters
Built-in Adapters
- AptevaAdapter - For Apteva agent platform
Creating Custom Adapters
import { BaseAgentAdapter } from 'agent-testing-library/adapters';
export class MyAdapter extends BaseAgentAdapter {
async createAgent(config) {
// Implement
}
async sendMessage(agentId, message, options) {
// Implement
}
async getAgent(agentId) {
// Implement
}
async deleteAgent(agentId) {
// Implement
}
}License
MIT
Contributing
Contributions welcome! Please check GitHub issues.
Built with ❤️ for the AI agent community
