agentic-test
v3.0.0
Published
A lightweight, Jest-like testing framework for AI agent workflows. Test tool calls, outputs, latency, and costs with a familiar describe/test API.
Maintainers
Readme
Why agentic-test?
Existing tools for evaluating LLMs are either heavy platforms (LangSmith), Python-only (DeepEval), or focused on prompt comparison (promptfoo). None of them give you a simple, lightweight npm package to test multi-step AI agent workflows.
agentic-test fills that gap:
- 🎯 Jest-like API —
describe,test,beforeAll,afterAll— you already know how to use it - 🔧 Agent-specific assertions —
toolWasCalled(),toolCallOrder(),outputSemanticallyMatches() - ⚡ Zero-config — works with any agent via a simple adapter pattern
- 🎭 Mock provider — deterministic testing without LLM API calls
- 📊 Cost tracking — monitor token usage and estimated costs
- 🔄 CI/CD ready — JUnit XML, GitHub Actions annotations, JSON reporters
- 📦 Tiny footprint — only 2 runtime dependencies
Installation
npm install agentic-test --save-devQuick Start
1. Create a test file
Create __agent_tests__/my-agent.agent.test.ts:
import { describe, test, createMockAgent } from 'agentic-test';
import {
outputContains,
toolWasCalled,
toolCalledWith,
completedWithin,
toolCallOrder,
} from 'agentic-test/assertions';
// Mock agent for demo (replace with your real agent)
const agent = createMockAgent()
.on('weather')
.respondWith({
output: 'The weather in New York is sunny, 72°F.',
toolCalls: [
{ name: 'getWeather', arguments: { city: 'New York' }, result: { temp: 72 } },
],
tokens: 50,
})
.on('*')
.respondWith({ output: "I don't understand." })
.build();
describe('Weather Agent', { adapter: agent }, () => {
test('fetches weather for a city', {
input: 'What is the weather in New York?',
assertions: [
toolWasCalled('getWeather'),
toolCalledWith('getWeather', { city: 'New York' }),
outputContains('sunny'),
completedWithin(5000),
],
});
test('handles unknown requests', {
input: 'Tell me a joke',
assertions: [
outputContains("don't understand"),
],
});
});2. Connect your real agent
Replace the mock with your actual agent using createAdapter():
import { createAdapter } from 'agentic-test';
const agent = createAdapter(async (input) => {
const result = await myLangChainAgent.invoke({ input });
return {
output: result.output,
toolCalls: result.intermediateSteps.map(step => ({
name: step.action.tool,
arguments: step.action.toolInput,
result: step.observation,
})),
tokens: result.llmOutput?.tokenUsage?.totalTokens ?? 0,
duration: 0, // auto-measured
};
});3. Run tests
npx agentic-test runOutput:
🤖 agentic-test v0.1.0
Running 1 test suite(s)...
Weather Agent
✓ fetches weather for a city (15ms) [50 tokens] [1 tool calls]
✓ handles unknown requests (12ms)
──────────────────────────────────────────────────
PASS 2 passed (2 total)
Duration: 0.03s | Tokens: 50 | Suites: 1Assertions
Output Assertions
import {
outputContains, // substring check
outputDoesNotContain, // negation
outputMatches, // regex match
outputStartsWith, // prefix check
outputEndsWith, // suffix check
outputContainsIgnoreCase,// case-insensitive
outputMinLength, // minimum length
outputMaxLength, // maximum length
outputIsNotEmpty, // non-empty check
} from 'agentic-test/assertions';Tool Call Assertions
import {
toolWasCalled, // verify tool was invoked
toolNotCalled, // verify tool was NOT invoked
toolCalledWith, // verify tool args (partial match)
toolCallOrder, // verify execution sequence
toolCalledTimes, // verify call count
totalToolCalls, // verify total call count
toolReturnedResult, // verify tool result
} from 'agentic-test/assertions';Performance Assertions
import {
completedWithin, // latency budget (ms)
tokensBudget, // max tokens
maxToolCalls, // prevent infinite loops
minToolCalls, // ensure minimum work
costBudget, // estimated cost ($)
} from 'agentic-test/assertions';Semantic Assertions
import {
outputSemanticallyMatches, // word-overlap similarity
noHallucination, // groundedness check
custom, // user-defined assertion
} from 'agentic-test/assertions';
// Custom assertion example
custom('output is valid JSON', (response) => {
try { JSON.parse(response.output); return true; }
catch { return false; }
});Mock Agent
Test without LLM API calls using the mock provider:
import { createMockAgent } from 'agentic-test';
const mock = createMockAgent()
.on('book a flight') // string pattern (substring match)
.respondWith({
output: 'Flight booked!',
toolCalls: [{ name: 'bookFlight', arguments: { from: 'NYC' } }],
tokens: 100,
})
.on(/order #\d+/) // regex pattern
.respondWith({ output: 'Order found' })
.on('*') // default fallback
.respondWith({ output: 'Unknown request' })
.build();Reporters
| Reporter | Use Case |
|---|---|
| console (default) | Pretty terminal output with colors |
| json | Machine-readable JSON |
| junit | CI/CD pipelines (Jenkins, CircleCI) |
| github-actions | Inline PR annotations |
npx agentic-test run --reporter json --output results.json
npx agentic-test run --reporter junit --output results.xml
npx agentic-test run --reporter github-actionsCLI Reference
npx agentic-test run # run all *.agent.test.ts files
npx agentic-test run --filter weather # filter by name
npx agentic-test run --reporter json # change reporter
npx agentic-test run --retries 3 # retry flaky tests
npx agentic-test run --timeout 10000 # set timeout (ms)
npx agentic-test init # scaffold example testLifecycle Hooks
import { describe, test, beforeAll, afterAll, beforeEach, afterEach } from 'agentic-test';
describe('My Agent', { adapter: agent }, () => {
beforeAll(async () => { /* setup once */ });
afterAll(async () => { /* cleanup */ });
beforeEach(async () => { /* before each test */ });
afterEach(async () => { /* after each test */ });
test('...', { input: '...', assertions: [...] });
});Skip & Only
import { describe, test } from 'agentic-test';
// Skip
describe.skip('Skipped Suite', { adapter }, () => { ... });
test.skip('skipped test', { ... });
// Focus
describe.only('Only this suite', { adapter }, () => { ... });
test.only('only this test', { ... });Programmatic Usage
import { AgenticTestRunner, ConsoleReporter, describe, test, getRegisteredSuites } from 'agentic-test';
// Register suites...
describe('...', { adapter }, () => { ... });
// Run programmatically
const runner = new AgenticTestRunner();
runner.addReporter(new ConsoleReporter());
const result = await runner.run(getRegisteredSuites());
console.log(`${result.passed}/${result.totalTests} tests passed`);Roadmap
- [ ] Multi-agent orchestration testing (CrewAI, AutoGen)
- [ ] Streaming response evaluation
- [ ] Snapshot testing (record/replay agent runs)
- [ ] Statistical mode (run N times, assert on distributions)
- [ ] Built-in OpenAI and LangChain adapter packages
- [ ] Web dashboard for test results visualization
- [ ] VS Code extension
Contributing
Contributions are welcome! Please open an issue or PR on GitHub.
License
MIT © Satyam Das
