@certaworks/agent-test-harness
v0.1.0
Published
Write repeatable unit and integration tests for agent pipelines, reasoning traces, tools, and consistency.
Downloads
100
Maintainers
Readme
Agent Test Harness
Type: Local SDK / CLI test framework
Value: Helps teams write repeatable unit and integration tests for agent outputs, tool traces, latency, cost reporting, mock tool behavior, and consistency.
Current Status
Complete as a local product slice. It is ready to use as a local TypeScript SDK and CLI harness, with deterministic fixtures and CI-friendly report artifacts.
Implemented Local Slice
- JSON fixture loading with schema validation and stable error messages
- Deterministic mock agent runner with scripted tool calls
- Output, JSON-field, tool-call, tool-input, tool-output, tool-sequence, cost, latency, and custom assertions
- Consistency runner for repeated output checks, including suite/CLI repeat handling
- JSON, Markdown, and JUnit report writers with local run history
agent-testCLI for fixture execution and pass/fail exit codes- Package
exports, runtimefiles, generated declarations, and packed CLI artifact checks
CLI
agent-test suite.json --out .agent-test-results --reporter json --reporter markdown --reporter junitFixture example:
{
"tests": [
{
"id": "weather-search",
"description": "uses search tool",
"prompt": "weather in Tulsa",
"mockTools": [
{ "name": "search", "response": { "result": "sunny" } }
],
"scriptedToolCalls": [
{ "tool": "search", "input": { "query": "Tulsa weather" } }
],
"assertions": [
{ "type": "tool_called", "value": "search" },
{ "type": "tool_input", "value": "search.query=Tulsa weather" },
{ "type": "tool_output", "value": "search.result=sunny" }
]
}
]
}Reports
The CLI and SDK can write:
latest.jsonlatest.mdlatest.xmlfor JUnit-compatible CI readershistory.jsonfor local trend records
custom assertions are SDK-only because they require a JavaScript function. JSON fixtures reject custom assertions and should use the built-in assertion types.
Verification
Fresh suite verification on 2026-05-28:
npm testpassed, 33/33 testsnpm run buildpassed- CLI smoke passed for fixture loading, failed-suite exit code, JSON/Markdown/JUnit report writing, history writing, and
--help
Current Limits
- No hosted runner, managed CI service, account model, public npm publication, or live checkout is included.
- No live provider adapter is bundled; the shipped runner is deterministic and mock-driven.
- JUnit output is CI-friendly, but CI integration templates are still roadmap.
- History is local artifact history, not a hosted analytics dashboard.
