@certaworks/agent-test-harness

v0.1.0

Published

23 days ago

Write repeatable unit and integration tests for agent pipelines, reasoning traces, tools, and consistency.

Downloads

100

0High
0Medium
0Low

blairhall

certaworks mcp ai-agent agent-safety testing test-harness agent-testing

Agent Test Harness

Type: Local SDK / CLI test framework

Value: Helps teams write repeatable unit and integration tests for agent outputs, tool traces, latency, cost reporting, mock tool behavior, and consistency.

Current Status

Complete as a local product slice. It is ready to use as a local TypeScript SDK and CLI harness, with deterministic fixtures and CI-friendly report artifacts.

Implemented Local Slice

JSON fixture loading with schema validation and stable error messages
Deterministic mock agent runner with scripted tool calls
Output, JSON-field, tool-call, tool-input, tool-output, tool-sequence, cost, latency, and custom assertions
Consistency runner for repeated output checks, including suite/CLI repeat handling
JSON, Markdown, and JUnit report writers with local run history
agent-test CLI for fixture execution and pass/fail exit codes
Package exports, runtime files, generated declarations, and packed CLI artifact checks

CLI

agent-test suite.json --out .agent-test-results --reporter json --reporter markdown --reporter junit

Fixture example:

{
  "tests": [
    {
      "id": "weather-search",
      "description": "uses search tool",
      "prompt": "weather in Tulsa",
      "mockTools": [
        { "name": "search", "response": { "result": "sunny" } }
      ],
      "scriptedToolCalls": [
        { "tool": "search", "input": { "query": "Tulsa weather" } }
      ],
      "assertions": [
        { "type": "tool_called", "value": "search" },
        { "type": "tool_input", "value": "search.query=Tulsa weather" },
        { "type": "tool_output", "value": "search.result=sunny" }
      ]
    }
  ]
}

Reports

The CLI and SDK can write:

latest.json
latest.md
latest.xml for JUnit-compatible CI readers
history.json for local trend records

custom assertions are SDK-only because they require a JavaScript function. JSON fixtures reject custom assertions and should use the built-in assertion types.

Verification

Fresh suite verification on 2026-05-28:

npm test passed, 33/33 tests
npm run build passed
CLI smoke passed for fixture loading, failed-suite exit code, JSON/Markdown/JUnit report writing, history writing, and --help

Current Limits

No hosted runner, managed CI service, account model, public npm publication, or live checkout is included.
No live provider adapter is bundled; the shipped runner is deterministic and mock-driven.
JUnit output is CI-friendly, but CI integration templates are still roadmap.
History is local artifact history, not a hosted analytics dashboard.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme