agent-sentinel
v0.1.1
Published
Test harness for AI agent workflows. pytest for AI agents.
Downloads
165
Maintainers
Readme
agent-sentinel
Test harness for AI agent workflows. pytest for AI agents.
Define multi-turn agent conversations as YAML test cases, run them against any HTTP endpoint, and validate responses with matchers (contains, regex, JSON schema, tool calls).
Install
npm install -g agent-sentinelQuick Start
Create a test file tests/my-agent.yaml:
name: "booking-agent-happy-path"
description: "Agent books a reservation"
agent:
endpoint: "http://localhost:3000/chat"
method: "POST"
steps:
- user: "Book me a table for 2 at Nobu tonight at 8pm"
expect:
contains: ["Nobu", "8pm"]
tool_calls:
- name: "create_booking"
max_latency_ms: 5000
- user: "Actually, make it 4 people"
expect:
contains: ["4", "updated"]
not_contains: ["error"]
tool_calls:
- name: "update_booking"
args_match:
guests: 4
assertions:
total_cost_max: 0.05
total_latency_max_ms: 15000Run it:
sentinel run tests/my-agent.yamlOr run all tests in a directory:
sentinel run tests/CLI Options
sentinel run <path> [options]
Options:
-e, --endpoint <url> Override agent endpoint
-m, --method <method> Override HTTP method (GET, POST, PUT, PATCH, DELETE)
-H, --header <key=value> Add HTTP headers (repeatable)
-t, --timeout-ms <ms> Override request timeoutMatchers
| Matcher | Description |
|---------|-------------|
| contains | Response includes all specified strings |
| not_contains | Response excludes all specified strings |
| regex | Response matches regex patterns (with optional flags) |
| schema | Response body validates against JSON schema |
| tool_calls | Agent invoked expected tools with matching args |
| max_latency_ms | Per-step latency threshold |
Assertions
Test-level assertions checked after all steps complete:
total_cost_max- Maximum total cost across all stepstotal_latency_max_ms- Maximum total latency across all steps
Agent Endpoint Contract
Your agent endpoint should accept POST requests with:
{
"messages": [
{ "role": "user", "content": "..." },
{ "role": "assistant", "content": "..." }
]
}And return:
{
"content": "Agent response text",
"tool_calls": [{ "name": "tool_name", "args": {} }],
"cost": 0.001
}Roadmap
- [ ] CLI target adapter (test command-line agents)
- [ ] SDK target adapter (test agents via code)
- [ ] LLM-as-judge evaluator (semantic correctness)
- [ ] Regression detection (baseline comparison)
- [ ] JSON/JUnit/HTML reporters
- [ ] Parallel test execution
License
MIT
