agent-sentinel

v0.1.1

Published

3 months ago

Test harness for AI agent workflows. pytest for AI agents.

0High
0Medium
0Low

mordecai1010

ai agent testing validation llm evaluation

agent-sentinel

Test harness for AI agent workflows. pytest for AI agents.

Define multi-turn agent conversations as YAML test cases, run them against any HTTP endpoint, and validate responses with matchers (contains, regex, JSON schema, tool calls).

Install

npm install -g agent-sentinel

Quick Start

Create a test file tests/my-agent.yaml:

name: "booking-agent-happy-path"
description: "Agent books a reservation"
agent:
  endpoint: "http://localhost:3000/chat"
  method: "POST"

steps:
  - user: "Book me a table for 2 at Nobu tonight at 8pm"
    expect:
      contains: ["Nobu", "8pm"]
      tool_calls:
        - name: "create_booking"
    max_latency_ms: 5000

  - user: "Actually, make it 4 people"
    expect:
      contains: ["4", "updated"]
      not_contains: ["error"]
      tool_calls:
        - name: "update_booking"
          args_match:
            guests: 4

assertions:
  total_cost_max: 0.05
  total_latency_max_ms: 15000

Run it:

sentinel run tests/my-agent.yaml

Or run all tests in a directory:

sentinel run tests/

CLI Options

sentinel run <path> [options]

Options:
  -e, --endpoint <url>       Override agent endpoint
  -m, --method <method>      Override HTTP method (GET, POST, PUT, PATCH, DELETE)
  -H, --header <key=value>   Add HTTP headers (repeatable)
  -t, --timeout-ms <ms>      Override request timeout

Matchers

| Matcher | Description | |---------|-------------| | contains | Response includes all specified strings | | not_contains | Response excludes all specified strings | | regex | Response matches regex patterns (with optional flags) | | schema | Response body validates against JSON schema | | tool_calls | Agent invoked expected tools with matching args | | max_latency_ms | Per-step latency threshold |

Assertions

Test-level assertions checked after all steps complete:

total_cost_max - Maximum total cost across all steps
total_latency_max_ms - Maximum total latency across all steps

Agent Endpoint Contract

Your agent endpoint should accept POST requests with:

{
  "messages": [
    { "role": "user", "content": "..." },
    { "role": "assistant", "content": "..." }
  ]
}

And return:

{
  "content": "Agent response text",
  "tool_calls": [{ "name": "tool_name", "args": {} }],
  "cost": 0.001
}

Roadmap

[ ] CLI target adapter (test command-line agents)
[ ] SDK target adapter (test agents via code)
[ ] LLM-as-judge evaluator (semantic correctness)
[ ] Regression detection (baseline comparison)
[ ] JSON/JUnit/HTML reporters
[ ] Parallel test execution

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agent-sentinel

Install

Quick Start

CLI Options

Matchers

Assertions

Agent Endpoint Contract

Roadmap

License