@ariaflowagents/eval
v0.7.0
Published
Deterministic conversation replay and assertions for AriaFlow transcripts
Readme
AriaFlow Eval
@ariaflowagents/eval provides deterministic replay and assertions for AriaFlow transcript events.
This package is intentionally separate from @ariaflowagents/core to avoid runtime bloat.
What It Solves
- Validate event contracts without depending on exact LLM wording.
- Replay stored
.jsonltranscripts in CI. - Catch regressions in tool-call integrity and flow behavior.
Install
bun add @ariaflowagents/evalExample
import { TranscriptReplay } from '@ariaflowagents/eval';
const replay = await TranscriptReplay.fromFile('./transcripts/run.jsonl');
replay
.expectEventOrder(['input', 'tool-call', 'tool-result', 'done'])
.expectToolCalled('start_order')
.expectNoToolMismatches()
.expectNoErrors()
.expectDone();Replay Tests from Stored Transcripts
Yes, this is the intended home for replay tests.
Typical workflow:
- Run production-like examples and store transcript files.
- Commit selected golden transcripts.
- In CI, load those files with
TranscriptReplay. - Assert structural contracts (event order, tool integrity, flow end behavior).
This makes tests stable even when model wording changes.
Golden Fixtures
Golden fixtures are committed in fixtures/golden/*.jsonl, with expectations in fixtures/golden.manifest.json.
Run the suite:
bun run --filter '@ariaflowagents/eval' test:goldenWorkspace shortcut from repo root:
bun run test:golden