@kuralle-agents/eval

v0.6.1

Published

13 hours ago

Deterministic conversation replay and assertions for Kuralle transcripts

0High
0Medium
0Low

@kuralle-agents/eval

Deterministic replay and assertions for Kuralle transcript events — kept separate from @kuralle-agents/core to avoid test-only bloat in production bundles.

Install

npm install @kuralle-agents/eval

No peers required.

What it does

Load stored .jsonl transcripts, replay the event stream, and assert structural contracts — event order, tool-call integrity, flow end behavior — without depending on exact LLM wording.

Key exports:

TranscriptReplay — fluent assertion API over a loaded transcript.
ReplayAssertionError — thrown when a TranscriptReplay assertion fails.
readTranscriptFile / readTranscriptDirectory / listTranscriptFiles — I/O helpers.
loadGoldenManifest / runGoldenSuite — golden-fixture suite runner.
registerScorer / getScorer / listScorers — pluggable scoring registry.

Usage

import { TranscriptReplay } from '@kuralle-agents/eval';

const replay = await TranscriptReplay.fromFile('./transcripts/order-flow.jsonl');

replay
  .expectEventOrder(['input', 'tool-call', 'tool-result', 'done'])
  .expectToolCalled('start_order')
  .expectNoToolMismatches()
  .expectNoErrors()
  .expectDone();

Golden fixtures

Commit golden transcripts as .jsonl files and run the suite in CI:

import { loadGoldenManifest, runGoldenSuite } from '@kuralle-agents/eval';

const manifest = await loadGoldenManifest('./fixtures/golden.manifest.json');
const results = await runGoldenSuite(manifest);

Typical workflow:

Run production-like examples; store transcript files.
Commit selected golden transcripts under fixtures/.
In CI, load with TranscriptReplay and assert structural contracts.

Tests stay stable even when model wording changes.

@kuralle-agents/core — runtime that produces the transcript events this package replays.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@kuralle-agents/eval

Install

What it does

Usage

Golden fixtures

Related