npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@elsium-ai/testing

v0.6.0

Published

Testing utilities, mock providers, fixtures, and eval framework for ElsiumAI

Downloads

1,220

Readme

@elsium-ai/testing

Testing utilities, mock providers, fixtures, and eval framework for ElsiumAI.

npm License: MIT

Install

npm install @elsium-ai/testing --save-dev

What's Inside

| Category | Exports | Description | |---|---|---| | Mock Provider | mockProvider, MockProvider, MockProviderOptions, MockResponseConfig | Zero-latency LLM provider for unit tests | | Fixtures | createFixture, loadFixture, createRecorder, Fixture, FixtureEntry, FixtureRecorder | Record, save, and replay request/response pairs | | Eval | runEvalSuite, formatEvalReport, EvalCase, EvalCriterion, EvalResult, CriterionResult, EvalSuiteConfig, EvalSuiteResult, LLMJudge | Evaluation framework with built-in and custom criteria | | Snapshot | createSnapshotStore, hashOutput, testSnapshot, PromptSnapshot, SnapshotStore, SnapshotTestResult | Hash-based snapshot testing for LLM outputs | | Prompts | createPromptRegistry, definePrompt, PromptDefinition, PromptDiff, DiffLine, PromptRegistry | Versioned prompt registry with diff and rendering | | Regression | createRegressionSuite, RegressionBaseline, RegressionResult, RegressionDetail, RegressionSuite | Baseline-driven regression detection | | Replay | createReplayRecorder, createReplayPlayer, ReplayEntry, ReplayRecorder, ReplayPlayer | Record and replay raw LLM completion calls | | Pinning | createPinStore, pinOutput, Pin, PinStore, PinResult | Pin expected outputs and detect drift | | Determinism | assertDeterministic, assertStable, DeterminismResult, StabilityResult | Verify output consistency across repeated runs |


Mock Provider

Create a mock LLMProvider that returns pre-configured responses without making real API calls.

MockResponseConfig

interface MockResponseConfig {
  content?: string
  toolCalls?: Array<{
    id?: string
    name: string
    arguments: Record<string, unknown>
  }>
  stopReason?: 'end_turn' | 'tool_use' | 'max_tokens' | 'stop_sequence'
  usage?: Partial<TokenUsage>
  model?: string
  delay?: number
}

MockProviderOptions

interface MockProviderOptions {
  responses?: MockResponseConfig[]
  defaultResponse?: MockResponseConfig
  onRequest?: (request: CompletionRequest) => void
}

| Field | Description | |---|---| | responses | Ordered list of responses returned sequentially per call | | defaultResponse | Fallback response used when responses is exhausted | | onRequest | Callback invoked on every request (useful for assertions) |

MockProvider

interface MockProvider extends LLMProvider {
  readonly calls: CompletionRequest[]
  readonly callCount: number
  reset(): void
}

Extends the standard LLMProvider interface with inspection helpers. calls stores every CompletionRequest received, callCount returns the total, and reset() clears both the call log and the response index.

mockProvider()

Creates a mock provider instance.

function mockProvider(options?: MockProviderOptions): MockProvider

| Parameter | Type | Default | Description | |---|---|---|---| | options | MockProviderOptions | {} | Configuration for responses and callbacks |

Returns: MockProvider

import { mockProvider } from '@elsium-ai/testing'

const mock = mockProvider({
  responses: [
    { content: 'Hello!' },
    { content: 'Goodbye!', stopReason: 'end_turn' },
  ],
  defaultResponse: { content: 'Default reply' },
  onRequest: (req) => console.log('Model:', req.model),
})

const first = await mock.complete({ messages: [{ role: 'user', content: 'Hi' }] })
// first.message.content === 'Hello!'

console.log(mock.callCount) // 1
mock.reset()
console.log(mock.callCount) // 0

Fixtures

Capture request/response pairs as reusable fixtures that can be serialized to JSON and replayed as mock providers.

FixtureEntry

interface FixtureEntry {
  request: {
    messages: Array<{ role: string; content: string }>
    model?: string
    system?: string
  }
  response: MockResponseConfig
  timestamp?: string
}

Fixture

interface Fixture {
  readonly name: string
  readonly entries: FixtureEntry[]
  toProvider(options?: { matching?: 'sequential' | 'request-hash' }): MockProvider
  toJSON(): string
}

| Method | Description | |---|---| | toProvider() | Converts the fixture into a MockProvider. Pass { matching: 'request-hash' } to match responses by message content hash instead of sequential order. | | toJSON() | Serializes the fixture (with timestamps) to a JSON string. |

createFixture()

Creates a fixture from a name and an array of entries.

function createFixture(name: string, entries: FixtureEntry[]): Fixture

| Parameter | Type | Description | |---|---|---| | name | string | Human-readable fixture name | | entries | FixtureEntry[] | Array of request/response pairs |

Returns: Fixture

import { createFixture } from '@elsium-ai/testing'

const fixture = createFixture('greeting-flow', [
  {
    request: { messages: [{ role: 'user', content: 'Hello' }] },
    response: { content: 'Hi there!' },
  },
])

const provider = fixture.toProvider()
const res = await provider.complete({
  messages: [{ role: 'user', content: 'Hello' }],
})
// res.message.content === 'Hi there!'

loadFixture()

Deserializes a JSON string back into a Fixture.

function loadFixture(json: string): Fixture

| Parameter | Type | Description | |---|---|---| | json | string | JSON string previously produced by fixture.toJSON() |

Returns: Fixture

import { createFixture, loadFixture } from '@elsium-ai/testing'

const original = createFixture('test', [
  {
    request: { messages: [{ role: 'user', content: 'ping' }] },
    response: { content: 'pong' },
  },
])

const json = original.toJSON()
const restored = loadFixture(json)
// restored.name === 'test'

FixtureRecorder

interface FixtureRecorder {
  wrap(provider: MockProvider): MockProvider
  getEntries(): FixtureEntry[]
  toFixture(name: string): Fixture
  clear(): void
}

createRecorder()

Creates a recorder that intercepts complete() calls and captures request/response pairs.

function createRecorder(): FixtureRecorder

Returns: FixtureRecorder

import { mockProvider, createRecorder } from '@elsium-ai/testing'

const recorder = createRecorder()
const mock = mockProvider({ responses: [{ content: 'recorded response' }] })
const wrapped = recorder.wrap(mock)

await wrapped.complete({
  messages: [{ role: 'user', content: 'capture this' }],
})

const fixture = recorder.toFixture('my-fixture')
console.log(fixture.entries.length) // 1

Eval

A structured evaluation framework for assessing LLM outputs against configurable criteria.

EvalCase

interface EvalCase {
  name: string
  input: string
  expected?: string
  criteria?: EvalCriterion[]
  tags?: string[]
}

EvalCriterion

A discriminated union of all supported criterion types:

type EvalCriterion =
  | { type: 'contains'; value: string; caseSensitive?: boolean }
  | { type: 'not_contains'; value: string; caseSensitive?: boolean }
  | { type: 'matches'; pattern: string; flags?: string }
  | { type: 'length_min'; value: number }
  | { type: 'length_max'; value: number }
  | { type: 'json_valid' }
  | { type: 'json_matches'; schema: Record<string, unknown> }
  | { type: 'custom'; name: string; fn: (output: string) => boolean }
  | { type: 'llm_judge'; prompt: string; judge: LLMJudge; threshold?: number }
  | { type: 'semantic_similarity'; reference: string; threshold?: number }
  | { type: 'factual_accuracy'; facts: string[]; threshold?: number }

| Criterion | Description | |---|---| | contains | Output must contain value (case-insensitive by default) | | not_contains | Output must not contain value | | matches | Output must match the regex pattern | | length_min | Output length must be at least value characters | | length_max | Output length must be at most value characters | | json_valid | Output must be valid JSON | | json_matches | Output must be valid JSON matching schema (key presence + type check) | | custom | Output is passed to fn; must return true to pass | | llm_judge | An LLM judge scores the output; must meet threshold (default 0.7) | | semantic_similarity | Word-overlap similarity against reference; must meet threshold (default 0.7) | | factual_accuracy | Checks how many facts appear in the output; must meet threshold (default 0.7) |

LLMJudge

type LLMJudge = (prompt: string) => Promise<{ score: number; reasoning: string }>

EvalResult

interface EvalResult {
  name: string
  passed: boolean
  score: number
  criteria: CriterionResult[]
  input: string
  output: string
  durationMs: number
  tags: string[]
}

CriterionResult

interface CriterionResult {
  type: string
  passed: boolean
  message: string
}

EvalSuiteConfig

interface EvalSuiteConfig {
  name: string
  cases: EvalCase[]
  runner: (input: string) => Promise<string>
  concurrency?: number
}

| Field | Description | |---|---| | name | Suite name for reporting | | cases | Array of eval cases to run | | runner | Function that takes an input string and returns the LLM output | | concurrency | Max parallel eval cases (default 1 for sequential execution) |

EvalSuiteResult

interface EvalSuiteResult {
  name: string
  total: number
  passed: number
  failed: number
  score: number
  results: EvalResult[]
  durationMs: number
}

runEvalSuite()

Runs all eval cases through the runner and evaluates each against its criteria.

function runEvalSuite(config: EvalSuiteConfig): Promise<EvalSuiteResult>

| Parameter | Type | Description | |---|---|---| | config | EvalSuiteConfig | Suite configuration including cases and runner |

Returns: Promise<EvalSuiteResult>

import { runEvalSuite, formatEvalReport } from '@elsium-ai/testing'

const result = await runEvalSuite({
  name: 'Sentiment classifier',
  cases: [
    {
      name: 'positive review',
      input: 'This product is amazing!',
      criteria: [
        { type: 'contains', value: 'positive' },
        { type: 'length_max', value: 50 },
      ],
    },
    {
      name: 'negative review',
      input: 'Terrible experience.',
      expected: 'negative',
    },
  ],
  runner: async (input) => {
    // Call your LLM or classifier here
    return input.includes('amazing') ? 'positive' : 'negative'
  },
  concurrency: 2,
})

console.log(result.score)  // 0..1
console.log(result.passed) // number of passing cases

formatEvalReport()

Formats an EvalSuiteResult into a human-readable string report.

function formatEvalReport(result: EvalSuiteResult): string

| Parameter | Type | Description | |---|---|---| | result | EvalSuiteResult | The result object returned by runEvalSuite |

Returns: string

import { runEvalSuite, formatEvalReport } from '@elsium-ai/testing'

const result = await runEvalSuite({ /* ... */ })
console.log(formatEvalReport(result))
// Output:
//   Eval Suite: Sentiment classifier
//   --------------------------------------------------
//   [PASS] positive review (3ms)
//   [PASS] negative review (1ms)
//   --------------------------------------------------
//   Score: 100.0% | 2/2 passed | 4ms

Snapshot

Hash-based snapshot testing that detects when LLM outputs change between runs.

PromptSnapshot

interface PromptSnapshot {
  name: string
  request: {
    system?: string
    messages: Array<{ role: string; content: string }>
    model?: string
  }
  outputHash: string
  timestamp: string
}

SnapshotStore

interface SnapshotStore {
  get(name: string): PromptSnapshot | undefined
  set(name: string, snapshot: PromptSnapshot): void
  getAll(): PromptSnapshot[]
  toJSON(): string
}

createSnapshotStore()

Creates an in-memory snapshot store, optionally seeded with existing snapshots.

function createSnapshotStore(existing?: PromptSnapshot[]): SnapshotStore

| Parameter | Type | Default | Description | |---|---|---|---| | existing | PromptSnapshot[] | undefined | Previously saved snapshots to preload |

Returns: SnapshotStore

import { createSnapshotStore } from '@elsium-ai/testing'

const store = createSnapshotStore()
console.log(store.getAll().length) // 0

hashOutput()

Produces a SHA-256 hex digest of the given string.

function hashOutput(output: string): string

| Parameter | Type | Description | |---|---|---| | output | string | The output string to hash |

Returns: string -- SHA-256 hex hash

import { hashOutput } from '@elsium-ai/testing'

const hash = hashOutput('Hello, world!')
// 'dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f'

SnapshotTestResult

interface SnapshotTestResult {
  name: string
  status: 'new' | 'match' | 'changed'
  previousHash?: string
  currentHash: string
  output: string
}

testSnapshot()

Runs the provided function, hashes its output, and compares the hash against the stored snapshot.

function testSnapshot(
  name: string,
  store: SnapshotStore,
  runner: () => Promise<string>,
  request?: Partial<CompletionRequest>,
): Promise<SnapshotTestResult>

| Parameter | Type | Description | |---|---|---| | name | string | Unique snapshot name | | store | SnapshotStore | The store to read/write snapshots | | runner | () => Promise<string> | Function that produces the output to snapshot | | request | Partial<CompletionRequest> | Optional request metadata stored in the snapshot |

Returns: Promise<SnapshotTestResult> -- status is 'new' on first run, 'match' if the hash is unchanged, or 'changed' if it differs.

import { createSnapshotStore, testSnapshot } from '@elsium-ai/testing'

const store = createSnapshotStore()

const result = await testSnapshot('greeting', store, async () => {
  return 'Hello, world!'
})

console.log(result.status) // 'new' on first run

const result2 = await testSnapshot('greeting', store, async () => {
  return 'Hello, world!'
})

console.log(result2.status) // 'match'

Prompts

A versioned prompt registry for managing, rendering, and diffing prompt templates.

PromptDefinition

interface PromptDefinition {
  name: string
  version: string
  content: string
  variables: string[]
  metadata?: Record<string, unknown>
}

PromptDiff

interface PromptDiff {
  name: string
  fromVersion: string
  toVersion: string
  changes: DiffLine[]
}

DiffLine

interface DiffLine {
  type: 'added' | 'removed' | 'unchanged'
  lineNumber: number
  content: string
}

PromptRegistry

interface PromptRegistry {
  register(name: string, prompt: PromptDefinition): void
  get(name: string, version?: string): PromptDefinition | undefined
  getLatest(name: string): PromptDefinition | undefined
  list(): Array<{ name: string; versions: string[] }>
  diff(name: string, fromVersion: string, toVersion: string): PromptDiff | null
  render(name: string, variables: Record<string, string>, version?: string): string
  getVersions(name: string): string[]
}

| Method | Description | |---|---| | register | Stores a prompt under its name and version | | get | Retrieves a specific version, or the latest if version is omitted | | getLatest | Returns the highest semver version for a prompt | | list | Lists all prompt names with their available versions | | diff | Computes a line-by-line diff between two versions | | render | Replaces {{variable}} placeholders in the prompt content | | getVersions | Returns all versions for a prompt sorted by semver |

definePrompt()

A convenience function that returns a shallow copy of the given prompt definition.

function definePrompt(config: PromptDefinition): PromptDefinition

| Parameter | Type | Description | |---|---|---| | config | PromptDefinition | The prompt definition to register |

Returns: PromptDefinition

import { definePrompt } from '@elsium-ai/testing'

const prompt = definePrompt({
  name: 'classifier',
  version: '1.0.0',
  content: 'Classify the following text as {{label}}: {{text}}',
  variables: ['label', 'text'],
})

createPromptRegistry()

Creates an empty prompt registry.

function createPromptRegistry(): PromptRegistry

Returns: PromptRegistry

import { definePrompt, createPromptRegistry } from '@elsium-ai/testing'

const registry = createPromptRegistry()

const v1 = definePrompt({
  name: 'summarizer',
  version: '1.0.0',
  content: 'Summarize: {{text}}',
  variables: ['text'],
})

const v2 = definePrompt({
  name: 'summarizer',
  version: '2.0.0',
  content: 'Provide a concise summary of: {{text}}',
  variables: ['text'],
})

registry.register('summarizer', v1)
registry.register('summarizer', v2)

// Render with the latest version
const output = registry.render('summarizer', { text: 'A long article...' })
// 'Provide a concise summary of: A long article...'

// Diff between versions
const diff = registry.diff('summarizer', '1.0.0', '2.0.0')
// diff.changes includes added/removed/unchanged lines

Regression

Baseline-driven regression detection that compares current LLM outputs to previously recorded baselines.

RegressionBaseline

interface RegressionBaseline {
  name: string
  cases: Array<{
    input: string
    output: string
    score: number
    timestamp: number
  }>
  createdAt: number
  updatedAt: number
}

RegressionResult

interface RegressionResult {
  name: string
  totalCases: number
  regressions: RegressionDetail[]
  improvements: RegressionDetail[]
  unchanged: number
  overallScore: number
  baselineScore: number
}

RegressionDetail

interface RegressionDetail {
  input: string
  baselineOutput: string
  currentOutput: string
  baselineScore: number
  currentScore: number
  delta: number
}

RegressionSuite

interface RegressionSuite {
  load(path: string): Promise<void>
  save(path: string): Promise<void>
  run(
    runner: (input: string) => Promise<string>,
    scorer?: (input: string, output: string) => Promise<number>,
  ): Promise<RegressionResult>
  addCase(input: string, output: string, score: number): void
  readonly baseline: RegressionBaseline | null
}

| Method | Description | |---|---| | load | Reads a baseline JSON file from disk | | save | Writes the current baseline to disk (creates directories as needed) | | run | Runs all baseline cases through runner, compares scores, and classifies regressions (delta < -0.1), improvements (delta > 0.1), or unchanged | | addCase | Adds or updates a case in the baseline | | baseline | Read-only access to the current baseline (or null) |

createRegressionSuite()

Creates a new regression suite with the given name.

function createRegressionSuite(name: string): RegressionSuite

| Parameter | Type | Description | |---|---|---| | name | string | Name for the regression suite |

Returns: RegressionSuite

import { createRegressionSuite } from '@elsium-ai/testing'

const suite = createRegressionSuite('qa-bot')

// Build baseline
suite.addCase('What is 2+2?', '4', 1.0)
suite.addCase('Capital of France?', 'Paris', 1.0)
await suite.save('./baselines/qa-bot.json')

// Later, run against the baseline
await suite.load('./baselines/qa-bot.json')
const result = await suite.run(async (input) => {
  // Call your LLM here
  return 'some answer'
})

console.log(result.regressions.length) // number of regressions detected
console.log(result.overallScore)        // aggregate score across all cases

Replay

Record raw CompletionRequest / LLMResponse pairs and replay them deterministically in tests.

ReplayEntry

interface ReplayEntry {
  request: CompletionRequest
  response: LLMResponse
  timestamp: number
}

ReplayRecorder

interface ReplayRecorder {
  wrap(
    completeFn: (req: CompletionRequest) => Promise<LLMResponse>,
  ): (req: CompletionRequest) => Promise<LLMResponse>
  getEntries(): ReplayEntry[]
  toJSON(): string
  clear(): void
}

createReplayRecorder()

Creates a recorder that wraps a completion function and captures every request/response pair.

function createReplayRecorder(): ReplayRecorder

Returns: ReplayRecorder

import { createReplayRecorder } from '@elsium-ai/testing'

const recorder = createReplayRecorder()
const wrappedComplete = recorder.wrap(provider.complete.bind(provider))

// Use wrappedComplete in place of provider.complete — all calls are recorded
const response = await wrappedComplete({
  messages: [{ role: 'user', content: 'Hello' }],
})

// Save for later replay
const json = recorder.toJSON()

ReplayPlayer

interface ReplayPlayer {
  complete(request: CompletionRequest): Promise<LLMResponse>
  readonly remaining: number
}

createReplayPlayer()

Creates a player that replays recorded responses sequentially, regardless of the incoming request.

function createReplayPlayer(entriesOrJson: ReplayEntry[] | string): ReplayPlayer

| Parameter | Type | Description | |---|---|---| | entriesOrJson | ReplayEntry[] \| string | An array of replay entries, or a JSON string produced by recorder.toJSON() |

Returns: ReplayPlayer

Throws an error with the message 'Replay exhausted: no more recorded responses' if complete() is called after all entries have been consumed.

import { createReplayRecorder, createReplayPlayer } from '@elsium-ai/testing'

// Record
const recorder = createReplayRecorder()
const wrapped = recorder.wrap(provider.complete.bind(provider))
await wrapped({ messages: [{ role: 'user', content: 'Hi' }] })

// Replay
const player = createReplayPlayer(recorder.getEntries())
console.log(player.remaining) // 1

const replayed = await player.complete({
  messages: [{ role: 'user', content: 'Hi' }],
})
console.log(player.remaining) // 0

Pinning

Pin LLM outputs to specific prompt + config combinations and detect when outputs drift.

Pin

interface Pin {
  promptHash: string
  configHash: string
  outputHash: string
  outputText: string
  model?: string
  createdAt: number
}

PinStore

interface PinStore {
  get(key: string): Pin | undefined
  set(key: string, pin: Pin): void
  delete(key: string): boolean
  getAll(): Pin[]
  toJSON(): string
}

PinResult

interface PinResult {
  status: 'new' | 'match' | 'mismatch'
  pin: Pin
  previousPin?: Pin
}

createPinStore()

Creates an in-memory pin store, optionally preloaded with existing pins.

function createPinStore(existing?: Pin[]): PinStore

| Parameter | Type | Default | Description | |---|---|---|---| | existing | Pin[] | undefined | Previously saved pins to preload |

Returns: PinStore

import { createPinStore } from '@elsium-ai/testing'

const store = createPinStore()
console.log(store.getAll().length) // 0

pinOutput()

Runs a function, hashes its output along with the prompt and config, and compares against any previously stored pin.

function pinOutput(
  name: string,
  store: PinStore,
  runner: () => Promise<string>,
  config: {
    prompt: string
    model?: string
    temperature?: number
    seed?: number
  },
  options?: { assert?: boolean },
): Promise<PinResult>

| Parameter | Type | Description | |---|---|---| | name | string | Human-readable name for the pin (used in error messages) | | store | PinStore | The store to read/write pins | | runner | () => Promise<string> | Function that produces the output to pin | | config | object | Prompt text and model config used to generate the hash key | | options.assert | boolean | When true, throws an ElsiumError on mismatch instead of returning |

Returns: Promise<PinResult> -- status is 'new' on first run, 'match' if output is identical, or 'mismatch' if the output has changed.

import { createPinStore, pinOutput } from '@elsium-ai/testing'

const store = createPinStore()

const result = await pinOutput(
  'greeting-pin',
  store,
  async () => 'Hello, world!',
  { prompt: 'Say hello', model: 'gpt-4', temperature: 0 },
)

console.log(result.status) // 'new'

// Run again with the same output
const result2 = await pinOutput(
  'greeting-pin',
  store,
  async () => 'Hello, world!',
  { prompt: 'Say hello', model: 'gpt-4', temperature: 0 },
)

console.log(result2.status) // 'match'

// Run with assert mode in CI
await pinOutput(
  'greeting-pin',
  store,
  async () => 'Different output!',
  { prompt: 'Say hello', model: 'gpt-4', temperature: 0 },
  { assert: true }, // throws ElsiumError on mismatch
)

Determinism

Verify that an LLM function produces consistent outputs across multiple invocations.

DeterminismResult

interface DeterminismResult {
  deterministic: boolean
  runs: number
  uniqueOutputs: number
  outputs: string[]
  variance: number
}

StabilityResult

interface StabilityResult {
  stable: boolean
  runs: number
  uniqueOutputs: number
  outputs: Array<{ output: string; timestamp: number }>
  variance: number
}

assertDeterministic()

Runs a function multiple times and verifies that all outputs are identical (or within the specified tolerance).

function assertDeterministic(
  fn: (seed?: number) => Promise<string>,
  options?: {
    runs?: number
    seed?: number
    tolerance?: number
  },
): Promise<DeterminismResult>

| Parameter | Type | Default | Description | |---|---|---|---| | fn | (seed?: number) => Promise<string> | -- | The function to test for determinism | | options.runs | number | 5 | Number of times to invoke fn | | options.seed | number | undefined | Seed passed to fn on each invocation | | options.tolerance | number | 0 | Maximum allowed variance (0 = strictly deterministic) |

Returns: Promise<DeterminismResult>

Throws an ElsiumError when tolerance is 0 (the default) and outputs are not identical.

import { assertDeterministic } from '@elsium-ai/testing'

const result = await assertDeterministic(
  async (seed) => {
    // Call your LLM with temperature: 0 and the provided seed
    return 'consistent output'
  },
  { runs: 5, seed: 42, tolerance: 0 },
)

console.log(result.deterministic) // true
console.log(result.uniqueOutputs) // 1
console.log(result.variance)      // 0

assertStable()

Runs a function multiple times with a delay between invocations to verify temporal stability.

function assertStable(
  fn: (seed?: number) => Promise<string>,
  options?: {
    intervalMs?: number
    runs?: number
    seed?: number
  },
): Promise<StabilityResult>

| Parameter | Type | Default | Description | |---|---|---|---| | fn | (seed?: number) => Promise<string> | -- | The function to test for stability | | options.intervalMs | number | 100 | Delay in milliseconds between runs | | options.runs | number | 3 | Number of times to invoke fn | | options.seed | number | undefined | Seed passed to fn on each invocation |

Returns: Promise<StabilityResult>

import { assertStable } from '@elsium-ai/testing'

const result = await assertStable(
  async (seed) => {
    return 'same output every time'
  },
  { intervalMs: 200, runs: 3, seed: 42 },
)

console.log(result.stable)        // true
console.log(result.uniqueOutputs) // 1
console.log(result.outputs)       // [{ output: '...', timestamp: ... }, ...]

Part of ElsiumAI

This package is the testing layer of the ElsiumAI framework. See the full documentation for guides and examples.

License

MIT