llm-assert

v1.0.0

Published

2 months ago

Semantic AI assertions for Jest and Vitest. Test LLM outputs with expect()-style syntax.

0High
0Medium
0Low

swapneilbasutkar

llm testing jest vitest ai assertions eval evaluation openai anthropic ollama semantic-testing nlp chatbot-testing

llm-assert

Semantic AI assertions for Jest and Vitest. Test LLM outputs with expect()-style syntax.

import { llmExpect } from "llm-assert";

test("customer service reply is relevant and professional", async () => {
  const reply = await getAIResponse("How do I get a refund?");
  await llmExpect(reply).toBeRelevantTo("refund policy");
  await llmExpect(reply).toMatchTone("professional and empathetic");
  await llmExpect(reply).not.toContainHallucination({ context: refundDocs });
});

Why llm-assert?

Testing AI outputs is hard. You can't use expect(output).toBe("exact string") when the response varies every time. Existing tools are either Python-only, platform-specific, or require a separate CLI workflow.

llm-assert drops into your existing Jest or Vitest test suite with zero friction:

Familiar API -- if you know expect().toBe(), you know llmExpect().toBeRelevantTo()
Zero config to start -- set OPENAI_API_KEY and go
No vendor lock-in -- supports OpenAI, Anthropic, and free local models via Ollama
Cost-aware -- built-in caching so you don't burn API credits on every test run
Production-ready -- clear error messages, timeout handling, per-assertion overrides

Installation

npm install llm-assert --save-dev

Set your API key:

export OPENAI_API_KEY=sk-...

Or use Ollama for free local inference (no API key needed):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3

Quick Start

llmExpect() works with any test runner. It throws on failure, which Jest/Vitest catch automatically:

import { llmExpect } from "llm-assert";

test("AI summary is relevant", async () => {
  const summary = await myAI.summarize(article);
  await llmExpect(summary).toBeRelevantTo("climate change");
});

No setup file needed. No custom matchers to register. Just import and use.

All Assertions

`toBeRelevantTo(topic: string)`

Checks if the text is semantically relevant to a topic.

await llmExpect("We process refunds within 5-7 days.").toBeRelevantTo("refund policy");
// PASSES

await llmExpect("Our office is in New York.").toBeRelevantTo("refund policy");
// FAILS -- not relevant

`toMatchTone(tone: string)`

Checks if the text matches an expected tone.

await llmExpect("Dear customer, we sincerely apologize.").toMatchTone("professional, empathetic");
// PASSES

await llmExpect("lol idk just deal with it").toMatchTone("professional");
// FAILS

`toContainHallucination(options?)`

Checks for fabricated information. Typically used with .not:

const context = "Our company was founded in 2015 in Austin, Texas.";
const response = "The company was founded in 2015 in Austin, Texas by John Smith.";

await llmExpect(response).not.toContainHallucination({ context });
// FAILS -- "John Smith" is hallucinated

`toBeFactuallyCorrect(options?)`

Checks factual accuracy against context or general knowledge.

await llmExpect("Water boils at 100°C at sea level.").toBeFactuallyCorrect();
// PASSES

`toSatisfy(criteria: string)`

The flexible escape hatch. Pass any natural language criteria:

await llmExpect(response).toSatisfy("contains a call-to-action and mentions pricing");
await llmExpect(response).toSatisfy("is written in valid markdown format");
await llmExpect(response).toSatisfy("answers the question without being condescending");

`toHaveSentiment(sentiment: string)`

Checks the emotional sentiment.

await llmExpect("This product is amazing!").toHaveSentiment("positive");
await llmExpect("I'm very disappointed.").toHaveSentiment("negative");

`toBeSafe(options?)`

Checks for harmful content (hate speech, PII leakage, prompt injection, etc.):

await llmExpect(response).toBeSafe();

// With custom categories:
await llmExpect(response).toBeSafe({
  categories: ["pii_leakage", "prompt_injection"],
});

Negation

All matchers support .not:

await llmExpect(response).not.toContainHallucination({ context });
await llmExpect(response).not.toMatchTone("aggressive");

Configuration

Programmatic (recommended for test setups)

// jest.setup.ts or vitest.setup.ts
import { configureLLMAssert } from "llm-assert";

configureLLMAssert({
  provider: "openai",       // "openai" | "anthropic" | "ollama"
  model: "gpt-4o-mini",
  cache: true,
  verbose: false,
});

Environment variables

LLM_ASSERT_PROVIDER=openai
LLM_ASSERT_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LLM_ASSERT_CACHE=true
LLM_ASSERT_VERBOSE=false

Per-assertion overrides

Every matcher accepts an options object:

await llmExpect(text).toBeRelevantTo("topic", {
  threshold: 0.9,          // stricter than default 0.7
  model: "gpt-4o",         // use a better model for this check
  provider: "anthropic",   // use a different provider
  timeout: 60000,          // longer timeout
  cache: false,            // skip cache for this assertion
});

Providers

OpenAI (default)

Uses gpt-4o-mini by default. Set OPENAI_API_KEY or pass via config.

configureLLMAssert({ provider: "openai", model: "gpt-4o" });

Anthropic

Uses claude-sonnet-4-20250514 by default. Set ANTHROPIC_API_KEY.

configureLLMAssert({ provider: "anthropic" });

Ollama (free, local)

No API key needed. Requires Ollama running locally.

configureLLMAssert({
  provider: "ollama",
  model: "llama3",
  ollamaBaseUrl: "http://localhost:11434", // default
});

Caching

By default, llm-assert caches LLM responses to avoid redundant API calls. Cache is stored in .llm-assert-cache/ and keyed on: assertion type + input text + criteria + model name.

Add .llm-assert-cache/ to your .gitignore.

Cache entries expire after 7 days. Clear manually:

import { clearLLMAssertCache } from "llm-assert";
await clearLLMAssertCache();

Jest Integration

Option 1: Use `llmExpect()` directly (no setup needed)

import { llmExpect } from "llm-assert";

test("AI response quality", async () => {
  await llmExpect(response).toBeRelevantTo("topic");
});

Option 2: Extend Jest's `expect` with `toLLMMatch`

// jest.setup.ts
import "llm-assert/jest";

// jest.config.js
module.exports = {
  setupFilesAfterSetup: ["./jest.setup.ts"],
};

// In your tests:
await expect(response).toLLMMatch({
  relevantTo: "refund policy",
  tone: "professional",
});

Vitest Integration

// vitest.setup.ts
import "llm-assert/vitest";

Error Messages

Failed assertions produce clear, actionable errors:

LLMAssertionError: toBeRelevantTo

  Expected: text to be relevant to "refund policy"
  Received: "Our office hours are 9am to 5pm Monday through Friday."

  Score:     0.15 / 0.70 (threshold)
  Reasoning: The text discusses office hours, which is unrelated to
             refund policies, return processes, or money-back guarantees.

  Provider:  openai (gpt-4o-mini)
  Cached:    false

API Reference

Functions

| Function | Description | |----------|-------------| | llmExpect(text) | Create a semantic assertion on text | | configureLLMAssert(config) | Set global configuration | | defineConfig(config) | Helper for config files | | clearLLMAssertCache() | Clear all cached results |

Types

interface LLMAssertConfig {
  provider: string;          // "openai" | "anthropic" | "ollama"
  model: string;             // model name
  apiKey?: string;           // API key
  ollamaBaseUrl?: string;    // Ollama URL (default: http://localhost:11434)
  defaultThreshold: number;  // global threshold (default: 0.7)
  timeout: number;           // ms (default: 30000)
  maxRetries: number;        // retry count (default: 2)
  cache: boolean;            // enable caching (default: true)
  cacheDir: string;          // cache directory (default: .llm-assert-cache)
  cacheTTL: number;          // cache TTL in ms (default: 7 days)
  verbose: boolean;          // verbose logging (default: false)
}

interface AssertionOptions {
  threshold?: number;
  model?: string;
  provider?: string;
  cache?: boolean;
  timeout?: number;
}

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

llm-assert

Why llm-assert?

Installation

Quick Start

All Assertions

toBeRelevantTo(topic: string)

toMatchTone(tone: string)

toContainHallucination(options?)

toBeFactuallyCorrect(options?)

toSatisfy(criteria: string)

toHaveSentiment(sentiment: string)

toBeSafe(options?)

Negation

Configuration

Programmatic (recommended for test setups)

Environment variables

Per-assertion overrides

Providers

OpenAI (default)

Anthropic

Ollama (free, local)

Caching

Jest Integration

Option 1: Use llmExpect() directly (no setup needed)

Option 2: Extend Jest's expect with toLLMMatch

Vitest Integration

Error Messages

API Reference

Functions

Types

License

`toBeRelevantTo(topic: string)`

`toMatchTone(tone: string)`

`toContainHallucination(options?)`

`toBeFactuallyCorrect(options?)`

`toSatisfy(criteria: string)`

`toHaveSentiment(sentiment: string)`

`toBeSafe(options?)`

Option 1: Use `llmExpect()` directly (no setup needed)

Option 2: Extend Jest's `expect` with `toLLMMatch`