llm-assert
v1.0.0
Published
Semantic AI assertions for Jest and Vitest. Test LLM outputs with expect()-style syntax.
Maintainers
Readme
llm-assert
Semantic AI assertions for Jest and Vitest. Test LLM outputs with
expect()-style syntax.
import { llmExpect } from "llm-assert";
test("customer service reply is relevant and professional", async () => {
const reply = await getAIResponse("How do I get a refund?");
await llmExpect(reply).toBeRelevantTo("refund policy");
await llmExpect(reply).toMatchTone("professional and empathetic");
await llmExpect(reply).not.toContainHallucination({ context: refundDocs });
});Why llm-assert?
Testing AI outputs is hard. You can't use expect(output).toBe("exact string") when the response varies every time. Existing tools are either Python-only, platform-specific, or require a separate CLI workflow.
llm-assert drops into your existing Jest or Vitest test suite with zero friction:
- Familiar API -- if you know
expect().toBe(), you knowllmExpect().toBeRelevantTo() - Zero config to start -- set
OPENAI_API_KEYand go - No vendor lock-in -- supports OpenAI, Anthropic, and free local models via Ollama
- Cost-aware -- built-in caching so you don't burn API credits on every test run
- Production-ready -- clear error messages, timeout handling, per-assertion overrides
Installation
npm install llm-assert --save-devSet your API key:
export OPENAI_API_KEY=sk-...Or use Ollama for free local inference (no API key needed):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3Quick Start
llmExpect() works with any test runner. It throws on failure, which Jest/Vitest catch automatically:
import { llmExpect } from "llm-assert";
test("AI summary is relevant", async () => {
const summary = await myAI.summarize(article);
await llmExpect(summary).toBeRelevantTo("climate change");
});No setup file needed. No custom matchers to register. Just import and use.
All Assertions
toBeRelevantTo(topic: string)
Checks if the text is semantically relevant to a topic.
await llmExpect("We process refunds within 5-7 days.").toBeRelevantTo("refund policy");
// PASSES
await llmExpect("Our office is in New York.").toBeRelevantTo("refund policy");
// FAILS -- not relevanttoMatchTone(tone: string)
Checks if the text matches an expected tone.
await llmExpect("Dear customer, we sincerely apologize.").toMatchTone("professional, empathetic");
// PASSES
await llmExpect("lol idk just deal with it").toMatchTone("professional");
// FAILStoContainHallucination(options?)
Checks for fabricated information. Typically used with .not:
const context = "Our company was founded in 2015 in Austin, Texas.";
const response = "The company was founded in 2015 in Austin, Texas by John Smith.";
await llmExpect(response).not.toContainHallucination({ context });
// FAILS -- "John Smith" is hallucinatedtoBeFactuallyCorrect(options?)
Checks factual accuracy against context or general knowledge.
await llmExpect("Water boils at 100°C at sea level.").toBeFactuallyCorrect();
// PASSEStoSatisfy(criteria: string)
The flexible escape hatch. Pass any natural language criteria:
await llmExpect(response).toSatisfy("contains a call-to-action and mentions pricing");
await llmExpect(response).toSatisfy("is written in valid markdown format");
await llmExpect(response).toSatisfy("answers the question without being condescending");toHaveSentiment(sentiment: string)
Checks the emotional sentiment.
await llmExpect("This product is amazing!").toHaveSentiment("positive");
await llmExpect("I'm very disappointed.").toHaveSentiment("negative");toBeSafe(options?)
Checks for harmful content (hate speech, PII leakage, prompt injection, etc.):
await llmExpect(response).toBeSafe();
// With custom categories:
await llmExpect(response).toBeSafe({
categories: ["pii_leakage", "prompt_injection"],
});Negation
All matchers support .not:
await llmExpect(response).not.toContainHallucination({ context });
await llmExpect(response).not.toMatchTone("aggressive");Configuration
Programmatic (recommended for test setups)
// jest.setup.ts or vitest.setup.ts
import { configureLLMAssert } from "llm-assert";
configureLLMAssert({
provider: "openai", // "openai" | "anthropic" | "ollama"
model: "gpt-4o-mini",
cache: true,
verbose: false,
});Environment variables
LLM_ASSERT_PROVIDER=openai
LLM_ASSERT_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LLM_ASSERT_CACHE=true
LLM_ASSERT_VERBOSE=falsePer-assertion overrides
Every matcher accepts an options object:
await llmExpect(text).toBeRelevantTo("topic", {
threshold: 0.9, // stricter than default 0.7
model: "gpt-4o", // use a better model for this check
provider: "anthropic", // use a different provider
timeout: 60000, // longer timeout
cache: false, // skip cache for this assertion
});Providers
OpenAI (default)
Uses gpt-4o-mini by default. Set OPENAI_API_KEY or pass via config.
configureLLMAssert({ provider: "openai", model: "gpt-4o" });Anthropic
Uses claude-sonnet-4-20250514 by default. Set ANTHROPIC_API_KEY.
configureLLMAssert({ provider: "anthropic" });Ollama (free, local)
No API key needed. Requires Ollama running locally.
configureLLMAssert({
provider: "ollama",
model: "llama3",
ollamaBaseUrl: "http://localhost:11434", // default
});Caching
By default, llm-assert caches LLM responses to avoid redundant API calls. Cache is stored in .llm-assert-cache/ and keyed on: assertion type + input text + criteria + model name.
Add .llm-assert-cache/ to your .gitignore.
Cache entries expire after 7 days. Clear manually:
import { clearLLMAssertCache } from "llm-assert";
await clearLLMAssertCache();Jest Integration
Option 1: Use llmExpect() directly (no setup needed)
import { llmExpect } from "llm-assert";
test("AI response quality", async () => {
await llmExpect(response).toBeRelevantTo("topic");
});Option 2: Extend Jest's expect with toLLMMatch
// jest.setup.ts
import "llm-assert/jest";// jest.config.js
module.exports = {
setupFilesAfterSetup: ["./jest.setup.ts"],
};// In your tests:
await expect(response).toLLMMatch({
relevantTo: "refund policy",
tone: "professional",
});Vitest Integration
// vitest.setup.ts
import "llm-assert/vitest";Error Messages
Failed assertions produce clear, actionable errors:
LLMAssertionError: toBeRelevantTo
Expected: text to be relevant to "refund policy"
Received: "Our office hours are 9am to 5pm Monday through Friday."
Score: 0.15 / 0.70 (threshold)
Reasoning: The text discusses office hours, which is unrelated to
refund policies, return processes, or money-back guarantees.
Provider: openai (gpt-4o-mini)
Cached: falseAPI Reference
Functions
| Function | Description |
|----------|-------------|
| llmExpect(text) | Create a semantic assertion on text |
| configureLLMAssert(config) | Set global configuration |
| defineConfig(config) | Helper for config files |
| clearLLMAssertCache() | Clear all cached results |
Types
interface LLMAssertConfig {
provider: string; // "openai" | "anthropic" | "ollama"
model: string; // model name
apiKey?: string; // API key
ollamaBaseUrl?: string; // Ollama URL (default: http://localhost:11434)
defaultThreshold: number; // global threshold (default: 0.7)
timeout: number; // ms (default: 30000)
maxRetries: number; // retry count (default: 2)
cache: boolean; // enable caching (default: true)
cacheDir: string; // cache directory (default: .llm-assert-cache)
cacheTTL: number; // cache TTL in ms (default: 7 days)
verbose: boolean; // verbose logging (default: false)
}
interface AssertionOptions {
threshold?: number;
model?: string;
provider?: string;
cache?: boolean;
timeout?: number;
}License
MIT
