npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

vibe-match

v0.1.1

Published

AI-powered semantic matchers for testing with vitest and jest

Downloads

6

Readme

vibe-match

Deterministic tests for non-deterministic AI outputs.

Testing AI agents, LLM responses, and generated content is very difficult. Traditional testing frameworks focus on providing exact procedural value matching. vibe-match solves this by using LLMs and embeddings to evaluate meaning, not exact text. You can write assertions that ask "is this answer semantically equivalent?" or "Does this response conform to a set of criteria?" and get reliable test results.

// Instead of brittle exact matching...
expect(response).toBe("The capital of France is Paris."); // ❌ Breaks on minor variations

// ...use semantic matching that understands meaning
await expect(response).toBeSimilarTo("Paris is France's capital"); // ✅ Works!

Table of Contents


Quick Start

npm install -D vibe-match
import { vibeMatchers, type VibeMatchConfig } from "vibe-match";

const config: VibeMatchConfig = {
  apiKeys: {
    openai: process.env.OPENAI_API_KEY,
  },
  languageModel: "openai:gpt-4o-mini",
  embeddingModel: "openai:text-embedding-3-small",
};

// Extend expect with semantic matchers (works with Vitest, Bun Test, and any Jest-compatible testing framework)
expect.extend(vibeMatchers(config));

// Now use them in your tests!
test("AI response is semantically correct", async () => {
  const response = await myAgent.ask("What's the capital of France?");

  await expect(response).toBeSimilarTo("Paris is the capital of France");
});

Why vibe-match?

AI outputs are inherently non-deterministic. The same prompt can produce different phrasings, structures, and word choices. Traditional testing approaches fail because:

vibe-match introduces semantic assertions that evaluate meaning rather than syntax:

| Approach | "The capital of France is Paris" vs "Paris is France's capital" | | ----------------- | ------------------------------------------------------------------- | | toBe() | ❌ Fails | | toMatch() | ❌ Fails | | toBeSimilarTo() | ✅ Passes |


Installation

1. Install the package:

# npm
npm install -D vibe-match

# bun
bun add -D vibe-match

# yarn
yarn add -D vibe-match

# pnpm
pnpm add -D vibe-match

2. Import and extend expect:

vibe-match provides custom matchers that need to be registered with your test framework. Add this to your test setup file:

import { vibeMatchers, type VibeMatchConfig } from "vibe-match";

const config: VibeMatchConfig = {
  apiKeys: {
    openai: process.env.OPENAI_API_KEY,
    // Add other providers as needed
  },
  languageModel: "openai:gpt-4o-mini", // The default llm model unless overridden by a test
  embeddingModel: "openai:text-embedding-3-small", // The default embedding model unless overridden by a test
};

// Register the semantic matchers with expect
expect.extend(vibeMatchers(config));

3. Configure your test framework to use the setup file:

For Vitest, add to vitest.config.ts:

export default defineConfig({
  test: {
    setupFiles: ["./tests/setup.ts"],
  },
});

For Jest, add to jest.config.js:

module.exports = {
  setupFilesAfterEnv: ["./jest.setup.ts"],
};

Matchers

toBeSimilarTo(expected, options?)

Tests if two strings are semantically similar. Uses an LLM to evaluate whether the texts convey the same meaning.

await expect("I love cheese pizza").toBeSimilarTo(
  "Cheese pizza is my favorite"
);

Options:

| Option | Type | Default | Description | | --------------- | ------------------------------------- | ---------- | ------------------------------------------ | | level | "loose" | "normal" | "strict" | "normal" | How strictly to judge similarity | | threshold | number | 0.75 | Percentage of samples that must pass (0–1) | | samples | number | 5 | Number of LLM evaluations to run | | systemPrompt | string | — | Override the default prompt | | languageModel | LanguageModelV1 | — | Override the configured model |

Similarity Levels:

  • loose — The core meaning is the same. Minor differences in detail are acceptable.
  • normal — Meaningfully similar with only small ambiguities allowed.
  • strict — Functionally identical. Same meaning, same intent, same structure.
// Loose: "roughly the same idea"
await expect("The meeting is at 3pm").toBeSimilarTo("We meet at three", {
  level: "loose",
});

// Strict: "functionally identical"
await expect("Error: File not found").toBeSimilarTo("Error: File not found", {
  level: "strict",
});

toMention(concept, options?)

Tests if text mentions a concept—directly, by description, or by implication.

await expect("The red and yellow clown mascot smiled at customers").toMention(
  "McDonald's"
);
// ✅ Passes — describes Ronald McDonald

Options:

| Option | Type | Default | Description | | --------------- | ----------------- | ------- | ------------------------------------------ | | threshold | number | 0.75 | Percentage of samples that must pass (0–1) | | samples | number | 5 | Number of LLM evaluations to run | | systemPrompt | string | — | Override the default prompt | | languageModel | LanguageModelV1 | — | Override the configured model |

What counts as "mentioning":

  • Direct reference: "I use JavaScript daily" mentions JavaScript
  • Category membership: "The cat sat on the mat" mentions animals
  • Implication: "We need to optimize queries" mentions performance
  • Synonyms: "She was feeling blue" mentions sadness
await expect(summary).toMention("quarterly revenue");
await expect(response).toMention("customer satisfaction");
await expect(article).toMention("climate change");

toSatisfyCriteria(criteria, options?)

Tests if text satisfies one or more natural language criteria. Perfect for evaluating quality, tone, completeness, or any custom requirements.

// Single criterion
await expect(response).toSatisfyCriteria("Addresses the user by name");

// Multiple criteria
await expect(response).toSatisfyCriteria([
  "Addresses the user by name",
  "Provides a specific solution",
  "Maintains a professional tone",
]);

Options:

| Option | Type | Default | Description | | --------------- | ----------------------------------- | ------- | ------------------------------------------------------- | | mode | "all" | "any" | "threshold" | "all" | How to aggregate multiple criteria | | threshold | number | 0.75 | For "threshold" mode: percentage that must pass (0–1) | | samples | number | 3 | LLM evaluations per criterion | | systemPrompt | string | — | Override the default prompt | | languageModel | LanguageModelV1 | — | Override the configured model |

Aggregation Modes:

  • all — Every criterion must pass (logical AND)
  • any — At least one criterion must pass (logical OR)
  • threshold — A percentage of criteria must pass
// All criteria must pass
await expect(email).toSatisfyCriteria(
  [
    "Has a clear subject line",
    "Includes a call to action",
    "Contains contact information",
  ],
  { mode: "all" }
);

// At least 80% of criteria must pass
await expect(report).toSatisfyCriteria(
  [
    "Includes executive summary",
    "Contains data visualizations",
    "Cites sources",
    "Has recommendations section",
  ],
  { mode: "threshold", threshold: 0.8 }
);

toBeVectorSimilarTo(expected, options?)

Tests semantic similarity using embedding vectors and cosine similarity. Faster and cheaper than LLM-based matchers, but less nuanced.

await expect("A guide to baking bread").toBeVectorSimilarTo(
  "Bread baking tutorial"
);

Options:

| Option | Type | Default | Description | | ---------------- | ---------------- | ------- | ------------------------------- | | threshold | number | 0.85 | Minimum cosine similarity (0–1) | | embeddingModel | EmbeddingModel | — | Override the configured model |

When to use vector similarity:

  • ✅ High-volume testing where cost matters
  • ✅ Comparing longer documents
  • ✅ Quick semantic similarity checks
  • ❌ Nuanced meaning comparison (use toBeSimilarTo)
  • ❌ Detecting implications or indirect mentions
// Compare document similarity
await expect(generatedDoc).toBeVectorSimilarTo(referenceDoc, {
  threshold: 0.9,
});

Configuration

Provider Setup

vibe-match supports multiple AI providers. Configure with a simple string format: "provider:model-name".

OpenAI

const config: VibeMatchConfig = {
  apiKeys: {
    openai: process.env.OPENAI_API_KEY,
  },
  languageModel: "openai:gpt-4o-mini",
  embeddingModel: "openai:text-embedding-3-small",
};

Anthropic

Anthropic doesn't provide embedding models, so pair with OpenAI for embeddings:

const config: VibeMatchConfig = {
  apiKeys: {
    anthropic: process.env.ANTHROPIC_API_KEY,
    openai: process.env.OPENAI_API_KEY,
  },
  languageModel: "anthropic:claude-sonnet-4-20250514",
  embeddingModel: "openai:text-embedding-3-small",
};

Google (Gemini)

const config: VibeMatchConfig = {
  apiKeys: {
    google: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
  },
  languageModel: "google:gemini-2.0-flash",
  embeddingModel: "google:text-embedding-004",
};

Mistral

const config: VibeMatchConfig = {
  apiKeys: {
    mistral: process.env.MISTRAL_API_KEY,
  },
  languageModel: "mistral:mistral-small-latest",
  embeddingModel: "mistral:mistral-embed",
};

xAI (Grok)

xAI doesn't provide embedding models, so pair with OpenAI for embeddings:

const config: VibeMatchConfig = {
  apiKeys: {
    xai: process.env.XAI_API_KEY,
    openai: process.env.OPENAI_API_KEY,
  },
  languageModel: "xai:grok-2-1212",
  embeddingModel: "openai:text-embedding-3-small",
};

Provider Capabilities

| Provider | Language Models | Embedding Models | | ---------- | --------------- | ---------------- | | OpenAI | ✅ | ✅ | | Anthropic | ✅ | ❌ | | Google | ✅ | ✅ | | Mistral | ✅ | ✅ | | xAI (Grok) | ✅ | ❌ |


Advanced Configuration

Custom AI SDK Models

For OpenAI-compatible APIs (OpenRouter, Together AI, Fireworks, etc.) or advanced configuration, pass AI SDK model instances directly:

OpenRouter

import { createOpenAI } from "@ai-sdk/openai";

const openrouter = createOpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const config: VibeMatchConfig = {
  languageModel: openrouter("anthropic/claude-sonnet-4"),
  embeddingModel: openrouter.embedding("openai/text-embedding-3-small"),
};

Together AI

import { createOpenAI } from "@ai-sdk/openai";

const together = createOpenAI({
  baseURL: "https://api.together.xyz/v1",
  apiKey: process.env.TOGETHER_API_KEY,
});

const config: VibeMatchConfig = {
  languageModel: together("meta-llama/Llama-3.3-70B-Instruct-Turbo"),
  embeddingModel: together.embedding(
    "togethercomputer/m2-bert-80M-8k-retrieval"
  ),
};

Azure OpenAI

import { createAzure } from "@ai-sdk/azure";
import { createOpenAI } from "@ai-sdk/openai";

const azure = createAzure({
  resourceName: "my-resource",
  apiKey: process.env.AZURE_API_KEY,
});

const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });

const config: VibeMatchConfig = {
  languageModel: azure("my-gpt-4-deployment"),
  embeddingModel: openai.embedding("text-embedding-3-small"),
};

Per-Test Model Overrides

Override the model for specific assertions without changing global configuration:

import { createOpenAI } from "@ai-sdk/openai";

const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });
const gpt4o = openai("gpt-4o"); // More powerful model

// Use GPT-4o for this critical test only
await expect(response).toSatisfyCriteria(
  ["Is factually accurate", "Cites sources"],
  { languageModel: gpt4o }
);

// Use a larger embedding model for higher precision
const largeEmbedding = openai.embedding("text-embedding-3-large");
await expect(document).toBeVectorSimilarTo(reference, {
  embeddingModel: largeEmbedding,
  threshold: 0.95,
});

Custom Prompts

Override the default LLM prompts globally or per-assertion:

// Global prompt override
const config: VibeMatchConfig = {
  apiKeys: { openai: process.env.OPENAI_API_KEY },
  languageModel: "openai:gpt-4o-mini",
  embeddingModel: "openai:text-embedding-3-small",
  prompts: {
    toBeSimilarTo: "Your custom similarity prompt...",
    toMention: "Your custom mention detection prompt...",
    toSatisfyCriteria: "Your custom criteria evaluation prompt...",
  },
};

// Per-assertion prompt override
await expect(response).toBeSimilarTo(expected, {
  systemPrompt: "Be extra strict about numerical accuracy...",
});

Test Setup

Vitest

// tests/setup.ts
import { vibeMatchers, type VibeMatchConfig } from "vibe-match";

const config: VibeMatchConfig = {
  apiKeys: {
    openai: process.env.OPENAI_API_KEY,
  },
  languageModel: "openai:gpt-4o-mini",
  embeddingModel: "openai:text-embedding-3-small",
};

expect.extend(vibeMatchers(config));
// vitest.config.ts
import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    setupFiles: ["./tests/setup.ts"],
  },
});

Jest

// jest.setup.ts
import { vibeMatchers, type VibeMatchConfig } from "vibe-match";

const config: VibeMatchConfig = {
  apiKeys: {
    openai: process.env.OPENAI_API_KEY,
  },
  languageModel: "openai:gpt-4o-mini",
  embeddingModel: "openai:text-embedding-3-small",
};

expect.extend(vibeMatchers(config));
// jest.config.js
module.exports = {
  setupFilesAfterEnv: ["./jest.setup.ts"],
};

TypeScript Support

For full type inference on custom matchers, add to your test setup:

// tests/setup.ts
import { vibeMatchers } from "vibe-match";

interface VibeMatchers<R = unknown> {
  toBeSimilarTo(
    expected: string,
    options?: import("vibe-match").ToBeSimilarToOptions
  ): Promise<R>;
  toMention(
    concept: string,
    options?: import("vibe-match").ToMentionOptions
  ): Promise<R>;
  toSatisfyCriteria(
    criteria: string | string[],
    options?: import("vibe-match").ToSatisfyCriteriaOptions
  ): Promise<R>;
  toBeVectorSimilarTo(
    expected: string,
    options?: import("vibe-match").VectorSimilarityOptions
  ): Promise<R>;
}

declare module "vitest" {
  interface Assertion<T = any> extends VibeMatchers<T> {}
  interface AsymmetricMatchersContaining extends VibeMatchers {}
}

// For Jest
declare global {
  namespace jest {
    interface Matchers<R> extends VibeMatchers<R> {}
  }
}

API Reference

vibeMatchers(config)

Creates the matcher functions to extend expect.

import { vibeMatchers, type VibeMatchConfig } from "vibe-match";

expect.extend(vibeMatchers(config));

Configuration Types

VibeMatchConfig

type VibeMatchConfig = VibeMatchStringConfig | VibeMatchCustomModelConfig;

VibeMatchStringConfig

For built-in providers with string-based model selection:

interface VibeMatchStringConfig {
  apiKeys: {
    openai?: string;
    anthropic?: string;
    google?: string;
    mistral?: string;
    xai?: string;
  };
  languageModel: LanguageModelString; // e.g., "openai:gpt-4o-mini"
  embeddingModel: EmbeddingModelString; // e.g., "openai:text-embedding-3-small"
  prompts?: VibeMatchPrompts;
}

VibeMatchCustomModelConfig

For custom AI SDK model instances:

interface VibeMatchCustomModelConfig {
  languageModel: LanguageModelV1;
  embeddingModel: EmbeddingModel<string>;
  prompts?: VibeMatchPrompts;
}

Exported Types

import type {
  VibeMatchConfig,
  VibeMatchStringConfig,
  VibeMatchCustomModelConfig,
  VibeMatchPrompts,
  VibeMatchApiKeys,
  LanguageModelString,
  EmbeddingModelString,
  ToBeSimilarToOptions,
  ToMentionOptions,
  ToSatisfyCriteriaOptions,
  VectorSimilarityOptions,
  // AI SDK types for model overrides
  LanguageModelV1,
  EmbeddingModel,
} from "vibe-match";

Default Prompts

Export default prompts for reference or extension:

import {
  DEFAULT_SIMILARITY_PROMPT,
  DEFAULT_MENTION_PROMPT,
  DEFAULT_SATISFY_CRITERIA_PROMPT,
} from "vibe-match";

Reliability & Sampling

LLMs can give inconsistent answers. vibe-match addresses this with multi-sample evaluation:

  1. Each assertion runs multiple LLM calls (configurable via samples)
  2. Results are aggregated using a threshold (configurable via threshold)
  3. The assertion passes if enough samples agree
// Run 7 evaluations, pass if 6+ agree (≥85%)
await expect(response).toBeSimilarTo(expected, {
  samples: 7,
  threshold: 0.85,
});

For toSatisfyCriteria with multiple criteria, sampling works per-criterion:

await expect(response).toSatisfyCriteria(
  ["Criterion A", "Criterion B", "Criterion C"],
  {
    samples: 5, // 5 evaluations per criterion
    mode: "all", // All criteria must pass
  }
);
// Total LLM calls: 3 criteria × 5 samples = 15

FAQ

How can you use LLMs to test an LLM?

Good question, Obviously, this is not a perfect solution, however vibe-match does several things to force the LLMs to behave more deterministically. For example, by default each test is evaluated by the model several times and the results are sampled to get a wider view of how the model evaluates the test, rather than relying on singular model responses. You can override this behavior and increase the number of samples for even more consistency, the trade off being more LLM calls and higher costs. See the Reliability & Sampling section for more details.

How can I write effective tests with vibe-match?

LLMs perform best at evaluating content when given very narrow criteria to look for. The more specific you can be, the more likely your tests are to return consistent results. The toSatisfyCriteria matcher is particularly useful for this, as it allows you to specify a list of specific criteria that the response must satisfy. For example:

await expect(response).toSatisfyCriteria([
  "Addresses the user by name",
  "Includes <name of business> in the response at least once",
  "Includes 2 call to actions",
]);

See the Matchers section for more details.


License

MIT