npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

playwright-ai-matchers

v2.2.0

Published

Provider-agnostic AI matchers for Playwright's expect() — ships with Claude Opus 4.7 (prompt caching + adaptive thinking), OpenAI, and Gemini adapters

Readme

playwright-ai-matchers

npm version npm downloads CI License: MIT

Semantic assertions for Playwright's expect(), powered by LLMs. Validate intent, truthfulness, tone, and meaning instead of exact strings.

import { test, expect } from '@playwright/test';
import 'playwright-ai-matchers';

test('support bot is empathetic', async ({ page }) => {
  const response = 'I'm so sorry for the delay — I've escalated your case with high priority.';
  await expect(response).toHaveSentiment('empathetic');
});

Works with plain strings and Playwright Locators — text is extracted automatically:

await expect(page.locator('.hero')).toSatisfy('has a clear call to action');

Why

Traditional matchers (toContain, toMatch) break against LLM variability. They can't tell you whether a response hallucinated a fact, maintained the right tone, or fulfilled its purpose — only whether specific characters are present.

This library adds matchers that delegate validation to an LLM judge (Claude, GPT, or Gemini), return pass: boolean, and — on failure — surface the exact reason the verdict was reached.

Error: Expected response to convey "empathetic" sentiment, but it didn't.
Model:     claude-opus-4-7 (effort: medium)
Reason:    Tone is purely procedural ("Submit a ticket via the portal") — no acknowledgment of frustration.
Received:  "Submit a ticket via the portal."

Installation

npm install --save-dev playwright-ai-matchers

Install the peer dependency for one provider:

# Anthropic Claude (default — recommended for prompt caching + adaptive thinking)
npm install --save-dev @anthropic-ai/sdk

# OpenAI
npm install --save-dev openai

# Google Gemini
npm install --save-dev @google/generative-ai

Requires @playwright/test >= 1.40.


Setup

Export an API key for the provider you want to use. The library auto-detects which key is present:

export ANTHROPIC_API_KEY=sk-ant-...
# or
export OPENAI_API_KEY=sk-...
# or
export GOOGLE_API_KEY=AIza...   # (alias: GEMINI_API_KEY)

One import in your test file registers all matchers:

import 'playwright-ai-matchers';

No expect.extend() call needed.


Matchers

All matchers accept a natural-language criterion and an optional { effort, provider, retries } config.

toSatisfy(criterion)

The response meets an arbitrary criterion expressed in plain language.

await expect(response).toSatisfy('explains the three parts of a JWT');

toMeanSomethingAbout(topic)

The response genuinely engages with a topic.

await expect(response).toMeanSomethingAbout('pricing');
await expect(response).not.toMeanSomethingAbout('billing');

toHallucinate(context)

The response invents facts not present in the provided context. Use with .not to assert fidelity.

const groundTruth = 'The Pro plan costs $49/month. No Enterprise plan is publicly listed.';
await expect(response).not.toHallucinate(groundTruth);

toBeHelpful()

The response is substantive — not a refusal, error message, or empty reply.

await expect(response).toBeHelpful();

toHaveIntent(intent)

The response expresses or enacts a communicative intent.

await expect(response).toHaveIntent('scheduling a meeting with the user');

toHaveSentiment(sentiment)

The response conveys an emotional tone.

await expect(response).toHaveSentiment('empathetic');
await expect(response).not.toHaveSentiment('aggressive');

Locator support

All matchers accept a Playwright Locator in place of a string. The text content is extracted automatically via innerText():

test('hero section has a clear CTA', async ({ page }) => {
  await page.goto('https://example.com');
  await expect(page.locator('main')).toSatisfy('has a clear call to action');
  await expect(page.locator('.hero')).toHaveIntent('attracting visitors to a trial or demo');
});

Effort levels

Each matcher accepts { effort: 'low' | 'medium' | 'high' | 'xhigh' }. Default: medium.

await expect(response).toSatisfy('reasoning is logically sound', { effort: 'high' });

| Effort | When to use | |--------|-------------| | low | Obvious cases, high-volume, fast CI | | medium | Most cases (default) | | high | Ambiguous criteria, borderline cases | | xhigh | Critical reviews, compliance, legal evaluations |

Higher effort = more LLM reasoning tokens = better verdicts on hard cases, at higher cost and latency.


Retry logic

Matchers automatically retry on transient API errors. The default is 2 retries with exponential backoff. Override per matcher:

await expect(response).toSatisfy('criterion', { retries: 3 });

Set retries: 0 to disable retries entirely.


Cross-run caching

Wrap any provider in CachedProvider to cache evaluation results to disk between CI runs. Identical inputs (text + criteria + model + effort) return the cached verdict without an API call.

import { ClaudeProvider, CachedProvider, setDefaultProvider } from 'playwright-ai-matchers';

setDefaultProvider(
  new CachedProvider(new ClaudeProvider(), {
    ttlSeconds: 86400,  // 24 hours
    namespace: 'v1',    // bump this to bust the cache after rubric changes
  })
);

Cache files are stored in .playwright-ai-cache/ in the project root. Add it to .gitignore.


Providers

If you export only one API key, the library uses it. To force a provider globally:

import { setDefaultProvider, ClaudeProvider } from 'playwright-ai-matchers';

setDefaultProvider(new ClaudeProvider({ model: 'claude-opus-4-7' }));

Or pass a provider per matcher:

import { OpenAIProvider } from 'playwright-ai-matchers';

await expect(response).toSatisfy('criterion', {
  provider: new OpenAIProvider({ model: 'gpt-4o' }),
});

| Feature | Claude (Anthropic) | OpenAI | Gemini | Ollama (local) | |---------|:-----------------:|:------:|:------:|:--------------:| | Semantic evaluation | ✅ | ✅ | ✅ | ✅ | | Prompt caching | ✅ native | ⚠️ auto | ❌ | ❌ | | Adaptive thinking | ✅ | ✅ | ✅ | ❌ | | No API key needed | ❌ | ❌ | ❌ | ✅ | | Runs offline | ❌ | ❌ | ❌ | ✅ |

Default is Claude Opus 4.7 — prompt caching makes the ~10k-token rubric cheap after the first assertion in a run.

Ollama — run evaluations locally, no API key

Use any model available in Ollama without sending data to external APIs:

# Install Ollama, then pull a model
ollama pull llama3.2
import { setDefaultProvider, OllamaProvider } from 'playwright-ai-matchers';

setDefaultProvider(new OllamaProvider({ model: 'llama3.2' }));

Or set environment variables (no code change needed):

export OLLAMA_MODEL=llama3.2
# optional: export OLLAMA_BASE_URL=http://localhost:11434

Recommended models for evaluation quality: llama3.2, qwen2.5, mistral, phi4, gemma2

Note: Local models are less consistent than Claude or GPT-4o on ambiguous criteria. Use effort: 'high' for borderline cases and validate your setup with a few known-pass / known-fail examples before relying on results in CI.


Cost & latency

Each assertion makes one LLM call.

  • Latency: ~1–3s with effort: 'medium'; 3–8s with high
  • Cost: with Claude Opus 4.7 + prompt caching in repeated suites, ~$0.01–0.03 per assertion
  • CI: set workers: 1 or 2 if you hit rate limits
  • Tip: use CachedProvider in CI to avoid re-evaluating identical assertions across runs

CI (GitHub Actions)

- name: Run Playwright tests
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: npx playwright test

Troubleshooting

no provider API key detected Export ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY before running tests.

Claude did not call submit_evaluation Rate limit or truncated response. The matcher will retry automatically (up to retries times). Lower effort to low if it persists.

Property 'toSatisfy' not found Missing the import 'playwright-ai-matchers' side-effect import in the spec file.

Matcher receives a Locator instead of a string Just pass the Locator directly — text extraction is automatic as of v2.1.


Examples

See test/demo.spec.ts for a demo with all matchers against fixed strings.

See examples/ for a real E2E test against an AI chat interface.

See docs/GUIDE.md for the full guide: when to use each matcher, common patterns (live web, APIs, RAG), CI, costs, and troubleshooting.


License

MIT © Germán Gordón