playwright-vibe-check
v1.0.2
Published
LLM-based visual testing for Playwright, using OpenAI and Claude to evaluate UI against specifications
Maintainers
Readme
playwright-vibe-check
LLM-based visual testing for Playwright, using OpenAI and Claude to evaluate UI against natural language specifications.
Instead of brittle pixel-perfect screenshot comparisons, describe what your UI should look like in plain English and let AI evaluate whether it matches.
Installation
npm install playwright-vibe-checkQuick Start
- Set up your API keys in a
.envfile or environment variables:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...- Import the test fixtures in your test file:
import { test, expect } from 'playwright-vibe-check';
test('should display a proper login form', async ({ page, vibeCheck }) => {
await page.goto('https://example.com/login');
await vibeCheck(
page,
'A login form with email and password input fields, and a blue submit button'
);
});API
vibeCheck(target, specification, options?)
Evaluates whether a page or element matches a natural language specification.
Parameters:
target:Page | Locator- The page or element to checkspecification:string- Natural language description of expected UIoptions?:VibeCheckOptions- Optional configuration
Options:
interface VibeCheckOptions {
name?: string; // Custom screenshot name
provider?: 'openai' | 'anthropic'; // LLM provider to use
confidenceThreshold?: number; // 0-1, default 0.8
includeRawResponse?: boolean; // Include raw LLM response
maxRetries?: number; // Retry count on failure
modelParameters?: Record<string, unknown>; // Provider-specific params
}Returns: Promise<VibeCheckResult>
interface VibeCheckResult {
verdict: 'yes' | 'no';
confidence: number;
reasoning?: string;
failReason?: string;
suggestions?: string[];
screenshotPath: string;
}configureVibes(options)
Configure global settings for all vibe checks in the current test.
test('my test', async ({ page, vibeCheck, configureVibes }) => {
configureVibes({
defaultProvider: 'anthropic',
evaluation: {
confidenceThreshold: 0.9,
maxRetries: 3,
},
});
await page.goto('https://example.com');
await vibeCheck(page, 'A beautiful homepage');
});Usage Examples
Basic Page Check
import { test, expect } from 'playwright-vibe-check';
test('homepage has correct layout', async ({ page, vibeCheck }) => {
await page.goto('https://example.com');
await vibeCheck(
page,
'A professional homepage with a navigation bar at the top, hero section with a call-to-action button, and footer at the bottom'
);
});Element-Specific Check
test('button has correct styling', async ({ page, vibeCheck }) => {
await page.goto('https://example.com');
const submitButton = page.locator('button[type="submit"]');
await vibeCheck(
submitButton,
'A blue button with white text saying "Submit", with rounded corners'
);
});Custom Provider Per Check
test('verify with Anthropic', async ({ page, vibeCheck }) => {
await page.goto('https://example.com');
await vibeCheck(
page,
'A clean, minimal design',
{ provider: 'anthropic' }
);
});Chaining Multiple Checks
test('verify multiple elements', async ({ page, vibeCheck }) => {
await page.goto('https://example.com');
// Check the overall page
await vibeCheck(page, 'A well-structured webpage');
// Check specific elements
await vibeCheck(
page.locator('header'),
'A navigation bar with logo on the left and menu items on the right'
);
await vibeCheck(
page.locator('footer'),
'A footer with copyright notice and social media links'
);
});Custom Confidence Threshold
test('relaxed visual check', async ({ page, vibeCheck }) => {
await page.goto('https://example.com');
// Lower threshold for more lenient matching
await vibeCheck(
page,
'A webpage with some content',
{ confidenceThreshold: 0.6 }
);
});Configuration
Environment Variables
| Variable | Description |
|----------|-------------|
| OPENAI_API_KEY | OpenAI API key for GPT-5.2 vision |
| ANTHROPIC_API_KEY | Anthropic API key for Claude Sonnet 4.5 vision |
| VIBE_DEFAULT_PROVIDER | Default provider (openai or anthropic) |
Custom Test Instance
For advanced configuration, create a custom test instance:
import { createVibeTest } from 'playwright-vibe-check';
export const test = createVibeTest({
defaultProvider: 'anthropic',
evaluation: {
confidenceThreshold: 0.85,
includeRawResponse: true,
maxRetries: 3,
modelParameters: {
temperature: 0.1,
},
},
});
export { expect } from 'playwright-vibe-check';Then import from your custom file:
import { test, expect } from './my-test-setup';How It Works
Screenshot Capture: When
vibeCheck()is called, a screenshot of the target (page or element) is captured.LLM Evaluation: The screenshot and your specification are sent to the configured LLM (OpenAI GPT-5.2 or Anthropic Claude Sonnet 4.5).
Confidence Scoring: The LLM analyzes the image and returns:
verdict: Whether the UI matches (yesorno)confidence: A score from 0.0 to 1.0reasoning: Explanation of the decisionsuggestions: Ideas for improvement (if applicable)
Assertion: If the confidence is below the threshold or verdict is "no", the test fails with a detailed error message.
Provider Comparison
| Feature | OpenAI (GPT-5.2) | Anthropic (Claude Sonnet 4.5) |
|---------|------------------|-------------------------------|
| Default Model | gpt-5.2 | claude-sonnet-4-5-20250929 |
| Image Analysis | Excellent | Excellent |
| Speed | Fast | Fast |
| Cost | See OpenAI pricing | See Anthropic pricing |
Best Practices
Be Specific: Write clear, specific specifications that describe exactly what you expect to see.
Focus on Key Elements: Don't try to describe every pixel. Focus on the important visual aspects.
Use Appropriate Thresholds: Lower thresholds for rough checks, higher for critical UI elements.
Combine with Traditional Tests: Use vibe checks alongside traditional assertions for comprehensive testing.
License
MIT
