@thebylito/checkmate

v0.3.0

Published

20 days ago

AI‑driven e2e test automation framework built on Playwright Test and OpenAI API

0High
0Medium
0Low

thebylito

test automation ai llm playwright openai e2e testing

checkmate

AI test automation that actually works. Write tests in plain English, without locators, and with less code.

Why?

Spending countless hours building and maintaining E2E tests that look like this?

await page.goto('https://www.google.com')
const searchBox = page.getByRole('combobox', { name: 'Search', exact: true })
await searchBox.fill('playwright test automation')
await searchBox.press('Enter')
await expect(page.getByRole('link', { name: 'playwright' })
    .filter({ hasText: 'playwright.dev' })
    .first(), 'playwright.dev link should be visible')
    .toBeVisible( { timeout: 30 * 1000 } )

Try checkmate!

await ai.run({
	action: `
        Navigate to google.com
        Type 'playwright test automation' in the search bar
        Press Enter key`,
	expect: `
        Search results contain the playwright.dev link`,
})

What You Get

✅ Zero Locators - Write tests in plain English
✅ Self-Healing - Tests adapt to UI changes automatically
✅ Any Provider - Gemini, Claude, Groq, GPT, xAI, or local models
✅ Web & Salesforce - Basic support out of the box
✅ Cost Optimized - Built-in token management and budgeting
✅ Full Playwright - Reports, traces, debugging - all included

Get Started in 5 Minutes

Prerequisites

Node.js LTS
OpenAI API key or compatible provider Groq Gemini xAI etc.

1. Install

git clone https://github.com/dawiddiwad/checkmate.git
cd checkmate
npm run checkmate:install

2. Configure `.env`

using OpenAI API key and default settings:

OPENAI_API_KEY=#your_api_key_here

for other providers, set the base url and model:

OPENAI_BASE_URL=https://api.groq.com/openai/v1
OPENAI_MODEL=openai/gpt-oss-20b

3. Run Tests

npm run test:web:example

4. View Report

npm run show:report

Writing Tests

Import the test from ./test/fixtures/checkmate and use the ai fixture to run AI-driven tests along standard Playwright features.
checkmate tests are written using natural language by specifying action and expect:

import { test } from '../../fixtures/checkmate'

test('google search', async ({ ai }) => {
	await ai.run({
		action: `
            Open the browser and navigate to google.com.
            Type 'playwright test automation' in the search bar.
            Press Enter key.`,
		expect: `
            Search results contain the 'playwright.dev' link`,
	})
})

That's it. No page objects, no selectors. No locators. Peace on Earth.

Browser settings (viewport, headless mode, video recording, timeouts, etc.) are configured in playwright.config.ts using Playwright's standard configuration mechanism.

Programmatic API

If you use checkmate without the fixture wrapper, the public entry point is CheckmateRunner:

import { CheckmateRunner } from 'checkmate-exp'

const ai = new CheckmateRunner(page)
await ai.run({
	action: 'Open the pricing page',
	expect: 'Pricing details are visible',
})

See guide for detailed examples and best practices.

Costs

Costs vary based on model and provider, test complexity and number of steps. checkmate includes built-in token usage monitoring.

Cost estimates with gpt-oss-20b hosted on groq.com for optimal balance:

Simple test (~5 steps): ~$0.001 - $0.01
Complex test (~20 steps): ~$0.01 - $0.05
Full E2E suite (~50 complex tests): ~$1.00 - $2.00

See guide for detailed cost control and monitoring options.

Common Issues

AI makes incorrect decisions

Provide precise descriptions in action and more focused assertions in expect
Reference specific element identifiers and roles (for example: text, label, button, list)
Break complex workflows into single-action steps; use a step-by-step approach

Tests loop during step execution

Increase OPENAI_TEMPERATURE to encourage exploration
Use a reasoning/thinking model (if available) to improve planning and avoid repetitive loops

High token costs

Enable snapshot filtering with CHECKMATE_SNAPSHOT_FILTERING=true to score and narrow the elements automatically from action and expect. Use topPercent to dial how much of the scored snapshot to keep for a step.
Set a lower reasoning effort: OPENAI_REASONING_EFFORT
Consider disabling OPENAI_INCLUDE_SCREENSHOT_IN_SNAPSHOT
Use a cheaper model, lower-end models often perform well (e.g., gpt-5.4-nano or gpt-oss-20b)

See guide for detailed configuration options and troubleshooting tips.

FAQ

Which models work best?
You can use any model that was trained for tool use. Here are the best picks based on extensive testing:

Highly recommended: gpt-oss-20b hosted on groq.com. Groq's infrastructure is optimized for minimal latency and fast inference, making it ideal for E2E test automation.
Google's gemini-2.5-flash offers an excellent balance of cost and performance if you prefer major cloud providers.
OpenAI's gpt-5-mini, gpt-5.4-nano and xAI's grok-4-1-fast-reasoning also work well and keep costs relatively low.

Can I use local models?
Yes - checkmate works with any OpenAI‑compatible API, including local models via LM Studio, Ollama, or llama.cpp. I recommend qwen3.5-4b. It is fast (≈100 tokens/sec on an RTX 3060 Ti; ≈40 tokens/sec on Apple M3) and performs surprisingly well for E2E testing.

Does it work with CI/CD?
Absolutely. Use checkmate as part of your existing Playwright Test suites in any CI/CD pipeline. You can mix AI‑driven steps and traditional tests as needed.

Is this production-ready?
It depends. If you can accept some non‑deterministic behavior and leverage LLMs' randomness to help address the pesticide paradox, checkmate can be production-ready. In many cases, the maintenance savings, faster development, and benefits of non‑linear execution outweigh occasional hiccups.

If you require 100% deterministic tests at all times, traditional Playwright remains the better choice.

Best part?
You can mix both approaches within the same test suite, combining AI‑driven and traditional tests as needed:

// traditional playwright actions:
await page.goto('https://www.google.com')
const searchBox = page.getByRole('combobox', { name: 'Search', exact: true })
await searchBox.fill('playwright test automation')
await searchBox.press('Enter')

// ai-driven actions and assertions:
await ai.run({
	action: 'Click on the link that leads to playwright.dev',
	expect: 'The playwright.dev homepage is displayed',
})

Documentation

Contributing

I'd love your help! Key areas:

Additional tool integrations (API testing, Salesforce, etc.)
Further cost optimization techniques
Context and prompt engineering improvements
Error handling and recovery

See roadmap for future plans and development

MIT License

See license file for details

Why I build this?

Test automation shouldn't require a PhD in XPath. This project explores how AI can make it accessible to anyone.

Less coding, more testing.

Built with ❤️ by Dawid Dobrowolski