npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

passmark

v1.0.4

Published

The open-source AI framework for regression testing.

Downloads

319

Readme

Passmark

Passmark is an open-source AI framework for regression testing. Most AI testing tools focus on testing a single PR. Passmark is designed for continuous regression testing of applications, where the goal is to catch regressions as soon as they occur, without needing to update AI prompts or retrain models.

Passmark uses AI models to execute natural language steps via Playwright, with intelligent caching, auto-healing, and multi-model assertion verification.

Quick Start

npm init playwright@latest passmark-project # select the default options and set language to TypeScript
cd passmark-project
npm install passmark

We need at least one model from Anthropic and one from Google to use Passmark's multi-model consensus features. Set the required environment variables in .env:

ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=AIza...

Alternatively, you can use an AI gateway like Vercel AI Gateway to route requests to multiple providers without managing individual API keys. If you choose this option, set AI_GATEWAY_API_KEY instead. Note: we support only Vercel AI Gateway at the moment, but we're planning to add support for other gateways in the future.

Set your Playwright project to read .env by adding the following to playwright.config.ts (after import { defineConfig, devices } from '@playwright/test';):

import dotenv from 'dotenv';
import path from 'path';

dotenv.config({ path: path.resolve(__dirname, '.env') });

Make sure you install dotenv by running npm install dotenv.

Now, paste the following code into tests/example.spec.ts:

import { test, expect } from "@playwright/test";
import { runSteps } from "passmark";

test.use({
  headless: !!process.env.CI,
});

test("Shopping cart tests", async ({ page }) => {
  test.setTimeout(60_000); // increase timeout for AI execution
  await runSteps({
    page,
    userFlow: "Add product to cart",
    steps: [
      { description: "Navigate to https://demo.vercel.store" },
      { description: "Click Acme Circles T-Shirt" },
      { description: "Select color", data: { value: "White" } },
      { description: "Select size", data: { value: "S" } },
      { description: "Add to cart", waitUntil: "My Cart is visible" },
    ],
    assertions: [{ assertion: "You can see My Cart with Acme Circles T-Shirt" }],
    test,
    expect
  });
});

If you are using Vercel AI Gateway, you can add the following to the above code:

import { runSteps, configure } from "passmark";

configure({
  ai: {
    gateway: "vercel" // Make sure to set AI_GATEWAY_API_KEY in your .env file
  }
});

To run the test, use:

npx playwright test example.spec.ts --project chromium

After the test completes, you can run npx playwright show-report to see a detailed report of the test execution, including an AI summary at the top, provided by Passmark.

Features

  • Core ExecutionrunSteps() and runUserFlow() for flexible test orchestration in natural language, with smart caching and auto-healing
  • Multi-Model Assertion Engine — Consensus-based validation using Claude and Gemini, with an arbiter model to resolve disagreements
  • Redis-Based Step Caching — Cache-first execution with AI fallback and automatic self-healing when cached steps fail
  • Configurable AI Models — 8 dedicated model slots for step execution, assertions, extraction, and more
  • AI Gateway Support — Route requests through Vercel AI Gateway or connect directly to provider SDKs
  • Dynamic Placeholders — Inject values at runtime with {{run.*}}, {{global.*}}, {{data.*}}, and {{email.*}} expressions for repeatable and data-driven tests
  • Email Extraction — Pluggable email provider interface with a built-in emailsink provider
  • AI-Powered Data Extraction — Extract structured values from page snapshots and URLs using AI
  • Smart Wait Conditions — AI-evaluated wait conditions with exponential backoff. No rigid selectors or time-based waits needed.
  • Secure Script Runner — AST-validated Playwright script execution with an allowlisted API surface
  • Telemetry — Optional Axiom and OpenTelemetry tracing via environment variables
  • Structured Logging — Pino-based logger with configurable log levels
  • Global Configuration — Single configure() entry point for models, gateway, email provider, and upload path

Core Functions

runSteps(options: RunStepsOptions)

Executes a sequence of steps using AI with caching. Each step is described in natural language and executed via Playwright.

await runSteps({
  page,
  userFlow: "Checkout Flow",
  steps: [
    { description: "Add item to cart" },
    { description: "Go to checkout" },
    { description: "Fill in shipping details", data: { value: "123 Main St" } },
  ],
  assertions: [{ assertion: "Order confirmation is displayed" }],
  test,
  expect,
});

runUserFlow(options: UserFlowOptions)

Runs a complete user flow as a single AI agent call. Best for exploratory testing where exact steps are flexible.

const result = await runUserFlow({
  page,
  userFlow: "Complete a purchase",
  steps: "Navigate to store, add an item, checkout with test card",
  effort: "high", // by default "low" uses gemini-3-flash for faster execution; "high" uses gemini-3.1-pro-preview for deeper thinking
});

assert(options: AssertionOptions)

Multi-model consensus assertion. Runs Claude and Gemini in parallel; if they disagree, a third model arbitrates.

const result = await assert({
  page,
  assertion: "The dashboard shows 3 active projects",
  expect,
});

Configuration

Call configure() once before using any functions:

import { configure } from "passmark";

configure({
  ai: {
    gateway: "none", // "none" (default) or "vercel"
    models: {
      stepExecution: "google/gemini-3-flash",
      utility: "google/gemini-2.5-flash",
    },
  },
  uploadBasePath: "./uploads",
});

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | REDIS_URL | No | - | Redis connection URL for step caching and global state | | ANTHROPIC_API_KEY | Yes | - | Anthropic API key for Claude models | | GOOGLE_GENERATIVE_AI_API_KEY | Yes | - | Google API key for Gemini models | | AI_GATEWAY_API_KEY | If gateway=vercel | - | Vercel AI Gateway API key | | AXIOM_TOKEN | No | - | Axiom token for OpenTelemetry tracing | | AXIOM_DATASET | No | - | Axiom dataset for trace storage | | PASSMARK_LOG_LEVEL | No | info | Log level: debug, info, warn, error, silent |

Model Configuration

All models are configurable via configure({ ai: { models: { ... } } }):

| Key | Default | Used For | |-----|---------|----------| | stepExecution | google/gemini-3-flash | Executing individual steps | | userFlowLow | google/gemini-3-flash-preview | User flow execution (low effort) | | userFlowHigh | google/gemini-3.1-pro-preview | User flow execution (high effort) | | assertionPrimary | anthropic/claude-4.5-haiku | Primary assertion model (Claude) | | assertionSecondary | google/gemini-3-flash | Secondary assertion model (Gemini) | | assertionArbiter | google/gemini-3.1-pro-preview | Arbiter for assertion disagreements | | utility | google/gemini-2.5-flash | Data extraction, wait conditions |

Caching

Passmark caches successful step actions in Redis. On subsequent runs, cached steps execute directly without AI calls, dramatically reducing latency and cost.

  • Steps are cached by userFlow + step.description
  • Set bypassCache: true on individual steps or the entire run to force AI execution
  • Cache is automatically bypassed on Playwright retries
  • Caching only applies to runSteps. As of now, only those AI executions that are single-step are cached as multi-step actions can vary widely and are less likely to be identical on subsequent runs. We're exploring ways to safely cache multi-step flows.

Telemetry

Telemetry is opt-in. Set AXIOM_TOKEN and AXIOM_DATASET to enable OpenTelemetry tracing via Axiom. All AI calls are wrapped with withSpan for observability.

Without these env vars, telemetry is a no-op.

Configure Axiom to get a rich dashboard like this:

Axiom Dashboard

Email Extraction

Configure an email provider for testing flows that involve email verification. By default, you can use the emailsink provider, which provides disposable email addresses and an API to fetch received emails. The free tier doesn't need any credentials, but for more reliability and flexible rate limits, you can sign up for an account and use your EMAILSINK_API_KEY. Reach out to us if you want to get an API key.

import { configure } from "passmark";
import { emailsinkProvider } from "passmark/providers/emailsink";

configure({
  email: emailsinkProvider({ apiKey: process.env.EMAILSINK_API_KEY }),
});

Or implement a custom provider:

configure({
  email: {
    domain: "my-test-mail.com",
    extractContent: async ({ email, prompt }) => {
      // Fetch and extract content from your email service
      return extractedValue;
    },
  },
});

Use in steps with the {{email.*}} placeholder pattern:

{
  description: "Enter the verification code",
  data: { value: "{{email.otp:get the 6 digit verification code:{{run.dynamicEmail}}}}" }
}

Placeholder System

Dynamic values can be injected into step data using placeholders:

| Pattern | Scope | Description | |---------|-------|-------------| | {{run.email}} | Single test | Random email (faker) | | {{run.dynamicEmail}} | Single test | Email using configured domain | | {{run.fullName}} | Single test | Random full name | | {{run.shortid}} | Single test | Short unique ID | | {{run.phoneNumber}} | Single test | Random phone number | | {{global.email}} | All tests in an execution | Shared across runSteps calls with same executionId | | {{global.dynamicEmail}} | All tests in an execution | Shared dynamic email | | {{data.key}} | Per project | Stored in Redis, managed via project settings | | {{email.type:prompt}} | Resolved lazily | Extract content from received email |

Architecture Overview

Step Request
    |
    v
[Cache Check] --hit--> [Execute Cached Action] --success--> Done
    |                          |
    miss                     fail (auto-heal)
    |                          |
    v                          v
[AI Execution] ---------> [Cache Result]
    |
    v
[Assertions] (Claude + Gemini consensus)

Known Limitations

  • Tests are not comprehensive at the moment. We welcome contributions to expand test coverage, especially around edge cases and failure modes.

Contributing

See CONTRIBUTING.md for development setup, code style, and PR workflow.

License

FSL-1.1-Apache-2.0 - Functional Source License, Version 1.1, with Apache 2.0 future license.