npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@mcp-testing/server-tester

v0.10.2

Published

Playwright-based testing and evaluation framework for MCP servers

Readme

@mcp-testing/server-tester

npm version CI License: MIT Node.js Version TypeScript

Playwright-based testing framework for MCP servers

[!WARNING] Experimental Project - This library is in active development. APIs may change, and we welcome contributions, feedback, and collaboration as we evolve the framework. See CONTRIBUTING.md for details.

@mcp-testing/server-tester is a comprehensive testing and evaluation framework for Model Context Protocol (MCP) servers. It provides first-class Playwright fixtures, data-driven eval datasets, and optional LLM-as-a-judge scoring.

What's Included

This framework provides two complementary approaches for testing MCP servers:

1. Automated Testing (Playwright Tests)

Write deterministic, automated tests using standard Playwright patterns with MCP-specific fixtures. Perfect for:

  • Direct tool calls with expected outputs
  • Protocol conformance validation
  • Integration testing with your MCP server
  • CI/CD pipelines
test('read a file', async ({ mcp }) => {
  const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
  expect(result.content).toContain('Hello');
});

2. Evaluation Datasets (Evals) ⚠️ Experimental

Run deeper, more subjective analysis using dataset-driven evaluations. Includes:

  • Schema validation (deterministic)
  • Text and regex pattern matching (deterministic)
  • LLM-as-a-judge scoring (non-deterministic)

Note: Evals, particularly those using LLM-as-a-judge, are highly experimental due to their non-deterministic nature. Results may vary between runs, and prompts may need tuning for your specific use case.

const result = await runEvalDataset({ dataset, expectations }, { mcp });
expect(result.passed).toBe(result.total);

Features

  • 🎭 Playwright Integration - Use MCP servers in Playwright tests with idiomatic fixtures
  • 📊 Matrix Evals - Run dataset-driven evaluations across multiple transports
  • 📸 Snapshot Testing - Capture and compare deterministic responses with optional sanitizers for variable data
  • 🤖 LLM-as-a-Judge - Optional semantic evaluation using OpenAI or Anthropic
  • 🔌 Multiple Transports - Support for both stdio (local) and HTTP (remote) connections
  • Protocol Conformance - Built-in checks for MCP spec compliance

Installation

npm install --save-dev @mcp-testing/server-tester @playwright/test zod

Note: Additional dependencies for LLM-as-a-judge are optional and only needed if you plan to use semantic evaluation:

# For OpenAI judge (optional)
npm install --save-dev openai @openai/agents

# For Anthropic judge (optional)
npm install --save-dev @anthropic-ai/sdk

Quick Start

Initialize with CLI

The fastest way to get started:

npx mcp-test init

# Follow the interactive prompts to create:
# - playwright.config.ts (configured for your MCP server)
# - tests/mcp.spec.ts (example tests)
# - data/example-dataset.json (sample eval dataset)
# - package.json (with all dependencies)

See the CLI Guide for all options.

Example: Testing in Action

Here's what a complete test suite looks like (following the layered testing pattern):

// tests/mcp.spec.ts
import { test, expect } from '@mcp-testing/server-tester/fixtures/mcp';
import {
  loadEvalDataset,
  runEvalDataset,
  createSchemaExpectation,
} from '@mcp-testing/server-tester';
import { z } from 'zod';

// Layer 1: MCP Protocol Conformance
test.describe('MCP Protocol Conformance', () => {
  test('should return valid server info', async ({ mcp }) => {
    const info = mcp.getServerInfo();
    expect(info).toBeTruthy();
    expect(info?.name).toBeTruthy();
    expect(info?.version).toBeTruthy();
  });

  test('should list available tools', async ({ mcp }) => {
    const tools = await mcp.listTools();
    expect(Array.isArray(tools)).toBe(true);
    expect(tools.length).toBeGreaterThan(0);
  });

  test('should handle invalid tool gracefully', async ({ mcp }) => {
    const result = await mcp.callTool('nonexistent_tool', {});
    expect(result.isError).toBe(true);
  });
});

// Layer 2: Direct Tool Testing
test.describe('File Operations', () => {
  test('should read a file', async ({ mcp }) => {
    const result = await mcp.callTool('read_file', {
      path: '/tmp/test.txt',
    });
    expect(result.content).toContain('Hello');
  });
});

// Layer 3: Eval Datasets
test('file operations eval', async ({ mcp }) => {
  const FileContentSchema = z.object({
    content: z.string(),
  });

  const dataset = await loadEvalDataset('./data/evals.json', {
    schemas: { 'file-content': FileContentSchema },
  });

  const result = await runEvalDataset(
    {
      dataset,
      expectations: {
        schema: createSchemaExpectation(dataset),
      },
    },
    { mcp }
  );

  expect(result.passed).toBe(result.total);
});
// data/evals.json
{
  "name": "file-ops",
  "cases": [
    {
      "id": "read-config",
      "toolName": "read_file",
      "args": { "path": "/tmp/config.json" },
      "expectedSchemaName": "file-content"
    },
    {
      "id": "read-readme",
      "toolName": "read_file",
      "args": { "path": "/tmp/README.md" },
      "expectedTextContains": ["# Welcome", "## Installation"]
    }
  ]
}
// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  projects: [
    {
      name: 'mcp-local',
      use: {
        mcpConfig: {
          transport: 'stdio',
          command: 'node',
          args: ['path/to/your/server.js'],
        },
      },
    },
  ],
});

Documentation

Examples

The examples/ directory contains complete working examples:

Real MCP Server Tests:

  • filesystem-server/ - Test suite for Anthropic's Filesystem MCP server

    • Demonstrates fixturify-project for isolated test fixtures
    • Zod schema validation for JSON files
    • 5 Playwright tests, 11 eval dataset cases
  • sqlite-server/ - Test suite for SQLite MCP server

    • Demonstrates better-sqlite3 for database testing
    • Custom expectations for record count validation
    • 11 Playwright tests, 14 eval dataset cases

Basic Patterns:

Each example includes complete test suites, eval datasets, and npm scripts. See examples/README.md for detailed documentation.

Key Concepts

Fixtures

Access MCP servers in tests via Playwright fixtures:

  • mcpClient: Client - Raw MCP SDK client
  • mcp: MCPFixtureApi - High-level test API with helper methods

Expectations

Validate tool responses with multiple expectation types:

  • Exact Match - Structured JSON equality
  • Schema - Zod validation
  • Text Contains - Substring matching (great for markdown)
  • Regex - Pattern matching
  • LLM Judge - Semantic evaluation

See Expectations Guide for details.

Transports

Connect to MCP servers via:

  • stdio - Local server processes
  • HTTP - Remote servers

See Transports Guide for configuration.

Snapshot Testing

Snapshot testing captures tool responses and compares them against stored baselines. This works best for deterministic responses like help text, configuration, or schema discovery.

Note: For responses with timestamps, IDs, or live data, use sanitizers to normalize variable content, or consider schema validation instead.

# Generate dataset with snapshot expectations
npx mcp-test generate --snapshot -o data/evals.json

# First run captures snapshots
npx playwright test

# Update snapshots when server behavior changes
npx playwright test --update-snapshots

For responses with variable data, use sanitizers:

{
  "id": "get-user",
  "toolName": "get_user",
  "args": { "id": "123" },
  "expectedSnapshot": "user-profile",
  "snapshotSanitizers": ["uuid", "iso-date", { "remove": ["lastLoginAt"] }]
}

See the Expectations Guide for when to use snapshots vs other validation methods.

CLI OAuth Authentication

For MCP servers that require OAuth authentication, the framework provides a CLI-based OAuth flow:

Interactive Login

# Authenticate with an MCP server (opens browser)
npx mcp-test login https://api.example.com/mcp

# Force re-authentication
npx mcp-test login https://api.example.com/mcp --force

Token Storage

Tokens are cached locally and automatically refreshed when expired.

Storage locations:

  • Linux: $XDG_STATE_HOME/mcp-tests/<server-key>/ or ~/.local/state/mcp-tests/<server-key>/
  • macOS: ~/.local/state/mcp-tests/<server-key>/
  • Windows: %LOCALAPPDATA%\mcp-tests\<server-key>\

Security:

  • Directory permissions: 0700 (owner only)
  • File permissions: 0600 (owner read/write only)
  • Files stored: tokens.json, client.json, server.json

Use --state-dir to override the storage location.

Programmatic Usage

import { CLIOAuthClient } from '@mcp-testing/server-tester';

const client = new CLIOAuthClient({
  mcpServerUrl: 'https://api.example.com/mcp',
});

// Get a valid access token (cached, refreshed, or new)
const result = await client.getAccessToken();
console.log(`Token: ${result.accessToken}`);

CI/CD Usage (GitHub Actions)

For automated testing in CI, tokens can be provided via environment variables:

# .github/workflows/mcp-tests.yml
jobs:
  test:
    runs-on: ubuntu-latest
    env:
      MCP_ACCESS_TOKEN: ${{ secrets.MCP_ACCESS_TOKEN }}
      MCP_REFRESH_TOKEN: ${{ secrets.MCP_REFRESH_TOKEN }}
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run test:playwright

To set up GitHub Actions secrets:

  1. Authenticate locally: npx mcp-test login <server-url>
  2. Export tokens for GitHub: npx mcp-test token <server-url> --format gh
  3. Run the output gh secret set commands (requires GitHub CLI)

The token command supports multiple formats:

  • env (default) - Shell-compatible KEY=value pairs
  • json - JSON object for scripting
  • gh - Ready-to-paste GitHub CLI commands

See the CLI Guide for details.

Alternatively, inject tokens programmatically in your test setup:

import { injectTokens } from '@mcp-testing/server-tester';

// In globalSetup.ts
await injectTokens('https://api.example.com/mcp', {
  accessToken: process.env.MCP_ACCESS_TOKEN!,
  tokenType: 'Bearer',
});

UI Reporter

Interactive web UI for visualizing test results:

MCP Test Reporter UI

Add to your playwright.config.ts:

export default defineConfig({
  reporter: [['list'], ['@mcp-testing/server-tester/reporters/mcpReporter']],
});

See UI Reporter Guide for features and usage.

Support

License

MIT

Contributing

Contributions welcome! See Development Guide for setup instructions.

Credits

Built with: