@mcp-testing/server-tester

v0.10.2

Published

4 hours ago

Playwright-based testing and evaluation framework for MCP servers

0High
0Medium
0Low

scalvert

rwjblue

hjdivad

playwright mcp model-context-protocol evals testing llm server-testing

@mcp-testing/server-tester

Playwright-based testing framework for MCP servers

[!WARNING] Experimental Project - This library is in active development. APIs may change, and we welcome contributions, feedback, and collaboration as we evolve the framework. See CONTRIBUTING.md for details.

@mcp-testing/server-tester is a comprehensive testing and evaluation framework for Model Context Protocol (MCP) servers. It provides first-class Playwright fixtures, data-driven eval datasets, and optional LLM-as-a-judge scoring.

What's Included

This framework provides two complementary approaches for testing MCP servers:

1. Automated Testing (Playwright Tests)

Write deterministic, automated tests using standard Playwright patterns with MCP-specific fixtures. Perfect for:

Direct tool calls with expected outputs
Protocol conformance validation
Integration testing with your MCP server
CI/CD pipelines

test('read a file', async ({ mcp }) => {
  const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
  expect(result.content).toContain('Hello');
});

2. Evaluation Datasets (Evals) ⚠️ Experimental

Run deeper, more subjective analysis using dataset-driven evaluations. Includes:

Schema validation (deterministic)
Text and regex pattern matching (deterministic)
LLM-as-a-judge scoring (non-deterministic)

Note: Evals, particularly those using LLM-as-a-judge, are highly experimental due to their non-deterministic nature. Results may vary between runs, and prompts may need tuning for your specific use case.

const result = await runEvalDataset({ dataset, expectations }, { mcp });
expect(result.passed).toBe(result.total);

Features

🎭 Playwright Integration - Use MCP servers in Playwright tests with idiomatic fixtures
📊 Matrix Evals - Run dataset-driven evaluations across multiple transports
📸 Snapshot Testing - Capture and compare deterministic responses with optional sanitizers for variable data
🤖 LLM-as-a-Judge - Optional semantic evaluation using OpenAI or Anthropic
🔌 Multiple Transports - Support for both stdio (local) and HTTP (remote) connections
✅ Protocol Conformance - Built-in checks for MCP spec compliance

Installation

npm install --save-dev @mcp-testing/server-tester @playwright/test zod

Note: Additional dependencies for LLM-as-a-judge are optional and only needed if you plan to use semantic evaluation:

# For OpenAI judge (optional)
npm install --save-dev openai @openai/agents

# For Anthropic judge (optional)
npm install --save-dev @anthropic-ai/sdk

Quick Start

Initialize with CLI

The fastest way to get started:

npx mcp-test init

# Follow the interactive prompts to create:
# - playwright.config.ts (configured for your MCP server)
# - tests/mcp.spec.ts (example tests)
# - data/example-dataset.json (sample eval dataset)
# - package.json (with all dependencies)

See the CLI Guide for all options.

Example: Testing in Action

Here's what a complete test suite looks like (following the layered testing pattern):

// tests/mcp.spec.ts
import { test, expect } from '@mcp-testing/server-tester/fixtures/mcp';
import {
  loadEvalDataset,
  runEvalDataset,
  createSchemaExpectation,
} from '@mcp-testing/server-tester';
import { z } from 'zod';

// Layer 1: MCP Protocol Conformance
test.describe('MCP Protocol Conformance', () => {
  test('should return valid server info', async ({ mcp }) => {
    const info = mcp.getServerInfo();
    expect(info).toBeTruthy();
    expect(info?.name).toBeTruthy();
    expect(info?.version).toBeTruthy();
  });

  test('should list available tools', async ({ mcp }) => {
    const tools = await mcp.listTools();
    expect(Array.isArray(tools)).toBe(true);
    expect(tools.length).toBeGreaterThan(0);
  });

  test('should handle invalid tool gracefully', async ({ mcp }) => {
    const result = await mcp.callTool('nonexistent_tool', {});
    expect(result.isError).toBe(true);
  });
});

// Layer 2: Direct Tool Testing
test.describe('File Operations', () => {
  test('should read a file', async ({ mcp }) => {
    const result = await mcp.callTool('read_file', {
      path: '/tmp/test.txt',
    });
    expect(result.content).toContain('Hello');
  });
});

// Layer 3: Eval Datasets
test('file operations eval', async ({ mcp }) => {
  const FileContentSchema = z.object({
    content: z.string(),
  });

  const dataset = await loadEvalDataset('./data/evals.json', {
    schemas: { 'file-content': FileContentSchema },
  });

  const result = await runEvalDataset(
    {
      dataset,
      expectations: {
        schema: createSchemaExpectation(dataset),
      },
    },
    { mcp }
  );

  expect(result.passed).toBe(result.total);
});

// data/evals.json
{
  "name": "file-ops",
  "cases": [
    {
      "id": "read-config",
      "toolName": "read_file",
      "args": { "path": "/tmp/config.json" },
      "expectedSchemaName": "file-content"
    },
    {
      "id": "read-readme",
      "toolName": "read_file",
      "args": { "path": "/tmp/README.md" },
      "expectedTextContains": ["# Welcome", "## Installation"]
    }
  ]
}

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  projects: [
    {
      name: 'mcp-local',
      use: {
        mcpConfig: {
          transport: 'stdio',
          command: 'node',
          args: ['path/to/your/server.js'],
        },
      },
    },
  ],
});

Documentation

Quick Start Guide - Detailed setup and configuration
Expectations - All validation types (exact, schema, regex, text contains, snapshot, LLM judge)
API Reference - Complete API documentation
CLI Commands - init, generate, login, and token command details
UI Reporter - Interactive web UI for test results
Transports - Stdio vs HTTP configuration
Development - Contributing and building

Examples

The examples/ directory contains complete working examples:

Real MCP Server Tests:

filesystem-server/ - Test suite for Anthropic's Filesystem MCP server
- Demonstrates fixturify-project for isolated test fixtures
- Zod schema validation for JSON files
- 5 Playwright tests, 11 eval dataset cases
sqlite-server/ - Test suite for SQLite MCP server
- Demonstrates better-sqlite3 for database testing
- Custom expectations for record count validation
- 11 Playwright tests, 14 eval dataset cases

Basic Patterns:

basic-playwright-usage/ - Simple Playwright test patterns

Each example includes complete test suites, eval datasets, and npm scripts. See examples/README.md for detailed documentation.

Key Concepts

Fixtures

Access MCP servers in tests via Playwright fixtures:

mcpClient: Client - Raw MCP SDK client
mcp: MCPFixtureApi - High-level test API with helper methods

Expectations

Validate tool responses with multiple expectation types:

Exact Match - Structured JSON equality
Schema - Zod validation
Text Contains - Substring matching (great for markdown)
Regex - Pattern matching
LLM Judge - Semantic evaluation

See Expectations Guide for details.

Transports

Connect to MCP servers via:

stdio - Local server processes
HTTP - Remote servers

See Transports Guide for configuration.

Snapshot Testing

Snapshot testing captures tool responses and compares them against stored baselines. This works best for deterministic responses like help text, configuration, or schema discovery.

Note: For responses with timestamps, IDs, or live data, use sanitizers to normalize variable content, or consider schema validation instead.

# Generate dataset with snapshot expectations
npx mcp-test generate --snapshot -o data/evals.json

# First run captures snapshots
npx playwright test

# Update snapshots when server behavior changes
npx playwright test --update-snapshots

For responses with variable data, use sanitizers:

{
  "id": "get-user",
  "toolName": "get_user",
  "args": { "id": "123" },
  "expectedSnapshot": "user-profile",
  "snapshotSanitizers": ["uuid", "iso-date", { "remove": ["lastLoginAt"] }]
}

See the Expectations Guide for when to use snapshots vs other validation methods.

CLI OAuth Authentication

For MCP servers that require OAuth authentication, the framework provides a CLI-based OAuth flow:

Interactive Login

# Authenticate with an MCP server (opens browser)
npx mcp-test login https://api.example.com/mcp

# Force re-authentication
npx mcp-test login https://api.example.com/mcp --force

Token Storage

Tokens are cached locally and automatically refreshed when expired.

Storage locations:

Linux: $XDG_STATE_HOME/mcp-tests/<server-key>/ or ~/.local/state/mcp-tests/<server-key>/
macOS: ~/.local/state/mcp-tests/<server-key>/
Windows: %LOCALAPPDATA%\mcp-tests\<server-key>\

Security:

Directory permissions: 0700 (owner only)
File permissions: 0600 (owner read/write only)
Files stored: tokens.json, client.json, server.json

Use --state-dir to override the storage location.

Programmatic Usage

import { CLIOAuthClient } from '@mcp-testing/server-tester';

const client = new CLIOAuthClient({
  mcpServerUrl: 'https://api.example.com/mcp',
});

// Get a valid access token (cached, refreshed, or new)
const result = await client.getAccessToken();
console.log(`Token: ${result.accessToken}`);

CI/CD Usage (GitHub Actions)

For automated testing in CI, tokens can be provided via environment variables:

# .github/workflows/mcp-tests.yml
jobs:
  test:
    runs-on: ubuntu-latest
    env:
      MCP_ACCESS_TOKEN: ${{ secrets.MCP_ACCESS_TOKEN }}
      MCP_REFRESH_TOKEN: ${{ secrets.MCP_REFRESH_TOKEN }}
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run test:playwright

To set up GitHub Actions secrets:

Authenticate locally: npx mcp-test login <server-url>
Export tokens for GitHub: npx mcp-test token <server-url> --format gh
Run the output gh secret set commands (requires GitHub CLI)

The token command supports multiple formats:

env (default) - Shell-compatible KEY=value pairs
json - JSON object for scripting
gh - Ready-to-paste GitHub CLI commands

See the CLI Guide for details.

Alternatively, inject tokens programmatically in your test setup:

import { injectTokens } from '@mcp-testing/server-tester';

// In globalSetup.ts
await injectTokens('https://api.example.com/mcp', {
  accessToken: process.env.MCP_ACCESS_TOKEN!,
  tokenType: 'Bearer',
});

UI Reporter

Interactive web UI for visualizing test results:

MCP Test Reporter UI

Add to your playwright.config.ts:

export default defineConfig({
  reporter: [['list'], ['@mcp-testing/server-tester/reporters/mcpReporter']],
});

See UI Reporter Guide for features and usage.

Support

Documentation: See docs/ directory
Examples: See examples/ directory
Issues: GitHub Issues

License

MIT

Contributing

Contributions welcome! See Development Guide for setup instructions.

Credits

Built with:

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@mcp-testing/server-tester

What's Included

1. Automated Testing (Playwright Tests)

2. Evaluation Datasets (Evals) ⚠️ Experimental

Features

Installation

Quick Start

Initialize with CLI

Example: Testing in Action

Documentation

Examples

Key Concepts

Fixtures

Expectations

Transports

Snapshot Testing

CLI OAuth Authentication

Interactive Login

Token Storage

Programmatic Usage

CI/CD Usage (GitHub Actions)

UI Reporter

Support

License

Contributing

Credits