@mcp-testing/server-tester
v0.10.2
Published
Playwright-based testing and evaluation framework for MCP servers
Readme
@mcp-testing/server-tester
Playwright-based testing framework for MCP servers
[!WARNING] Experimental Project - This library is in active development. APIs may change, and we welcome contributions, feedback, and collaboration as we evolve the framework. See CONTRIBUTING.md for details.
@mcp-testing/server-tester is a comprehensive testing and evaluation framework for Model Context Protocol (MCP) servers. It provides first-class Playwright fixtures, data-driven eval datasets, and optional LLM-as-a-judge scoring.
What's Included
This framework provides two complementary approaches for testing MCP servers:
1. Automated Testing (Playwright Tests)
Write deterministic, automated tests using standard Playwright patterns with MCP-specific fixtures. Perfect for:
- Direct tool calls with expected outputs
- Protocol conformance validation
- Integration testing with your MCP server
- CI/CD pipelines
test('read a file', async ({ mcp }) => {
const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
expect(result.content).toContain('Hello');
});2. Evaluation Datasets (Evals) ⚠️ Experimental
Run deeper, more subjective analysis using dataset-driven evaluations. Includes:
- Schema validation (deterministic)
- Text and regex pattern matching (deterministic)
- LLM-as-a-judge scoring (non-deterministic)
Note: Evals, particularly those using LLM-as-a-judge, are highly experimental due to their non-deterministic nature. Results may vary between runs, and prompts may need tuning for your specific use case.
const result = await runEvalDataset({ dataset, expectations }, { mcp });
expect(result.passed).toBe(result.total);Features
- 🎭 Playwright Integration - Use MCP servers in Playwright tests with idiomatic fixtures
- 📊 Matrix Evals - Run dataset-driven evaluations across multiple transports
- 📸 Snapshot Testing - Capture and compare deterministic responses with optional sanitizers for variable data
- 🤖 LLM-as-a-Judge - Optional semantic evaluation using OpenAI or Anthropic
- 🔌 Multiple Transports - Support for both stdio (local) and HTTP (remote) connections
- ✅ Protocol Conformance - Built-in checks for MCP spec compliance
Installation
npm install --save-dev @mcp-testing/server-tester @playwright/test zodNote: Additional dependencies for LLM-as-a-judge are optional and only needed if you plan to use semantic evaluation:
# For OpenAI judge (optional)
npm install --save-dev openai @openai/agents
# For Anthropic judge (optional)
npm install --save-dev @anthropic-ai/sdkQuick Start
Initialize with CLI
The fastest way to get started:
npx mcp-test init
# Follow the interactive prompts to create:
# - playwright.config.ts (configured for your MCP server)
# - tests/mcp.spec.ts (example tests)
# - data/example-dataset.json (sample eval dataset)
# - package.json (with all dependencies)See the CLI Guide for all options.
Example: Testing in Action
Here's what a complete test suite looks like (following the layered testing pattern):
// tests/mcp.spec.ts
import { test, expect } from '@mcp-testing/server-tester/fixtures/mcp';
import {
loadEvalDataset,
runEvalDataset,
createSchemaExpectation,
} from '@mcp-testing/server-tester';
import { z } from 'zod';
// Layer 1: MCP Protocol Conformance
test.describe('MCP Protocol Conformance', () => {
test('should return valid server info', async ({ mcp }) => {
const info = mcp.getServerInfo();
expect(info).toBeTruthy();
expect(info?.name).toBeTruthy();
expect(info?.version).toBeTruthy();
});
test('should list available tools', async ({ mcp }) => {
const tools = await mcp.listTools();
expect(Array.isArray(tools)).toBe(true);
expect(tools.length).toBeGreaterThan(0);
});
test('should handle invalid tool gracefully', async ({ mcp }) => {
const result = await mcp.callTool('nonexistent_tool', {});
expect(result.isError).toBe(true);
});
});
// Layer 2: Direct Tool Testing
test.describe('File Operations', () => {
test('should read a file', async ({ mcp }) => {
const result = await mcp.callTool('read_file', {
path: '/tmp/test.txt',
});
expect(result.content).toContain('Hello');
});
});
// Layer 3: Eval Datasets
test('file operations eval', async ({ mcp }) => {
const FileContentSchema = z.object({
content: z.string(),
});
const dataset = await loadEvalDataset('./data/evals.json', {
schemas: { 'file-content': FileContentSchema },
});
const result = await runEvalDataset(
{
dataset,
expectations: {
schema: createSchemaExpectation(dataset),
},
},
{ mcp }
);
expect(result.passed).toBe(result.total);
});// data/evals.json
{
"name": "file-ops",
"cases": [
{
"id": "read-config",
"toolName": "read_file",
"args": { "path": "/tmp/config.json" },
"expectedSchemaName": "file-content"
},
{
"id": "read-readme",
"toolName": "read_file",
"args": { "path": "/tmp/README.md" },
"expectedTextContains": ["# Welcome", "## Installation"]
}
]
}// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
testDir: './tests',
projects: [
{
name: 'mcp-local',
use: {
mcpConfig: {
transport: 'stdio',
command: 'node',
args: ['path/to/your/server.js'],
},
},
},
],
});Documentation
- Quick Start Guide - Detailed setup and configuration
- Expectations - All validation types (exact, schema, regex, text contains, snapshot, LLM judge)
- API Reference - Complete API documentation
- CLI Commands -
init,generate,login, andtokencommand details - UI Reporter - Interactive web UI for test results
- Transports - Stdio vs HTTP configuration
- Development - Contributing and building
Examples
The examples/ directory contains complete working examples:
Real MCP Server Tests:
filesystem-server/- Test suite for Anthropic's Filesystem MCP server- Demonstrates
fixturify-projectfor isolated test fixtures - Zod schema validation for JSON files
- 5 Playwright tests, 11 eval dataset cases
- Demonstrates
sqlite-server/- Test suite for SQLite MCP server- Demonstrates
better-sqlite3for database testing - Custom expectations for record count validation
- 11 Playwright tests, 14 eval dataset cases
- Demonstrates
Basic Patterns:
basic-playwright-usage/- Simple Playwright test patterns
Each example includes complete test suites, eval datasets, and npm scripts. See examples/README.md for detailed documentation.
Key Concepts
Fixtures
Access MCP servers in tests via Playwright fixtures:
mcpClient: Client- Raw MCP SDK clientmcp: MCPFixtureApi- High-level test API with helper methods
Expectations
Validate tool responses with multiple expectation types:
- Exact Match - Structured JSON equality
- Schema - Zod validation
- Text Contains - Substring matching (great for markdown)
- Regex - Pattern matching
- LLM Judge - Semantic evaluation
See Expectations Guide for details.
Transports
Connect to MCP servers via:
- stdio - Local server processes
- HTTP - Remote servers
See Transports Guide for configuration.
Snapshot Testing
Snapshot testing captures tool responses and compares them against stored baselines. This works best for deterministic responses like help text, configuration, or schema discovery.
Note: For responses with timestamps, IDs, or live data, use sanitizers to normalize variable content, or consider schema validation instead.
# Generate dataset with snapshot expectations
npx mcp-test generate --snapshot -o data/evals.json
# First run captures snapshots
npx playwright test
# Update snapshots when server behavior changes
npx playwright test --update-snapshotsFor responses with variable data, use sanitizers:
{
"id": "get-user",
"toolName": "get_user",
"args": { "id": "123" },
"expectedSnapshot": "user-profile",
"snapshotSanitizers": ["uuid", "iso-date", { "remove": ["lastLoginAt"] }]
}See the Expectations Guide for when to use snapshots vs other validation methods.
CLI OAuth Authentication
For MCP servers that require OAuth authentication, the framework provides a CLI-based OAuth flow:
Interactive Login
# Authenticate with an MCP server (opens browser)
npx mcp-test login https://api.example.com/mcp
# Force re-authentication
npx mcp-test login https://api.example.com/mcp --forceToken Storage
Tokens are cached locally and automatically refreshed when expired.
Storage locations:
- Linux:
$XDG_STATE_HOME/mcp-tests/<server-key>/or~/.local/state/mcp-tests/<server-key>/ - macOS:
~/.local/state/mcp-tests/<server-key>/ - Windows:
%LOCALAPPDATA%\mcp-tests\<server-key>\
Security:
- Directory permissions:
0700(owner only) - File permissions:
0600(owner read/write only) - Files stored:
tokens.json,client.json,server.json
Use --state-dir to override the storage location.
Programmatic Usage
import { CLIOAuthClient } from '@mcp-testing/server-tester';
const client = new CLIOAuthClient({
mcpServerUrl: 'https://api.example.com/mcp',
});
// Get a valid access token (cached, refreshed, or new)
const result = await client.getAccessToken();
console.log(`Token: ${result.accessToken}`);CI/CD Usage (GitHub Actions)
For automated testing in CI, tokens can be provided via environment variables:
# .github/workflows/mcp-tests.yml
jobs:
test:
runs-on: ubuntu-latest
env:
MCP_ACCESS_TOKEN: ${{ secrets.MCP_ACCESS_TOKEN }}
MCP_REFRESH_TOKEN: ${{ secrets.MCP_REFRESH_TOKEN }}
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm run test:playwrightTo set up GitHub Actions secrets:
- Authenticate locally:
npx mcp-test login <server-url> - Export tokens for GitHub:
npx mcp-test token <server-url> --format gh - Run the output
gh secret setcommands (requires GitHub CLI)
The token command supports multiple formats:
env(default) - Shell-compatibleKEY=valuepairsjson- JSON object for scriptinggh- Ready-to-paste GitHub CLI commands
See the CLI Guide for details.
Alternatively, inject tokens programmatically in your test setup:
import { injectTokens } from '@mcp-testing/server-tester';
// In globalSetup.ts
await injectTokens('https://api.example.com/mcp', {
accessToken: process.env.MCP_ACCESS_TOKEN!,
tokenType: 'Bearer',
});UI Reporter
Interactive web UI for visualizing test results:

Add to your playwright.config.ts:
export default defineConfig({
reporter: [['list'], ['@mcp-testing/server-tester/reporters/mcpReporter']],
});See UI Reporter Guide for features and usage.
Support
- Documentation: See
docs/directory - Examples: See
examples/directory - Issues: GitHub Issues
License
MIT
Contributing
Contributions welcome! See Development Guide for setup instructions.
Credits
Built with:
