mcpspec
v1.2.2
Published
The definitive MCP server testing platform
Maintainers
Readme
Why MCPSpec?
Deterministic. Unlike LLM-based testing, MCPSpec runs are fast and repeatable. No flaky tests, no token costs. Ideal for CI.
Secure. Catch Tool Poisoning (prompt injection in tool descriptions) and Excessive Agency (destructive tools without safeguards) before they reach production.
Collaborative. Record server interactions once, share mock servers with your team. Frontend developers and CI pipelines can test against mocks — no API keys, no live dependencies.
Quick Start
# Install
npm install -g mcpspec
# Explore a server interactively
mcpspec inspect "npx @modelcontextprotocol/server-filesystem /tmp"
# Record a session — no test code needed
mcpspec record start "npx my-server"
# Generate a mock for your team
mcpspec mock my-recording --generate ./mocks/server.js
# Add CI gating in 30 seconds
mcpspec ci-initTry it in 30 seconds with a pre-built community collection — no setup required:
mcpspec test examples/collections/servers/filesystem.yaml
mcpspec test examples/collections/servers/time.yaml --tag smokeSee all 70 community tests for 7 popular MCP servers.
Record & Mock
Record a session once. Replay it to catch regressions. Mock it for CI. No API keys required.
# 1. Record a session against your real server
mcpspec record start "npx my-server"
mcpspec> .call get_user {"id": "1"}
mcpspec> .call list_items {}
mcpspec> .save my-api
# 2. Replay against a new version — catch regressions instantly
mcpspec record replay my-api "npx my-server-v2"
# 3. Start a mock server — drop-in replacement, zero dependencies
mcpspec mock my-api
# 4. Generate a standalone .js file — commit to your repo
mcpspec mock my-api --generate ./mocks/server.js
node ./mocks/server.jsReplay output shows exactly what changed:
Replaying 3 steps...
1/3 get_user (id=1)... [OK] 42ms → {"name": "Alice"}
2/3 list_items... [CHANGED] 38ms → {"items": [...]}
3/3 create_item (name=test) [OK] 51ms → {"id": "abc"}
Summary: 2 matched, 1 changed, 0 added, 0 removedMock options:
| Option | Effect |
|--------|--------|
| --mode match (default) | Exact input match first, then next queued response per tool |
| --mode sequential | Tape/cassette style — responses served in recorded order |
| --latency original | Simulate original response times |
| --latency 100 | Fixed 100ms delay |
| --on-missing empty | Return empty instead of error for unrecorded tools |
| --generate <path> | Output standalone .js file (only needs @modelcontextprotocol/sdk) |
Manage recordings:
mcpspec record list # List saved recordings
mcpspec record delete my-session # Delete a recordingCI/CD Integration
ci-init generates ready-to-use pipeline configurations. Deterministic exit codes and JUnit/JSON/TAP reporters for seamless CI integration.
mcpspec ci-init # Interactive wizard
mcpspec ci-init --platform github # GitHub Actions
mcpspec ci-init --platform gitlab # GitLab CI
mcpspec ci-init --platform shell # Shell script
mcpspec ci-init --checks test,audit,score # Choose checks
mcpspec ci-init --fail-on medium # Audit severity gate
mcpspec ci-init --min-score 70 # MCP Score threshold
mcpspec ci-init --force # Overwrite/replace existingAuto-detects platform from .github/ or .gitlab-ci.yml. GitLab --force surgically replaces only the mcpspec job block, preserving other jobs.
GitHub Actions example (generated by mcpspec ci-init --platform github):
name: MCP Server Tests
on: [push, pull_request]
jobs:
mcpspec:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
- run: npm install -g mcpspec
- name: Run tests
run: mcpspec test --ci --reporter junit --output results.xml
- name: Security audit
run: mcpspec audit "npx my-server" --fail-on high
- name: Quality gate
run: mcpspec score "npx my-server" --min-score 80
- uses: mikepenz/action-junit-report@v4
if: always()
with:
report_paths: results.xmlExit codes:
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Test failure |
| 2 | Runtime error |
| 3 | Configuration error |
| 4 | Connection error |
| 5 | Timeout |
| 6 | Security findings above threshold |
| 7 | Validation error |
| 130 | Interrupted (Ctrl+C) |
Reporters: console (default), json, junit, html, tap.
Security Audit
Prevent Tool Poisoning. 8 security rules covering traditional vulnerabilities and LLM-specific threats. A safety filter auto-skips destructive tools, and --dry-run previews targets before scanning.
mcpspec audit "npx my-server" # Passive (safe)
mcpspec audit "npx my-server" --mode active # Active probing
mcpspec audit "npx my-server" --fail-on medium # CI gate
mcpspec audit "npx my-server" --exclude-tools delete # Skip tools
mcpspec audit "npx my-server" --dry-run # Preview targetsSecurity rules:
| Rule | Mode | What it detects |
|------|------|-----------------|
| Tool Poisoning | Passive | LLM prompt injection in descriptions, hidden Unicode, cross-tool manipulation |
| Excessive Agency | Passive | Destructive tools without confirmation params, arbitrary code execution |
| Path Traversal | Passive | ../../etc/passwd style directory escape attacks |
| Input Validation | Passive | Missing constraints (enum, pattern, min/max) on tool inputs |
| Info Disclosure | Passive | Leaked paths, stack traces, API keys in tool descriptions |
| Resource Exhaustion | Active | Unbounded loops, large allocations, recursion |
| Auth Bypass | Active | Missing auth checks, hardcoded credentials |
| Injection | Active | SQL and command injection in tool inputs |
Scan modes:
- Passive (default) — 5 rules, analyzes metadata only, no tool calls. Safe for production.
- Active — All 8 rules, sends test payloads. Requires confirmation prompt.
- Aggressive — All 8 rules with more exhaustive probing. Requires confirmation prompt.
Active/aggressive modes auto-skip tools matching destructive patterns (delete_*, drop_*, destroy_*, etc.) and require explicit confirmation unless --acknowledge-risk is passed.
Each finding includes severity (info/low/medium/high/critical), description, evidence, and remediation advice.
Test Collections
Write tests in YAML with 10 assertion types, environments, variable extraction, tags, retries, and parallel execution.
name: Filesystem Tests
server: npx @modelcontextprotocol/server-filesystem /tmp
tests:
- name: Read a file
call: read_file
with:
path: /tmp/test.txt
expect:
- exists: $.content
- type: [$.content, string]
- name: Handle missing file
call: read_file
with:
path: /tmp/nonexistent.txt
expectError: trueAdvanced features — environments, tags, retries, variable extraction, expressions:
schemaVersion: "1.0"
name: Advanced Tests
server:
command: npx
args: ["my-mcp-server"]
env:
NODE_ENV: test
environments:
dev:
variables:
BASE_PATH: /tmp/dev
staging:
variables:
BASE_PATH: /tmp/staging
defaultEnvironment: dev
tests:
- id: create-data
name: Create data
tags: [smoke, write]
timeout: 5000
retries: 2
call: create_item
with:
name: "test-item"
assertions:
- type: schema
- type: exists
path: $.id
- type: latency
maxMs: 1000
extract:
- name: itemId
path: $.id
- id: verify-data
name: Verify created data
tags: [smoke, read]
call: get_item
with:
id: "{{itemId}}"
assertions:
- type: equals
path: $.name
value: "test-item"
- type: expression
expr: "response.id == itemId"Assertion types:
| Type | Description | Example |
|------|-------------|---------|
| schema | Validate response structure | type: schema |
| equals | Exact match (deep comparison) | path: $.id, value: 123 |
| contains | Array or string contains value | path: $.tags, value: "active" |
| exists | Path exists and is not null | path: $.name |
| matches | Regex pattern match | path: $.email, pattern: ".*@.*" |
| type | Type check | path: $.count, expected: number |
| length | Array/string length | path: $.items, operator: gt, value: 0 |
| latency | Response time threshold | maxMs: 1000 |
| mimeType | Content type validation | expected: "image/png" |
| expression | Safe expression eval | expr: "response.total > 0" |
Expressions use expr-eval — comparisons, logical operators, property access, math. No arbitrary code execution.
Shorthand format for common assertions:
expect:
- exists: $.field
- equals: [$.id, 123]
- contains: [$.tags, "active"]
- matches: [$.email, ".*@.*"]Run options:
mcpspec test ./tests.yaml # Specific file
mcpspec test --env staging # Switch environment
mcpspec test --tag @smoke # Filter by tag
mcpspec test --parallel 4 # Parallel execution
mcpspec test --reporter junit --output results.xml
mcpspec test --baseline main # Compare against baseline
mcpspec test --watch # Re-run on file changes
mcpspec test --ci # CI mode (no colors)MCP Score
A 0-100 quality rating across 5 weighted categories with opinionated schema linting. Use as a CI gate or generate a badge for your README.
mcpspec score "npx my-server"
mcpspec score "npx my-server" --badge badge.svg # Generate SVG badge
mcpspec score "npx my-server" --min-score 80 # Fail if below thresholdScoring categories:
| Category (weight) | What it measures |
|--------------------|-----------------|
| Documentation (25%) | Percentage of tools and resources with descriptions |
| Schema Quality (25%) | Property types, descriptions, required fields, constraints (enum/pattern/min/max), naming conventions |
| Error Handling (20%) | Structured error responses (isError: true) vs. crashes on bad input |
| Responsiveness (15%) | Median latency: <100ms = 100, <500ms = 80, <1s = 60, <5s = 40 |
| Security (15%) | Findings from passive security scan: 0 = 100, <=2 = 70, <=5 = 40 |
Schema quality uses 6 sub-criteria: structure (20%), property types (20%), descriptions (20%), required fields (15%), constraints (15%), naming conventions (10%).
The --badge flag generates a shields.io-style SVG badge for your README.
More Features
Interactive Inspector
Connect to any MCP server and explore its capabilities in a live REPL.
mcpspec inspect "npx @modelcontextprotocol/server-filesystem /tmp"| Command | Description |
|---------|-------------|
| .tools | List all available tools with descriptions |
| .resources | List all available resources (URIs) |
| .call <tool> <json> | Call a tool with JSON input |
| .schema <tool> | Display tool's JSON Schema input spec |
| .info | Show server info (name, version, capabilities) |
| .help | Show help |
| .exit | Disconnect and exit |
Performance Benchmarks
Measure latency and throughput with statistical analysis across hundreds of iterations.
mcpspec bench "npx my-server" # 100 iterations
mcpspec bench "npx my-server" --iterations 500
mcpspec bench "npx my-server" --tool read_file
mcpspec bench "npx my-server" --warmup 10Output includes min/max/mean/median/P95/P99 latency, standard deviation, and throughput (calls/sec). Warmup iterations (default: 5) are excluded from measurements.
Doc Generator
Auto-generate Markdown or HTML documentation from server introspection. Zero manual writing.
mcpspec docs "npx my-server" # Markdown to stdout
mcpspec docs "npx my-server" --format html # HTML output
mcpspec docs "npx my-server" --output ./docs # Write to directoryWeb Dashboard
A full React UI for managing servers, running tests, viewing audit results, and more. Dark mode included.
mcpspec ui # Opens localhost:6274
mcpspec ui --port 8080 # Custom port
mcpspec ui --no-open # Don't auto-open browser10 pages: Dashboard, Servers, Collections, Runs, Inspector, Recordings, Audit, Benchmark, Docs, Score.
Baselines & Comparison
Save test runs as baselines and detect regressions between versions.
mcpspec baseline save main # Save current run
mcpspec baseline list # List all baselines
mcpspec test --baseline main # Compare against baseline
mcpspec compare --baseline main # Explicit comparison
mcpspec compare <run-id-1> <run-id-2> # Compare two runsTransports
MCPSpec supports 3 transport types for connecting to MCP servers:
| Transport | Use case | Connection | |-----------|----------|------------| | stdio | Local processes | Spawns child process, communicates via stdin/stdout | | SSE | Server-Sent Events | Connects to HTTP SSE endpoint | | HTTP | Streamable HTTP | POST requests to HTTP endpoint |
# stdio (default)
server:
command: npx
args: ["my-mcp-server"]
# SSE
server:
transport: sse
url: http://localhost:3000/sse
# HTTP
server:
transport: http
url: http://localhost:3000/mcpCommands
| Command | Description |
|---------|-------------|
| mcpspec record start <server> | Record an inspector session — .call, .save, .steps |
| mcpspec record replay <name> <server> | Replay a recording and diff against original |
| mcpspec mock <recording> | Mock server from recording — --mode, --latency, --on-missing, --generate |
| mcpspec test [collection] | Run test collections with --env, --tag, --parallel, --reporter, --watch, --ci |
| mcpspec audit <server> | Security scan — --mode, --fail-on, --exclude-tools, --dry-run |
| mcpspec score <server> | Quality score (0-100) — --badge badge.svg, --min-score |
| mcpspec ci-init | Generate CI config — --platform github\|gitlab\|shell, --checks, --fail-on, --force |
| mcpspec inspect <server> | Interactive REPL — .tools, .call, .schema, .resources, .info |
| mcpspec bench <server> | Performance benchmark — --iterations, --tool, --args, --warmup |
| mcpspec docs <server> | Generate docs — --format markdown\|html, --output <dir> |
| mcpspec compare | Compare test runs or --baseline <name> |
| mcpspec baseline save <name> | Save/list baselines for regression detection |
| mcpspec record list | List saved recordings |
| mcpspec record delete <name> | Delete a saved recording |
| mcpspec init [dir] | Scaffold project — --template minimal\|standard\|full |
| mcpspec ui | Launch web dashboard on localhost:6274 |
Community Collections
Pre-built test suites for popular MCP servers in examples/collections/servers/:
| Collection | Server | Tests | |------------|--------|-------| | filesystem.yaml | @modelcontextprotocol/server-filesystem | 12 | | memory.yaml | @modelcontextprotocol/server-memory | 10 | | everything.yaml | @modelcontextprotocol/server-everything | 11 | | fetch.yaml | @modelcontextprotocol/server-fetch | 7 | | time.yaml | @modelcontextprotocol/server-time | 10 | | chrome-devtools.yaml | chrome-devtools-mcp | 11 | | github.yaml | @modelcontextprotocol/server-github | 9 |
70 tests covering tool discovery, read/write operations, error handling, security edge cases, and latency.
mcpspec test examples/collections/servers/filesystem.yaml
mcpspec test examples/collections/servers/time.yaml --tag smokeArchitecture
| Package | Description |
|---------|-------------|
| @mcpspec/shared | Types, Zod schemas, constants |
| @mcpspec/core | MCP client, test runner, assertions, security scanner (8 rules), profiler, doc generator, scorer, recording/replay, mock server |
| @mcpspec/cli | 13 CLI commands built with Commander.js |
| @mcpspec/server | Hono HTTP server with REST API + WebSocket |
| @mcpspec/ui | React SPA — TanStack Router, TanStack Query, Tailwind, shadcn/ui |
Development
git clone https://github.com/light-handle/mcpspec.git
cd mcpspec
pnpm install && pnpm build
pnpm test # 334 tests across core + serverLicense
MIT
