mcpspec

v1.2.2

Published

5 months ago

The definitive MCP server testing platform

0High
0Medium
0Low

lighthandle

mcp model-context-protocol testing cli

Why MCPSpec?

Deterministic. Unlike LLM-based testing, MCPSpec runs are fast and repeatable. No flaky tests, no token costs. Ideal for CI.

Secure. Catch Tool Poisoning (prompt injection in tool descriptions) and Excessive Agency (destructive tools without safeguards) before they reach production.

Collaborative. Record server interactions once, share mock servers with your team. Frontend developers and CI pipelines can test against mocks — no API keys, no live dependencies.

Quick Start

# Install
npm install -g mcpspec

# Explore a server interactively
mcpspec inspect "npx @modelcontextprotocol/server-filesystem /tmp"

# Record a session — no test code needed
mcpspec record start "npx my-server"

# Generate a mock for your team
mcpspec mock my-recording --generate ./mocks/server.js

# Add CI gating in 30 seconds
mcpspec ci-init

Try it in 30 seconds with a pre-built community collection — no setup required:

mcpspec test examples/collections/servers/filesystem.yaml
mcpspec test examples/collections/servers/time.yaml --tag smoke

See all 70 community tests for 7 popular MCP servers.

Record & Mock

Record a session once. Replay it to catch regressions. Mock it for CI. No API keys required.

# 1. Record a session against your real server
mcpspec record start "npx my-server"
mcpspec> .call get_user {"id": "1"}
mcpspec> .call list_items {}
mcpspec> .save my-api

# 2. Replay against a new version — catch regressions instantly
mcpspec record replay my-api "npx my-server-v2"

# 3. Start a mock server — drop-in replacement, zero dependencies
mcpspec mock my-api

# 4. Generate a standalone .js file — commit to your repo
mcpspec mock my-api --generate ./mocks/server.js
node ./mocks/server.js

Replay output shows exactly what changed:

Replaying 3 steps...

  1/3 get_user (id=1)...       [OK] 42ms → {"name": "Alice"}
  2/3 list_items...            [CHANGED] 38ms → {"items": [...]}
  3/3 create_item (name=test)  [OK] 51ms → {"id": "abc"}

Summary: 2 matched, 1 changed, 0 added, 0 removed

Mock options:

| Option | Effect | |--------|--------| | --mode match (default) | Exact input match first, then next queued response per tool | | --mode sequential | Tape/cassette style — responses served in recorded order | | --latency original | Simulate original response times | | --latency 100 | Fixed 100ms delay | | --on-missing empty | Return empty instead of error for unrecorded tools | | --generate <path> | Output standalone .js file (only needs @modelcontextprotocol/sdk) |

Manage recordings:

mcpspec record list                    # List saved recordings
mcpspec record delete my-session       # Delete a recording

CI/CD Integration

ci-init generates ready-to-use pipeline configurations. Deterministic exit codes and JUnit/JSON/TAP reporters for seamless CI integration.

mcpspec ci-init                                 # Interactive wizard
mcpspec ci-init --platform github               # GitHub Actions
mcpspec ci-init --platform gitlab               # GitLab CI
mcpspec ci-init --platform shell                # Shell script
mcpspec ci-init --checks test,audit,score       # Choose checks
mcpspec ci-init --fail-on medium                # Audit severity gate
mcpspec ci-init --min-score 70                  # MCP Score threshold
mcpspec ci-init --force                         # Overwrite/replace existing

Auto-detects platform from .github/ or .gitlab-ci.yml. GitLab --force surgically replaces only the mcpspec job block, preserving other jobs.

GitHub Actions example (generated by mcpspec ci-init --platform github):

name: MCP Server Tests
on: [push, pull_request]

jobs:
  mcpspec:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '22'
      - run: npm install -g mcpspec
      - name: Run tests
        run: mcpspec test --ci --reporter junit --output results.xml
      - name: Security audit
        run: mcpspec audit "npx my-server" --fail-on high
      - name: Quality gate
        run: mcpspec score "npx my-server" --min-score 80
      - uses: mikepenz/action-junit-report@v4
        if: always()
        with:
          report_paths: results.xml

Exit codes:

| Code | Meaning | |------|---------| | 0 | Success | | 1 | Test failure | | 2 | Runtime error | | 3 | Configuration error | | 4 | Connection error | | 5 | Timeout | | 6 | Security findings above threshold | | 7 | Validation error | | 130 | Interrupted (Ctrl+C) |

Reporters: console (default), json, junit, html, tap.

Security Audit

Prevent Tool Poisoning. 8 security rules covering traditional vulnerabilities and LLM-specific threats. A safety filter auto-skips destructive tools, and --dry-run previews targets before scanning.

mcpspec audit "npx my-server"                        # Passive (safe)
mcpspec audit "npx my-server" --mode active           # Active probing
mcpspec audit "npx my-server" --fail-on medium        # CI gate
mcpspec audit "npx my-server" --exclude-tools delete  # Skip tools
mcpspec audit "npx my-server" --dry-run               # Preview targets

Security rules:

| Rule | Mode | What it detects | |------|------|-----------------| | Tool Poisoning | Passive | LLM prompt injection in descriptions, hidden Unicode, cross-tool manipulation | | Excessive Agency | Passive | Destructive tools without confirmation params, arbitrary code execution | | Path Traversal | Passive | ../../etc/passwd style directory escape attacks | | Input Validation | Passive | Missing constraints (enum, pattern, min/max) on tool inputs | | Info Disclosure | Passive | Leaked paths, stack traces, API keys in tool descriptions | | Resource Exhaustion | Active | Unbounded loops, large allocations, recursion | | Auth Bypass | Active | Missing auth checks, hardcoded credentials | | Injection | Active | SQL and command injection in tool inputs |

Scan modes:

Passive (default) — 5 rules, analyzes metadata only, no tool calls. Safe for production.
Active — All 8 rules, sends test payloads. Requires confirmation prompt.
Aggressive — All 8 rules with more exhaustive probing. Requires confirmation prompt.

Active/aggressive modes auto-skip tools matching destructive patterns (delete_*, drop_*, destroy_*, etc.) and require explicit confirmation unless --acknowledge-risk is passed.

Each finding includes severity (info/low/medium/high/critical), description, evidence, and remediation advice.

Test Collections

Write tests in YAML with 10 assertion types, environments, variable extraction, tags, retries, and parallel execution.

name: Filesystem Tests
server: npx @modelcontextprotocol/server-filesystem /tmp

tests:
  - name: Read a file
    call: read_file
    with:
      path: /tmp/test.txt
    expect:
      - exists: $.content
      - type: [$.content, string]

  - name: Handle missing file
    call: read_file
    with:
      path: /tmp/nonexistent.txt
    expectError: true

Advanced features — environments, tags, retries, variable extraction, expressions:

schemaVersion: "1.0"
name: Advanced Tests

server:
  command: npx
  args: ["my-mcp-server"]
  env:
    NODE_ENV: test

environments:
  dev:
    variables:
      BASE_PATH: /tmp/dev
  staging:
    variables:
      BASE_PATH: /tmp/staging

defaultEnvironment: dev

tests:
  - id: create-data
    name: Create data
    tags: [smoke, write]
    timeout: 5000
    retries: 2
    call: create_item
    with:
      name: "test-item"
    assertions:
      - type: schema
      - type: exists
        path: $.id
      - type: latency
        maxMs: 1000
    extract:
      - name: itemId
        path: $.id

  - id: verify-data
    name: Verify created data
    tags: [smoke, read]
    call: get_item
    with:
      id: "{{itemId}}"
    assertions:
      - type: equals
        path: $.name
        value: "test-item"
      - type: expression
        expr: "response.id == itemId"

Assertion types:

| Type | Description | Example | |------|-------------|---------| | schema | Validate response structure | type: schema | | equals | Exact match (deep comparison) | path: $.id, value: 123 | | contains | Array or string contains value | path: $.tags, value: "active" | | exists | Path exists and is not null | path: $.name | | matches | Regex pattern match | path: $.email, pattern: ".*@.*" | | type | Type check | path: $.count, expected: number | | length | Array/string length | path: $.items, operator: gt, value: 0 | | latency | Response time threshold | maxMs: 1000 | | mimeType | Content type validation | expected: "image/png" | | expression | Safe expression eval | expr: "response.total > 0" |

Expressions use expr-eval — comparisons, logical operators, property access, math. No arbitrary code execution.

Shorthand format for common assertions:

expect:
  - exists: $.field
  - equals: [$.id, 123]
  - contains: [$.tags, "active"]
  - matches: [$.email, ".*@.*"]

Run options:

mcpspec test ./tests.yaml              # Specific file
mcpspec test --env staging             # Switch environment
mcpspec test --tag @smoke              # Filter by tag
mcpspec test --parallel 4              # Parallel execution
mcpspec test --reporter junit --output results.xml
mcpspec test --baseline main           # Compare against baseline
mcpspec test --watch                   # Re-run on file changes
mcpspec test --ci                      # CI mode (no colors)

MCP Score

A 0-100 quality rating across 5 weighted categories with opinionated schema linting. Use as a CI gate or generate a badge for your README.

mcpspec score "npx my-server"
mcpspec score "npx my-server" --badge badge.svg       # Generate SVG badge
mcpspec score "npx my-server" --min-score 80          # Fail if below threshold

Scoring categories:

| Category (weight) | What it measures | |--------------------|-----------------| | Documentation (25%) | Percentage of tools and resources with descriptions | | Schema Quality (25%) | Property types, descriptions, required fields, constraints (enum/pattern/min/max), naming conventions | | Error Handling (20%) | Structured error responses (isError: true) vs. crashes on bad input | | Responsiveness (15%) | Median latency: <100ms = 100, <500ms = 80, <1s = 60, <5s = 40 | | Security (15%) | Findings from passive security scan: 0 = 100, <=2 = 70, <=5 = 40 |

Schema quality uses 6 sub-criteria: structure (20%), property types (20%), descriptions (20%), required fields (15%), constraints (15%), naming conventions (10%).

The --badge flag generates a shields.io-style SVG badge for your README.

More Features

Interactive Inspector

Connect to any MCP server and explore its capabilities in a live REPL.

mcpspec inspect "npx @modelcontextprotocol/server-filesystem /tmp"

| Command | Description | |---------|-------------| | .tools | List all available tools with descriptions | | .resources | List all available resources (URIs) | | .call <tool> <json> | Call a tool with JSON input | | .schema <tool> | Display tool's JSON Schema input spec | | .info | Show server info (name, version, capabilities) | | .help | Show help | | .exit | Disconnect and exit |

Performance Benchmarks

Measure latency and throughput with statistical analysis across hundreds of iterations.

mcpspec bench "npx my-server"                         # 100 iterations
mcpspec bench "npx my-server" --iterations 500
mcpspec bench "npx my-server" --tool read_file
mcpspec bench "npx my-server" --warmup 10

Output includes min/max/mean/median/P95/P99 latency, standard deviation, and throughput (calls/sec). Warmup iterations (default: 5) are excluded from measurements.

Doc Generator

Auto-generate Markdown or HTML documentation from server introspection. Zero manual writing.

mcpspec docs "npx my-server"                   # Markdown to stdout
mcpspec docs "npx my-server" --format html      # HTML output
mcpspec docs "npx my-server" --output ./docs    # Write to directory

Web Dashboard

A full React UI for managing servers, running tests, viewing audit results, and more. Dark mode included.

mcpspec ui                    # Opens localhost:6274
mcpspec ui --port 8080        # Custom port
mcpspec ui --no-open          # Don't auto-open browser

10 pages: Dashboard, Servers, Collections, Runs, Inspector, Recordings, Audit, Benchmark, Docs, Score.

Baselines & Comparison

Save test runs as baselines and detect regressions between versions.

mcpspec baseline save main                      # Save current run
mcpspec baseline list                           # List all baselines
mcpspec test --baseline main                    # Compare against baseline
mcpspec compare --baseline main                 # Explicit comparison
mcpspec compare <run-id-1> <run-id-2>           # Compare two runs

Transports

MCPSpec supports 3 transport types for connecting to MCP servers:

| Transport | Use case | Connection | |-----------|----------|------------| | stdio | Local processes | Spawns child process, communicates via stdin/stdout | | SSE | Server-Sent Events | Connects to HTTP SSE endpoint | | HTTP | Streamable HTTP | POST requests to HTTP endpoint |

# stdio (default)
server:
  command: npx
  args: ["my-mcp-server"]

# SSE
server:
  transport: sse
  url: http://localhost:3000/sse

# HTTP
server:
  transport: http
  url: http://localhost:3000/mcp

Commands

| Command | Description | |---------|-------------| | mcpspec record start <server> | Record an inspector session — .call, .save, .steps | | mcpspec record replay <name> <server> | Replay a recording and diff against original | | mcpspec mock <recording> | Mock server from recording — --mode, --latency, --on-missing, --generate | | mcpspec test [collection] | Run test collections with --env, --tag, --parallel, --reporter, --watch, --ci | | mcpspec audit <server> | Security scan — --mode, --fail-on, --exclude-tools, --dry-run | | mcpspec score <server> | Quality score (0-100) — --badge badge.svg, --min-score | | mcpspec ci-init | Generate CI config — --platform github\|gitlab\|shell, --checks, --fail-on, --force | | mcpspec inspect <server> | Interactive REPL — .tools, .call, .schema, .resources, .info | | mcpspec bench <server> | Performance benchmark — --iterations, --tool, --args, --warmup | | mcpspec docs <server> | Generate docs — --format markdown\|html, --output <dir> | | mcpspec compare | Compare test runs or --baseline <name> | | mcpspec baseline save <name> | Save/list baselines for regression detection | | mcpspec record list | List saved recordings | | mcpspec record delete <name> | Delete a saved recording | | mcpspec init [dir] | Scaffold project — --template minimal\|standard\|full | | mcpspec ui | Launch web dashboard on localhost:6274 |

Community Collections

Pre-built test suites for popular MCP servers in examples/collections/servers/:

| Collection | Server | Tests | |------------|--------|-------| | filesystem.yaml | @modelcontextprotocol/server-filesystem | 12 | | memory.yaml | @modelcontextprotocol/server-memory | 10 | | everything.yaml | @modelcontextprotocol/server-everything | 11 | | fetch.yaml | @modelcontextprotocol/server-fetch | 7 | | time.yaml | @modelcontextprotocol/server-time | 10 | | chrome-devtools.yaml | chrome-devtools-mcp | 11 | | github.yaml | @modelcontextprotocol/server-github | 9 |

70 tests covering tool discovery, read/write operations, error handling, security edge cases, and latency.

mcpspec test examples/collections/servers/filesystem.yaml
mcpspec test examples/collections/servers/time.yaml --tag smoke

Architecture

| Package | Description | |---------|-------------| | @mcpspec/shared | Types, Zod schemas, constants | | @mcpspec/core | MCP client, test runner, assertions, security scanner (8 rules), profiler, doc generator, scorer, recording/replay, mock server | | @mcpspec/cli | 13 CLI commands built with Commander.js | | @mcpspec/server | Hono HTTP server with REST API + WebSocket | | @mcpspec/ui | React SPA — TanStack Router, TanStack Query, Tailwind, shadcn/ui |

Development

git clone https://github.com/light-handle/mcpspec.git
cd mcpspec
pnpm install && pnpm build
pnpm test   # 334 tests across core + server

License

MIT