mcp-quality-gate

v0.1.2

Published

2 months ago

Quality gate for MCP servers — compliance, security, and efficiency testing

mcp-quality-gate

Quality gate for MCP servers. Like npm audit for packages, but for Model Context Protocol servers.

When an LLM connects to your MCP server, it trusts whatever you expose. Bad tool schemas mean bad tool calls. Missing descriptions mean the model guesses. 50+ tools flood the context window. Leaked environment variables expose secrets. mcp-quality-gate catches all of this in one command.

npx mcp-quality-gate validate "npx -y @modelcontextprotocol/server-filesystem /tmp"

One command. Four dimensions. 0-100 score.

What It Catches

mcp-quality-gate scores every MCP server across four dimensions:

| Dimension | Weight | What It Checks | Why It Matters | |-----------|:------:|----------------|----------------| | Compliance | 40 pts | Protocol conformance — init, tool listing, tool calls, resources, prompts, error handling | A server that doesn't follow the spec breaks every client | | Quality | 25 pts | Parameter descriptions, description length, deprecated tools, duplicate schemas, schema consistency | LLMs need good descriptions to make correct tool calls. 72% undocumented params = 72% guesswork | | Security | 20 pts | Environment variable exposure, code execution surfaces, destructive operations without warnings | Tools run with user permissions. A get-env tool leaks every secret on the machine | | Efficiency | 15 pts | Tool count, total schema token cost | Every tool schema eats context. 21 tools at 3000 tokens leaves less room for actual conversation |

Install

npm install -g mcp-quality-gate

Requires Node.js >= 22.

Usage

# Test any stdio MCP server
mcp-quality-gate validate "npx -y @modelcontextprotocol/server-filesystem /tmp"

# Test with environment variables
mcp-quality-gate validate "npx -y @supabase/mcp-server-supabase@latest --read-only --project-ref REF" \
  --env "SUPABASE_ACCESS_TOKEN=your-token"

# JSON output for CI/CD pipelines
mcp-quality-gate validate "./my-server" --reporter json --output report.json

# Fail CI if score is below threshold
mcp-quality-gate validate "./my-server" --threshold 80

# Test HTTP/SSE servers
mcp-quality-gate validate "http://localhost:3000/mcp" --transport http

Real-World Benchmarks

Tested against official MCP reference servers (April 2026). These are real results from live server connections, not synthetic data:

| Server | Score | Compliance | Quality | Efficiency | Security | What mcp-quality-gate Found | |--------|:-----:|:----------:|:-------:|:----------:|:--------:|---| | Memory | 98 | 40/40 | 23/25 | 15/15 | 20/20 | 50% of parameters have no descriptions — LLMs have to guess argument format | | Sequential Thinking | 98 | 40/40 | 23/25 | 15/15 | 20/20 | 500+ character description — wastes context tokens on a single tool | | Everything | 88 | 40/40 | 23/25 | 15/15 | 10/20 | get-env tool leaks environment variables. Duplicate schemas across tools | | Filesystem | 81 | 40/40 | 11/25 | 15/15 | 15/20 | 72% of params undocumented, read_file marked deprecated but still listed, duplicate schemas | | Playwright | 81 | 40/40 | 19/25 | 12/15 | 10/20 | 21 tools consuming 3000+ schema tokens, code execution surfaces, short descriptions |

Servers tested: @modelcontextprotocol/server-memory, @modelcontextprotocol/server-sequential-thinking, @modelcontextprotocol/server-everything, @modelcontextprotocol/server-filesystem, @anthropic/mcp-server-playwright.

Example Output

mcp-quality-gate v0.1.0
Server: npx -y @modelcontextprotocol/server-filesystem /tmp

  lifecycle
    PASS Server reports name and version (0ms)
    PASS Server reports capabilities (0ms)
    PASS Server responds to ping (1ms)

  tools
    PASS Server lists tools without error (5ms)
    PASS Tool definitions have required fields (6ms)
    PASS Tool names follow naming convention (8ms)
    PASS Tool inputSchema has type object (4ms)
    PASS Can call a listed tool (10ms)
    PASS Calling nonexistent tool returns error (1ms)
    PASS Tool descriptions are present (8ms)

  resources
    SKIP Server lists resources without error
    SKIP Resource definitions have required fields
    SKIP Resource descriptions are present
    SKIP Can read a listed resource

  prompts
    SKIP Server lists prompts without error
    SKIP Prompt definitions have required fields
    SKIP Can get a listed prompt

  efficiency
    14 tools, ~3057 schema tokens

  quality
    Param description coverage: 28%
    Deprecated: read_file
    Duplicates: read_file, read_text_file
    CRIT 18 of 25 parameters lack descriptions (72%)
    CRIT 1 deprecated tool(s) still listed: read_file
    WARN Tools with identical schemas: read_file, read_text_file

  security
    WARN "write_file" performs destructive operations — description warns of risk

Results: 10 passed, 7 skipped (45ms)
Score: 81/100
  compliance 40/40 | quality 11/25 | efficiency 15/15 | security 15/20

Compliance Tests (17)

mcp-quality-gate connects to your server, makes real protocol calls, and verifies behavior. This is not static analysis — it's a live test suite that actually calls your tools with generated arguments.

| Category | Tests | What's Verified | |----------|:-----:|-----------------| | Lifecycle | 3 | Server init (name, version, capabilities), ping response | | Tools | 7 | Tool listing, required fields, naming conventions, schema structure, live tool invocation with auto-generated arguments, error handling for nonexistent tools, description presence | | Resources | 4 | Resource listing, required fields, descriptions, resource read | | Prompts | 3 | Prompt listing, required fields, prompt retrieval |

Tests are skipped (not failed) when a server doesn't advertise a capability. A tools-only server won't lose points for missing resources.

Full Test Reference

| ID | Test | Severity | |----|------|----------| | lifecycle-init-01 | Server reports name and version | Critical | | lifecycle-init-02 | Server reports capabilities | Critical | | lifecycle-init-03 | Server responds to ping | High | | tools-list-01 | Server lists tools without error | Critical | | tools-list-02 | Tool definitions have required fields | Critical | | tools-list-03 | Tool names follow naming convention | Medium | | tools-list-04 | Tool inputSchema has type "object" | High | | tools-call-01 | Can call a listed tool | Critical | | tools-call-02 | Calling nonexistent tool returns error | High | | tools-call-03 | Tool descriptions are present | Medium | | resources-list-01 | Server lists resources without error | Critical | | resources-list-02 | Resource definitions have required fields | Critical | | resources-list-03 | Resource descriptions are present | Medium | | resources-read-01 | Can read a listed resource | Critical | | prompts-list-01 | Server lists prompts without error | Critical | | prompts-list-02 | Prompt definitions have required fields | Critical | | prompts-get-01 | Can get a listed prompt | Critical |

Quality Analysis

Checks how well your tool schemas help LLMs understand and use your tools:

| Check | What It Catches | |-------|-----------------| | Parameter description coverage | Parameters without descriptions — the #1 cause of incorrect LLM tool calls | | Description quality (short) | Descriptions under 20 characters — too brief for LLMs to understand intent | | Description quality (verbose) | Descriptions over 500 characters — wastes context tokens | | Deprecated tool detection | Tools marked deprecated that are still listed — confuses tool selection | | Duplicate tool detection | Tools with identical input schemas — suggests redundant or versioned tools | | Required/default mismatch | Required parameters with default values — contradictory schema signals |

Security Analysis

Static analysis on tool definitions to detect common security antipatterns:

| Check | What It Catches | |-------|-----------------| | Environment variable exposure | Tools like get-env that leak secrets to the LLM | | Code execution detection | Tools accepting code, script, or eval parameters — arbitrary execution surfaces | | Dangerous default patterns | Destructive operations (write, delete, drop) without proper warning descriptions |

Efficiency Analysis

Catches tool proliferation and schema bloat — the top causes of poor LLM performance with MCP servers:

| Metric | Warning | Critical | Why | |--------|:-------:|:--------:|-----| | Tool count | > 20 | > 50 | More tools = more tokens in every request = less room for conversation | | Schema tokens | > 10,000 | > 30,000 | Token budget is finite. Schema overhead competes with actual content |

Token estimation uses chars/4 heuristic (~15% accuracy vs tiktoken for JSON schemas).

Scoring

Composite 0-100 score. Each dimension starts at its maximum and deducts for findings:

| Dimension | Max | Deductions | |-----------|:---:|------------| | Compliance | 40 | (passed / total_run) * 40 | | Quality | 25 | -5 per critical, -2 per warning | | Efficiency | 15 | -8 per critical, -3 per warning | | Security | 20 | -10 per critical, -5 per warning |

Skipping a dimension with --skip-* flags means those points are not awarded. A server with --skip-security can score at most 80.

CLI Reference

| Flag | Description | Default | |------|-------------|---------| | -t, --transport | Transport type (stdio or http) | stdio | | -r, --reporter | Output format (console or json) | console | | -o, --output | Write report to file | | | --threshold | Minimum passing score (0-100) — exit 1 if below | | | --timeout | Test timeout in ms | 30000 | | --skip | Comma-separated test IDs to skip | | | --only | Comma-separated test IDs to run | | | -e, --env | Environment variables as KEY=VAL,KEY2=VAL2 | | | --max-tools | Critical threshold for tool count | 50 | | --max-schema-tokens | Critical threshold for schema tokens | 30000 | | --skip-efficiency | Skip efficiency analysis | | | --skip-quality | Skip quality analysis | | | --skip-security | Skip security analysis | |

CI/CD Integration

Add to your GitHub Actions workflow:

- name: Test MCP Server
  run: npx mcp-quality-gate validate "./my-server" --threshold 80 --reporter json --output mcp-quality-gate-report.json

mcp-quality-gate exits with code 1 when the score falls below --threshold, failing the CI step.

Programmatic API

import {
  createMCPClient,
  listAllTools,
  runTests,
  complianceTests,
  analyzeEfficiency,
  analyzeQuality,
  analyzeSecurity,
  ConsoleReporter,
} from "mcp-quality-gate";

const client = await createMCPClient({
  command: "node",
  args: ["./my-server.js"],
  transport: "stdio",
});

const tools = await listAllTools(client);
const efficiency = analyzeEfficiency(tools);
const quality = analyzeQuality(tools);
const security = analyzeSecurity(tools);

const result = await runTests(
  complianceTests,
  { client, timeout: 10000 },
  undefined,
  "my-server",
  efficiency,
  quality,
  security,
);

console.log(new ConsoleReporter().format(result));
await client.close();

Architecture

mcp-quality-gate
├── CLI (Commander)         → parse args, orchestrate
├── MCP Client Wrapper      → connect via stdio or HTTP, manage lifecycle
├── Compliance Tests (17)   → live protocol verification
│   ├── Lifecycle (3)       → init, capabilities, ping
│   ├── Tools (7)           → list, fields, naming, schema, call, errors, descriptions
│   ├── Resources (4)       → list, fields, descriptions, read
│   └── Prompts (3)         → list, fields, get
├── Quality Analyzer        → param descriptions, description length, deprecated, duplicates
├── Security Analyzer       → env exposure, code execution, dangerous defaults
├── Efficiency Analyzer     → tool count, schema token estimation
├── Score Calculator        → 4-dimension weighted composite (40+25+15+20=100)
└── Reporters               → console (colored), JSON (CI/CD)

Releases

mcp-quality-gate follows Semantic Versioning:

0.x.y — pre-1.0, API may change between minor versions
PATCH (0.x.Y) — bug fixes, new compliance tests, doc updates
MINOR (0.X.0) — new analyzer dimensions, new reporters, CLI flags
MAJOR (X.0.0) — breaking API changes, scoring formula changes

How Releases Work

Bump version in package.json
Update CHANGELOG.md with the new version entry
Merge to main
CI automatically: runs lint + test + build, publishes to npm with provenance, creates a GitHub Release with tag v{version}

The prepublishOnly script runs lint && build && test as a safety gate. See CONTRIBUTING.md for full release instructions.

Roadmap

[x] v0.1 — Compliance tests (lifecycle, tools, resources, prompts), quality + security + efficiency analysis, 4-dimension scoring, CI/CD workflows
[ ] v0.2 — Transport compliance tests (HTTP/SSE edge cases), response schema validation, capability refusal tests
[ ] v0.3 — mcp-quality-gate init scaffolding, GitHub Action for CI, performance benchmarking
[ ] v1.0 — Dynamic security testing, MCP server registry scanner, stable API

Contributing

See CONTRIBUTING.md for development setup, code standards, and how to add tests.

Security

See SECURITY.md for reporting vulnerabilities.

Code of Conduct

See CODE_OF_CONDUCT.md.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mcp-quality-gate

What It Catches

Install

Usage

Real-World Benchmarks

Example Output

Compliance Tests (17)

Full Test Reference

Quality Analysis

Security Analysis

Efficiency Analysis

Scoring

CLI Reference

CI/CD Integration

Programmatic API

Architecture

Releases

How Releases Work

Roadmap

Contributing

Security

Code of Conduct

License