mcp-fit
v0.1.1
Published
MCP server agent-usability scorer — scores and auto-fixes tool descriptions
Maintainers
Readme
mcp-fit
Score MCP servers for agent-usability — then auto-fix them.
Plenty of tools let you expose an MCP server. None tell you whether it is actually agent-friendly: clean namespacing, strict params, lean typed outputs, helpful errors, low tool-selection confusion. mcp-fit does.
It connects to a target MCP server, scores it across five contract-usability axes, runs real agent tasks against it, and — in fix mode — rewrites the server's tool and parameter descriptions to measurably raise that score, proving the gain with a before/after delta.
The scorecard axes are the provider-side dual of the RubricRefine tool-use contract taxonomy (arXiv 2605.09730): namespacing, tool-selection-confusion, param-strictness, output-leanness, error-helpfulness.
Quickstart
Score the bundled strawman server (a deliberately bad in-memory note store):
Scanning your own server needs no clone →
npx mcp-fit scan -- <your-server-command>. The walkthrough below uses the repo's bundled strawman fixture, so it is clone-based.
# 1. Clone and install
git clone <repo-url> mcp-fit && cd mcp-fit
npm install
# 2. Install strawman dependencies
cd fixtures/strawman-server && npm install && cd ../..
# 3. Build mcp-fit
npm run build
# 4. Scan the strawman — renders a scorecard and writes compat.json + evals.jsonl
node dist/cli.js scan \
--out ./out \
-- fixtures/strawman-server/node_modules/.bin/tsx fixtures/strawman-server/server.tsExpected output (lint-only — the deterministic badge scores only the axes static lint can verify; behavioural axes are eval-only):
┌────────────────────────────────────────────────────────────┐
│ mcp-fit scorecard · strawman v0.1.0 (stdio) │
├────────────────────────────────────────────────────────────┤
│ Axis Score Grade Findings │
├────────────────────────────────────────────────────────────┤
│ namespacing 9 /10 A 0err 1warn │
│ tool-selection-confusion — /10 · eval-only │
│ param-strictness 1 /10 F 7err 0warn │
│ output-leanness — /10 · eval-only │
│ error-helpfulness — /10 · eval-only │
├────────────────────────────────────────────────────────────┤
│ LINT SCORE (deterministic) 5.6 / 10 │
│ WEIGHTED AGGREGATE 5.6 / 10 [grade: C] │
└────────────────────────────────────────────────────────────┘The — axes are eval-only: static lint cannot grade runtime output shape, error quality, or tool-selection confusion, so the deterministic badge does not claim a verdict on them. Run scan --eval (needs ANTHROPIC_API_KEY) to score them stochastically.
Keyless red→green (no API key)
fixtures/strawman-fixed-server is the strawman with clean contracts. Scan both and compare the deterministic LINT SCORE — a reproducible before/after with no LLM call:
# bad: 5.6 / 10 (param-strictness F)
node bin/mcp-fit scan -- fixtures/strawman-server/node_modules/.bin/tsx fixtures/strawman-server/server.ts
# fixed: 10 / 10 (A)
node bin/mcp-fit scan -- fixtures/strawman-fixed-server/node_modules/.bin/tsx fixtures/strawman-fixed-server/server.tsSee
sample-artifacts/for a pre-generatedcompat.jsonandevals.jsonlfrom the strawman run.
Auto-fix mode
Generate improved descriptions and show the before/after delta:
node dist/cli.js fix \
--out ./out \
-- fixtures/strawman-server/node_modules/.bin/tsx fixtures/strawman-server/server.tsNote:
fixcalls the Claude API. SetANTHROPIC_API_KEYin your environment, or copy.env.exampleto.envfirst.
SSE / HTTP transport
node dist/cli.js scan --sse http://localhost:3001/sse
node dist/cli.js fix --sse http://localhost:3001/sse --out ./outAfter npm link or npm install -g mcp-fit
mcp-fit scan -- node my-server.js
mcp-fit fix -- npx -y @my-org/my-server --out ./results
mcp-fit helpCLI reference
mcp-fit scan [--out <dir>] -- <command> [args...]
mcp-fit scan [--out <dir>] --sse <url>
mcp-fit fix [--out <dir>] -- <command> [args...]
mcp-fit fix [--out <dir>] --sse <url>
mcp-fit help| Option | Default | Description |
|--------|---------|-------------|
| --out <dir> | . | Directory for emitted artifacts |
| --sse <url> | — | SSE transport URL (instead of -- cmd) |
Scorecard axes
| Axis | Lineage | Measures |
|------|---------|----------|
| namespacing | tool-choice | tools distinguishable; documented path obvious |
| tool-selection-confusion | tool-choice | overlapping / ambiguous tools that mislead selection |
| param-strictness | call-signature | unambiguous signatures; clear required args |
| output-leanness | output-contract | typed values vs labeled prose / token bloat |
| error-helpfulness | provider-only | errors that guide recovery vs opaque failures |
Scores are 1–10 ordinal (10 = trivially correct; 1–4 = very easy to get wrong).
The lint score is deterministic and badge-able. The eval score (stochastic, requires --eval) is reported with variance.
Artifacts
| File | Schema | Content |
|------|--------|---------|
| compat.json | schemas/compat.schema.json | Full scorecard (all axes, findings, aggregate) |
| evals.jsonl | schemas/evals.schema.json | Per-task agent traces (one JSON object per line) |
Development
npm run typecheck # tsc --noEmit
npm run build # compile src/ → dist/
npm test # vitest runSecurity
ANTHROPIC_API_KEY— required only forfixmode and eval. Never committed; load from.env(gitignored).mcp-fitspawns and queries servers with your consent; never auto-runs an untrusted server without an explicit command.
Architecture
src/connect/ MCP client, transports, introspect, proxy B-001
src/lint/ deterministic rule engine + rules B-002
fixtures/ strawman bad server + task corpus B-003
src/report/ artifact emitter + schema validation B-004
src/eval/ dynamic-eval runner (Claude SDK harness) B-005
src/score/ scorer + contract-rubric loop B-006
src/fix/ description rewriter + re-validate + delta B-007
src/cli.ts CLI entry point (this bead) B-008Source of truth: specs/mcp-fit/spec.md · Implementation plan: plan.md · Issue tracking: tasks.md
License
Apache-2.0 — see LICENSE.
