@polygraphso/litmus
v0.12.1
Published
Behavioral litmus harness for MCP servers — grade a server A–F (tool-output injection, egress, sensitive-data, adversarial-input) with reproducible, content-addressed evidence. Ships a CLI and an MCP server with a run_litmus tool for AI agents.
Maintainers
Readme
@polygraphso/litmus
The behavioral litmus harness for MCP servers, from polygraph.so.
It connects to an MCP server the way an agent would, fingerprints its exact tool surface, and runs four probe categories — C-01 tool-output injection (static, dynamic, and second-order — one tool's output weaponized as another's input), C-02 permission/egress (in a hardened default-deny Docker sandbox, matched host and port), C-03 sensitive-data handling (planted canaries), C-04 adversarial-input handling (malformed/oversized and jailbreak inputs) — then grades the server A–F and produces a deterministic, content-addressed evidence bundle.
A passing grade is a measurement, not a guarantee. The methodology and its disclosed limits live at polygraph.so.
Install
npm i -g @polygraphso/litmus
# …or run without installing (note the -p flag — the package ships two bins,
# `polygraphso-litmus` and `polygraphso-litmus-mcp`, so npx needs to be told which):
npx -y -p @polygraphso/litmus polygraphso-litmus litmus npm/@modelcontextprotocol/server-filesystemRequires Node ≥ 18. Docker is optional — without it, C-02 (egress) is skipped and the grade is capped at B for that run.
CLI
polygraphso-litmus litmus <registry-ref | https-url | path-to-mcp> # grade a server
polygraphso-litmus litmus --json <ref> # machine-readable evidence bundle
polygraphso-litmus litmus --timeout <seconds> <ref> # cap the whole run (default 900s)
polygraphso-litmus check <ref> # look up a published gradeExamples:
# a remote https target runs no local code — graded directly
polygraphso-litmus litmus https://example.com/mcp
# a registry ref or local file launches the TARGET's own code. Grade it sandboxed:
LITMUS_STDIO_ISOLATION=docker polygraphso-litmus litmus npm/@modelcontextprotocol/server-filesystem
# …or, without Docker, opt in to running it on this host:
polygraphso-litmus litmus --unsafe-host-exec npm/@modelcontextprotocol/server-filesystemHost-execution safety. Grading a registry ref (npm/…, pypi/…) or a local
path launches the target's own code. By default the CLI refuses to do that on
your host: set LITMUS_STDIO_ISOLATION=docker to run the target only inside the
hardened sandbox, or pass --unsafe-host-exec to accept host execution. Remote
https:// targets run no local code and need neither.
The litmus command exits non-zero on a failing grade (D/F), so it scripts in CI.
To dispute a published grade, just re-run litmus against the same server: the harness is
open and deterministic, so a re-run reproduces the grade — or refutes it.
Use it from an AI agent (MCP server)
The package ships a stdio MCP server, polygraphso-litmus-mcp, so it works in any
MCP-capable client. It exposes two tools:
run_litmus— actively grade a server now (runs the harness end-to-end) and return the grade and the evidence. Optionalbearer(andheaderentries, each"Key: Value") grade a token-gatedhttps://MCP target — sent to that origin only, ignored for stdio/local targets, the same plumbing as the CLI's--bearer/--header. Grading a registry ref or local path launches the target's own code, so it requiresunsafe_host_exec: trueunlessLITMUS_STDIO_ISOLATION=dockeris set (the MCP mirror of--unsafe-host-exec).verify_attestation— passively read a server's already-published grade before trusting or paying it.
It also registers two prompts that show up as slash commands — in Claude Code,
/mcp__polygraph-litmus__grade <server_ref> (run a fresh grade) and
/mcp__polygraph-litmus__check <server_ref> (read a published grade); other
clients surface the same prompts in their own UI. For a cleaner pair of commands
in Claude Code — /polygraph:grade and /polygraph:check — install the plugin
(below), which wires up this server and both commands in one step.
Prerequisites: Node ≥ 18. Docker is optional (without it, C-02 egress is
skipped and the grade caps at B). Set POLYGRAPH_API_URL=https://polygraph.so so
verify_attestation can look up published grades.
Heads-up: grade publishing is still rolling out, so
verify_attestationcommonly returnsnot_availabletoday — that means unevaluated, not a failing grade. To grade a server right now, userun_litmus.
Claude Code: one-click plugin (recommended)
The plugin bundles this MCP server and adds the /polygraph:grade and
/polygraph:check commands — one install does everything:
/plugin marketplace add polygraphso/litmus
/plugin install polygraph@polygraphsoThen just run /polygraph:grade npm/@modelcontextprotocol/server-filesystem.
Prefer to wire the server up by hand, or using another client? Add it once, then just talk to your agent.
Claude Code — one command:
claude mcp add polygraph-litmus -e POLYGRAPH_API_URL=https://polygraph.so \
-- npx -y -p @polygraphso/litmus polygraphso-litmus-mcpCursor — one-click install:
Or wire it up by hand — Claude Desktop (claude_desktop_config.json) / Cursor (~/.cursor/mcp.json):
{
"mcpServers": {
"polygraph-litmus": {
"command": "npx",
"args": ["-y", "-p", "@polygraphso/litmus", "polygraphso-litmus-mcp"],
"env": { "POLYGRAPH_API_URL": "https://polygraph.so" }
}
}
}The
-pflag is required: this package ships two bins, so plainnpx @polygraphso/litmuscan't tell which to run. If you installed globally (npm i -g @polygraphso/litmus) you can instead use"command": "polygraphso-litmus-mcp"with no args.
Any other MCP client / the Claude Agent SDK: spawn the same stdio command —
npx -y -p @polygraphso/litmus polygraphso-litmus-mcp.
Then just ask your agent
Run polygraph against
npm/@modelcontextprotocol/server-filesystemand tell me the grade.
The agent calls run_litmus, which launches that server in the harness, runs
C-01/C-02/C-03/C-04, and returns the grade (A–F), the per-category results, and the
tool-surface fingerprint. Use verify_attestation instead to read a grade
that's already published.
run_litmus launches the target server's code to exercise it (egress-sandboxed
when Docker is present). It needs no wallet or RPC.
ChatGPT and other remote clients
ChatGPT's MCP support expects a remote Streamable-HTTP server; this package is stdio-only, so you can't point ChatGPT at it directly. If you self-host, bridge stdio over HTTP yourself — e.g.
npx -y supergateway --stdio "npx -y -p @polygraphso/litmus polygraphso-litmus-mcp" --port 8000(or mcp-proxy) — then point your client
at that endpoint. polygraph does not host this for you; the bridge runs on your own
machine.
Troubleshooting
- Two bins /
npx:npxneeds-p @polygraphso/litmusplus the bin name (polygraphso-litmusorpolygraphso-litmus-mcp); plainnpx @polygraphso/litmuscan't choose which to run. Installed globally? Use the bin name directly, no-p. - Docker optional: without Docker, C-02 (egress) is skipped and the grade caps
at B — the C-02 row reads
skippedwith reasonno sandbox (Docker unavailable). Not a failure, just unverified. verify_attestationsayslookup_failed: the grade index or RPC was unreachable — that's unknown, not no grade. Retry; checkPOLYGRAPH_API_URL.
Grade a skill
Claude Code / Agent Skills (a SKILL.md plus an optional bundle) are graded by a
separate static litmus (litmus-skill-v1). It scans the skill's bytes — S-01
prompt injection in the body, S-03 data-exfiltration instructions, S-04
dangerous commands in bundled executable scripts — and content-hashes the whole
directory. The letter is A/B/D/F.
This is a static scan: it does not execute the skill or its scripts, so an A
means the static checks were clean, not that the skill is behaviorally safe. A
command the skill builds or fetches at runtime is not visible to it.
CLI
polygraphso-litmus-skill <path-to-skill-dir> # grade a local skill folder (must contain SKILL.md)
polygraphso-litmus-skill --json <path-to-skill-dir> # machine-readable safety + quality bundlesIt also prints a separate, advisory quality signal (well-formed / issues /
malformed) — never an A–F letter, never minted. Its deterministic checks
(frontmatter + bundled-link resolution) always run; the optional LLM-judged axes
(honesty, coherence) run only when a judge is available:
Inside an agent (the MCP tool below): the host agent's own model judges via MCP sampling — no key, any provider.
Standalone: bring your own key for any OpenAI-compatible endpoint:
export LITMUS_LLM_API_KEY=… # your key (any OpenAI-compatible endpoint) export LITMUS_LLM_MODEL=gpt-4o # a model the endpoint serves export LITMUS_LLM_BASE_URL=https://api.openai.com/v1 # optional; defaults to OpenAI # Other providers via their OpenAI-compatible endpoint, e.g.: # Claude: LITMUS_LLM_BASE_URL=https://api.anthropic.com/v1 LITMUS_LLM_MODEL=claude-sonnet-4-6 # Gemini: LITMUS_LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai LITMUS_LLM_MODEL=gemini-2.5-flashWith neither, the judged axes are skipped — the grade and deterministic quality still run. The core never needs a key.
From an AI agent (MCP)
The same polygraphso-litmus-mcp server exposes two skill tools (plus grade-skill /
check-skill prompts):
run_skill_litmus— grade a local skill directory now (static; uses the host model via sampling for the quality axes, no key).verify_skill_attestation— read a skill's already-published grade by itsskill_ref(source/owner/repo#path, e.g.github/anthropics/skills#skills/pdf). It returns the attestedcontentHash; recompute the skill's hash and require equality before installing — the content hash, not the version, is the trust anchor.
Library
import { runLitmus, gateDecision, liveFingerprint, readAttestation } from "@polygraphso/litmus";
const bundle = await runLitmus("npm/@modelcontextprotocol/server-filesystem");
console.log(bundle.grade, bundle.gradeRationale);
// Skills: static safety grade + a separate advisory quality bundle.
import { runSkillLitmus, runSkillQuality } from "@polygraphso/litmus";
const skill = runSkillLitmus("./skills/my-skill");
console.log(skill.grade, skill.contentHash);License
Apache-2.0
