npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@runtime-judgement/mcp-server

v0.1.0

Published

Runtime Judgement MCP server — verify, attribute, and snapshot from inside Claude Code / Codex CLI / Aider. Three tools that fit the inner-loop verification position from the Sprint 11 capability roadmap.

Downloads

36

Readme

@runtime-judgement/mcp-server

A Model Context Protocol server that gives Claude Code, Codex CLI, Aider and any other MCP-aware coding agent three Runtime Judgement tools to call mid-session, before commit:

  • rj.verify_change — run the snapshot suite against the current pipeline to verify a patch hasn't regressed the locked-in behaviour.
  • rj.attribute_trace — ingest a failed trace, attribute the root cause, return the cited cause + L1/L4 verdict + suggested fix.
  • rj.suggest_snapshot — lock a verdict in as a regression snapshot so the next rj.verify_change call guards against it.

This is the inner-loop verification position from the Sprint 11 capability roadmap (§7 Demo 6). The coding agent never leaves its session to check the web app — verification happens inside the same prompt cycle.

The first dog-food customer is this repo: the test plan for the package is to point Claude Code at the runtime-judgement-app codebase, have it make a patch, and call rj.verify_change against the canonical suite before committing. Eat the dog food.


One-liner setup

Note: @runtime-judgement/mcp-server is not yet published to npm. Until it is, install from source (see below).

Install from source:

git clone https://github.com/rambo-01/runtime-judgement-app
cd runtime-judgement-app
pnpm install
pnpm --filter @runtime-judgement/mcp-server build

Then point your MCP config at:

{
  "command": "node",
  "args": ["/absolute/path/to/runtime-judgement-app/packages/rj-mcp-server/dist/index.js"]
}

The path-with-spaces fix (10ba427) is already in main, so paths like ~/Claude Code/... work correctly.

Once published to npm, the one-liner will be:

# Works in Claude Code, Cursor (via Claude), Windsurf, Cline, and any MCP-compatible tool
npx @runtime-judgement/mcp-server

The server will prompt for your API key on first run and write it to ~/.rj-config.json.

Or set it directly:

RJ_API_KEY=rj_live_... npx @runtime-judgement/mcp-server

Get your key at https://runtime-judgement.app/app/settings/api-keys


Quick start with Claude Code

  1. Get your RJ API token from https://runtime-judgement.app/app/integrations.

  2. Add to your ~/.config/claude-code/mcp.json:

    {
      "rj": {
        "command": "npx",
        "args": ["-y", "@runtime-judgement/mcp-server"],
        "env": {
          "RJ_API_URL": "https://runtime-judgement-app.vercel.app",
          "RJ_API_KEY": "rj_..."
        }
      }
    }
  3. Restart Claude Code. The rj MCP server appears in your tool list with three tools: rj.attribute_trace, rj.verify_change, rj.suggest_snapshot.

  4. Ask Claude: "Read the trace at ~/Downloads/demo-trace.json and call rj.attribute_trace on it. The user-visible failure surfaced in span lookup_order_status."

  5. Claude calls rj.attribute_trace and returns the cited cause + suggested fix as inline Markdown — no need to leave the terminal.

  6. After patching, ask Claude: "Now call rj.verify_change against suite 01HZ... to confirm the fix." Claude reports the verdict (pass / regression / drift) so you know whether to commit.

Each tool returns a human_readable Markdown field alongside the structured JSON, so Claude Code can surface a clean summary inline. The structured payload is preserved for machine consumers and for the next agent step.

Available tools

| Tool | Purpose | One-line example | |---|---|---| | rj.attribute_trace | Attribute a failed trace to its root cause | rj.attribute_trace({ trace: <otel json>, errorSpanId: "lookup_order_status" }) | | rj.verify_change | Verify a patch hasn't regressed locked-in behaviour | rj.verify_change({ suiteId: "01HZSUITE..." }) | | rj.suggest_snapshot | Lock an attribution in as a regression snapshot | rj.suggest_snapshot({ attributionId: "01HZATTR...", name: "tool-args-guard" }) |

Full input/output reference is in the Tool reference section below.


5-minute setup

1. Install

Note: @runtime-judgement/mcp-server is not yet on npm. Until it is published, see the "One-liner setup" section above for the install-from-source path.

Once published, the server can be run via npx:

npx @runtime-judgement/mcp-server

Or installed globally:

pnpm add -g @runtime-judgement/mcp-server

2. Set the environment

# Runtime Judgement — get these from https://runtime-judgement-app.vercel.app/app/settings
export RJ_API_URL="https://runtime-judgement-app.vercel.app"
export RJ_API_KEY="rj_..."

The server inherits whatever env it's launched in. The agent's own config file (Claude Code: ~/.config/claude-code/mcp_servers.json; Codex CLI: ~/.config/codex/mcp_servers.toml) is the right place to declare these.

3. Register with your agent

Claude Code

{
  "mcpServers": {
    "runtime-judgement": {
      "command": "npx",
      "args": ["@runtime-judgement/mcp-server"],
      "env": {
        "RJ_API_URL": "https://runtime-judgement-app.vercel.app",
        "RJ_API_KEY": "rj_..."
      }
    }
  }
}

Codex CLI

[mcp_servers.runtime-judgement]
command = "npx"
args = ["@runtime-judgement/mcp-server"]
env = { RJ_API_URL = "https://runtime-judgement-app.vercel.app", RJ_API_KEY = "rj_..." }

Aider

# .aider.conf.yml
mcp_servers:
  - name: runtime-judgement
    command: ["npx", "@runtime-judgement/mcp-server"]
    env:
      RJ_API_URL: https://runtime-judgement-app.vercel.app
      RJ_API_KEY: rj_...

Use without Claude Code (any MCP host)

Any tool that supports the Model Context Protocol can connect to this server. Use the following generic JSON config block and adapt the key names to your host:

{
  "mcpServers": {
    "runtime-judgement": {
      "command": "npx",
      "args": ["-y", "@runtime-judgement/mcp-server"],
      "env": {
        "RJ_API_KEY": "rj_live_..."
      }
    }
  }
}

| Host | Config file location | |---|---| | Claude Code | ~/.claude/mcp.json or .claude/mcp.json in the project root | | Cursor | .cursor/mcp.json in the project root | | Windsurf | ~/.codeium/windsurf/mcp_config.json | | Cline | VS Code settings → Cline MCP Servers | | Any stdio-MCP host | Point command at npx @runtime-judgement/mcp-server and pass RJ_API_KEY via env |

4. Use it

The agent will see three tools in its tool list. Ask:

"Before you commit this patch, run rj.verify_change against suite 01HZ... and tell me the verdict."

The tool will POST to the suite-run endpoint, wait for the result, and return verdict counts + cited spans so the agent can decide whether to proceed.


Tool reference

rj.verify_change

| Arg | Type | Required | Notes | | -------------- | ---------------- | -------- | ---------------------------------------------------- | | suiteId | string | yes | Snapshot suite ULID | | tags | string[] | no | Run only snapshots tagged with these tags | | perturbation | object | no | Forward-compat hint about what the agent changed |

Returns:

{
  suiteId: string
  suiteName?: string
  verdict: "pass" | "regression" | "drift" | "error" | "empty"
  counts: {
    total: number
    passed: number
    changedIntentional: number
    changedUnexpected: number
    skipped: number
    errored: number
  }
  outcomes: Array<{
    snapshotId: string
    status: "passed" | "changed-intentional" | "changed-unexpected" | "skipped" | "error"
    citedSpanIds?: string[]
    message?: string
  }>
  runIds: string[]
  durationMs?: number
  spendUsd?: number
}

The verdict field is the single-string summary the agent should look at:

  • pass — every snapshot is unchanged. Proceed with commit.
  • regression — at least one snapshot's verdict changed unexpectedly. Stop and inspect outcomes to see which.
  • drift — every change is changed-intentional. The agent should confirm with the user whether the intentional drift is what was wanted.
  • error — at least one snapshot failed to replay (judge timeout, pipeline crash, etc.). Surface in agent output as a transient issue.
  • empty — suite has no snapshots. Configuration issue, not a verdict.

rj.attribute_trace

| Arg | Type | Required | Notes | | ----------------- | --------- | -------- | -------------------------------------------------- | | trace | object | yes | Raw trace JSON (OTEL gen-ai / LangSmith / custom) | | errorSpanId | string | yes | Span where the user-visible failure surfaced | | errorDescription| string | no | Human description (improves judge precision) | | errorEvidence | string | no | Verbatim quote from the failure | | pipeline | string | no | Pipeline name override (defaults to q72-k1) |

Returns:

{
  attributionId: string
  traceId: string
  sourceFormat: "otel-genai" | "langsmith" | "custom-json"
  spanCount?: number
  deduped: boolean
  l1: { axis: string; confidence: number }
  l4: { category: string; confidence: number }
  citedSpans: string[]
  explanation: string | null
  suggestedFix: string | null
  cost: { usd?: number } | null
  algoVersions: Record<string, unknown> | null
}

Two-call dance under the hood: ingest the trace via POST /api/traces, then run the attribution pipeline via POST /api/attributions. Both calls share the same Bearer token.

rj.suggest_snapshot

| Arg | Type | Required | Notes | | -------------- | -------- | -------- | ---------------------------------------------------- | | attributionId| string | yes | ULID from rj.attribute_trace | | name | string | yes | Human-readable snapshot name (unique per user) | | description | string | no | Longer description for the suite UI | | suiteName | string | no | Add to this suite (lazy-created); else "Unfiled" |

Returns:

{
  snapshotId: string
  name: string
  suiteId?: string
  nextStep: string  // human hint pointing the agent at rj.verify_change
}

Useful 409/404 hints in structuredContent:

  • hint: "name_conflict" — pick a different name.
  • hint: "attribution_not_found" — the attributionId is wrong or belongs to a different user.

How the loop fits together

                ┌──────────────────────────────────────┐
                │ agent runs your test / observes bug  │
                └──────────────────────────────────────┘
                                  │
                                  ▼
                          rj.attribute_trace
                                  │
                                  ▼
                         rj.suggest_snapshot
                                  │
                                  ▼
                   ┌────────────────────────────────────┐
                   │ agent writes a patch               │
                   └────────────────────────────────────┘
                                  │
                                  ▼
                           rj.verify_change
                                  │
                                  ▼
                ┌──────────────────────────────────────┐
                │ verdict=pass → commit                │
                │ verdict=regression → fix + re-loop   │
                │ verdict=drift → confirm w/ user      │
                └──────────────────────────────────────┘

This is the same loop the human follows on the web app — collapsed into three tool calls the agent can make without leaving its session.


SDK availability + handling

The server depends on @modelcontextprotocol/sdk for the JSON-RPC transport. If you're building from source in an environment where the SDK can't be installed (no internet during build, restricted registry, etc.), the tool modules under src/tools/* are fully usable as a library — they have zero SDK imports and can be called directly:

import verifyChange from "@runtime-judgement/mcp-server/tools/verify-change"

const result = await verifyChange.invoke(
  { suiteId: "01HZ..." },
  { env: process.env },
)

The transport layer (src/index.ts) is the only part that imports the SDK. If you need to run the tools without the SDK, import the tool modules directly and wire your own transport.


Development

pnpm install --filter @runtime-judgement/mcp-server
pnpm --filter @runtime-judgement/mcp-server build
pnpm --filter @runtime-judgement/mcp-server test

Output lands in dist/ with .d.ts declarations. The binary entrypoint is dist/index.js (referenced by the bin field in package.json).


What's not in v0.1

  • No HTTP transport — stdio only. Future work: an HTTP wrapper for hosted deployments.
  • No streaming — the snapshot suite run is synchronous up to the 300s RJ function timeout. Long-running suites should be queued and polled (deferred to v0.2).
  • No tool-side caching — every rj.verify_change triggers a fresh run server-side. RJ has subgraph caching (Sprint 5 / migration 0007) that takes care of within-trace deduplication, but cross-call cache is the user's job.

Dog-food test

The package's own first customer is the runtime-judgement-app repo this package lives inside. The acceptance test for v0.1 is:

  1. Spawn Claude Code in the repo root with this server registered.
  2. Have it make a patch to (say) the compressor's heuristic file.
  3. Have it call rj.verify_change against the canonical regression suite.
  4. Confirm the verdict matches what pnpm bench reports independently.

If those match, the tool is honest. If they diverge, file a bug — the divergence is the failure attribution.