@inference-relay/mcp
v1.3.0
Published
MCP server for inference-relay: 19 IDE tools for financial intelligence, health monitoring, security, and fleet management
Downloads
397
Maintainers
Readme
@inference-relay/mcp
MCP server for inference-relay. Turns your IDE into a live operational console — query costs, monitor health, and verify cryptographic identity from inside Claude Code, Cursor, or any MCP-compatible client.
Patent Pending. (c) 2026 L2B II LLC. Proprietary — see LICENSE.
What you get
- 19 tools across financial intelligence, operational health, security, manifest sync, and fleet management
- Natural-language access — ask your IDE "show me my inference-relay savings this week" and the LLM picks the right tool
- Cryptographic identity verification — RS256-signed JWS handshake confirms api.inference-relay.com is genuine, not a MITM
- No prompt content ever leaves your machine — type-level guarantee, queryable via
verify_privacy_integrity
Prerequisites
- An
inference-relaylicense key — get one at inference-relay.com/pricing - Node.js 18+
- An MCP-compatible client (Claude Code, Claude Desktop, Cursor, or any other)
Setup
Claude Code (recommended — one command)
claude mcp add inference-relay \
--env IR_LICENSE_KEY=ir_live_xxxx \
-- npx -y @inference-relay/mcpReplace ir_live_xxxx with your real license key. Then restart Claude Code (the MCP server is loaded at session start) and ask:
"List the inference-relay MCP tools you have access to."
You should see 19 tool names.
Claude Desktop / Cursor / other MCP clients
Add to your client's MCP config file (e.g. ~/Library/Application Support/Claude/claude_desktop_config.json for Claude Desktop on macOS):
{
"mcpServers": {
"inference-relay": {
"command": "npx",
"args": ["-y", "@inference-relay/mcp"],
"env": {
"IR_LICENSE_KEY": "ir_live_xxxx"
}
}
}
}The
-yflag is important — without it,npxblocks waiting for "OK to install? (y/N)" on first run, and the MCP stdio handshake hangs.
After saving the config, restart your IDE and try the verification prompt above.
Try these prompts
Once installed, ask your IDE any of these (the LLM will pick the right tool automatically):
| Prompt | Tool used | What it proves |
|---|---|---|
| "Show me inference-relay model pricing." | get_model_pricing | MCP dispatch works (no auth, instant) |
| "Verify inference-relay privacy integrity." | verify_privacy_integrity | Type-level "no prompt content" guarantee |
| "What's my inference-relay usage cap?" | monitor_usage_caps | License auth + live Convex roundtrip |
| "Validate the inference-relay JWS handshake." | validate_jws_handshake | Cryptographic identity of api.inference-relay.com |
| "Show me my inference-relay fleet status." | get_fleet_status | All license keys you own |
| "Show me inference-relay savings this month." | get_savings_summary | Real-time P&L vs direct API cost |
The 19 tools
Financial Intelligence (5)
| Tool | What it does |
|---|---|
| get_savings_summary | Real-time P&L: relay cost vs direct API cost, gross margin delta |
| analyze_workflow_efficiency | Per-model ROI ranking from recent events |
| get_projected_burn | Monthly forecast: with relay vs without relay |
| monitor_usage_caps | Current usage against tier cap with gauge visualization |
| get_model_pricing | Per-model pricing table for all supported providers |
Operational Health (5)
| Tool | What it does |
|---|---|
| probe_provider_availability | Status grid: CLI, Anthropic, OpenAI, Ollama |
| get_duration_benchmarks | p50 / p95 / p99 latency per provider |
| list_fallback_events | Recent cascade events with failure reasons |
| explain_fallback_chain | Plain-English narration of a fallback chain string |
| probe_environment | Bloomberg-style environment diagnostic |
Security & Compliance (4)
| Tool | What it does |
|---|---|
| verify_privacy_integrity | Compile-time type guarantee: no prompt content in audit events |
| get_audit_trail | Audit entries with optional SHA-256 hash chain verification (Pro) |
| scan_leak_telemetry | Runtime scan for anomalous string lengths in telemetry |
| validate_jws_handshake | RS256 verification of /v1/validate and /v1/manifest |
Manifest Sync (2)
| Tool | What it does |
|---|---|
| check_manifest_sync | Compare local LAST_KNOWN_GOOD against remote manifest |
| simulate_cli_drift | What-if: simulate a manifest field change and report impact |
Fleet Management (3)
| Tool | What it does |
|---|---|
| rotate_relay_keys | Generate new key, revoke old (Pro/Enterprise) |
| get_fleet_status | Status table for all keys in the fleet |
| get_activity_log | Operational activity log (rotations, tier changes, cap warnings, etc.) |
Troubleshooting
MCP server shows as "failed to connect" in your IDE
The -y flag may be missing. npx without -y blocks waiting for install confirmation, and the MCP stdio handshake times out. Use npx -y @inference-relay/mcp.
License-gated tools return "IR_LICENSE_KEY environment variable is required"
The env block in your MCP config wasn't picked up. Verify:
claude mcp get inference-relayA working entry looks like:
inference-relay:
Scope: Local config (private to you in this project)
Status: ✓ Connected
Type: stdio
Command: npx
Args: -y @inference-relay/mcp
Environment:
IR_LICENSE_KEY=ir_live_xxxxThe two things to confirm: Status: ✓ Connected and an Environment: block containing your IR_LICENSE_KEY. If either is missing, remove and re-add with the --env flag.
Tools return "401 Invalid license key"
Either the key is wrong, expired, or revoked. Verify via the dashboard at inference-relay.com/dashboard, or run npx inference-relay verify to test the key independently.
npx keeps re-downloading the package on every invocation
That's normal for the first few runs — npm caches the package after a few uses. To pre-cache: npm install -g @inference-relay/mcp and use command: "inference-relay-mcp" in your config instead of npx.
Links
(c) 2026 L2B II LLC. Patent Pending. Proprietary.
