@sovereign-labs/mcp-proxy

v0.8.1

Published

3 months ago

Governed transport for MCP. Structural invariants for any tool server.

0High
0Medium
0Low

vibestarter

mcp audit-trail governance ai-agents tool-use receipts tamper-evident proxy

@sovereign-labs/mcp-proxy

See what your agent did. Verify nobody tampered with the record. Stop broken retries. Coach agents toward recovery.

A drop-in governance proxy for any MCP tool server. One command to add tamper-evident receipts, failure memory, convergence physics, and authority tracking.

See It In Action (1 command)

npx @sovereign-labs/mcp-proxy --demo

No config. No server. Watch governance happen: receipts, failure memory, automatic blocking, hash chain verification — all in 5 seconds.

The Trust Demo (4 commands)

# 1. Govern your filesystem server
npx @sovereign-labs/mcp-proxy --wrap filesystem

# 2. Use Claude Code normally — every tool call is now receipted

# 3. See what happened (plain English)
npx @sovereign-labs/mcp-proxy --explain --state-dir .governance-filesystem

# 4. Verify the record is intact
npx @sovereign-labs/mcp-proxy --verify --state-dir .governance-filesystem

That's it. Your agent doesn't know the proxy exists. Your MCP server doesn't know either. But now you have proof.

Step 3 gives you a plain-English summary of what the agent did. Use --view instead for the full per-receipt timeline.

The Problem

MCP lets AI agents call tools — filesystems, databases, APIs. But nothing records what they did.

Agent runs:
  write_file("/app/config.json", content)    -> Permission denied
  write_file("/app/config.json", content)    -> Permission denied
  write_file("/app/config.json", content)    -> Permission denied
  ...
  write_file("/app/config.json", content)    -> Permission denied    x 37

No audit trail.  No failure memory.  No way to prove what happened.

After a session, you can't answer basic questions:

What did the agent actually do?
Did it repeat the same mistake?
Who authorized that action?
Was the audit trail modified after the fact?

What You Get

Agent (Claude, GPT, etc.)
  | stdio
@sovereign-labs/mcp-proxy
  |-- Record: tamper-evident receipt for every call
  |-- Learn:  failures seed constraints (don't repeat mistakes)
  |-- Coach:  block messages include prior context + winning strategies
  |-- Guard:  block calls that match known failures (exact + strategy-class)
  |-- Track:  controller identity + authority epoch
  | stdio
Your MCP Server (filesystem, database, anything)

No changes to your agent. No changes to your MCP server. Drop-in.

Four Guarantees

Receipts — Every tool call produces a hash-chained record. Like git commits for tool execution. Tamper with one receipt and the chain breaks.
Constraints — When a tool call fails, the proxy fingerprints the failure and blocks identical calls within a TTL window. Your agent can't repeat the same mistake. Strategy-class matching blocks entire failure patterns, not just exact duplicates.
Coaching — Blocked calls get rich context: what failed, why it was blocked, what worked before, and actionable suggestions. The proxy doesn't just say "no" — it says "no, and here's what worked last time."
Authority — A stable controller ID and monotonic epoch counter. You can prove which controller was active and whether authority was still valid when a call was made.

Quick Start

Option A: Wrap an existing server (recommended)

If you already have MCP servers in .mcp.json:

npx @sovereign-labs/mcp-proxy --wrap filesystem

Done. Restart your MCP client. To remove governance later:

npx @sovereign-labs/mcp-proxy --unwrap filesystem

Receipts are preserved even after unwrapping.

Option B: Direct proxy mode

npx @sovereign-labs/mcp-proxy --upstream "npx -y @modelcontextprotocol/server-filesystem /tmp"

Option C: Manual .mcp.json

{
  "mcpServers": {
    "governed-filesystem": {
      "command": "npx",
      "args": [
        "-y", "@sovereign-labs/mcp-proxy",
        "--upstream", "npx -y @modelcontextprotocol/server-filesystem /tmp"
      ]
    }
  }
}

Convergence Physics (v0.8.0)

Three features that transform the proxy from passive observation into active convergence — agents get materially better at recovering from failures.

1. Strategy-Class Constraints

v0.7 blocked exact tool+target matches. v0.8 also blocks the strategy that failed.

When a rewrite_page strategy fails on server.js, the proxy doesn't just block write_file → server.js — it blocks ALL rewrite_page attempts for the TTL window. The agent must try a different strategy (smaller edits, targeted changes) instead of repeating the same broad approach on a different file.

Two-pass constraint matching:

Pass 1 (exact): Same tool + same target → blocked
Pass 2 (action class): Any call classified as the same strategy → blocked

Five action classes (heuristic, zero LLM):

| Action Class | Detection | |-------------|-----------| | schema_migration | SQL file or migration directory touched | | global_replace | Same search pattern across 3+ files | | rewrite_page | Full file creation or >50% content replacement | | style_overhaul | More than 5 CSS property changes | | unrelated_edit | Touched files don't match predicate surface |

Action class is computed before the governance gate — the proxy knows what kind of strategy the agent is attempting before deciding whether to allow it.

2. Scoped Transfer with Decay

Not all failures are the agent's fault. Infrastructure flakes (DNS timeout, container not ready) shouldn't poison the constraint store the same way code bugs do.

Failure classification:

| FailureKind | Meaning | Scope | TTL | |-------------|---------|-------|-----| | harness_fault | Infrastructure issue (DNS, timeout, SSH) | Session only | 10 min | | app_failure | Agent's code/approach was wrong | Cross-session | 1 hour (exact), 30 min (action class) | | unknown | Can't determine | Cross-session | 1 hour |

Session cleanup: On proxy startup, session-scoped constraints and expired constraints are purged. Infrastructure flakes from a previous session don't carry over.

Winning patterns persist: When a tool call succeeds with a strategy that differs from a prior failure's strategy (same target or tool, different action class), the winning strategy is recorded in patterns.json with a 24-hour TTL. These patterns survive across sessions and are surfaced in coaching responses.

3. Coaching Responses

When the proxy blocks a call, it doesn't return a generic error. It returns a structured coaching message with everything the agent needs to recover:

[GOVERNANCE BLOCKED] constraint

Tool: write_file
Target: /app/server.js
Reason: G2 BLOCKED: write_file+/app/server.js (known failure)

--- Prior Failure Context ---
Failure signature: syntax_error
Error: SyntaxError: Unexpected token '}'
Failed strategy: rewrite_page
Match type: ACTION CLASS — this entire strategy class is blocked, not just this specific target
Constraint expires in: 23 min

--- What Has Worked Before ---
  Strategy: style_overhaul (seen 3x, last 45min ago)
  Tool: write_file → /app/server.js

--- Suggestions ---
• Try a different strategy than "rewrite_page"
• Make smaller, more targeted changes instead of broad rewrites
• Previously successful strategies: style_overhaul

Four sections:

What was blocked — Tool, target, reason
Prior failure context — The failure signature, error snippet, action class, match type, and time until constraint expires
What has worked before — Winning patterns from prior sessions matching this failure signature (top 3 by recency)
Actionable suggestions — Strategy-specific guidance (different for constraint violations, authority issues, containment mismatches, convergence blocks, and budget exhaustion)

CLI Reference

# Setup
npx @sovereign-labs/mcp-proxy --wrap <server>     # Govern an existing MCP server
npx @sovereign-labs/mcp-proxy --unwrap <server>   # Restore original config

# Try it
npx @sovereign-labs/mcp-proxy --demo              # Interactive demo (no config needed)

# Inspection (offline, no proxy needed)
npx @sovereign-labs/mcp-proxy --view              # Per-receipt timeline
npx @sovereign-labs/mcp-proxy --view --tool write  # Filter by tool name
npx @sovereign-labs/mcp-proxy --view --outcome error  # Show only failures
npx @sovereign-labs/mcp-proxy --receipts          # Session summary
npx @sovereign-labs/mcp-proxy --verify            # Tamper detection
npx @sovereign-labs/mcp-proxy --explain           # Plain-English summary

# Explain with LLM enhancement (optional — any provider)
npx @sovereign-labs/mcp-proxy --explain --llm openai --api-key sk-...
npx @sovereign-labs/mcp-proxy --explain --llm anthropic --api-key sk-ant-...
npx @sovereign-labs/mcp-proxy --explain --llm gemini --api-key AIza...
npx @sovereign-labs/mcp-proxy --explain --llm ollama              # localhost
npx @sovereign-labs/mcp-proxy --explain --llm ollama --model llama3.2

# Proxy mode
npx @sovereign-labs/mcp-proxy --upstream "command"
npx @sovereign-labs/mcp-proxy --upstream "command" --enforcement advisory
npx @sovereign-labs/mcp-proxy --upstream "command" --state-dir ./my-state
npx @sovereign-labs/mcp-proxy --upstream "command" --schema strict
npx @sovereign-labs/mcp-proxy --upstream "command" --webhook https://example.com/hook

What --view Shows

  RECEIPT LEDGER
  ===============================================================
  controller:  311036af...
  integrity:   verified
  showing:     10 receipts
  ---------------------------------------------------------------

  ok #  1  2026-03-06 14:22:03    42ms  read_file
           target: /tmp/config.json
           hash: 8c1a7d3b4e2f9a01...

  ok #  2  2026-03-06 14:22:04   103ms  write_file [MUTATION]
           target: /tmp/config.json
           hash: 3f7b2c1d8e4a6509...

  !! #  3  2026-03-06 14:22:05    38ms  write_file [MUTATION]
           target: /tmp/secret.key
           error: Permission denied
           hash: 9d2e4f6a1b3c7508...

  -- #  4  2026-03-06 14:22:05     1ms  write_file [MUTATION]
           target: /tmp/secret.key
           blocked by: write_file+/tmp/secret.key (known failure)
           hash: 5a8b3c2d1e4f7609...

  ---------------------------------------------------------------
  4 receipts  |  3 mutations  |  1 blocked  |  1 errors

What --verify Shows

  CHAIN VERIFICATION
  ===============================================================

  receipts:          47
  chain depth:       47
  integrity:         all hashes verified
  controller:        311036af...
  first hash:        a8ba7720...
  last hash:         e8aa80ef...

If anyone modifies a receipt after the fact:

  integrity:         TAMPERED at seq 23

  The receipt chain has been tampered with or corrupted.
  The break was detected at sequence number 23.

What --explain Shows

A plain-English summary of what the agent did, generated from receipt data — no LLM required.

  WHAT HAPPENED
  ───────────────────────────────────────────────────────────────

  Purpose:      Attempt operation (with failure prevention)

  The agent examined several resources to understand the current
  state. It made changes across 3 resources, primarily the
  configuration file, the server code and a source file. It
  encountered errors accessing a sensitive file (was denied
  access). The proxy blocked 1 repeated operation to prevent
  wasted retries.

  Impact:       5 reads  ·  3 changes  ·  1 blocked  ·  1 error

  Bottom line:  One operation failed, and the proxy blocked 1
                retry to prevent repeating the same mistake.

  This summary was generated from verifiable execution receipts.
  Run --verify to confirm the record has not been altered.

For large sessions (hundreds of calls), it automatically switches to aggregate mode:

  Purpose:      Update configuration and apply changes

  The agent examined many resources to understand the current
  state. It made changes across 20 resources, primarily the
  football, the clear and the message. It restarted services
  so changes would take effect. It encountered 13 errors
  across 4 resources.

  Impact:       799 reads  ·  73 changes  ·  13 errors

  Bottom line:  859 operations succeeded, but 13 operations failed.

LLM Enhancement (optional)

Pass --llm to get a richer narrative from your own LLM provider. The heuristic summary always works as a fallback.

npx @sovereign-labs/mcp-proxy --explain --llm openai --api-key sk-...
npx @sovereign-labs/mcp-proxy --explain --llm anthropic --api-key sk-ant-...
npx @sovereign-labs/mcp-proxy --explain --llm gemini --api-key AIza...
npx @sovereign-labs/mcp-proxy --explain --llm ollama

The LLM receives a compressed summary of receipts (not raw data) and produces a narrative. If the LLM call fails, the heuristic output is shown instead. No dependencies — just HTTP calls.

Smart Defaults (v0.7.0)

Schema validation defaults to warn — catches hallucinated tool parameters without any configuration:

# Default: warns on invalid parameters (no flag needed)
npx @sovereign-labs/mcp-proxy --upstream "command"

# Explicit modes
npx @sovereign-labs/mcp-proxy --upstream "command" --schema off     # No validation
npx @sovereign-labs/mcp-proxy --upstream "command" --schema strict  # Block invalid calls

Narrative Exit Summary

When a session ends, the proxy prints a plain-language summary alongside the stats:

Your agent made 47 tool calls over 3.2 minutes.
It read 35 resources and modified 8. 2 calls were blocked
(constraint violation, budget exceeded). No loops detected.

Always printed — no flag needed.

Webhooks

Fire-and-forget notifications on three events: blocked, loop_detected, session_complete.

# Generic webhook
npx @sovereign-labs/mcp-proxy --upstream "command" --webhook https://example.com/hook

# Multiple webhooks
npx @sovereign-labs/mcp-proxy --upstream "command" \
  --webhook https://hook1.example.com \
  --webhook https://hook2.example.com

Discord & Telegram Auto-Detection

Set environment variables and the proxy picks them up automatically — no --webhook flag needed:

# Discord: just the webhook URL
export DISCORD_WEBHOOK="https://discord.com/api/webhooks/123/abc"

# Telegram: bot token + chat ID
export TELEGRAM_BOT_TOKEN="bot123"
export TELEGRAM_CHAT_ID="456"

Discord gets formatted messages with emoji and Markdown. Telegram gets parse_mode: 'Markdown'. Generic endpoints get the raw event JSON.

2s abort timeout on all webhook calls — the proxy never blocks on delivery.

Enforcement Modes

| Mode | On constraint violation | Receipts | |------|----------------------|----------| | strict (default) | Block the call | Always | | advisory | Log + forward anyway | Always |

Start with advisory to see what the proxy catches without blocking:

npx @sovereign-labs/mcp-proxy --wrap filesystem --enforcement advisory

How Constraint Learning Works

Basic (exact match — v0.7)

1. Agent calls write_file({ path: "/app/config.json", content: "..." })
2. Upstream returns error: "Permission denied"
3. Proxy fingerprints: tool=write_file, target=/app/config.json, sig=permission_denied
4. Constraint stored with 1-hour TTL
5. Agent tries same call again -> BLOCKED (strict) or annotated (advisory)
6. Agent tries write_file on a DIFFERENT path -> allowed (target-specific)

Strategy-class (action class match — v0.8)

1. Agent submits edits that rewrite 80% of server.js
2. Upstream returns error: "SyntaxError: Unexpected token"
3. Proxy fingerprints: sig=syntax_error, actionClass=rewrite_page
4. Two constraints created:
   a. Exact: write_file + server.js (1hr TTL, cross-session)
   b. Action-class: rewrite_page (30min TTL, cross-session)
5. Agent tries rewriting server.js -> BLOCKED (exact match)
6. Agent tries rewriting index.html -> BLOCKED (action-class match)
7. Agent tries a small, targeted edit to server.js -> ALLOWED (different strategy)

Winning pattern detection (v0.8)

1. Prior constraint exists: sig=syntax_error, actionClass=rewrite_page
2. Agent makes a small targeted edit to server.js -> succeeds
3. Proxy detects: same target, different strategy -> winning pattern!
4. Pattern recorded: "For syntax_error, targeted edits work" (24hr TTL)
5. Next time rewrite_page is blocked, coaching includes:
   "What Has Worked Before: targeted edit (seen 1x)"

Scoped decay (v0.8)

1. DNS timeout during staging -> harness_fault -> session scope (10min)
2. SyntaxError in agent's code -> app_failure -> cross-session (1hr)
3. Proxy restarts -> session-scoped constraints purged
4. Cross-session constraints survive (app_failure remains, DNS timeout gone)

Governance Meta-Tools

The proxy injects tools the agent can call:

| Tool | What it does | |------|-------------| | governance_status | Controller ID, epoch, constraint count, receipt count | | governance_bump_authority | Advance epoch (invalidates stale sessions) | | governance_declare_intent | Declare goal + predicates for containment | | governance_clear_intent | Clear declared intent | | governance_convergence_status | Loop detection state |

State Directory

All state lives in .governance/ (or --state-dir):

| File | Contents | |------|----------| | receipts.jsonl | Append-only hash-chained audit trail | | constraints.json | Failure fingerprints with scope + TTL + action class | | patterns.json | Winning strategies from prior recoveries (24hr decay) | | controller.json | Stable controller UUID | | authority.json | Authority epoch + session binding |

Programmatic API

import { startProxy, createGovernedProxy } from '@sovereign-labs/mcp-proxy';

await startProxy({
  upstream: 'npx -y @modelcontextprotocol/server-filesystem /tmp',
  stateDir: '.governance',
  enforcement: 'strict',
});

Built On

@sovereign-labs/kernel — 7 governance invariants as pure functions. The proxy uses the kernel for hash chaining, failure fingerprinting, constraint checking, and authority validation.

Requirements

Node.js >= 18 (for npx) or Bun >= 1.0 (for bunx)
Any MCP-compatible tool server as upstream

Questions or bugs?

Open an issue or email [email protected].

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@sovereign-labs/mcp-proxy

See It In Action (1 command)

The Trust Demo (4 commands)

The Problem

What You Get

Four Guarantees

Quick Start

Option A: Wrap an existing server (recommended)

Option B: Direct proxy mode

Option C: Manual .mcp.json

Convergence Physics (v0.8.0)

1. Strategy-Class Constraints

2. Scoped Transfer with Decay

3. Coaching Responses

CLI Reference

What --view Shows

What --verify Shows

What --explain Shows

LLM Enhancement (optional)

Smart Defaults (v0.7.0)

Narrative Exit Summary

Webhooks

Discord & Telegram Auto-Detection

Enforcement Modes

How Constraint Learning Works

Basic (exact match — v0.7)

Strategy-class (action class match — v0.8)

Winning pattern detection (v0.8)

Scoped decay (v0.8)

Governance Meta-Tools

State Directory

Programmatic API

Built On

Requirements

Questions or bugs?

License