@eigenart/agentshield-mcp

v0.1.6

Published

2 months ago

MCP server for AgentShield — detect prompt injection, jailbreak, and social-engineering attempts in any text before your agent processes it.

0High
0Medium
0Low

eigenart-dev

mcp model-context-protocol prompt-injection jailbreak-detection llm-security ai-security agent-security guardrails runtime-gateway real-time-classifier agentshield claude cursor cline

@eigenart/agentshield-mcp

Official MCP (Model Context Protocol) server for AgentShield — the runtime gateway and real-time classifier that detects prompt-injection, jailbreak, and social-engineering attempts in text while your agent is running, not in an offline audit pass.

Works with any MCP-compatible client: Claude Desktop, Cursor, Cline, Zed, Continue, and custom agents. Single-shot per request, p50 ~2.4 ms — designed to sit in the agent's hot path on every untrusted input.

What it does

Exposes one tool to the agent: classify_text. Call it on any untrusted text (user messages, retrieved documents, web scrapes, third-party tool outputs) and get back a per-request verdict.

{
  "is_injection": true,
  "confidence": 0.94,
  "category": "jailbreak",
  "latency_ms": 2.4,
  "model": "agentshield-minilm-v2",
  "request_id": "req_01HX…"
}

Classifier is hosted at api.agentshield.pro. No local GPU, no model download. Free tier: 100 classifications/day, no credit card.

Install (Claude Desktop)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "agentshield": {
      "command": "npx",
      "args": ["-y", "@eigenart/agentshield-mcp"],
      "env": {
        "AGENTSHIELD_API_KEY": "ask_your_key_here"
      }
    }
  }
}

Restart Claude Desktop. The classify_text tool will be available.

Install (Cursor / Cline / Zed / Continue)

Same pattern — each client has its own MCP config path, but the command + env block are identical to the Claude Desktop snippet above. See your client's MCP docs for the exact file.

Get an API key

Free tier, no credit card: agentshield.pro/signup.

Usage pattern (for your agent)

The tool description already tells the agent when to use this, but the core rule is:

Before your agent processes any external/untrusted text, call classify_text. If is_injection=true and confidence ≥ 0.8, refuse to act and escalate.

Typical sources of untrusted text:

User messages from public channels
RAG / retrieved documents / web scrapes
Tool-call results from third-party services
Filenames, issue titles, commit messages from external contributors

Environment variables

| Variable | Required | Default | Purpose | |---|---|---|---| | AGENTSHIELD_API_KEY | yes | — | Your API key from agentshield.pro | | AGENTSHIELD_BASE_URL | no | https://api.agentshield.pro | Override for self-hosted gateway |

Benchmark

Public, reproducible: agentshield.pro/benchmark

F1: 0.956 (headline, 5 of 6 public datasets, 4,666 samples; jackhhao role-play set analyzed separately) / 0.921 (full set, all 6 datasets, 5,972 samples) (EN/DE/ES/ZH/FR + encoding-obfuscation)
Latency: p50 2.44 ms (gateway + GPU classifier)
Dataset and scoring script are open source.

Roadmap

v0.2 — check_output tool (output-side secret/PII leak detection, layer 3 of the Gateway)
v0.2 — get_usage tool (rate-limit status for the current API key, so the agent can self-manage budget)
v0.3 — streaming / batch classification
v0.3 — local-first mode (ship a distilled classifier in the package, zero network)

File issues at github.com/dl-eigenart/agentshield-platform/issues.

Python SDK — pip install agentshield-sdk (import stays from agentshield import AgentShield)
ElizaOS plugin (Solana transaction guard) — @eigenart/agentshield
Full product & pricing — agentshield.pro

Not an audit tool

AgentShield is a runtime classifier for live agent traffic. If you are looking for a one-shot pre-deployment OWASP-LLM-Top-10 scan of your own prompts, that is a different product category — use a static audit tool for that and pair it with AgentShield at runtime.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@eigenart/agentshield-mcp

What it does

Install (Claude Desktop)

Install (Cursor / Cline / Zed / Continue)

Get an API key

Usage pattern (for your agent)

Environment variables

Benchmark

Roadmap

Related

Not an audit tool

License