@eigenart/agentshield-mcp
v0.1.6
Published
MCP server for AgentShield — detect prompt injection, jailbreak, and social-engineering attempts in any text before your agent processes it.
Maintainers
Readme
@eigenart/agentshield-mcp
Official MCP (Model Context Protocol) server for AgentShield — the runtime gateway and real-time classifier that detects prompt-injection, jailbreak, and social-engineering attempts in text while your agent is running, not in an offline audit pass.
Works with any MCP-compatible client: Claude Desktop, Cursor, Cline, Zed, Continue, and custom agents. Single-shot per request, p50 ~2.4 ms — designed to sit in the agent's hot path on every untrusted input.
What it does
Exposes one tool to the agent: classify_text. Call it on any untrusted text (user messages, retrieved documents, web scrapes, third-party tool outputs) and get back a per-request verdict.
{
"is_injection": true,
"confidence": 0.94,
"category": "jailbreak",
"latency_ms": 2.4,
"model": "agentshield-minilm-v2",
"request_id": "req_01HX…"
}Classifier is hosted at api.agentshield.pro. No local GPU, no model download. Free tier: 100 classifications/day, no credit card.
Install (Claude Desktop)
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"agentshield": {
"command": "npx",
"args": ["-y", "@eigenart/agentshield-mcp"],
"env": {
"AGENTSHIELD_API_KEY": "ask_your_key_here"
}
}
}
}Restart Claude Desktop. The classify_text tool will be available.
Install (Cursor / Cline / Zed / Continue)
Same pattern — each client has its own MCP config path, but the command + env block are identical to the Claude Desktop snippet above. See your client's MCP docs for the exact file.
Get an API key
Free tier, no credit card: agentshield.pro/signup.
Usage pattern (for your agent)
The tool description already tells the agent when to use this, but the core rule is:
Before your agent processes any external/untrusted text, call
classify_text. Ifis_injection=trueandconfidence ≥ 0.8, refuse to act and escalate.
Typical sources of untrusted text:
- User messages from public channels
- RAG / retrieved documents / web scrapes
- Tool-call results from third-party services
- Filenames, issue titles, commit messages from external contributors
Environment variables
| Variable | Required | Default | Purpose |
|---|---|---|---|
| AGENTSHIELD_API_KEY | yes | — | Your API key from agentshield.pro |
| AGENTSHIELD_BASE_URL | no | https://api.agentshield.pro | Override for self-hosted gateway |
Benchmark
Public, reproducible: agentshield.pro/benchmark
- F1: 0.956 (headline, 5 of 6 public datasets, 4,666 samples; jackhhao role-play set analyzed separately) / 0.921 (full set, all 6 datasets, 5,972 samples) (EN/DE/ES/ZH/FR + encoding-obfuscation)
- Latency: p50 2.44 ms (gateway + GPU classifier)
- Dataset and scoring script are open source.
Roadmap
- v0.2 —
check_outputtool (output-side secret/PII leak detection, layer 3 of the Gateway) - v0.2 —
get_usagetool (rate-limit status for the current API key, so the agent can self-manage budget) - v0.3 — streaming / batch classification
- v0.3 — local-first mode (ship a distilled classifier in the package, zero network)
File issues at github.com/dl-eigenart/agentshield-platform/issues.
Related
- Python SDK —
pip install agentshield-sdk(import staysfrom agentshield import AgentShield) - ElizaOS plugin (Solana transaction guard) —
@eigenart/agentshield - Full product & pricing — agentshield.pro
Not an audit tool
AgentShield is a runtime classifier for live agent traffic. If you are looking for a one-shot pre-deployment OWASP-LLM-Top-10 scan of your own prompts, that is a different product category — use a static audit tool for that and pair it with AgentShield at runtime.
License
MIT © Eigenart Filmproduktion. See LICENSE.
