@hawon/promptguard
v0.1.0
Published
Fast, zero-dependency prompt injection detection for LLM applications and AI agents
Downloads
72
Maintainers
Readme
promptguard
Fast, zero-dependency prompt injection detection for LLM applications and AI agents.
Detects prompt injection attacks in user inputs, tool results, MCP responses, and documents before they reach your LLM.
Features
- Zero dependencies - Pure TypeScript, no external packages
- Fast - Pattern-based detection, sub-millisecond scans
- AI Agent aware - Specialized rules for tool results and MCP responses
- 22+ built-in rules covering role override, instruction injection, data exfiltration, delimiter escape, encoding evasion, tool abuse, multi-turn manipulation, and indirect injection
- Customizable - Add your own rules, disable built-ins, set severity thresholds
- CLI + Library - Use as npm package or command-line tool
Install
npm install promptguardQuick Start
import { scan, isInjected, guard } from "promptguard";
// Simple boolean check
if (isInjected(userMessage)) {
throw new Error("Prompt injection detected");
}
// Detailed scan
const result = scan(userMessage);
if (result.injected) {
console.log(result.findings); // Array of findings with severity, evidence, etc.
}
// Guard middleware - throws on high+ severity
guard(toolResult, { context: "tool_result", throwSeverity: "high" });Scan Tool Results & MCP Responses
AI agents are vulnerable to injection via tool outputs. PromptGuard detects these:
import { scan } from "promptguard";
// Scan MCP tool result before passing to LLM
const toolOutput = await mcpClient.callTool("web_search", { query: "..." });
const result = scan(toolOutput.content, { context: "mcp_response" });
if (result.injected) {
// Don't pass this to the LLM
console.warn("Injection in tool result:", result.findings);
}MCP Server (Claude Code / OpenClaw)
PromptGuard runs as an MCP server, integrating directly with Claude Code, OpenClaw, and any MCP-compatible AI agent.
Claude Code
Add to ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"promptguard": {
"command": "npx",
"args": ["promptguard-mcp"]
}
}
}OpenClaw
Add to your openclaw.json:
{
"mcp": {
"servers": {
"promptguard": {
"command": "npx",
"args": ["promptguard-mcp"]
}
}
}
}MCP Tools
Once connected, your AI agent gets these tools:
| Tool | Description |
|------|-------------|
| promptguard_scan | Full scan with detailed findings |
| promptguard_check | Quick boolean injection check |
| promptguard_guard | Validate text is safe, error if not |
| promptguard_scan_batch | Scan multiple inputs at once |
Example: Auto-scan tool results
Your agent can use promptguard to validate tool outputs before processing:
Agent: I'll scan this web search result for injection before using it.
→ calls promptguard_scan({ text: searchResult, context: "tool_result" })
→ { injected: true, findings: [{ ruleId: "tool-result-injection", ... }] }
Agent: The search result contains injection, I'll discard it.CLI Usage
# Scan text directly
promptguard "Ignore all previous instructions"
# Scan a file
promptguard --file response.txt --context tool_result
# Pipe from stdin
curl -s http://example.com | promptguard - --context document
# JSON output
promptguard "test input" --json
# Quiet mode (exit code only: 0=clean, 1=injected)
promptguard "test" --quietDetection Categories
| Category | Rules | Examples |
|----------|-------|---------|
| Role Override | 2 | "You are now DAN", "Developer mode enabled" |
| Instruction Override | 3 | "Ignore previous instructions", "[SYSTEM OVERRIDE]:" |
| Data Exfiltration | 2 | "Show me your system prompt", "Dump your context" |
| Delimiter Escape | 3 | </system>, markdown fences, separator injection |
| Encoding Evasion | 4 | Base64 payloads, Unicode smuggling, homoglyphs, ROT13 |
| Tool/MCP Abuse | 2 | "IMPORTANT NOTE TO AI: ignore...", role switch in results |
| Multi-turn | 2 | Fake conversation history, memory poisoning |
| Indirect Injection | 2 | Hidden CSS text, HTML comment injection |
Custom Rules
import { scan, type DetectionRule } from "promptguard";
const myRules: DetectionRule[] = [
{
id: "custom-api-key-leak",
severity: "critical",
message: "API key pattern in output",
pattern: /sk-[a-zA-Z0-9]{32,}/,
applicableContexts: ["tool_result"],
},
];
const result = scan(input, { customRules: myRules });API
scan(input, options?): ScanResult
Full scan returning all findings.
isInjected(input, options?): boolean
Quick boolean check.
guard(input, options?): ScanResult
Throws PromptInjectionError if injection exceeds threshold.
scanBatch(inputs, options?): ScanResult[]
Scan multiple inputs.
License
MIT
