@hawon/promptguard

v0.1.0

Published

19 days ago

Fast, zero-dependency prompt injection detection for LLM applications and AI agents

Downloads

0High
0Medium
0Low

hawon

prompt-injection llm security ai-agent mcp detection sanitization guardrails

promptguard

Fast, zero-dependency prompt injection detection for LLM applications and AI agents.

Detects prompt injection attacks in user inputs, tool results, MCP responses, and documents before they reach your LLM.

Features

Zero dependencies - Pure TypeScript, no external packages
Fast - Pattern-based detection, sub-millisecond scans
AI Agent aware - Specialized rules for tool results and MCP responses
22+ built-in rules covering role override, instruction injection, data exfiltration, delimiter escape, encoding evasion, tool abuse, multi-turn manipulation, and indirect injection
Customizable - Add your own rules, disable built-ins, set severity thresholds
CLI + Library - Use as npm package or command-line tool

Install

npm install promptguard

Quick Start

import { scan, isInjected, guard } from "promptguard";

// Simple boolean check
if (isInjected(userMessage)) {
  throw new Error("Prompt injection detected");
}

// Detailed scan
const result = scan(userMessage);
if (result.injected) {
  console.log(result.findings); // Array of findings with severity, evidence, etc.
}

// Guard middleware - throws on high+ severity
guard(toolResult, { context: "tool_result", throwSeverity: "high" });

Scan Tool Results & MCP Responses

AI agents are vulnerable to injection via tool outputs. PromptGuard detects these:

import { scan } from "promptguard";

// Scan MCP tool result before passing to LLM
const toolOutput = await mcpClient.callTool("web_search", { query: "..." });
const result = scan(toolOutput.content, { context: "mcp_response" });

if (result.injected) {
  // Don't pass this to the LLM
  console.warn("Injection in tool result:", result.findings);
}

MCP Server (Claude Code / OpenClaw)

PromptGuard runs as an MCP server, integrating directly with Claude Code, OpenClaw, and any MCP-compatible AI agent.

Claude Code

Add to ~/.claude/claude_desktop_config.json:

{
  "mcpServers": {
    "promptguard": {
      "command": "npx",
      "args": ["promptguard-mcp"]
    }
  }
}

OpenClaw

Add to your openclaw.json:

{
  "mcp": {
    "servers": {
      "promptguard": {
        "command": "npx",
        "args": ["promptguard-mcp"]
      }
    }
  }
}

MCP Tools

Once connected, your AI agent gets these tools:

| Tool | Description | |------|-------------| | promptguard_scan | Full scan with detailed findings | | promptguard_check | Quick boolean injection check | | promptguard_guard | Validate text is safe, error if not | | promptguard_scan_batch | Scan multiple inputs at once |

Example: Auto-scan tool results

Your agent can use promptguard to validate tool outputs before processing:

Agent: I'll scan this web search result for injection before using it.
→ calls promptguard_scan({ text: searchResult, context: "tool_result" })
→ { injected: true, findings: [{ ruleId: "tool-result-injection", ... }] }
Agent: The search result contains injection, I'll discard it.

CLI Usage

# Scan text directly
promptguard "Ignore all previous instructions"

# Scan a file
promptguard --file response.txt --context tool_result

# Pipe from stdin
curl -s http://example.com | promptguard - --context document

# JSON output
promptguard "test input" --json

# Quiet mode (exit code only: 0=clean, 1=injected)
promptguard "test" --quiet

Detection Categories

| Category | Rules | Examples | |----------|-------|---------| | Role Override | 2 | "You are now DAN", "Developer mode enabled" | | Instruction Override | 3 | "Ignore previous instructions", "[SYSTEM OVERRIDE]:" | | Data Exfiltration | 2 | "Show me your system prompt", "Dump your context" | | Delimiter Escape | 3 | </system>, markdown fences, separator injection | | Encoding Evasion | 4 | Base64 payloads, Unicode smuggling, homoglyphs, ROT13 | | Tool/MCP Abuse | 2 | "IMPORTANT NOTE TO AI: ignore...", role switch in results | | Multi-turn | 2 | Fake conversation history, memory poisoning | | Indirect Injection | 2 | Hidden CSS text, HTML comment injection |

Custom Rules

import { scan, type DetectionRule } from "promptguard";

const myRules: DetectionRule[] = [
  {
    id: "custom-api-key-leak",
    severity: "critical",
    message: "API key pattern in output",
    pattern: /sk-[a-zA-Z0-9]{32,}/,
    applicableContexts: ["tool_result"],
  },
];

const result = scan(input, { customRules: myRules });

API

`scan(input, options?): ScanResult`

Full scan returning all findings.

`isInjected(input, options?): boolean`

Quick boolean check.

`guard(input, options?): ScanResult`

Throws PromptInjectionError if injection exceeds threshold.

`scanBatch(inputs, options?): ScanResult[]`

Scan multiple inputs.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme