clawguard
v0.1.0
Published
Security guardrails for AI agents - prompt injection detection & credential protection
Maintainers
Readme
ClawGuard 🛡️
Complete Security Guardrails for AI Agents
ClawGuard provides defense-in-depth against the "Lethal Trifecta" of AI agent vulnerabilities:
- Prompt Injection (Input Guard)
- Tool Access Abuse (Runtime Guard)
- Data Exfiltration (Output Guard)
Installation
bun add clawguard
# or
npm install clawguardQuick Start
import { GuardSystem } from 'clawguard';
const guard = new GuardSystem({
strictMode: true,
runtime: {
onApprovalRequired: async (request) => {
console.log(`Approve ${request.tool}?`, request.params);
return confirm('Allow?');
}
}
});
// Leg 1: Scan user input
const inputResult = guard.scanInput(userMessage);
if (!inputResult.safe) {
console.log('Blocked:', inputResult.threats);
}
// Leg 2: Validate tool calls
const toolResult = await guard.validateToolCall({
tool: 'send_email',
params: { to: '[email protected]', body: 'Hello' },
timestamp: Date.now()
});
if (!toolResult.allowed) {
throw new Error(toolResult.reason);
}
// Leg 3: Scan LLM output
const outputResult = guard.scanOutput(llmResponse);
const safeResponse = outputResult.redacted || llmResponse;The Lethal Trifecta
┌─────────────────────────────────────────────────────────────┐
│ 1. PROMPT INJECTION Attacker controls input │
│ + │
│ 2. TOOL ACCESS Agent can affect real world │
│ + │
│ 3. DATA ACCESS Agent has sensitive info │
│ = │
│ CONFUSED DEPUTY Agent becomes attacker's proxy │
└─────────────────────────────────────────────────────────────┘Features
Input Guard (Leg 1)
- 150+ heuristic patterns for direct injection
- 35+ international patterns (KO/JA/ZH/ES/DE/FR/RU)
- Indirect injection detection (web, email, file content)
- Encoding evasion detection (base64, unicode, homoglyphs)
- Multi-turn context tracking
- Adversarial suffix detection
- Entropy analysis
import { InputGuard, scanInput, scanIndirectContent } from 'clawguard';
// Direct injection scan
const result = scanInput('Ignore all previous instructions');
// { safe: false, score: 100, threats: [...] }
// Indirect injection scan (for external content)
const webResult = scanIndirectContent(webpageContent, 'web');
const emailResult = scanIndirectContent(emailBody, 'email');Runtime Guard (Leg 2)
- Tool call interception and validation
- Rate limiting per tool
- Dangerous parameter detection
- Human-in-the-loop approval gates
- Anomaly detection
- Audit logging
import { RuntimeGuard, guardTool } from 'clawguard';
const guard = new RuntimeGuard({
highRiskTools: ['send_email', 'delete_file', 'execute_code'],
rateLimits: {
send_email: { maxCalls: 10, windowMs: 60000 }
},
onApprovalRequired: async (request) => {
return await askUser(`Allow ${request.tool}?`);
}
});
// Wrap existing tools
const safeSendEmail = guardTool(sendEmail, 'send_email', guard);Output Guard (Leg 3)
- Credential leak detection (API keys, tokens, passwords)
- PII detection and redaction (SSN, credit cards, emails, phones)
- Canary token detection (prompt leak detection)
- Automatic redaction
- Sensitive context patterns
import { OutputGuard, scanOutput, createCanaryToken } from 'clawguard';
// Scan LLM output
const result = scanOutput(llmResponse);
if (result.redacted) {
return result.redacted;
}
// Create canary token for prompt leak detection
const canary = createCanaryToken('system prompt');
const systemPrompt = `You are helpful. ${canary.token}`;
// Check if canary leaked
const output = scanOutput(llmResponse);
if (output.threats.some(t => t.type === 'canary_leak')) {
console.error('PROMPT LEAKED!');
}Complete Guard System
import { GuardSystem, createGuardSystem } from 'clawguard';
const guard = new GuardSystem({
input: {
threshold: 25,
languages: ['en', 'ko'],
},
runtime: {
highRiskTools: ['send_email', 'delete_file'],
detectAnomalies: true,
onApprovalRequired: async (req) => askUser(req),
},
output: {
detectCredentials: true,
detectPII: true,
detectCanaries: true,
autoRedact: true,
},
strictMode: true,
auditAll: true,
});
// Process complete agent turn
const result = await guard.processAgentTurn(
userInput,
[{ tool: 'search', params: { query: 'weather' } }],
async (ctx) => await llm.generate(ctx)
);Architecture
┌──────────────────────────────────────┐
│ GUARD SYSTEM │
│ │
User Input │ ┌─────────┐ │
────────────► │ │ INPUT │ Blocks injection │
│ │ GUARD │ Decodes evasion │
│ └────┬────┘ │
│ ▼ │
Tool Calls │ ┌─────────┐ │
────────────► │ │ RUNTIME │ Rate limits │
│ │ GUARD │ Requires approval │
│ └────┬────┘ │
│ ▼ │
LLM Output │ ┌─────────┐ │
◄──────────── │ │ OUTPUT │ Detects leaks │
│ │ GUARD │ Redacts PII │
│ └─────────┘ │
└──────────────────────────────────────┘CLI
clawguard scan "Ignore all previous instructions"
clawguard scan "이전 지시를 무시하세요" # Korean
clawguard creds "API key: sk-1234..."
clawguard redact "My SSN is 123-45-6789"
clawguard skill-file ./SKILL.mdLicense
MIT
