clawguard-openclaw
v1.0.0
Published
Security guardrails for OpenClaw agents - Lethal Trifecta defense
Maintainers
Readme
🛡️ ClawGuard OpenClaw Plugin
SOTA security guardrails for OpenClaw agents — Complete Lethal Trifecta defense.
What is the Lethal Trifecta?
The three attack vectors that can compromise an AI agent:
- Input Attacks (Prompt Injection) - Malicious instructions in user messages or external content
- Runtime Attacks (Tool Exploitation) - Abusing tool calls for data exfiltration or system compromise
- Output Attacks (Data Leakage) - Credentials or PII leaking in agent responses
ClawGuard defends against all three with state-of-the-art detection techniques.
Installation
openclaw plugins install @openclaw/clawguardThen restart your gateway.
SOTA Features
Input Guard (Leg 1)
- Pattern-based detection in 7+ languages (EN/KO/JA/ZH/ES/DE/FR/RU)
- Adversarial suffix detection (GCG-style attacks) via entropy analysis
- Multi-turn tracking - detects split payload attacks across messages
- Source-aware thresholds - web content gets stricter scrutiny than user input
- Encoding evasion detection (base64, hex, unicode, homoglyphs)
- Jailbreak and system prompt extraction detection
Runtime Guard (Leg 2)
- Tool call interception with parameter validation
- Dangerous command detection (shell injection, rm -rf, etc.)
- Exfiltration URL blocking (webhook.site, ngrok, etc.)
- Sensitive path protection (.ssh, .aws, .env)
- Optional human-in-the-loop approval gates
Output Guard (Leg 3)
- Credential detection (AWS, GitHub, OpenAI, Slack, Discord, Telegram, and 15+ more)
- PII detection (SSN, credit cards, phones, emails, IPs)
- Automatic redaction before output
- Canary token system for prompt leak detection
Additional SOTA Features
- Spotlighting - Data marking for untrusted content (Microsoft research)
- Defense presets -
paranoid,balanced,permissive - Structured threat events - Correlation via fingerprinting
- Context decay - Risk scores decay over conversation
Quick Start
Use a preset:
{
plugins: {
entries: {
clawguard: {
enabled: true,
config: {
preset: "balanced" // or "paranoid" or "permissive"
}
}
}
}
}Custom configuration:
{
plugins: {
entries: {
clawguard: {
enabled: true,
config: {
inputGuard: {
enabled: true,
threshold: 50,
blockOnDetection: false,
useAdversarialDetection: true,
useMultiTurnTracking: true
},
runtimeGuard: {
enabled: true,
dangerousTools: ["exec", "write", "edit"],
blockExfilUrls: true,
requireApproval: false
},
outputGuard: {
enabled: true,
redactCredentials: true,
redactPII: true,
canaryTokens: ["SECRET_CANARY_12345"]
},
spotlighting: {
enabled: true,
mode: "delimit",
sources: ["web", "email"]
},
logging: {
logThreats: true,
structuredEvents: true
}
}
}
}
}
}Defense Presets
| Preset | Threshold | Block | Adversarial | Multi-turn | Approval | Spotlighting |
|--------|-----------|-------|-------------|------------|----------|--------------|
| paranoid | 25 | ✓ | ✓ | ✓ | ✓ | all sources |
| balanced | 50 | ✗ | ✓ | ✓ | ✗ | web, email |
| permissive | 75 | ✗ | ✗ | ✗ | ✗ | disabled |
CLI Commands
# Check status and stats
openclaw clawguard status
# View available presets
openclaw clawguard presets
# Test detection with source simulation
openclaw clawguard test "ignore previous instructions" --guard input --source web
openclaw clawguard test "sk-proj-abc123..." --guard output
# View recent threat events
openclaw clawguard events --limit 20Slash Command
In any chat, use /clawguard to see current status and session stats.
How It Works
ClawGuard hooks into OpenClaw's plugin lifecycle:
User Message
↓
┌─────────────────────────────────────┐
│ INPUT GUARD (before_agent_start) │
│ • Pattern matching (7 languages) │
│ • Adversarial suffix detection │
│ • Multi-turn context tracking │
│ • Source-aware thresholds │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ RUNTIME GUARD (before_tool_call) │
│ • Parameter validation │
│ • Exfil URL blocking │
│ • Dangerous command detection │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ OUTPUT GUARD (message_sending) │
│ • Credential scanning │
│ • PII detection │
│ • Canary token monitoring │
│ • Auto-redaction │
└─────────────────────────────────────┘
↓
Safe ResponseResearch References
- Adversarial Suffixes: Zou et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models"
- Spotlighting: Microsoft "Defending Against Indirect Prompt Injection Attacks"
- Lethal Trifecta: OpenClaw security model
- Multi-turn Attacks: Perez & Ribeiro "Ignore This Title and HackAPrompt"
Testing
cd projects/clawguard-plugin
bun test # 63 testsFile Structure
src/
├── index.ts # Plugin entry, lifecycle hooks, CLI
├── guards.ts # Input/Runtime/Output guards
├── patterns.ts # Detection patterns (injection, credentials, PII)
├── analyzers.ts # SOTA: entropy, context tracker, spotlighting
├── guards.test.ts # Guard tests (38)
└── analyzers.test.ts # Analyzer tests (25)License
MIT
Authors
Built by MaxsClawd & Max — Day one, shipped.
