clawguard-openclaw

v1.0.0

Published

13 days ago

Security guardrails for OpenClaw agents - Lethal Trifecta defense

0High
0Medium
0Low

maxliss

openclaw plugin security prompt-injection guardrails

🛡️ ClawGuard OpenClaw Plugin

SOTA security guardrails for OpenClaw agents — Complete Lethal Trifecta defense.

What is the Lethal Trifecta?

The three attack vectors that can compromise an AI agent:

Input Attacks (Prompt Injection) - Malicious instructions in user messages or external content
Runtime Attacks (Tool Exploitation) - Abusing tool calls for data exfiltration or system compromise
Output Attacks (Data Leakage) - Credentials or PII leaking in agent responses

ClawGuard defends against all three with state-of-the-art detection techniques.

Installation

openclaw plugins install @openclaw/clawguard

Then restart your gateway.

SOTA Features

Input Guard (Leg 1)

Pattern-based detection in 7+ languages (EN/KO/JA/ZH/ES/DE/FR/RU)
Adversarial suffix detection (GCG-style attacks) via entropy analysis
Multi-turn tracking - detects split payload attacks across messages
Source-aware thresholds - web content gets stricter scrutiny than user input
Encoding evasion detection (base64, hex, unicode, homoglyphs)
Jailbreak and system prompt extraction detection

Runtime Guard (Leg 2)

Tool call interception with parameter validation
Dangerous command detection (shell injection, rm -rf, etc.)
Exfiltration URL blocking (webhook.site, ngrok, etc.)
Sensitive path protection (.ssh, .aws, .env)
Optional human-in-the-loop approval gates

Output Guard (Leg 3)

Credential detection (AWS, GitHub, OpenAI, Slack, Discord, Telegram, and 15+ more)
PII detection (SSN, credit cards, phones, emails, IPs)
Automatic redaction before output
Canary token system for prompt leak detection

Additional SOTA Features

Spotlighting - Data marking for untrusted content (Microsoft research)
Defense presets - paranoid, balanced, permissive
Structured threat events - Correlation via fingerprinting
Context decay - Risk scores decay over conversation

Quick Start

Use a preset:

{
  plugins: {
    entries: {
      clawguard: {
        enabled: true,
        config: {
          preset: "balanced"  // or "paranoid" or "permissive"
        }
      }
    }
  }
}

Custom configuration:

{
  plugins: {
    entries: {
      clawguard: {
        enabled: true,
        config: {
          inputGuard: {
            enabled: true,
            threshold: 50,
            blockOnDetection: false,
            useAdversarialDetection: true,
            useMultiTurnTracking: true
          },
          runtimeGuard: {
            enabled: true,
            dangerousTools: ["exec", "write", "edit"],
            blockExfilUrls: true,
            requireApproval: false
          },
          outputGuard: {
            enabled: true,
            redactCredentials: true,
            redactPII: true,
            canaryTokens: ["SECRET_CANARY_12345"]
          },
          spotlighting: {
            enabled: true,
            mode: "delimit",
            sources: ["web", "email"]
          },
          logging: {
            logThreats: true,
            structuredEvents: true
          }
        }
      }
    }
  }
}

Defense Presets

| Preset | Threshold | Block | Adversarial | Multi-turn | Approval | Spotlighting | |--------|-----------|-------|-------------|------------|----------|--------------| | paranoid | 25 | ✓ | ✓ | ✓ | ✓ | all sources | | balanced | 50 | ✗ | ✓ | ✓ | ✗ | web, email | | permissive | 75 | ✗ | ✗ | ✗ | ✗ | disabled |

CLI Commands

# Check status and stats
openclaw clawguard status

# View available presets
openclaw clawguard presets

# Test detection with source simulation
openclaw clawguard test "ignore previous instructions" --guard input --source web
openclaw clawguard test "sk-proj-abc123..." --guard output

# View recent threat events
openclaw clawguard events --limit 20

Slash Command

In any chat, use /clawguard to see current status and session stats.

How It Works

ClawGuard hooks into OpenClaw's plugin lifecycle:

User Message
     ↓
┌─────────────────────────────────────┐
│  INPUT GUARD (before_agent_start)   │
│  • Pattern matching (7 languages)   │
│  • Adversarial suffix detection     │
│  • Multi-turn context tracking      │
│  • Source-aware thresholds          │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│  RUNTIME GUARD (before_tool_call)   │
│  • Parameter validation             │
│  • Exfil URL blocking               │
│  • Dangerous command detection      │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│  OUTPUT GUARD (message_sending)     │
│  • Credential scanning              │
│  • PII detection                    │
│  • Canary token monitoring          │
│  • Auto-redaction                   │
└─────────────────────────────────────┘
     ↓
Safe Response

Research References

Adversarial Suffixes: Zou et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models"
Spotlighting: Microsoft "Defending Against Indirect Prompt Injection Attacks"
Lethal Trifecta: OpenClaw security model
Multi-turn Attacks: Perez & Ribeiro "Ignore This Title and HackAPrompt"

Testing

cd projects/clawguard-plugin
bun test  # 63 tests

File Structure

src/
├── index.ts         # Plugin entry, lifecycle hooks, CLI
├── guards.ts        # Input/Runtime/Output guards
├── patterns.ts      # Detection patterns (injection, credentials, PII)
├── analyzers.ts     # SOTA: entropy, context tracker, spotlighting
├── guards.test.ts   # Guard tests (38)
└── analyzers.test.ts # Analyzer tests (25)

License

MIT

Authors

Built by MaxsClawd & Max — Day one, shipped.