@neuzhou/clawguard
v1.0.3
Published
AI Agent Immune System - Security scanner, PII sanitizer, intent-action mismatch detector. 285+ patterns, OWASP Agentic AI Top 10. 100% local.
Maintainers
Readme
🛡️ ClawGuard
AI Agent Security & Observability Platform
285+ security patterns across 9 rule categories. Risk Score Engine with attack chain detection. Insider Threat Detection. Policy Engine for tool call governance. Zero native dependencies. SARIF output. Built for OpenClaw, works with any AI agent framework.
🔥 Why This Exists
AI agents have access to your files, tools, shell, and secrets. A single prompt injection can:
- Exfiltrate API keys via tool calls
- Hijack the agent's identity by overwriting personality files
- Register shadow MCP servers to intercept tool calls
- Install backdoored skills with obfuscated reverse shells
- The agent itself can become the threat — self-preservation, deception, goal misalignment
ClawGuard catches these attacks before they execute.
🚀 Installation
As OpenClaw Skill (static scanning)
clawhub install ClawGuard-aiThen ask your agent: "scan my skills for security threats"
As OpenClaw Hook Pack (real-time protection)
openclaw hooks install ClawGuard-ai
openclaw hooks enable ClawGuard-ai-guard
openclaw hooks enable ClawGuard-ai-policyEvery message is now automatically scanned. Critical threats trigger alerts.
As CLI Tool
npx ClawGuard-ai scan ./path/to/scanAs npm Library
npm install ClawGuard-aiimport { runSecurityScan, calculateRisk, evaluateToolCall } from 'ClawGuard-ai';⚡ Quick Start
# Scan a skill directory for threats
npx ClawGuard-ai scan ./skills/
# Scan with strict mode (exit code 1 on high/critical findings)
npx ClawGuard-ai scan ./skills/ --strict
# Output SARIF for GitHub Code Scanning
npx ClawGuard-ai scan . --format sarif > results.sarif
# Generate default config
npx ClawGuard-ai init🏗️ Architecture
┌──────────────────────────────────────────────────────┐
│ ClawGuard │
├──────────┬──────────┬──────────┬─────────────────────┤
│ CLI │ Hooks │ Scanner │ Dashboard :19790 │
├──────────┴──────────┴──────────┴─────────────────────┤
│ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐ │
│ │ Risk Engine │ │Policy Engine│ │Insider Threat │ │
│ │ Score 0-100 │ │ allow/deny │ │ AI Misalign. │ │
│ │ Chain Detect │ │ exec/file/ │ │ 5 categories │ │
│ │ Multipliers │ │ browser/msg │ │ 39 patterns │ │
│ └──────────────┘ └─────────────┘ └────────────────┘ │
├──────────────────────────────────────────────────────┤
│ Security Engine — 285+ Patterns │
│ • Prompt Injection (93) • Data Leakage (62) │
│ • Insider Threat (39) • Supply Chain (35) │
│ • Identity Protection (19)• MCP Security (20) │
│ • File Protection (16) • Anomaly Detection │
│ • Compliance │
├──────────────────────────────────────────────────────┤
│ Exporters: JSONL · Syslog/CEF · Webhook · SARIF │
└──────────────────────────────────────────────────────┘🗂️ Rule Categories
OWASP Agentic AI Top 10 Mapping
| Rule | OWASP Category | Patterns | Severity Range |
|---|---|---|---|
| prompt-injection | LLM01: Prompt Injection | 93 | warning → critical |
| data-leakage | LLM06: Sensitive Information Disclosure | 62 | info → critical |
| insider-threat | Agentic AI: Misalignment | 39 | warning → critical |
| supply-chain | Agentic AI: Supply Chain | 35 | warning → critical |
| mcp-security | Agentic AI: Tool Manipulation | 20 | warning → critical |
| identity-protection | Agentic AI: Identity Hijacking | 19 | warning → critical |
| file-protection | LLM07: Insecure Plugin Design | 16 | warning → critical |
| anomaly-detection | LLM04: Model Denial of Service | 6+ | warning → high |
| compliance | LLM09: Overreliance | 5+ | info → warning |
🎯 Key Features
Risk Score Engine
Weighted scoring with attack chain detection and multiplier system:
import { calculateRisk } from 'ClawGuard-ai';
const result = calculateRisk(findings);
// → { score: 87, verdict: 'MALICIOUS', icon: '🔴',
// attackChains: ['credential-exfiltration'],
// enrichedFindings: [...] }- Severity weights: critical=40, high=15, medium=5, low=2
- Confidence scoring: every finding carries a confidence (0-1)
- Attack chain detection: auto-correlates findings into combo attacks
- credential + exfiltration → 2.2x multiplier
- identity-hijack + persistence → score ≥ 90
- prompt-injection + worm → 1.2x multiplier
- Verdicts: ✅ CLEAN / 🟡 LOW / 🟠 SUSPICIOUS / 🔴 MALICIOUS
🧠 Insider Threat Detection
Based on Anthropic's research on agentic misalignment, detects when AI agents themselves become threats:
- Self-Preservation (16 patterns): kill switch bypass, self-replication
- Information Leverage: reading secrets + composing threats
- Goal Conflict Reasoning: prioritizing own goals over user instructions
- Deception: impersonation, suppressing transparency
- Unauthorized Data Sharing: exfiltration planning, steganographic hiding
import { detectInsiderThreats } from 'ClawGuard-ai';
const threats = detectInsiderThreats(agentOutput);🚦 Policy Engine
Evaluate tool call safety against configurable policies:
import { evaluateToolCall } from 'ClawGuard-ai';
const decision = evaluateToolCall('exec', { command: 'rm -rf /' });
// → { decision: 'deny', tool: 'exec', reason: 'Dangerous command', severity: 'critical' }YAML policy configuration:
policies:
exec:
dangerous_commands:
- rm -rf
- mkfs
- curl|bash
file:
deny_read:
- /etc/shadow
- '*.pem'
deny_write:
- '*.env'
browser:
block_domains:
- evil.com🔍 Prompt Injection — 13 Sub-Categories
- Direct instruction override — "ignore previous instructions"
- Role confusion / jailbreaks — DAN, developer mode
- Delimiter attacks — chat template delimiters
- Invisible Unicode — zero-width chars, directional overrides
- Multi-language — 12 languages (CN/JP/KR/AR/FR/DE/IT/RU...)
- Encoding evasion — Base64, hex, URL-encoded
- Indirect / embedded — HTML comments, tool output cascading
- Multi-turn manipulation — false memories, fake agreements
- Payload cascading — template injection, string interpolation
- Context window stuffing — oversized messages
- Prompt worm — self-replication, agent-to-agent propagation
- Trust exploitation — authority claims, fake audits
- Safeguard bypass — retry-on-block, rephrase-to-bypass
🔧 Programmatic Usage
import {
runSecurityScan,
calculateRisk,
evaluateToolCall,
detectInsiderThreats,
} from 'ClawGuard-ai';
// Scan content
const findings = runSecurityScan(message.content, 'inbound', context);
// Get risk score
const risk = calculateRisk(findings);
if (risk.verdict === 'MALICIOUS') { /* block */ }
// Check tool calls
const decision = evaluateToolCall('exec', { command }, policies);
if (decision.decision === 'deny') { /* reject */ }
// Check for insider threats
const threats = detectInsiderThreats(agentOutput);📤 GitHub Actions / SARIF Integration
- name: Security Scan
run: npx ClawGuard-ai scan . --format sarif > results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif🛡️ Real-Time Protection (OpenClaw Hooks)
Install as a hook pack for automatic protection on every message:
openclaw hooks install ClawGuard-ai
openclaw hooks enable ClawGuard-ai-guard # Scans every message
openclaw hooks enable ClawGuard-ai-policy # Enforces tool call policiesClawGuard-ai-guard — Hooks into message:received and message:sent, runs all 285+ patterns, logs findings, and alerts on critical/high threats.
ClawGuard-ai-policy — Evaluates outbound tool calls against security policies, blocks dangerous commands, and protects sensitive files.
📚 References
- OWASP Top 10 for LLM Applications
- OWASP Agentic AI Top 10 (2026)
- Anthropic: Research on Agentic Misalignment
- OWASP Guide for Secure MCP Server Development
📜 License
AGPL-3.0 © Kang Zhou
ClawGuard is dual-licensed:
- Open Source: AGPL-3.0 — free for open-source use
- Commercial: Commercial License — for proprietary/SaaS use
Contributors must agree to our CLA to enable dual licensing.
For commercial inquiries: [email protected]
