@neuzhou/clawguard

v1.0.3

Published

2 months ago

AI Agent Immune System - Security scanner, PII sanitizer, intent-action mismatch detector. 285+ patterns, OWASP Agentic AI Top 10. 100% local.

0High
0Medium
0Low

kazhou

ai-agent security scanner pii sanitizer owasp prompt-injection clawguard ai-safety guardrails openclaw

🛡️ ClawGuard

AI Agent Security & Observability Platform

285+ security patterns across 9 rule categories. Risk Score Engine with attack chain detection. Insider Threat Detection. Policy Engine for tool call governance. Zero native dependencies. SARIF output. Built for OpenClaw, works with any AI agent framework.

🔥 Why This Exists

AI agents have access to your files, tools, shell, and secrets. A single prompt injection can:

Exfiltrate API keys via tool calls
Hijack the agent's identity by overwriting personality files
Register shadow MCP servers to intercept tool calls
Install backdoored skills with obfuscated reverse shells
The agent itself can become the threat — self-preservation, deception, goal misalignment

ClawGuard catches these attacks before they execute.

🚀 Installation

As OpenClaw Skill (static scanning)

clawhub install ClawGuard-ai

Then ask your agent: "scan my skills for security threats"

As OpenClaw Hook Pack (real-time protection)

openclaw hooks install ClawGuard-ai
openclaw hooks enable ClawGuard-ai-guard
openclaw hooks enable ClawGuard-ai-policy

Every message is now automatically scanned. Critical threats trigger alerts.

As CLI Tool

npx ClawGuard-ai scan ./path/to/scan

As npm Library

npm install ClawGuard-ai

import { runSecurityScan, calculateRisk, evaluateToolCall } from 'ClawGuard-ai';

⚡ Quick Start

# Scan a skill directory for threats
npx ClawGuard-ai scan ./skills/

# Scan with strict mode (exit code 1 on high/critical findings)
npx ClawGuard-ai scan ./skills/ --strict

# Output SARIF for GitHub Code Scanning
npx ClawGuard-ai scan . --format sarif > results.sarif

# Generate default config
npx ClawGuard-ai init

🏗️ Architecture

┌──────────────────────────────────────────────────────┐
│                   ClawGuard                      │
├──────────┬──────────┬──────────┬─────────────────────┤
│  CLI     │  Hooks   │ Scanner  │   Dashboard :19790  │
├──────────┴──────────┴──────────┴─────────────────────┤
│  ┌──────────────┐ ┌─────────────┐ ┌────────────────┐ │
│  │ Risk Engine  │ │Policy Engine│ │Insider Threat  │ │
│  │ Score 0-100  │ │ allow/deny  │ │ AI Misalign.   │ │
│  │ Chain Detect │ │ exec/file/  │ │ 5 categories   │ │
│  │ Multipliers  │ │ browser/msg │ │ 39 patterns    │ │
│  └──────────────┘ └─────────────┘ └────────────────┘ │
├──────────────────────────────────────────────────────┤
│              Security Engine — 285+ Patterns          │
│  • Prompt Injection (93)   • Data Leakage (62)       │
│  • Insider Threat (39)     • Supply Chain (35)        │
│  • Identity Protection (19)• MCP Security (20)        │
│  • File Protection (16)    • Anomaly Detection        │
│  • Compliance                                         │
├──────────────────────────────────────────────────────┤
│  Exporters: JSONL · Syslog/CEF · Webhook · SARIF     │
└──────────────────────────────────────────────────────┘

🗂️ Rule Categories

OWASP Agentic AI Top 10 Mapping

| Rule | OWASP Category | Patterns | Severity Range | |---|---|---|---| | prompt-injection | LLM01: Prompt Injection | 93 | warning → critical | | data-leakage | LLM06: Sensitive Information Disclosure | 62 | info → critical | | insider-threat | Agentic AI: Misalignment | 39 | warning → critical | | supply-chain | Agentic AI: Supply Chain | 35 | warning → critical | | mcp-security | Agentic AI: Tool Manipulation | 20 | warning → critical | | identity-protection | Agentic AI: Identity Hijacking | 19 | warning → critical | | file-protection | LLM07: Insecure Plugin Design | 16 | warning → critical | | anomaly-detection | LLM04: Model Denial of Service | 6+ | warning → high | | compliance | LLM09: Overreliance | 5+ | info → warning |

🎯 Key Features

Risk Score Engine

Weighted scoring with attack chain detection and multiplier system:

import { calculateRisk } from 'ClawGuard-ai';

const result = calculateRisk(findings);
// → { score: 87, verdict: 'MALICIOUS', icon: '🔴',
//    attackChains: ['credential-exfiltration'],
//    enrichedFindings: [...] }

Severity weights: critical=40, high=15, medium=5, low=2
Confidence scoring: every finding carries a confidence (0-1)
Attack chain detection: auto-correlates findings into combo attacks
- credential + exfiltration → 2.2x multiplier
- identity-hijack + persistence → score ≥ 90
- prompt-injection + worm → 1.2x multiplier
Verdicts: ✅ CLEAN / 🟡 LOW / 🟠 SUSPICIOUS / 🔴 MALICIOUS

🧠 Insider Threat Detection

Based on Anthropic's research on agentic misalignment, detects when AI agents themselves become threats:

Self-Preservation (16 patterns): kill switch bypass, self-replication
Information Leverage: reading secrets + composing threats
Goal Conflict Reasoning: prioritizing own goals over user instructions
Deception: impersonation, suppressing transparency
Unauthorized Data Sharing: exfiltration planning, steganographic hiding

import { detectInsiderThreats } from 'ClawGuard-ai';
const threats = detectInsiderThreats(agentOutput);

🚦 Policy Engine

Evaluate tool call safety against configurable policies:

import { evaluateToolCall } from 'ClawGuard-ai';

const decision = evaluateToolCall('exec', { command: 'rm -rf /' });
// → { decision: 'deny', tool: 'exec', reason: 'Dangerous command', severity: 'critical' }

YAML policy configuration:

policies:
  exec:
    dangerous_commands:
      - rm -rf
      - mkfs
      - curl|bash
  file:
    deny_read:
      - /etc/shadow
      - '*.pem'
    deny_write:
      - '*.env'
  browser:
    block_domains:
      - evil.com

🔍 Prompt Injection — 13 Sub-Categories

Direct instruction override — "ignore previous instructions"
Role confusion / jailbreaks — DAN, developer mode
Delimiter attacks — chat template delimiters
Invisible Unicode — zero-width chars, directional overrides
Multi-language — 12 languages (CN/JP/KR/AR/FR/DE/IT/RU...)
Encoding evasion — Base64, hex, URL-encoded
Indirect / embedded — HTML comments, tool output cascading
Multi-turn manipulation — false memories, fake agreements
Payload cascading — template injection, string interpolation
Context window stuffing — oversized messages
Prompt worm — self-replication, agent-to-agent propagation
Trust exploitation — authority claims, fake audits
Safeguard bypass — retry-on-block, rephrase-to-bypass

🔧 Programmatic Usage

import {
  runSecurityScan,
  calculateRisk,
  evaluateToolCall,
  detectInsiderThreats,
} from 'ClawGuard-ai';

// Scan content
const findings = runSecurityScan(message.content, 'inbound', context);

// Get risk score
const risk = calculateRisk(findings);
if (risk.verdict === 'MALICIOUS') { /* block */ }

// Check tool calls
const decision = evaluateToolCall('exec', { command }, policies);
if (decision.decision === 'deny') { /* reject */ }

// Check for insider threats
const threats = detectInsiderThreats(agentOutput);

📤 GitHub Actions / SARIF Integration

- name: Security Scan
  run: npx ClawGuard-ai scan . --format sarif > results.sarif

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

🛡️ Real-Time Protection (OpenClaw Hooks)

Install as a hook pack for automatic protection on every message:

openclaw hooks install ClawGuard-ai
openclaw hooks enable ClawGuard-ai-guard    # Scans every message
openclaw hooks enable ClawGuard-ai-policy   # Enforces tool call policies

ClawGuard-ai-guard — Hooks into message:received and message:sent, runs all 285+ patterns, logs findings, and alerts on critical/high threats.

ClawGuard-ai-policy — Evaluates outbound tool calls against security policies, blocks dangerous commands, and protects sensitive files.

📚 References

📜 License

ClawGuard is dual-licensed:

Open Source: AGPL-3.0 — free for open-source use
Commercial: Commercial License — for proprietary/SaaS use

Contributors must agree to our CLA to enable dual licensing.

For commercial inquiries: [email protected]