@capsulesecurity/clawguard

v0.1.5

Published

4 months ago

Security guard plugin for OpenClaw - uses LLM as a Judge to detect and block risky tool calls

0High
0Medium
0Low

lidanhazoutcapsule

openclaw security llm agent guardrails

ClawGuard by Capsule

ClawGuard

A security guard plugin for OpenClaw that monitors and validates tool calls before execution using an LLM as a Judge approach for risk detection.

Features

Tool Call Logging - Logs full JSON of every tool call before execution
LLM as a Judge - Uses a secondary LLM to judge and evaluate tool calls for security risks
Configurable Blocking - Automatically blocks high/critical risk operations based on the judge's verdict
Custom Judge Prompts - Override the default judging prompt for security evaluation

Installation

openclaw plugins install @capsulesecurity/clawguard

Configuration

| Option | Type | Default | Description | |--------|------|---------|-------------| | enabled | boolean | true | Enable or disable the plugin | | logToolCalls | boolean | true | Log full tool call JSON to logger | | securityCheckEnabled | boolean | true | Enable LLM as a Judge for security evaluation | | securityPrompt | string | (built-in) | Custom prompt for the judge LLM | | blockOnRisk | boolean | true | Block tool calls judged as high/critical risk | | timeoutMs | number | 15000 | Timeout for judge evaluation in milliseconds | | maxContextWords | number | 2000 | Maximum words of session context to include | | gatewayHost | string | 127.0.0.1 | Gateway host for LLM calls | | gatewayPort | number | 18789 | Gateway port for LLM calls |

Example Configuration

{
  "plugins": {
    "capsule-claw-guard": {
      "enabled": true,
      "logToolCalls": true,
      "securityCheckEnabled": true,
      "blockOnRisk": true,
      "timeoutMs": 20000
    }
  }
}

Security Risks Evaluated

The judge LLM evaluates tool calls for:

Command injection (shell commands with untrusted input)
Path traversal attacks (accessing files outside allowed directories)
Sensitive data exposure (reading credentials, secrets, private keys)
Destructive operations (deleting important files, dropping databases)
Network attacks (unauthorized external requests, data exfiltration)
Privilege escalation attempts
Malicious file operations (writing to system directories)
SQL injection patterns
Code execution with untrusted input
Rogue agent behavior (attempts to bypass safety controls, deceptive actions, unauthorized autonomous operations)

Custom Judge Prompt

You can provide a custom prompt for the judge LLM using the securityPrompt configuration option. Use {TOOL_CALL_JSON} as a placeholder for the tool call data:

{
  "plugins": {
    "capsule-claw-guard": {
      "securityPrompt": "You are a security judge. Evaluate this tool call:\n{TOOL_CALL_JSON}\n\nReturn your verdict as JSON: {\"isRisk\": boolean, \"riskLevel\": \"none\"|\"low\"|\"medium\"|\"high\"|\"critical\", \"riskType\": string, \"reason\": string}"
    }
  }
}

Requirements

The plugin makes HTTP calls to the OpenClaw Gateway's /v1/chat/completions endpoint for LLM evaluation. This requires:

Gateway running: The OpenClaw gateway must be running and accessible
Enable chat completions endpoint: Set gateway.http.endpoints.chatCompletions.enabled to true in your config:
```
openclaw config set gateway.http.endpoints.chatCompletions.enabled true
```
Authentication (optional): If your gateway requires authentication, set one of:
- OPENCLAW_GATEWAY_TOKEN environment variable
- OPENCLAW_GATEWAY_PASSWORD environment variable

How It Works

The plugin hooks into before_tool_call events
Logs the full tool call JSON (if logging enabled)
Loads the session context from session files (limited by maxContextWords)
Sends both the tool call and session context to the judge LLM for security evaluation
The judge returns a verdict with risk level and reasoning
If judged as high/critical risk and blocking is enabled, the tool call is blocked
All verdicts are logged for audit purposes

Session Context

The plugin loads conversation history from the session files to provide context for the judge LLM. This allows the judge to make more informed decisions by understanding the conversation flow that led to the tool call.

Session files are located at ~/.openclaw/agents/{agentId}/sessions/*.jsonl
The context is limited by word count (default: 2000 words) to manage token usage
Most recent messages are prioritized when truncating
Only user and assistant messages are included (system messages are filtered out)

License

MIT