guardclaw

v0.1.10

Published

3 months ago

AI Agent Safety Monitor with LLM-based Command Safeguard

0High
0Medium
0Low

zaixishang

gyq1101

openclaw agent monitoring safety ai llm

The Problem

AI agents usually fail in one of two ways:

Too loose: dangerous operations run with little control
Too strict: safe operations keep interrupting the user

GuardClaw sits between the agent and its tools, scores each action with a local or cloud judge, and makes a practical decision:

safe actions continue without friction
suspicious actions are surfaced for approval

Quick Start

Requires Node.js >= 18.

npm install -g guardclaw
guardclaw start

Takes about 30 seconds. No account needed. Uninstall any time with npm uninstall -g guardclaw.

First launch opens an interactive wizard:

Evaluation mode: local / mixed / cloud
LLM backend: local (LM Studio, Ollama, built-in MLX) and/or cloud (Claude, OpenAI Codex, MiniMax, Kimi, OpenRouter, Gemini, OpenAI). Cloud providers support OAuth or API key.
Response mode: Auto (warn and flag risky calls) or Monitor only (log without intervention)
Agent connections: auto-detects installed agents and installs hooks/plugins with one confirm

Re-run any time with guardclaw setup. Restart the target agent after installing hooks.

Supported Agents

Works with 7 major coding agents out of the box. Full pre-tool blocking on 6, shell-only on Cursor.

| Agent | Integration | Pre-tool blocking | Approval flow | Notes | |-------|------------|:-----------------:|:-------------:|-------| | Claude Code | HTTP hooks | ✅ | ✅ | Full support | | Codex CLI | Command hooks | ✅ | ✅ | Full support | | Gemini CLI | HTTP hooks | ✅ | ✅ | Full support | | OpenCode | HTTP hooks | ✅ | ✅ | Full support | | OpenClaw | WebSocket plugin | ✅ | ✅ | Full support; requires gateway | | Cursor | Shell hooks | ⚠️ | ✅ | Shell commands only; file operations (read/write/edit) are not intercepted | | GitHub Copilot CLI | HTTP hooks (shared with CC) | ✅ | ✅ | Full support via Claude Code hook endpoint |

Product Tour

Dashboard

The web dashboard (localhost:3002) is the central control plane: event timeline, risk filters, session visibility, blocking toggles.

Dashboard

Security Scan

Static checks for MCP configuration, secrets exposure, and agentic-risk patterns.

Security Scan

Menu Bar App (macOS)

GuardClawBar lives in the macOS menu bar so you can monitor GuardClaw without keeping the dashboard tab open. The popover shows live per-agent event counts, recent risky calls, and a quick toggle for blocking mode. Each agent (Claude Code, Codex, Gemini, OpenClaw) has its own tab. Approval prompts fire as native notifications so you can allow or deny right from the corner of your screen.

How It Works

Agent calls a tool (exec, read, write, browser, etc.)
GuardClaw captures context and sends it to the local judge model
The model returns risk score + verdict + reasoning
GuardClaw logs, allows, or gates execution based on policy

Risk Tiers

| Score | Verdict | Behavior | | ----- | ------- | -------- | | 1-3 | SAFE | Runs normally | | 4-7 | WARNING | Runs with stronger audit signal | | 8-10 | HIGH RISK | Requires approval / blocking |

Architecture

GuardClaw has four core subsystems. The short version:

Two-Stage Judge. A local judge model (via LM Studio / Ollama / MLX) scores every tool call. High-risk calls (score ≥ 8) escalate to a cloud judge (Claude) with richer context.
Multi-Level Security Memory. Four levels of memory (raw events → session brief → project context → global knowledge) designed to catch long-range attacks that unfold over hundreds of tool calls.
Adaptive Memory & Chain Analysis. Learns from your approve/deny decisions, tracks tool-call sequences per session, and flags multi-step exfiltration like read ~/.ssh/id_rsa → curl evil.com.
Active Intervention. Injects safety guidance into the agent's context before risky calls, dual-channel approval (agent dialog + dashboard + optional Telegram/Discord/WhatsApp push), circuit breaker on repeated denials, credential scanning on tool output, prompt injection detection, skill security review, and DTrace syscall monitoring (macOS).

Fast Paths (before the LLM)

Three checks run before the local judge to keep latency low:

High-risk patterns. Regex match on known-bad commands (curl | bash, nc -e) → instant score 9.
Safe fast-path. Allowlist of safe commands (git status, npm test) → instant score 1.
Agent permissions. Reads each agent's own config and auto-allows anything the agent already permits.

CLI Reference

guardclaw start          # start server (opens dashboard)
guardclaw stop           # stop server
guardclaw setup          # re-run the interactive setup wizard
guardclaw status         # server & judge status
guardclaw check <cmd>    # manually risk-score a command
guardclaw help           # full command list

Full CLI documentation with every command, flag, and option: https://tobyge.github.io/GuardClaw/docs/cli/overview

Development

To hack on GuardClaw itself, install from source:

git clone https://github.com/TobyGE/GuardClaw.git
cd GuardClaw
nvm use || nvm install
npm ci && npm ci --prefix client
npm run build
npm link
guardclaw start

npm run dev runs server (nodemon) + client (Vite) concurrently. See CLAUDE.md for the architecture overview.

Feedback & Issues

We genuinely want to hear from you. GuardClaw is early, and real-world usage is the best way to make it better.

🐛 Found a bug? Open an issue
💡 Have a feature idea? Open an issue — any suggestion is welcome
🤔 Stuck on setup? Open an issue — no question is too small
❤️ Something you love? Open an issue too — we want to know what's working

All feedback, big or small, goes through GitHub Issues. No template required, no account hoops — just say what's on your mind.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

The Problem

Quick Start

Supported Agents

Product Tour

Dashboard

Security Scan

Menu Bar App (macOS)

How It Works

Risk Tiers

Architecture

Fast Paths (before the LLM)

CLI Reference

Development

Feedback & Issues

Links