@vainplex/openclaw-governance

v0.11.2

Published

a month ago

Contextual, learning, cross-agent governance for AI agents Includes Response Gate for pre-response enforcement.

0High
0Medium
0Low

openclaw governance ai-agents policy-engine trust audit

📦 This plugin is part of the Vainplex OpenClaw Suite — a collection of production plugins that turn OpenClaw into a self-governing, learning system. See the monorepo for the full picture.

@vainplex/openclaw-governance

In February 2026, UC Berkeley's Center for Long-Term Cybersecurity published a 67-page framework for governing autonomous AI agents. The same month, Microsoft's Cyber Pulse report revealed that 80% of Fortune 500 companies now run active AI agents — and 29% of employees use unsanctioned ones. Microsoft followed up with a threat analysis specific to OpenClaw, outlining identity, isolation, and runtime risks for self-hosted agents.

The gap is clear: agents are everywhere, governance is nowhere. The Berkeley framework defines what's needed. The existing tools — scanners, input/output filters, output validators — cover fragments. None of them do contextual, learning, runtime governance across agents.

This plugin does. It implements 8 of Berkeley's 12 core requirements today, with the remaining 4 designed and scheduled.

Zero runtime dependencies. Hundreds of tests. Production since February 2026.

Berkeley/Microsoft Compliance Mapping

UC Berkeley's Agentic AI Risk-Management Standards Profile and Microsoft's governance requirements define what responsible agent infrastructure looks like. Here's where this plugin stands:

| Requirement | Our Implementation | Status | |---|---|---| | Agent Registry | Trust config with per-agent scores, all 9 agents registered | ✅ Implemented | | Access Control / Least Privilege | Per-agent tool blocking, trust tier-based permissions | ✅ Implemented | | Real-time Monitoring | Every tool call evaluated against policies before execution | ✅ Implemented | | Activity Logging / Audit Trail | Append-only JSONL, ISO 27001 / SOC 2 / NIS2 control mapping | ✅ Implemented | | Emergency Controls | Night Mode (time-based blocking), Rate Limiter (frequency cap) | ✅ Implemented | | Cascading Agent Policies | Cross-agent governance — parent policies propagate to sub-agents | ✅ Implemented | | Autonomy Levels | Trust tiers (0–100, five levels) — functionally equivalent to Berkeley's L0–L5 | ✅ Implemented | | Credential Protection | 3-layer redaction with SHA-256 vault, 17 built-in patterns, fail-closed | ✅ Implemented | | Human-in-the-Loop | Approval 2FA — TOTP-based approval for agent tool calls. Session approval mode: one code unlocks 10 minutes of auto-approved execution. | ✅ Implemented | | Semantic Intent Analysis | LLM-powered intent classification before tool execution | 📋 Planned | | Multi-Agent Interaction Monitoring | Agent-to-agent message governance | 📋 Planned | | Tamper-evident Audit | Hash-chain audit trail for compliance verification | 📋 Planned |

9 implemented. 3 planned. Production since 2026-02-18.

They Scan. We Govern.

Most tools in this space solve a piece of the problem. None of them solve the whole thing.

| Tool | What It Does | What's Missing | |---|---|---| | Invariant Labs → Snyk | Runtime guardrails, MCP scanning, trace analysis | Acquired by Snyk — enterprise-only. No trust scores. No cross-agent governance. No compliance audit trail. | | NVIDIA NeMo Guardrails | Input/output filtering, topical control | Filters messages, not tool calls. No agent context. No trust awareness. No multi-agent policies. | | GuardrailsAI | Output validation, schema enforcement | Validates what comes out. No idea who called what, when, or whether they should have. Python-only. | | SecureClaw | 56 audit checks, 5 hardening modules, OWASP-aligned | Scanner, not runtime. Tells you what's wrong — doesn't prevent it. No policies, no trust. | | OpenClaw built-in | Tool allowlists, realpath containment, plugin sandboxing | Static config. No trust scoring. No time-awareness. No learning. No compliance mapping. |

The difference: those tools operate on inputs and outputs. This plugin operates on decisions — which tool, which agent, what time, what trust level, what frequency, what context. Then it decides, logs, and learns.

As Peter Steinberger noted, this is what a trust model for AI agents should look like.

What It Does

Agent calls exec("git push origin main")
  → Governance evaluates: tool + time + trust + frequency + context
  → Verdict: DENY — "Forge cannot push to main (trust: restricted, score: 32)"
  → Audit record written (JSONL, compliance-mapped)
  → Agent gets a clear rejection reason

Core Features

Contextual Policies — Not just "which tool" but "which tool, when, by whom, at what risk level"
Learning Trust — Score 0–100, five tiers, decay on inactivity. Sub-agents can never exceed parent's trust.
Cross-Agent Governance — Parent policies cascade to sub-agents. Deny on main = deny on forge.
Compliance Audit Trail — Append-only JSONL with ISO 27001/SOC 2/NIS2 control mapping.

v0.6: Session Trust (RFC-008)

Trust is not a config value. It's earned per conversation.

Two-Tier Trust Model — Persistent agent trust (configured baseline) + ephemeral session trust (earned in real-time). A fresh session starts at 70% of agent trust and climbs with successful tool calls.
Session Signals — Success (+1), policy block (−2), credential violation (−10). Clean streak bonus after 10 consecutive good calls.
Ceiling & Floor — Sessions can earn up to 120% of agent trust, but can always drop to zero.
Adaptive Display — [Governance] Agent: main (60/trusted) | Session: 42/standard | Policies: 4

No existing governance tool implements session-level trust. Static per-agent allowlists don't capture that the same agent performs differently across sessions.

v0.5 Features

Output Validation (RFC-006) — Detects unverified numeric claims, contradictions, and hallucinated system states. Configurable LLM gate for external communications.
Redaction Layer (RFC-007) — 3-layer defense-in-depth for credentials, PII, and financial data. SHA-256 vault, fail-closed mode, 17 built-in patterns.
Fact Registry — Register known facts (from live systems or static files). Claims are checked against facts with fuzzy numeric matching.

Quick Start

Install

npm install @vainplex/openclaw-governance

Minimal Config (`openclaw.json`)

{
  "plugins": {
    "entries": {
      "openclaw-governance": { "enabled": true }
    }
  }
}

External Config (`~/.openclaw/plugins/openclaw-governance/config.json`)

{
  "enabled": true,
  "timezone": "Europe/Berlin",
  "failMode": "open",
  "trust": {
    "defaults": {
      "main": 60,
      "forge": 45,
      "*": 10
    }
  },
  "builtinPolicies": {
    "nightMode": { "start": "23:00", "end": "06:00" },
    "credentialGuard": true,
    "productionSafeguard": true,
    "rateLimiter": { "maxPerMinute": 15 }
  },
  "outputValidation": {
    "enabled": true,
    "unverifiedClaimPolicy": "flag"
  },
  "redaction": {
    "enabled": true,
    "categories": ["credential", "pii", "financial"],
    "failMode": "closed"
  }
}

🛡️ Agent Firewall (v0.9.0)

Real-time security intelligence for AI agents. Integrates ShieldAPI and ERC-8004 on-chain reputation into the governance layer.

What It Does

URL Threat Detection — Checks outbound URLs for phishing, malware, brand impersonation
Prompt Injection Detection — Scans tool parameters for adversarial inputs (208 patterns)
Domain Reputation — DNS, blacklist, SSL, SPF/DMARC checks on extracted domains
On-Chain Reputation — ERC-8004 agent identity + reputation from Base blockchain
Trust Enrichment — Security events automatically adjust agent trust scores
x402 Auto-Pay — Automatic USDC micropayments when free tier exhausted

Quick Start

Minimum config — add to your governance config:

{
  "agentFirewall": {
    "enabled": true
  }
}

That's it. Defaults: flag mode (warn, don't block), ShieldAPI at shield.vainplex.dev, 5s timeout, fail-open.

🟢 NVIDIA NemoClaw Integration

Vainplex Governance is 100% compatible with NVIDIA NemoClaw and OpenShell out of the box.

While NemoClaw provides OS-level sandboxing (Landlock, seccomp), Vainplex acts as the Policy Decision Point inside the sandbox, providing Human-in-the-Loop 2FA and verifiable Merkle-Tree audit trails.

Blueprint Configuration

Since NemoClaw strictly isolates network namespaces, you must allowlist the following endpoints in your nemoclaw-blueprint.yaml for Vainplex to function correctly:

network_policies:
  allowlist:
    - domain: "shield.vainplex.dev" # For Agent Firewall / URL Threat Detection
      port: 443
    - domain: "your-nats-cluster.internal" # For EventStore Merkle-Tree Auditing
      port: 4222

Full Config Reference

{
  "agentFirewall": {
    "enabled": true,
    "mode": "flag",
    "baseUrl": "https://shield.vainplex.dev",
    "timeoutMs": 5000,
    "maxUrlsPerMessage": 10,
    "domainAllowlist": ["mycompany.com", "*.internal.corp"],
    "fallbackOnError": "allow",
    "promptCheck": {
      "enabled": true,
      "tools": ["exec", "write", "edit", "sessions_spawn"],
      "minConfidence": 0.7
    },
    "cache": {
      "ttlSeconds": 3600,
      "maxEntries": 256
    },
    "trustEnrichment": {
      "enabled": true
    },
    "walletKey": "${SHIELDAPI_WALLET_KEY}",
    "erc8004": {
      "enabled": true,
      "chain": "base",
      "agentMapping": {
        "myagent": 16700
      }
    }
  }
}

Config Keys

| Key | Type | Default | Description | |-----|------|---------|-------------| | enabled | boolean | false | Enable Agent Firewall | | mode | "flag" | "block" | "flag" | Flag = warn only, Block = deny on threat | | baseUrl | string | https://shield.vainplex.dev | ShieldAPI endpoint | | timeoutMs | number | 5000 | Request timeout (ms) | | maxUrlsPerMessage | number | 10 | Max URLs to check per message | | domainAllowlist | string[] | [] | Additional domains to skip (supports *. globs) | | fallbackOnError | "allow" | "block" | "allow" | Behavior when ShieldAPI is unreachable | | walletKey | string | — | Wallet key for x402 auto-pay | | promptCheck.enabled | boolean | true | Enable prompt injection checking | | promptCheck.tools | string[] | ["exec","write","edit","sessions_spawn"] | Tools to check | | promptCheck.minConfidence | number | 0.7 | Min confidence to trigger | | cache.ttlSeconds | number | 3600 | Cache TTL | | cache.maxEntries | number | 256 | Max cache entries per check type | | trustEnrichment.enabled | boolean | true | Feed security events into trust scores | | erc8004.enabled | boolean | false | Enable on-chain reputation lookup | | erc8004.chain | string | "base" | Blockchain: base, ethereum, polygon | | erc8004.agentMapping | object | {} | Map agent IDs to on-chain IDs |

Environment Variables

| Variable | Description | |----------|-------------| | AGENT_FIREWALL_WALLET_KEY | Wallet key for x402 payments | | SHIELDAPI_WALLET_KEY | Alternative wallet key env var |

`/firewall` Command

Type /firewall to see:

Current mode (flag/block)
Cache stats (size, hits, misses)
x402 wallet status
ERC-8004 status

Modes

| Mode | URL Threat | Prompt Injection | Domain Risk | |------|-----------|-----------------|-------------| | flag | Logs warning, message goes through | Logs warning, tool call proceeds | Logs warning | | block | Message cancelled | Tool call blocked | Message cancelled |

Built-in Domain Allowlist

These domains are never checked (plus any you add via domainAllowlist): github.com, *.github.com, npmjs.com, api.openai.com, api.anthropic.com, *.vainplex.dev, *.vainplex.de

Free Tier

ShieldAPI offers 3 free calls per endpoint per day. After that:

With walletKey: Automatic x402 USDC micropayment ($0.001-$0.01)
Without walletKey: Falls back to "unknown" risk (fail-open)

🔒 Approval 2FA (v0.11.0)

TOTP-based Human-in-the-Loop for agent tool calls. When a security-critical agent (e.g., your pentesting agent) tries to run exec, the system:

Blocks the tool call via before_tool_call hook
Batches multiple commands within a 3-second window
Sends a notification to a dedicated Matrix room
Waits for a 6-digit TOTP code from an authorized approver
Approves the batch — and starts a Session Approval window

Session Approval

One TOTP code doesn't just approve one command — it unlocks all exec calls from that agent for a configurable duration (default: 10 minutes). No more entering codes for every nmap step.

🔒 APPROVAL REQUIRED (1 command)
Agent: vera
1. exec: nmap -sV -T4 127.0.0.1 --top-ports 20
Enter TOTP code (5min timeout)
✨ One code approves ALL commands for 10 minutes

Architecture

No dependency on OpenClaw's exec-approval system — works independently via plugin hooks
Dedicated Matrix bot (@governance:yourserver) sends notifications
Independent Matrix poller (2s interval) — reads TOTP codes directly from the governance room, no reliance on OpenClaw's Matrix sync
TOTP replay protection — same code can't be used twice within the same period
Periodic cleanup — expired sessions and cooldowns cleaned every 5 minutes

Configuration

{
  "approval2fa": {
    "enabled": true,
    "totpSecret": "YOUR_BASE32_SECRET",
    "totpIssuer": "Vainplex Governance",
    "totpLabel": "Agent Approval",
    "timeoutSeconds": 300,
    "maxAttempts": 3,
    "cooldownSeconds": 900,
    "batchWindowMs": 3000,
    "sessionDurationMinutes": 10,
    "approvers": ["@admin:yourserver.dev"],
    "notifyChannel": "room:!yourRoomId:yourserver.dev"
  }
}

Matrix Notification Setup

Create a dedicated Matrix bot account and a secrets file:

~/.openclaw/plugins/openclaw-governance/matrix-notify.json:

{
  "homeserverUrl": "https://matrix.yourserver.dev",
  "accessToken": "syt_your_bot_token",
  "userId": "@governance:yourserver.dev"
}

The bot needs to be invited to the notification room. The plugin's built-in Matrix poller reads responses directly — no need to configure the bot as an OpenClaw agent.

Policy Setup

Create a policy that triggers 2FA for specific agents/tools:

{
  "id": "agent-2fa",
  "priority": 200,
  "scope": {
    "hooks": ["before_tool_call"],
    "agents": ["vera"]
  },
  "rules": [{
    "id": "exec-requires-2fa",
    "conditions": [
      { "type": "tool", "name": "exec" }
    ],
    "effect": {
      "action": "2fa"
    }
  }]
}

Config Keys

| Key | Type | Default | Description | |-----|------|---------|-------------| | enabled | boolean | false | Enable Approval 2FA | | totpSecret | string | — | Base32-encoded TOTP secret (shared with authenticator app) | | totpIssuer | string | — | TOTP issuer name (shown in authenticator) | | totpLabel | string | — | TOTP label (shown in authenticator) | | timeoutSeconds | number | 300 | Seconds before auto-deny | | maxAttempts | number | 3 | Max wrong codes before cooldown | | cooldownSeconds | number | 900 | Cooldown after max attempts exceeded | | batchWindowMs | number | 3000 | Debounce window for batching commands | | sessionDurationMinutes | number | 10 | Auto-approve duration after valid TOTP | | approvers | string[] | — | Matrix user IDs authorized to approve | | notifyChannel | string | — | Matrix room for notifications (room:!id:server) |

Security Properties

Fail-closed — if 2FA check errors, the tool call is blocked
Approver-only — only configured Matrix users can enter codes
Replay-protected — same TOTP code rejected within the same 30s period
Rate-limited — 3 wrong codes → 15 minute cooldown
Session-scoped — approval is per-agent, not global

Redaction Layer (RFC-007)

3-layer defense-in-depth against credential, PII, and financial data leakage.

What It Protects

| Layer | Hook | When | Can Modify? | |-------|------|------|-------------| | Layer 1 | tool_result_persist | Before tool output is written to transcript | ✅ Yes (sync) | | Layer 2 | message_sending | Before outbound messages to channels | ✅ Yes (modifying) | | Layer 2b | before_message_write | Before message persistence | ✅ Yes (sync) |

17 Built-in Patterns

| Category | Patterns | |----------|----------| | Credential | OpenAI API key, Anthropic key, Google API key, GitHub PAT/server token, GitLab PAT, Private key headers, Bearer tokens, Key-value credentials, AWS access key, Generic API key (sk-*), Basic Auth | | PII | Email addresses, Phone numbers (international) | | Financial | Credit card numbers (Luhn-valid), IBAN, US SSN |

How It Works

Tool returns: "Found key sk_test_51Ss4R2..."
  → Layer 1: Pattern match → Replace with [REDACTED:api_key:a3f2]
  → SHA-256 hash stored in vault (1h TTL)
  → Transcript gets redacted version
  → If agent needs the real value later: vault resolves placeholder in before_tool_call

Configuration

{
  "redaction": {
    "enabled": true,
    "categories": ["credential", "pii", "financial"],
    "vaultExpirySeconds": 3600,
    "failMode": "closed",
    "customPatterns": [
      {
        "name": "internal-token",
        "regex": "MYAPP_[A-Z0-9]{32}",
        "category": "credential"
      }
    ],
    "allowlist": {
      "piiAllowedChannels": [],
      "financialAllowedChannels": [],
      "exemptTools": ["web_search"],
      "exemptAgents": []
    },
    "performanceBudgetMs": 5
  }
}

Security Invariants

Credentials can NEVER be allowlisted — even exempt tools get credential-only scanning
fail-closed — on redaction errors, output is suppressed entirely
SHA-256 vault — no plaintext storage, hash collision handling, TTL-based expiry
No secrets in logs — audit entries log categories and counts, never values

Known Limitations

Be honest about what this does and doesn't protect.

| ✅ Protected | ❌ Not Protected | |-------------|-----------------| | Tool outputs written to transcript | Live-streamed tool output (before persist) | | Outbound messages to channels | Inbound user messages | | Audit log entries | LLM context window (keys sent by user) | | Persisted conversation history | Third-party tool-internal logging |

Why? OpenClaw streams tool output to the LLM in real-time for responsiveness. The tool_result_persist hook fires after streaming but before writing to the transcript. This means:

If a tool returns a secret, the LLM sees it during the current turn (streaming)
But the transcript and audit logs get the redacted version
The LLM's response goes through Layer 2 (message_sending) — so secrets won't appear in outbound messages

For maximum protection: Don't store secrets in files that agents can cat. Use a vault (Vaultwarden, 1Password CLI) and let agents fetch secrets via dedicated tools that you exempt from redaction.

Output Validation (RFC-006)

Detects and flags potentially hallucinated or unverified claims in agent output.

Detectors

| Detector | What It Catches | |----------|----------------| | system_state | "The server is running" without live verification | | entity_name | Incorrect names for known entities | | existence | "Feature X exists" claims without evidence | | operational_status | "Service Y is healthy" without live check |

Fact Registry

{
  "outputValidation": {
    "enabled": true,
    "factRegistries": [{
      "id": "system-live",
      "facts": [
        { "subject": "governance-tests", "predicate": "count", "value": "771", "source": "vitest" },
        { "subject": "nats-events", "predicate": "count", "value": "255908", "source": "nats stream ls" }
      ]
    }],
    "unverifiedClaimPolicy": "flag"
  }
}

Policies

| Policy | Effect | |--------|--------| | ignore | No action on unverified claims | | flag | Add [UNVERIFIED] annotation | | warn | Log warning | | block | Block the message entirely |

LLM Gate (Optional)

For external communications (email, message tool, sessions_send), an optional LLM validator can verify claims against the fact registry before sending:

{
  "outputValidation": {
    "llmValidator": {
      "enabled": true,
      "model": "gemini/gemini-3-flash-preview",
      "failMode": "open",
      "maxRetries": 2,
      "cacheSeconds": 300
    }
  }
}

Policy Examples

"No dangerous commands at night"

{
  "id": "night-guard",
  "rules": [{
    "id": "deny-exec-at-night",
    "conditions": [
      { "type": "tool", "name": ["exec", "gateway", "cron"] },
      { "type": "time", "after": "23:00", "before": "07:00" }
    ],
    "effect": { "action": "deny", "reason": "High-risk tools blocked during night hours" }
  }]
}

"Only trusted agents can spawn sub-agents"

{
  "id": "spawn-control",
  "rules": [{
    "id": "require-trust",
    "conditions": [
      { "type": "tool", "name": "sessions_spawn" },
      { "type": "agent", "maxScore": 39 }
    ],
    "effect": { "action": "deny", "reason": "Agents below score 40 cannot spawn sub-agents" }
  }]
}

Condition Types

| Type | What it checks | |------|---------------| | tool | Tool name, parameters (exact, glob, regex) | | time | Hour, day-of-week, named windows | | agent | Agent ID, trust tier, score range | | context | Conversation, message content, channel | | risk | Computed risk level | | frequency | Actions per time window | | any | OR — at least one sub-condition | | not | Negation |

All conditions in a rule are AND-combined. Use any for OR logic.

Trust System

| Tier | Score | Capability | |------|-------|------------| | untrusted | 0–19 | Read-only, no external actions | | restricted | 20–39 | Basic operations, no production | | standard | 40–59 | Normal operation | | trusted | 60–79 | Extended permissions, can spawn agents | | privileged | 80–100 | Full autonomy |

Trust modifiers: +0.1/success, -2/violation, +0.5/day age, +0.3/day clean streak. Decay: ×0.95 after 30 days inactive. Sub-agents inherit parent's trust ceiling.

Built-in Policies

| Policy | What it does | |--------|-------------| | nightMode | Blocks risky tools during off-hours | | credentialGuard | Blocks access to secrets, .env, passwords | | productionSafeguard | Blocks systemctl, docker rm, destructive ops | | rateLimiter | Throttles tool calls per minute |

Audit Trail

Every decision → ~/.openclaw/plugins/openclaw-governance/governance/audit/YYYY-MM-DD.jsonl:

One file per day, auto-cleaned after retentionDays
Sensitive data redacted before write
Each record maps to compliance controls (ISO 27001, SOC 2, NIS2)

Performance

Policy evaluation: <5ms for 10+ regex policies
Redaction scan: <5ms for typical tool output
Zero runtime dependencies (Node.js builtins only)
Pre-compiled regex cache, ring buffer frequency tracking

Requirements

Node.js ≥ 22.0.0
OpenClaw gateway

Part of the Vainplex OpenClaw Suite

| Plugin | Description | |--------|-------------| | @vainplex/nats-eventstore | NATS JetStream event persistence + audit trail | | @vainplex/openclaw-cortex | Conversation intelligence — threads, decisions, boot context, trace analysis | | @vainplex/openclaw-governance | Policy engine — trust scores, credential redaction, production safeguards | | @vainplex/openclaw-knowledge-engine | Entity and relationship extraction from conversations | | @vainplex/openclaw-sitrep | Situation reports — health, goals, timers aggregated | | @vainplex/openclaw-leuko | Cognitive immune system — health checks, anomaly detection | | @vainplex/openclaw-membrane | Episodic memory bridge via gRPC |

Full suite: alberthild/vainplex-openclaw

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@vainplex/openclaw-governance

Berkeley/Microsoft Compliance Mapping

They Scan. We Govern.

What It Does

Core Features

v0.6: Session Trust (RFC-008)

v0.5 Features

Quick Start

Install

Minimal Config (openclaw.json)

External Config (~/.openclaw/plugins/openclaw-governance/config.json)

🛡️ Agent Firewall (v0.9.0)

What It Does

Quick Start

🟢 NVIDIA NemoClaw Integration

Blueprint Configuration

Full Config Reference

Config Keys

Environment Variables

/firewall Command

Modes

Built-in Domain Allowlist

Free Tier

🔒 Approval 2FA (v0.11.0)

Session Approval

Architecture

Configuration

Matrix Notification Setup

Policy Setup

Config Keys

Security Properties

Redaction Layer (RFC-007)

What It Protects

17 Built-in Patterns

How It Works

Configuration

Security Invariants

Known Limitations

Output Validation (RFC-006)

Detectors

Fact Registry

Policies

LLM Gate (Optional)

Policy Examples

"No dangerous commands at night"

"Only trusted agents can spawn sub-agents"

Condition Types

Trust System

Built-in Policies

Audit Trail

Performance

Requirements

Part of the Vainplex OpenClaw Suite

License

Minimal Config (`openclaw.json`)

External Config (`~/.openclaw/plugins/openclaw-governance/config.json`)

`/firewall` Command