@vainplex/openclaw-governance
v0.11.2
Published
Contextual, learning, cross-agent governance for AI agents Includes Response Gate for pre-response enforcement.
Maintainers
Readme
📦 This plugin is part of the Vainplex OpenClaw Suite — a collection of production plugins that turn OpenClaw into a self-governing, learning system. See the monorepo for the full picture.
@vainplex/openclaw-governance
In February 2026, UC Berkeley's Center for Long-Term Cybersecurity published a 67-page framework for governing autonomous AI agents. The same month, Microsoft's Cyber Pulse report revealed that 80% of Fortune 500 companies now run active AI agents — and 29% of employees use unsanctioned ones. Microsoft followed up with a threat analysis specific to OpenClaw, outlining identity, isolation, and runtime risks for self-hosted agents.
The gap is clear: agents are everywhere, governance is nowhere. The Berkeley framework defines what's needed. The existing tools — scanners, input/output filters, output validators — cover fragments. None of them do contextual, learning, runtime governance across agents.
This plugin does. It implements 8 of Berkeley's 12 core requirements today, with the remaining 4 designed and scheduled.
Zero runtime dependencies. Hundreds of tests. Production since February 2026.
Berkeley/Microsoft Compliance Mapping
UC Berkeley's Agentic AI Risk-Management Standards Profile and Microsoft's governance requirements define what responsible agent infrastructure looks like. Here's where this plugin stands:
| Requirement | Our Implementation | Status | |---|---|---| | Agent Registry | Trust config with per-agent scores, all 9 agents registered | ✅ Implemented | | Access Control / Least Privilege | Per-agent tool blocking, trust tier-based permissions | ✅ Implemented | | Real-time Monitoring | Every tool call evaluated against policies before execution | ✅ Implemented | | Activity Logging / Audit Trail | Append-only JSONL, ISO 27001 / SOC 2 / NIS2 control mapping | ✅ Implemented | | Emergency Controls | Night Mode (time-based blocking), Rate Limiter (frequency cap) | ✅ Implemented | | Cascading Agent Policies | Cross-agent governance — parent policies propagate to sub-agents | ✅ Implemented | | Autonomy Levels | Trust tiers (0–100, five levels) — functionally equivalent to Berkeley's L0–L5 | ✅ Implemented | | Credential Protection | 3-layer redaction with SHA-256 vault, 17 built-in patterns, fail-closed | ✅ Implemented | | Human-in-the-Loop | Approval 2FA — TOTP-based approval for agent tool calls. Session approval mode: one code unlocks 10 minutes of auto-approved execution. | ✅ Implemented | | Semantic Intent Analysis | LLM-powered intent classification before tool execution | 📋 Planned | | Multi-Agent Interaction Monitoring | Agent-to-agent message governance | 📋 Planned | | Tamper-evident Audit | Hash-chain audit trail for compliance verification | 📋 Planned |
9 implemented. 3 planned. Production since 2026-02-18.
They Scan. We Govern.
Most tools in this space solve a piece of the problem. None of them solve the whole thing.
| Tool | What It Does | What's Missing | |---|---|---| | Invariant Labs → Snyk | Runtime guardrails, MCP scanning, trace analysis | Acquired by Snyk — enterprise-only. No trust scores. No cross-agent governance. No compliance audit trail. | | NVIDIA NeMo Guardrails | Input/output filtering, topical control | Filters messages, not tool calls. No agent context. No trust awareness. No multi-agent policies. | | GuardrailsAI | Output validation, schema enforcement | Validates what comes out. No idea who called what, when, or whether they should have. Python-only. | | SecureClaw | 56 audit checks, 5 hardening modules, OWASP-aligned | Scanner, not runtime. Tells you what's wrong — doesn't prevent it. No policies, no trust. | | OpenClaw built-in | Tool allowlists, realpath containment, plugin sandboxing | Static config. No trust scoring. No time-awareness. No learning. No compliance mapping. |
The difference: those tools operate on inputs and outputs. This plugin operates on decisions — which tool, which agent, what time, what trust level, what frequency, what context. Then it decides, logs, and learns.
As Peter Steinberger noted, this is what a trust model for AI agents should look like.
What It Does
Agent calls exec("git push origin main")
→ Governance evaluates: tool + time + trust + frequency + context
→ Verdict: DENY — "Forge cannot push to main (trust: restricted, score: 32)"
→ Audit record written (JSONL, compliance-mapped)
→ Agent gets a clear rejection reasonCore Features
- Contextual Policies — Not just "which tool" but "which tool, when, by whom, at what risk level"
- Learning Trust — Score 0–100, five tiers, decay on inactivity. Sub-agents can never exceed parent's trust.
- Cross-Agent Governance — Parent policies cascade to sub-agents. Deny on main = deny on forge.
- Compliance Audit Trail — Append-only JSONL with ISO 27001/SOC 2/NIS2 control mapping.
v0.6: Session Trust (RFC-008)
Trust is not a config value. It's earned per conversation.
- Two-Tier Trust Model — Persistent agent trust (configured baseline) + ephemeral session trust (earned in real-time). A fresh session starts at 70% of agent trust and climbs with successful tool calls.
- Session Signals — Success (+1), policy block (−2), credential violation (−10). Clean streak bonus after 10 consecutive good calls.
- Ceiling & Floor — Sessions can earn up to 120% of agent trust, but can always drop to zero.
- Adaptive Display —
[Governance] Agent: main (60/trusted) | Session: 42/standard | Policies: 4
No existing governance tool implements session-level trust. Static per-agent allowlists don't capture that the same agent performs differently across sessions.
v0.5 Features
- Output Validation (RFC-006) — Detects unverified numeric claims, contradictions, and hallucinated system states. Configurable LLM gate for external communications.
- Redaction Layer (RFC-007) — 3-layer defense-in-depth for credentials, PII, and financial data. SHA-256 vault, fail-closed mode, 17 built-in patterns.
- Fact Registry — Register known facts (from live systems or static files). Claims are checked against facts with fuzzy numeric matching.
Quick Start
Install
npm install @vainplex/openclaw-governanceMinimal Config (openclaw.json)
{
"plugins": {
"entries": {
"openclaw-governance": { "enabled": true }
}
}
}External Config (~/.openclaw/plugins/openclaw-governance/config.json)
{
"enabled": true,
"timezone": "Europe/Berlin",
"failMode": "open",
"trust": {
"defaults": {
"main": 60,
"forge": 45,
"*": 10
}
},
"builtinPolicies": {
"nightMode": { "start": "23:00", "end": "06:00" },
"credentialGuard": true,
"productionSafeguard": true,
"rateLimiter": { "maxPerMinute": 15 }
},
"outputValidation": {
"enabled": true,
"unverifiedClaimPolicy": "flag"
},
"redaction": {
"enabled": true,
"categories": ["credential", "pii", "financial"],
"failMode": "closed"
}
}🛡️ Agent Firewall (v0.9.0)
Real-time security intelligence for AI agents. Integrates ShieldAPI and ERC-8004 on-chain reputation into the governance layer.
What It Does
- URL Threat Detection — Checks outbound URLs for phishing, malware, brand impersonation
- Prompt Injection Detection — Scans tool parameters for adversarial inputs (208 patterns)
- Domain Reputation — DNS, blacklist, SSL, SPF/DMARC checks on extracted domains
- On-Chain Reputation — ERC-8004 agent identity + reputation from Base blockchain
- Trust Enrichment — Security events automatically adjust agent trust scores
- x402 Auto-Pay — Automatic USDC micropayments when free tier exhausted
Quick Start
Minimum config — add to your governance config:
{
"agentFirewall": {
"enabled": true
}
}That's it. Defaults: flag mode (warn, don't block), ShieldAPI at shield.vainplex.dev, 5s timeout, fail-open.
🟢 NVIDIA NemoClaw Integration
Vainplex Governance is 100% compatible with NVIDIA NemoClaw and OpenShell out of the box.
While NemoClaw provides OS-level sandboxing (Landlock, seccomp), Vainplex acts as the Policy Decision Point inside the sandbox, providing Human-in-the-Loop 2FA and verifiable Merkle-Tree audit trails.
Blueprint Configuration
Since NemoClaw strictly isolates network namespaces, you must allowlist the following endpoints in your nemoclaw-blueprint.yaml for Vainplex to function correctly:
network_policies:
allowlist:
- domain: "shield.vainplex.dev" # For Agent Firewall / URL Threat Detection
port: 443
- domain: "your-nats-cluster.internal" # For EventStore Merkle-Tree Auditing
port: 4222Full Config Reference
{
"agentFirewall": {
"enabled": true,
"mode": "flag",
"baseUrl": "https://shield.vainplex.dev",
"timeoutMs": 5000,
"maxUrlsPerMessage": 10,
"domainAllowlist": ["mycompany.com", "*.internal.corp"],
"fallbackOnError": "allow",
"promptCheck": {
"enabled": true,
"tools": ["exec", "write", "edit", "sessions_spawn"],
"minConfidence": 0.7
},
"cache": {
"ttlSeconds": 3600,
"maxEntries": 256
},
"trustEnrichment": {
"enabled": true
},
"walletKey": "${SHIELDAPI_WALLET_KEY}",
"erc8004": {
"enabled": true,
"chain": "base",
"agentMapping": {
"myagent": 16700
}
}
}
}Config Keys
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| enabled | boolean | false | Enable Agent Firewall |
| mode | "flag" | "block" | "flag" | Flag = warn only, Block = deny on threat |
| baseUrl | string | https://shield.vainplex.dev | ShieldAPI endpoint |
| timeoutMs | number | 5000 | Request timeout (ms) |
| maxUrlsPerMessage | number | 10 | Max URLs to check per message |
| domainAllowlist | string[] | [] | Additional domains to skip (supports *. globs) |
| fallbackOnError | "allow" | "block" | "allow" | Behavior when ShieldAPI is unreachable |
| walletKey | string | — | Wallet key for x402 auto-pay |
| promptCheck.enabled | boolean | true | Enable prompt injection checking |
| promptCheck.tools | string[] | ["exec","write","edit","sessions_spawn"] | Tools to check |
| promptCheck.minConfidence | number | 0.7 | Min confidence to trigger |
| cache.ttlSeconds | number | 3600 | Cache TTL |
| cache.maxEntries | number | 256 | Max cache entries per check type |
| trustEnrichment.enabled | boolean | true | Feed security events into trust scores |
| erc8004.enabled | boolean | false | Enable on-chain reputation lookup |
| erc8004.chain | string | "base" | Blockchain: base, ethereum, polygon |
| erc8004.agentMapping | object | {} | Map agent IDs to on-chain IDs |
Environment Variables
| Variable | Description |
|----------|-------------|
| AGENT_FIREWALL_WALLET_KEY | Wallet key for x402 payments |
| SHIELDAPI_WALLET_KEY | Alternative wallet key env var |
/firewall Command
Type /firewall to see:
- Current mode (flag/block)
- Cache stats (size, hits, misses)
- x402 wallet status
- ERC-8004 status
Modes
| Mode | URL Threat | Prompt Injection | Domain Risk | |------|-----------|-----------------|-------------| | flag | Logs warning, message goes through | Logs warning, tool call proceeds | Logs warning | | block | Message cancelled | Tool call blocked | Message cancelled |
Built-in Domain Allowlist
These domains are never checked (plus any you add via domainAllowlist):
github.com, *.github.com, npmjs.com, api.openai.com, api.anthropic.com, *.vainplex.dev, *.vainplex.de
Free Tier
ShieldAPI offers 3 free calls per endpoint per day. After that:
- With
walletKey: Automatic x402 USDC micropayment ($0.001-$0.01) - Without
walletKey: Falls back to "unknown" risk (fail-open)
🔒 Approval 2FA (v0.11.0)
TOTP-based Human-in-the-Loop for agent tool calls. When a security-critical agent (e.g., your pentesting agent) tries to run exec, the system:
- Blocks the tool call via
before_tool_callhook - Batches multiple commands within a 3-second window
- Sends a notification to a dedicated Matrix room
- Waits for a 6-digit TOTP code from an authorized approver
- Approves the batch — and starts a Session Approval window
Session Approval
One TOTP code doesn't just approve one command — it unlocks all exec calls from that agent for a configurable duration (default: 10 minutes). No more entering codes for every nmap step.
🔒 APPROVAL REQUIRED (1 command)
Agent: vera
1. exec: nmap -sV -T4 127.0.0.1 --top-ports 20
Enter TOTP code (5min timeout)
✨ One code approves ALL commands for 10 minutesArchitecture
- No dependency on OpenClaw's exec-approval system — works independently via plugin hooks
- Dedicated Matrix bot (
@governance:yourserver) sends notifications - Independent Matrix poller (2s interval) — reads TOTP codes directly from the governance room, no reliance on OpenClaw's Matrix sync
- TOTP replay protection — same code can't be used twice within the same period
- Periodic cleanup — expired sessions and cooldowns cleaned every 5 minutes
Configuration
{
"approval2fa": {
"enabled": true,
"totpSecret": "YOUR_BASE32_SECRET",
"totpIssuer": "Vainplex Governance",
"totpLabel": "Agent Approval",
"timeoutSeconds": 300,
"maxAttempts": 3,
"cooldownSeconds": 900,
"batchWindowMs": 3000,
"sessionDurationMinutes": 10,
"approvers": ["@admin:yourserver.dev"],
"notifyChannel": "room:!yourRoomId:yourserver.dev"
}
}Matrix Notification Setup
Create a dedicated Matrix bot account and a secrets file:
~/.openclaw/plugins/openclaw-governance/matrix-notify.json:
{
"homeserverUrl": "https://matrix.yourserver.dev",
"accessToken": "syt_your_bot_token",
"userId": "@governance:yourserver.dev"
}The bot needs to be invited to the notification room. The plugin's built-in Matrix poller reads responses directly — no need to configure the bot as an OpenClaw agent.
Policy Setup
Create a policy that triggers 2FA for specific agents/tools:
{
"id": "agent-2fa",
"priority": 200,
"scope": {
"hooks": ["before_tool_call"],
"agents": ["vera"]
},
"rules": [{
"id": "exec-requires-2fa",
"conditions": [
{ "type": "tool", "name": "exec" }
],
"effect": {
"action": "2fa"
}
}]
}Config Keys
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| enabled | boolean | false | Enable Approval 2FA |
| totpSecret | string | — | Base32-encoded TOTP secret (shared with authenticator app) |
| totpIssuer | string | — | TOTP issuer name (shown in authenticator) |
| totpLabel | string | — | TOTP label (shown in authenticator) |
| timeoutSeconds | number | 300 | Seconds before auto-deny |
| maxAttempts | number | 3 | Max wrong codes before cooldown |
| cooldownSeconds | number | 900 | Cooldown after max attempts exceeded |
| batchWindowMs | number | 3000 | Debounce window for batching commands |
| sessionDurationMinutes | number | 10 | Auto-approve duration after valid TOTP |
| approvers | string[] | — | Matrix user IDs authorized to approve |
| notifyChannel | string | — | Matrix room for notifications (room:!id:server) |
Security Properties
- Fail-closed — if 2FA check errors, the tool call is blocked
- Approver-only — only configured Matrix users can enter codes
- Replay-protected — same TOTP code rejected within the same 30s period
- Rate-limited — 3 wrong codes → 15 minute cooldown
- Session-scoped — approval is per-agent, not global
Redaction Layer (RFC-007)
3-layer defense-in-depth against credential, PII, and financial data leakage.
What It Protects
| Layer | Hook | When | Can Modify? |
|-------|------|------|-------------|
| Layer 1 | tool_result_persist | Before tool output is written to transcript | ✅ Yes (sync) |
| Layer 2 | message_sending | Before outbound messages to channels | ✅ Yes (modifying) |
| Layer 2b | before_message_write | Before message persistence | ✅ Yes (sync) |
17 Built-in Patterns
| Category | Patterns |
|----------|----------|
| Credential | OpenAI API key, Anthropic key, Google API key, GitHub PAT/server token, GitLab PAT, Private key headers, Bearer tokens, Key-value credentials, AWS access key, Generic API key (sk-*), Basic Auth |
| PII | Email addresses, Phone numbers (international) |
| Financial | Credit card numbers (Luhn-valid), IBAN, US SSN |
How It Works
Tool returns: "Found key sk_test_51Ss4R2..."
→ Layer 1: Pattern match → Replace with [REDACTED:api_key:a3f2]
→ SHA-256 hash stored in vault (1h TTL)
→ Transcript gets redacted version
→ If agent needs the real value later: vault resolves placeholder in before_tool_callConfiguration
{
"redaction": {
"enabled": true,
"categories": ["credential", "pii", "financial"],
"vaultExpirySeconds": 3600,
"failMode": "closed",
"customPatterns": [
{
"name": "internal-token",
"regex": "MYAPP_[A-Z0-9]{32}",
"category": "credential"
}
],
"allowlist": {
"piiAllowedChannels": [],
"financialAllowedChannels": [],
"exemptTools": ["web_search"],
"exemptAgents": []
},
"performanceBudgetMs": 5
}
}Security Invariants
- Credentials can NEVER be allowlisted — even exempt tools get credential-only scanning
- fail-closed — on redaction errors, output is suppressed entirely
- SHA-256 vault — no plaintext storage, hash collision handling, TTL-based expiry
- No secrets in logs — audit entries log categories and counts, never values
Known Limitations
Be honest about what this does and doesn't protect.
| ✅ Protected | ❌ Not Protected | |-------------|-----------------| | Tool outputs written to transcript | Live-streamed tool output (before persist) | | Outbound messages to channels | Inbound user messages | | Audit log entries | LLM context window (keys sent by user) | | Persisted conversation history | Third-party tool-internal logging |
Why? OpenClaw streams tool output to the LLM in real-time for responsiveness. The tool_result_persist hook fires after streaming but before writing to the transcript. This means:
- If a tool returns a secret, the LLM sees it during the current turn (streaming)
- But the transcript and audit logs get the redacted version
- The LLM's response goes through Layer 2 (
message_sending) — so secrets won't appear in outbound messages
For maximum protection: Don't store secrets in files that agents can cat. Use a vault (Vaultwarden, 1Password CLI) and let agents fetch secrets via dedicated tools that you exempt from redaction.
Output Validation (RFC-006)
Detects and flags potentially hallucinated or unverified claims in agent output.
Detectors
| Detector | What It Catches |
|----------|----------------|
| system_state | "The server is running" without live verification |
| entity_name | Incorrect names for known entities |
| existence | "Feature X exists" claims without evidence |
| operational_status | "Service Y is healthy" without live check |
Fact Registry
Register known facts for claim verification:
{
"outputValidation": {
"enabled": true,
"factRegistries": [{
"id": "system-live",
"facts": [
{ "subject": "governance-tests", "predicate": "count", "value": "771", "source": "vitest" },
{ "subject": "nats-events", "predicate": "count", "value": "255908", "source": "nats stream ls" }
]
}],
"unverifiedClaimPolicy": "flag"
}
}Policies
| Policy | Effect |
|--------|--------|
| ignore | No action on unverified claims |
| flag | Add [UNVERIFIED] annotation |
| warn | Log warning |
| block | Block the message entirely |
LLM Gate (Optional)
For external communications (email, message tool, sessions_send), an optional LLM validator can verify claims against the fact registry before sending:
{
"outputValidation": {
"llmValidator": {
"enabled": true,
"model": "gemini/gemini-3-flash-preview",
"failMode": "open",
"maxRetries": 2,
"cacheSeconds": 300
}
}
}Policy Examples
"No dangerous commands at night"
{
"id": "night-guard",
"rules": [{
"id": "deny-exec-at-night",
"conditions": [
{ "type": "tool", "name": ["exec", "gateway", "cron"] },
{ "type": "time", "after": "23:00", "before": "07:00" }
],
"effect": { "action": "deny", "reason": "High-risk tools blocked during night hours" }
}]
}"Only trusted agents can spawn sub-agents"
{
"id": "spawn-control",
"rules": [{
"id": "require-trust",
"conditions": [
{ "type": "tool", "name": "sessions_spawn" },
{ "type": "agent", "maxScore": 39 }
],
"effect": { "action": "deny", "reason": "Agents below score 40 cannot spawn sub-agents" }
}]
}Condition Types
| Type | What it checks |
|------|---------------|
| tool | Tool name, parameters (exact, glob, regex) |
| time | Hour, day-of-week, named windows |
| agent | Agent ID, trust tier, score range |
| context | Conversation, message content, channel |
| risk | Computed risk level |
| frequency | Actions per time window |
| any | OR — at least one sub-condition |
| not | Negation |
All conditions in a rule are AND-combined. Use any for OR logic.
Trust System
| Tier | Score | Capability |
|------|-------|------------|
| untrusted | 0–19 | Read-only, no external actions |
| restricted | 20–39 | Basic operations, no production |
| standard | 40–59 | Normal operation |
| trusted | 60–79 | Extended permissions, can spawn agents |
| privileged | 80–100 | Full autonomy |
Trust modifiers: +0.1/success, -2/violation, +0.5/day age, +0.3/day clean streak. Decay: ×0.95 after 30 days inactive. Sub-agents inherit parent's trust ceiling.
Built-in Policies
| Policy | What it does |
|--------|-------------|
| nightMode | Blocks risky tools during off-hours |
| credentialGuard | Blocks access to secrets, .env, passwords |
| productionSafeguard | Blocks systemctl, docker rm, destructive ops |
| rateLimiter | Throttles tool calls per minute |
Audit Trail
Every decision → ~/.openclaw/plugins/openclaw-governance/governance/audit/YYYY-MM-DD.jsonl:
- One file per day, auto-cleaned after
retentionDays - Sensitive data redacted before write
- Each record maps to compliance controls (ISO 27001, SOC 2, NIS2)
Performance
- Policy evaluation: <5ms for 10+ regex policies
- Redaction scan: <5ms for typical tool output
- Zero runtime dependencies (Node.js builtins only)
- Pre-compiled regex cache, ring buffer frequency tracking
Requirements
- Node.js ≥ 22.0.0
- OpenClaw gateway
Part of the Vainplex OpenClaw Suite
| Plugin | Description | |--------|-------------| | @vainplex/nats-eventstore | NATS JetStream event persistence + audit trail | | @vainplex/openclaw-cortex | Conversation intelligence — threads, decisions, boot context, trace analysis | | @vainplex/openclaw-governance | Policy engine — trust scores, credential redaction, production safeguards | | @vainplex/openclaw-knowledge-engine | Entity and relationship extraction from conversations | | @vainplex/openclaw-sitrep | Situation reports — health, goals, timers aggregated | | @vainplex/openclaw-leuko | Cognitive immune system — health checks, anomaly detection | | @vainplex/openclaw-membrane | Episodic memory bridge via gRPC |
Full suite: alberthild/vainplex-openclaw
License
MIT © Albert Hild
