agentradius
v0.5.1
Published
Framework-agnostic security layer for AI agent orchestrators
Maintainers
Readme
"The more data & control you give to the AI agent: (A) the more it can help you AND (B) the more it can hurt you." — Lex Fridman
The problem
Your agent has root access to your machine. Your security layer is a system prompt that says "please be careful." Think about that for a second.
Intelligence is scaling. Access is scaling. Security is not. One bad prompt and your agent reads ~/.ssh/id_rsa or runs rm -rf /. One malicious skill and your credentials are gone before you notice.
You could review every action manually, but then why have an agent?
Why this matters
Numbers from 78 validated research sources (114 analyzed), Feb 2026:
| | | |---|---| | 13.4% | Of 3,984 marketplace skills scanned, 534 had critical issues. 76 were confirmed malicious, with install-time scripts that stole credentials. | | 6 / 6 | Researchers tested six coding agents for tool injection. All six gave up remote code execution through poisoned tool metadata. | | 85%+ | A 78-study survey of prompt-based guardrails found most break under adaptive red-team attacks. The LLM can't reliably police itself. |
A regex match on rm -rf is true or false. The agent can't talk its way past it.
What RADIUS does
RADIUS sits between the agent and every tool call. Before anything executes, it runs through a pipeline of modules and gets a verdict: allow, deny, modify (patch arguments), challenge (ask a human), or alert (log and continue).
You pick which modules run and how strict each one is. Blocking ~/.ssh reads but allowing /tmp is one line in fs_guard. Requiring Telegram approval for Bash but not for Read is one rule in approval_gate. Modules are independent, configure them separately.
You start with a profile (local, standard, or unbounded) that sets sensible defaults, then adjust whatever you want in radius.yaml:
npm install agentradius
npx agentradius init --framework openclaw --profile standardTwo commands and you have filesystem locks, shell blocking, secret redaction, rate limits, and an audit log.
Modules
Deterministic modules. Enable only what you need.
| Module | What it does |
|--------|-------------|
| kill_switch | Emergency stop. Set an env var or drop a file, all risky actions halt. |
| self_defense | Locks control-plane files (config/hooks) as immutable and detects tampering. |
| tripwire_guard | Optional honeytokens. Touching a tripwire can deny immediately or trigger kill switch. |
| tool_policy | Allow or deny by tool name. Optional argument schema validation. Default deny. |
| fs_guard | Blocks file access outside allowed paths. ~/.ssh, ~/.aws, /etc are unreachable. |
| command_guard | Matches shell patterns: sudo, rm -rf, pipe chains. Blocked before execution. |
| exec_sandbox | Wraps commands in bwrap. Restricted filesystem and network access. |
| egress_guard | Outbound network filter. Allowlist by domain, IP, port. Everything else is dropped. |
| output_dlp | Catches secrets in output: AWS keys, tokens, private certs. Redacts or blocks. |
| rate_budget | Caps tool calls per minute. Stops runaway loops. |
| repetition_guard | Optional loop brake for identical tool calls repeated N times in a row. |
| skill_scanner | Inspects skills at load time for injection payloads: zero-width chars, base64 blobs, exfil URLs. |
| approval_gate | Routes risky operations to Telegram or an HTTP endpoint for human approval. |
| verdict_provider | Optional external verdict provider integration (deterministic adapter contract). |
| audit | Append-only log of every decision. Every action, every timestamp. |
New in 0.5.x:
self_defensefor immutable control-plane protection (opt-in)tripwire_guardfor honeytoken tripwires (opt-in)repetition_guardfor repeated identical tool-call loops (opt-in)
Three postures
One config change. Pick the containment level that matches your context.
local -- production, billing, credentials. Default deny. Sandbox required. 30 calls/min.
standard -- development, staging, daily work. Default deny. Secrets redacted. 60 calls/min.
unbounded -- research, brainstorming, migration. Logs everything, blocks nothing. 120 calls/min.
Install
npm install agentradiusGet running
npx agentradius init --framework openclaw --profile standard
npx agentradius doctor # verify setup
npx agentradius pentest # test your defensesThis creates radius.yaml and wires the adapter for your orchestrator.
Supported frameworks: openclaw, nanobot, claude-telegram, generic.
What gets generated:
- openclaw:
.radius/openclaw-hook.command.sh,.radius/openclaw-hooks.json - claude-telegram:
.radius/claude-telegram.module.yaml,.radius/claude-tool-hook.command.sh, auto-patched.claude/settings.local.json
Hook scripts resolve config via $SCRIPT_DIR so they work regardless of shell working directory.
Usage
As a library
import { RadiusRuntime, GuardPhase } from 'agentradius';
const guard = new RadiusRuntime({
configPath: './radius.yaml',
framework: 'openclaw'
});
const result = await guard.evaluateEvent({
phase: GuardPhase.PRE_TOOL,
framework: 'openclaw',
sessionId: 'session-1',
toolCall: {
name: 'Bash',
arguments: { command: 'cat ~/.ssh/id_rsa' },
},
metadata: {},
});
// result.finalAction === 'deny'
// result.reason === 'fs_guard: path ~/.ssh/id_rsa is outside allowed paths'As a hook (stdin/stdout)
echo '{"tool_name":"Bash","tool_input":{"command":"sudo rm -rf /"}}' | npx agentradius hookAs a server
npx agentradius serve --port 3000Configuration
global:
profile: standard
workspace: ${CWD}
defaultAction: deny
modules:
- kill_switch
- tool_policy
- fs_guard
- command_guard
- output_dlp
- rate_budget
- audit
moduleConfig:
kill_switch:
enabled: true
envVar: RADIUS_KILL_SWITCH
filePath: ./.radius/KILL_SWITCH
fs_guard:
allowedPaths:
- ${workspace}
- /tmp
blockedPaths:
- ~/.ssh
- ~/.aws
blockedBasenames:
- .env
- .env.local
- .envrc
command_guard:
denyPatterns:
- "^sudo\\s"
- "rm\\s+-rf"
rate_budget:
windowSec: 60
maxCallsPerWindow: 60
store:
engine: sqlite
path: ./.radius/state.db
required: false # set true when node:sqlite is available (Node 22+)Optional hardening modules (all opt-in):
modules:
- self_defense
- tripwire_guard
- repetition_guard
- exec_sandbox
moduleConfig:
self_defense:
immutablePaths:
- ./radius.yaml
- ./.radius/**
onWriteAttempt: deny
onHashMismatch: kill_switch
tripwire_guard:
fileTokens:
- /workspace/.tripwire/salary_2026.csv
envTokens:
- RADIUS_TRIPWIRE_SECRET
onTrip: kill_switch
repetition_guard:
threshold: 3
cooldownSec: 60
onRepeat: deny
store:
engine: sqlite
path: ./.radius/state.db
required: false # set true when node:sqlite is available (Node 22+)
exec_sandbox:
engine: bwrap
shareNetwork: true
childPolicy:
network: denyTemplate variables: ${workspace}, ${HOME}, ${CWD}, and any environment variable.
Approvals
approval_gate routes risky tools to Telegram or HTTP for human confirmation. Both support sync_wait mode.
Telegram callbacks: Approve (one action) · Allow 30m (temporary lease) · Deny
HTTP expects a POST returning {"status":"approved"}, {"status":"denied"}, {"status":"approved_temporary","ttlSec":1800}, or {"status":"error","reason":"..."}.
Pending workflow is also supported for bridge architectures:
- initial POST may return
{"status":"pending","pollUrl":"https://.../status/<id>","retryAfterMs":500} - RADIUS polls
pollUrluntil final status or timeout
approval:
channels:
telegram:
enabled: true
transport: polling
botToken: ${TELEGRAM_BOT_TOKEN}
allowedChatIds: []
approverUserIds: []
http:
enabled: false
url: http://127.0.0.1:3101/approvals/resolve
timeoutMs: 10000
store:
engine: sqlite
path: ./.radius/state.db
required: true
moduleConfig:
approval_gate:
autoRouting:
defaultChannel: telegram
frameworkDefaults:
openclaw: telegram
generic: http
rules:
- tool: "Bash"
channel: auto
prompt: 'Approve execution of "Bash"?'
timeoutSec: 90Allow 30m only bypasses repeated approval prompts. All other modules still enforce normally.
Single-bot topology note (Telegram):
- If your orchestrator already consumes Telegram updates for the same bot token, avoid running two polling consumers.
- For one-bot setups, prefer
approval.channel=httpand bridge approvals through your existing bot service.
OpenClaw subprocess compatibility
OpenClaw hooks run as subprocesses, so in-memory state resets on every tool call. Anything that needs to persist across calls requires SQLite:
approval:
store:
engine: sqlite
path: ./.radius/state.db
required: false # set true when node:sqlite is available (Node 22+)
moduleConfig:
rate_budget:
store:
engine: sqlite
path: ./.radius/state.db
required: false # set true when node:sqlite is available (Node 22+)| Module | Subprocess mode | Note |
|--------|----------------|------|
| kill_switch, tool_policy, fs_guard, command_guard, audit | Works | Stateless or file/env based |
| self_defense, tripwire_guard | Works | File-system tripwire and immutable checks are subprocess-safe |
| approval_gate + Allow 30m | Works | SQLite lease store persists across processes |
| rate_budget | Works | SQLite store keeps counters across processes |
| repetition_guard | Works | Use SQLite store for cross-process streak tracking |
| output_dlp | Partial | Requires PostToolUse hook wiring |
| egress_guard | Works | Preflight policy; kernel egress needs OS firewall |
| exec_sandbox | Platform dependent | Linux bwrap; non-Linux needs equivalent |
| skill_scanner | Not triggered by PreToolUse | Run via npx agentradius scan or CI |
Custom adapter
For Claude Code-based orchestrators with custom runtime/protocol, see examples/claude-custom-adapter-runner.mjs.
Maps Claude hook payload to canonical GuardEvent, runs the pipeline, maps back to Claude response JSON.
echo '{"hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"sudo id"}}' \
| node ./examples/claude-custom-adapter-runner.mjs --config ./radius.yaml
# {"decision":"block","reason":"command_guard: denied by pattern ..."}Threat coverage
Covered
| Attack | What stops it |
|--------|---------------|
| Credential theft (cat ~/.ssh/id_rsa) | fs_guard |
| System file access (/etc/shadow) | fs_guard |
| Privilege escalation (sudo ...) | command_guard |
| Destructive shell (rm -rf /) | command_guard |
| Secret leakage in output (AKIA..., ghp_...) | output_dlp |
| Runaway loops (500 calls/min) | rate_budget |
| Emergency freeze | kill_switch |
| Skill supply chain (hidden instructions) | skill_scanner |
| Unsigned skill installs | skill_scanner provenance policy |
| Dotenv harvest (.env reads) | fs_guard + command_guard |
| Network exfiltration | egress_guard |
| Sandbox escape | exec_sandbox (bwrap) |
| Unapproved tool use | tool_policy |
Not covered (v0.4)
- Prompt injection at model level. Jailbreaks that produce harmful text without tool calls. RADIUS only sees tool calls and outputs, not the model's reasoning.
- Semantic attacks via allowed tools. Reading an allowed file, then sending its contents via an allowed API. Modules check independently; they don't reason about intent.
- Token/cost budgets. Rate limiting counts calls, not tokens or dollars.
- Multi-tenant isolation. One config per runtime, no user-level policy separation.
- OS-level exploits.
exec_sandboxuses bwrap, not a VM. A kernel exploit bypasses it.
Tests
92 tests across 10 suites. ~500ms.
npm testCI regression
.github/workflows/security-regression.yml runs build, tests, and pentest on every push:
npx agentradius init --framework generic --profile standard --output /tmp/radius-ci.yaml
npx agentradius pentest --config /tmp/radius-ci.yamlBuilt-in pentest
npx agentradius pentest
[OK ] fs_guard blocks /etc/passwd
[OK ] command_guard blocks sudo chain
[OK ] fs_guard blocks dotenv file reads
[OK ] output_dlp detects tool-output secret
[OK ] output_dlp detects response secret
[OK ] skill_scanner catches malicious skill
[OK ] skill_scanner catches tool metadata poisoning
[OK ] rate_budget blocks runaway loop
[WARN] egress_guard blocks outbound exfiltration
[OK ] adapters handle malformed payloadsAudit metrics
npx agentradius audit --jsonIntervention rate, detection latency, kill-switch activations, sandbox coverage, provenance coverage, dotenv exposure posture.
How it works
Orchestrator event
-> Adapter (converts to canonical format)
-> Pipeline (modules run in config order)
-> first DENY or CHALLENGE wins, patches compose, alerts accumulate
-> Adapter (converts back to orchestrator format)
-> ResponseModules run in config order. If any module returns DENY or CHALLENGE, the pipeline stops. MODIFY patches are deep-merged. If an enforce-mode module throws, it fails closed (denies). Observe-mode errors log and continue.
Requirements
- Node.js >= 20
- Node.js 22+ for persistent state (
node:sqlitefor approval leases, rate budgets) bwrap(optional,exec_sandboxon Linux)
Credits
Security philosophy and threat model based on research by Dima Matskevich:
License
MIT
