agentradius

v0.5.1

Published

a month ago

Framework-agnostic security layer for AI agent orchestrators

0High
0Medium
0Low

ai agent security guardrails mcp orchestrator llm tool-use sandbox radius

"The more data & control you give to the AI agent: (A) the more it can help you AND (B) the more it can hurt you." — Lex Fridman

The problem

Your agent has root access to your machine. Your security layer is a system prompt that says "please be careful." Think about that for a second.

Intelligence is scaling. Access is scaling. Security is not. One bad prompt and your agent reads ~/.ssh/id_rsa or runs rm -rf /. One malicious skill and your credentials are gone before you notice.

You could review every action manually, but then why have an agent?

Why this matters

Numbers from 78 validated research sources (114 analyzed), Feb 2026:

| | | |---|---| | 13.4% | Of 3,984 marketplace skills scanned, 534 had critical issues. 76 were confirmed malicious, with install-time scripts that stole credentials. | | 6 / 6 | Researchers tested six coding agents for tool injection. All six gave up remote code execution through poisoned tool metadata. | | 85%+ | A 78-study survey of prompt-based guardrails found most break under adaptive red-team attacks. The LLM can't reliably police itself. |

A regex match on rm -rf is true or false. The agent can't talk its way past it.

What RADIUS does

RADIUS sits between the agent and every tool call. Before anything executes, it runs through a pipeline of modules and gets a verdict: allow, deny, modify (patch arguments), challenge (ask a human), or alert (log and continue).

You pick which modules run and how strict each one is. Blocking ~/.ssh reads but allowing /tmp is one line in fs_guard. Requiring Telegram approval for Bash but not for Read is one rule in approval_gate. Modules are independent, configure them separately.

You start with a profile (local, standard, or unbounded) that sets sensible defaults, then adjust whatever you want in radius.yaml:

npm install agentradius
npx agentradius init --framework openclaw --profile standard

Two commands and you have filesystem locks, shell blocking, secret redaction, rate limits, and an audit log.

Modules

Deterministic modules. Enable only what you need.

| Module | What it does | |--------|-------------| | kill_switch | Emergency stop. Set an env var or drop a file, all risky actions halt. | | self_defense | Locks control-plane files (config/hooks) as immutable and detects tampering. | | tripwire_guard | Optional honeytokens. Touching a tripwire can deny immediately or trigger kill switch. | | tool_policy | Allow or deny by tool name. Optional argument schema validation. Default deny. | | fs_guard | Blocks file access outside allowed paths. ~/.ssh, ~/.aws, /etc are unreachable. | | command_guard | Matches shell patterns: sudo, rm -rf, pipe chains. Blocked before execution. | | exec_sandbox | Wraps commands in bwrap. Restricted filesystem and network access. | | egress_guard | Outbound network filter. Allowlist by domain, IP, port. Everything else is dropped. | | output_dlp | Catches secrets in output: AWS keys, tokens, private certs. Redacts or blocks. | | rate_budget | Caps tool calls per minute. Stops runaway loops. | | repetition_guard | Optional loop brake for identical tool calls repeated N times in a row. | | skill_scanner | Inspects skills at load time for injection payloads: zero-width chars, base64 blobs, exfil URLs. | | approval_gate | Routes risky operations to Telegram or an HTTP endpoint for human approval. | | verdict_provider | Optional external verdict provider integration (deterministic adapter contract). | | audit | Append-only log of every decision. Every action, every timestamp. |

New in 0.5.x:

self_defense for immutable control-plane protection (opt-in)
tripwire_guard for honeytoken tripwires (opt-in)
repetition_guard for repeated identical tool-call loops (opt-in)

Three postures

One config change. Pick the containment level that matches your context.

local -- production, billing, credentials. Default deny. Sandbox required. 30 calls/min.

standard -- development, staging, daily work. Default deny. Secrets redacted. 60 calls/min.

unbounded -- research, brainstorming, migration. Logs everything, blocks nothing. 120 calls/min.

Install

npm install agentradius

Get running

npx agentradius init --framework openclaw --profile standard
npx agentradius doctor    # verify setup
npx agentradius pentest   # test your defenses

This creates radius.yaml and wires the adapter for your orchestrator.

Supported frameworks: openclaw, nanobot, claude-telegram, generic.

What gets generated:

openclaw: .radius/openclaw-hook.command.sh, .radius/openclaw-hooks.json
claude-telegram: .radius/claude-telegram.module.yaml, .radius/claude-tool-hook.command.sh, auto-patched .claude/settings.local.json

Hook scripts resolve config via $SCRIPT_DIR so they work regardless of shell working directory.

Usage

As a library

import { RadiusRuntime, GuardPhase } from 'agentradius';

const guard = new RadiusRuntime({
  configPath: './radius.yaml',
  framework: 'openclaw'
});

const result = await guard.evaluateEvent({
  phase: GuardPhase.PRE_TOOL,
  framework: 'openclaw',
  sessionId: 'session-1',
  toolCall: {
    name: 'Bash',
    arguments: { command: 'cat ~/.ssh/id_rsa' },
  },
  metadata: {},
});

// result.finalAction === 'deny'
// result.reason === 'fs_guard: path ~/.ssh/id_rsa is outside allowed paths'

As a hook (stdin/stdout)

echo '{"tool_name":"Bash","tool_input":{"command":"sudo rm -rf /"}}' | npx agentradius hook

As a server

npx agentradius serve --port 3000

Configuration

global:
  profile: standard
  workspace: ${CWD}
  defaultAction: deny

modules:
  - kill_switch
  - tool_policy
  - fs_guard
  - command_guard
  - output_dlp
  - rate_budget
  - audit

moduleConfig:
  kill_switch:
    enabled: true
    envVar: RADIUS_KILL_SWITCH
    filePath: ./.radius/KILL_SWITCH

  fs_guard:
    allowedPaths:
      - ${workspace}
      - /tmp
    blockedPaths:
      - ~/.ssh
      - ~/.aws
    blockedBasenames:
      - .env
      - .env.local
      - .envrc

  command_guard:
    denyPatterns:
      - "^sudo\\s"
      - "rm\\s+-rf"

  rate_budget:
    windowSec: 60
    maxCallsPerWindow: 60
    store:
      engine: sqlite
      path: ./.radius/state.db
      required: false # set true when node:sqlite is available (Node 22+)

Optional hardening modules (all opt-in):

modules:
  - self_defense
  - tripwire_guard
  - repetition_guard
  - exec_sandbox

moduleConfig:
  self_defense:
    immutablePaths:
      - ./radius.yaml
      - ./.radius/**
    onWriteAttempt: deny
    onHashMismatch: kill_switch

  tripwire_guard:
    fileTokens:
      - /workspace/.tripwire/salary_2026.csv
    envTokens:
      - RADIUS_TRIPWIRE_SECRET
    onTrip: kill_switch

  repetition_guard:
    threshold: 3
    cooldownSec: 60
    onRepeat: deny
    store:
      engine: sqlite
      path: ./.radius/state.db
      required: false # set true when node:sqlite is available (Node 22+)

  exec_sandbox:
    engine: bwrap
    shareNetwork: true
    childPolicy:
      network: deny

Template variables: ${workspace}, ${HOME}, ${CWD}, and any environment variable.

Approvals

approval_gate routes risky tools to Telegram or HTTP for human confirmation. Both support sync_wait mode.

Telegram callbacks: Approve (one action) · Allow 30m (temporary lease) · Deny

HTTP expects a POST returning {"status":"approved"}, {"status":"denied"}, {"status":"approved_temporary","ttlSec":1800}, or {"status":"error","reason":"..."}.

Pending workflow is also supported for bridge architectures:

initial POST may return {"status":"pending","pollUrl":"https://.../status/<id>","retryAfterMs":500}
RADIUS polls pollUrl until final status or timeout

approval:
  channels:
    telegram:
      enabled: true
      transport: polling
      botToken: ${TELEGRAM_BOT_TOKEN}
      allowedChatIds: []
      approverUserIds: []
    http:
      enabled: false
      url: http://127.0.0.1:3101/approvals/resolve
      timeoutMs: 10000
  store:
    engine: sqlite
    path: ./.radius/state.db
    required: true

moduleConfig:
  approval_gate:
    autoRouting:
      defaultChannel: telegram
      frameworkDefaults:
        openclaw: telegram
        generic: http
    rules:
      - tool: "Bash"
        channel: auto
        prompt: 'Approve execution of "Bash"?'
        timeoutSec: 90

Allow 30m only bypasses repeated approval prompts. All other modules still enforce normally.

Single-bot topology note (Telegram):

If your orchestrator already consumes Telegram updates for the same bot token, avoid running two polling consumers.
For one-bot setups, prefer approval.channel=http and bridge approvals through your existing bot service.

OpenClaw subprocess compatibility

OpenClaw hooks run as subprocesses, so in-memory state resets on every tool call. Anything that needs to persist across calls requires SQLite:

approval:
  store:
    engine: sqlite
    path: ./.radius/state.db
    required: false # set true when node:sqlite is available (Node 22+)

moduleConfig:
  rate_budget:
    store:
      engine: sqlite
      path: ./.radius/state.db
      required: false # set true when node:sqlite is available (Node 22+)

| Module | Subprocess mode | Note | |--------|----------------|------| | kill_switch, tool_policy, fs_guard, command_guard, audit | Works | Stateless or file/env based | | self_defense, tripwire_guard | Works | File-system tripwire and immutable checks are subprocess-safe | | approval_gate + Allow 30m | Works | SQLite lease store persists across processes | | rate_budget | Works | SQLite store keeps counters across processes | | repetition_guard | Works | Use SQLite store for cross-process streak tracking | | output_dlp | Partial | Requires PostToolUse hook wiring | | egress_guard | Works | Preflight policy; kernel egress needs OS firewall | | exec_sandbox | Platform dependent | Linux bwrap; non-Linux needs equivalent | | skill_scanner | Not triggered by PreToolUse | Run via npx agentradius scan or CI |

Custom adapter

For Claude Code-based orchestrators with custom runtime/protocol, see examples/claude-custom-adapter-runner.mjs.

Maps Claude hook payload to canonical GuardEvent, runs the pipeline, maps back to Claude response JSON.

echo '{"hook_event_name":"PreToolUse","tool_name":"Bash","tool_input":{"command":"sudo id"}}' \
  | node ./examples/claude-custom-adapter-runner.mjs --config ./radius.yaml
# {"decision":"block","reason":"command_guard: denied by pattern ..."}

Threat coverage

Covered

| Attack | What stops it | |--------|---------------| | Credential theft (cat ~/.ssh/id_rsa) | fs_guard | | System file access (/etc/shadow) | fs_guard | | Privilege escalation (sudo ...) | command_guard | | Destructive shell (rm -rf /) | command_guard | | Secret leakage in output (AKIA..., ghp_...) | output_dlp | | Runaway loops (500 calls/min) | rate_budget | | Emergency freeze | kill_switch | | Skill supply chain (hidden instructions) | skill_scanner | | Unsigned skill installs | skill_scanner provenance policy | | Dotenv harvest (.env reads) | fs_guard + command_guard | | Network exfiltration | egress_guard | | Sandbox escape | exec_sandbox (bwrap) | | Unapproved tool use | tool_policy |

Not covered (v0.4)

Prompt injection at model level. Jailbreaks that produce harmful text without tool calls. RADIUS only sees tool calls and outputs, not the model's reasoning.
Semantic attacks via allowed tools. Reading an allowed file, then sending its contents via an allowed API. Modules check independently; they don't reason about intent.
Token/cost budgets. Rate limiting counts calls, not tokens or dollars.
Multi-tenant isolation. One config per runtime, no user-level policy separation.
OS-level exploits. exec_sandbox uses bwrap, not a VM. A kernel exploit bypasses it.

Tests

92 tests across 10 suites. ~500ms.

npm test

CI regression

.github/workflows/security-regression.yml runs build, tests, and pentest on every push:

npx agentradius init --framework generic --profile standard --output /tmp/radius-ci.yaml
npx agentradius pentest --config /tmp/radius-ci.yaml

Built-in pentest

npx agentradius pentest

  [OK  ] fs_guard blocks /etc/passwd
  [OK  ] command_guard blocks sudo chain
  [OK  ] fs_guard blocks dotenv file reads
  [OK  ] output_dlp detects tool-output secret
  [OK  ] output_dlp detects response secret
  [OK  ] skill_scanner catches malicious skill
  [OK  ] skill_scanner catches tool metadata poisoning
  [OK  ] rate_budget blocks runaway loop
  [WARN] egress_guard blocks outbound exfiltration
  [OK  ] adapters handle malformed payloads

Audit metrics

npx agentradius audit --json

Intervention rate, detection latency, kill-switch activations, sandbox coverage, provenance coverage, dotenv exposure posture.

How it works

Orchestrator event
  -> Adapter (converts to canonical format)
    -> Pipeline (modules run in config order)
      -> first DENY or CHALLENGE wins, patches compose, alerts accumulate
    -> Adapter (converts back to orchestrator format)
  -> Response

Modules run in config order. If any module returns DENY or CHALLENGE, the pipeline stops. MODIFY patches are deep-merged. If an enforce-mode module throws, it fails closed (denies). Observe-mode errors log and continue.

Requirements

Node.js >= 20
Node.js 22+ for persistent state (node:sqlite for approval leases, rate budgets)
bwrap (optional, exec_sandbox on Linux)

Credits

Security philosophy and threat model based on research by Dima Matskevich:

License

MIT