openclaw-agentic-security

v1.5.1

Published

3 months ago

Security gateway for AI agent API calls with interceptor hooks and runtime policy validation

0High
0Medium
0Low

alexanderfedin

anthropic gateway interceptor llm openai security

@openclaw/agentic-security

Security isolation layer for LLM-powered applications. Prevents secrets leaking, prompt injection, data exfiltration, and unauthorized tool execution.

Quick Start

import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic } from "@openclaw/agentic-security";

const client = wrapAnthropic(new Anthropic());
// Secrets are redacted before reaching the LLM — use client normally

OpenAI equivalent:

import OpenAI from "openai";
import { wrapOpenAI } from "@openclaw/agentic-security";

const client = wrapOpenAI(new OpenAI());

Installation

npm install @openclaw/agentic-security
# Peer deps (only what you use):
npm install @anthropic-ai/sdk    # for Anthropic
npm install openai               # for OpenAI

Why Use This?

The Threat Model

LLM-powered applications face security risks that traditional web security tools do not address.

Secrets leaking into LLM prompts. API keys, database credentials, tokens, and PII flow into prompt content through user inputs, tool outputs, and retrieved documents. Once in the prompt, secrets travel to the LLM provider's infrastructure and may appear in model responses. @openclaw/agentic-security scans every request with entropy-based detection and regex patterns before the SDK sends it. Secrets are redacted or the request is rejected — configurable per policy.

Prompt injection via tool outputs and user content. Attacker-controlled content (web pages, files, user messages) can contain instructions that override the system prompt or escalate privileges. The library runs heuristic detection for role confusion, delimiter breaking, and encoding attacks on every request.

Data exfiltration through tool results and network calls. LLMs can be manipulated into exfiltrating data through tool calls, network requests, or by embedding sensitive content in responses. The library monitors egress channels, filters allowed domains, and detects DNS tunneling. PII anonymization tokenizes sensitive values before they reach the LLM and restores them after.

Unauthorized tool execution. Claude Code and similar agents execute tools (Bash, file writes, web search) at the LLM's direction. Without controls, a compromised prompt can instruct the agent to run arbitrary commands. The library intercepts every tool call, checks it against an allowlist/denylist, validates parameters against a schema, and applies RBAC trust levels before execution.

No audit trail. Without observability, you cannot prove what the LLM did, detect anomalies, or meet compliance obligations. The library emits structured audit log entries and OpenTelemetry spans for every security event.

Comparison to Alternatives

| Approach | What it misses | | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | Scan input before sending | Does not cover secrets in LLM responses, tool outputs, or retrieved content | | Request-level WAF | Does not understand LLM semantics, cannot intercept tool calls, no session isolation | | Other LLM security libraries | None provide session-per-workspace isolation + tool interception for Claude Code | | Rolling your own interceptors | Interceptor priority ordering is subtle and error-prone; PII tokenization with de-anonymization is complex to implement correctly |

What This Library Does NOT Do

It is not a network firewall. Use your infrastructure's egress controls for that.
It is not a full compliance platform. It produces audit logs; you store and review them.
It is not a WAF. It does not inspect HTTP traffic outside LLM SDK calls.
It does not prevent all prompt injection. Heuristic detection covers known patterns; novel attacks may pass.

Feature Overview

| Capability | Description | | ----------------------------- | ---------------------------------------------------------------------------------------- | | Secret detection | Entropy + regex scanning of requests and responses; redact or reject mode | | PII anonymization | Format-preserving tokenization (name, email, phone, SSN, card); de-anonymize on response | | Prompt injection detection | Heuristic detection for role confusion, delimiter attacks, encoding tricks | | Output validation | Schema validation and content sanitization for SQL/XSS/command injection patterns | | Tool security (RBAC) | Allowlist/denylist, parameter schema validation, trust levels per tool | | Network control | Egress domain filtering, DNS tunneling detection, raw IP blocking | | Session hardening | Workspace-scoped session IDs, expiry, replay detection | | Health check + graceful drain | healthCheck() for readiness probes; drain() for SIGTERM handling | | OTel observability | Span per interceptor, structured audit log entries, Prometheus metrics | | Compliance presets | HIPAA, SOC 2, GDPR preset factories with all options pre-configured |

Security Presets

Three built-in security levels and three industry-specific compliance presets. Each preset's full option values are documented in docs/user-guide/index.md.

| Preset | Secret Redaction | Prompt Injection | Output Filtering | Network Control | Tool Security | Observability | Use Case | | ------------------------ | ---------------- | ---------------- | ---------------- | --------------- | ------------- | ------------- | ------------------------- | | createMinimalPolicy() | ✓ (redact) | | | | | | Prototyping, evaluation | | createBalancedPolicy() | ✓ (redact) | ✓ (warn) | ✓ (warn) | | | ✓ | Development, testing | | createStrictPolicy() | ✓ (reject) | ✓ (reject) | ✓ (sanitize) | ✓ (enforce) | ✓ | ✓ | Production, high-security | | createHIPAAPolicy() | ✓ (reject) | ✓ (reject) | ✓ (sanitize) | ✓ (enforce) | ✓ | ✓ (PHI) | Healthcare data | | createSOC2Policy() | ✓ (reject) | ✓ (reject) | ✓ (sanitize) | ✓ (enforce) | ✓ | ✓ (audit) | Enterprise compliance | | createGDPRPolicy() | ✓ (redact) | ✓ (warn) | ✓ (warn) | | | ✓ (erasure) | European data processing |

Modes:

redact: Replace secrets/PII with placeholders, continue
warn: Log violation, continue
reject: Block request/response on violation
sanitize: Remove dangerous patterns, continue

Use a preset as a starting point and override specific fields:

import { createStrictPolicy, wrapAnthropic } from "@openclaw/agentic-security";
import Anthropic from "@anthropic-ai/sdk";

const policy = createStrictPolicy();
policy.networkControl.egress.allowedDomains = ["api.myservice.com"];

const client = wrapAnthropic(new Anthropic(), policy);

Claude Code CLI Integration

createClaudeCodeSecurity() provides a purpose-built wrapper for Claude Code agents. It intercepts tool calls before execution and sanitizes tool results before they re-enter the conversation. Each workspace session is isolated by a derived session ID so replay attacks across workspaces are blocked.

import { createClaudeCodeSecurity, deriveSessionId } from "@openclaw/agentic-security";

const security = createClaudeCodeSecurity({ model: "claude-sonnet-4-6" });
const sessionId = deriveSessionId({ workspacePath: process.cwd() });

// Before the tool runs:
const result = await security.interceptToolCall({
  sessionId,
  toolType: "bash",
  toolName: "Bash",
  input: { command: "ls" },
  timestamp: Date.now(),
});
if (result.action === "reject") throw new Error(result.errorMessage);

// Execute the tool, then scan the output:
const out = await security.interceptToolResult({
  sessionId,
  toolType: "bash",
  toolName: "Bash",
  output: stdout,
  durationMs: 50,
  timestamp: Date.now(),
});
const safe = out.action === "redact" ? out.sanitizedOutput : stdout;

Call security.healthCheck() before accepting traffic and security.drain() on SIGTERM.

For the full integration pattern — model presets, session configuration, OTel wiring — see docs/user-guide/index.md.

Health Check and Drain

Use healthCheck() to implement a readiness probe before your process accepts traffic. Use drain() on SIGTERM to let active sessions complete before shutdown.

// Before accepting traffic:
const health = security.healthCheck();
// {
//   rateLimitHeadroom: 1,
//   sessionCount: 0,
//   killSwitchActive: true,
//   circuitBreakerState: "closed"
// }

// On SIGTERM — drain active sessions before shutdown:
process.on("SIGTERM", () => {
  security.drain();
  process.exit(0);
});

Architecture Overview

The security library operates as an interceptor pipeline that wraps SDK clients. Every request and response flows through a prioritized chain of security checks.

graph LR
    A[User Input] --> B[Interceptor Pipeline]
    B --> C[Request Hooks]
    C --> D[SDK Client]
    D --> E[LLM API]
    E --> F[Response]
    F --> G[Response Hooks]
    G --> H[User Output]

    subgraph "Request Flow (Ascending Priority)"
        C --> C1[1: Kill Switch]
        C1 --> C2[3: Rate Limiting]
        C2 --> C3[5: Tool Security]
        C3 --> C4[7: Network Control]
        C4 --> C5[10: Context Isolation]
        C5 --> C6[20: Prompt Injection Detection]
        C6 --> C7[30: PII Anonymization]
        C7 --> C8[100: Secret Scanning]
    end

    subgraph "Response Flow (Descending Priority)"
        G --> G1[200: Audit Logging]
        G1 --> G2[100: Secret Scanning]
        G2 --> G3[90: Schema Validation]
        G3 --> G4[80: PII De-Anonymization]
        G4 --> G5[70: Content Sanitization]
    end

Interceptor Priority Stack

| Priority | Interceptor | Request | Response | Purpose | | -------- | -------------------- | ------- | -------- | ------------------------------------------ | | 1 | Kill Switch | ✓ | | Emergency halt mechanism | | 3 | Rate Limiting | ✓ | | Prevent DoS/abuse | | 5 | Tool Security | ✓ | | RBAC, allowlist/denylist, parameter checks | | 7 | Network Control | ✓ | | Egress filtering, DNS security | | 10 | Context Isolation | ✓ | | Provider-specific wrapping | | 20 | Prompt Injection | ✓ | | Detect role confusion, delimiter attacks | | 30 | PII Anonymization | ✓ | | Tokenize sensitive data (request) | | 70 | Content Sanitization | | ✓ | Remove SQL/XSS/command injection patterns | | 80 | PII De-Anonymization | | ✓ | Restore original PII (response) | | 90 | Schema Validation | | ✓ | Enforce response structure | | 100 | Secret Scanning | ✓ | ✓ | Detect leaked credentials | | 200 | Audit Logging | ✓ | ✓ | Record all security events |

Lower priority runs first for requests, last for responses.

Documentation

docs/user-guide/ — full integration patterns, split by topic
- docs/user-guide/index.md — overview, preset option values, and links to each topic
- docs/user-guide/session-hardening.md — workspace isolation, session expiry, replay detection
- docs/user-guide/pii-anonymization.md — format-preserving tokenization, vault retention, de-anonymization
- docs/user-guide/ner-bridge.md — subprocess NER bridge architecture, configuration, data flow
- docs/user-guide/otel.md — span names, attributes, example trace output, Prometheus metrics
docs/api-reference.md — complete API reference for all exported functions and types
docs/setup-guide.md — preset configuration reference with all option values
docs/integration-guide.md — Express, Koa, raw API integration patterns
docs/security-model.md — threat model deep dive and OWASP LLM Top 10 mapping

Development Prerequisites

To build and test this package from a clone of the repository:

Node.js >= 22 — Required. Use nvm or fnm to manage Node versions.
pnpm — Required for workspace-level installs from the repo root (npm install -g pnpm). The package itself can be used with npm or yarn as a consumer.

No other system dependencies are required for the core package. Optional features (NER subprocess bridge, OTel) have peer dependency requirements documented in the user guide.

Contributing

Contributions are welcome. Please open an issue or PR on GitHub.

License

MIT License — see LICENSE for details.

Built with OpenClaw — Security-first LLM application framework.