@espadalabs/action-firewall

v0.1.2

Published

17 days ago

Agent Action Firewall — a deterministic gate that blocks the lethal trifecta by construction. A framework-agnostic TypeScript library that any agent runtime (OpenAI SDK, LangChain, custom loops) can wrap around its tool-calling layer.

0High
0Medium
0Low

saifaldin14

espada action-firewall agent-security prompt-injection lethal-trifecta taint-tracking llm-security ai-safety

@espadalabs/action-firewall

The Agent Action Firewall — a deterministic gate that closes the lethal trifecta by construction.

A framework-agnostic TypeScript library you wrap around any agent runtime — OpenAI SDK, LangChain, custom loops. Three hooks, zero runtime dependencies, fail-closed by default.

Why

The lethal trifecta — an agent that simultaneously has (a) access to private data, (b) the ability to ingest untrusted external content, and (c) the ability to exfiltrate — cannot be defended by classifier accuracy alone. Detector-only approaches keep losing to novel injection vectors. The only durable defense is to break at least one leg by construction.

The Action Firewall breaks the untrusted-ingest → privileged-action leg. When a tool result enters the conversation from outside the user's trust boundary, the surrounding conversation is flagged as untrusted. While flagged, any tool call whose capability is state-changing, exfil-capable, or credential-emitting is denied unless a verified human approval is attached.

The flag is monotonic: once flagged, a conversation stays flagged until explicitly cleared via an audited operator action.

Install

pnpm add @espadalabs/action-firewall

Quick start

import { ActionFirewall } from "@espadalabs/action-firewall";

const firewall = new ActionFirewall({
  mode: "enforce",
  statePath: "/var/lib/espada/firewall.json",
  approvalVerifier: async (id) => slackBot.isApproved(id),
  logger: console,
  onAudit: (event) => auditLog.append(event),
});

// 1. After every tool result is persisted:
firewall.processToolResult({
  conversationId: turn.conversationId,
  toolName: result.toolName,
  toolCallId: result.toolCallId,
  message: result.message,
});

// 2. Before every tool call is dispatched:
const verdict = await firewall.inspectToolCall({
  conversationId: turn.conversationId,
  toolName: call.name,
  params: call.params,
});
if (verdict.decision === "block") {
  return refuse(verdict.reason);
}

// 3. Before every LLM call:
const annotation = firewall.getSystemPromptAnnotation(turn.conversationId);
if (annotation) systemPrompt = `${annotation}\n\n${systemPrompt}`;

Modes

| Mode | processToolResult flags? | inspectToolCall returns | | --------- | -------------------------- | ---------------------------------------------------------------------- | | off | No | Always allow. Useful for local dev. | | audit | Yes | allow or require-approval. Never blocks. Use to baseline behavior. | | enforce | Yes | allow or block. Production posture. |

API

`new ActionFirewall(options?)`

| Option | Type | Default | Notes | | ---------------------- | ---------------------------------------------------- | ------------ | ------------------------------------------------------------------------------------- | | mode | "off" \| "audit" \| "enforce" | "audit" | | | statePath | string \| undefined | undefined | File path for persistence. In-memory if omitted. | | toolCapabilities | Record<string, ToolCapability \| ToolCapability[]> | {} | Augments the built-in tool → capability map. | | blockingCapabilities | readonly ToolCapability[] | all three | Narrow to allow / deny specific capability classes. | | approvalVerifier | (id: string) => Promise<boolean> | undefined | Required to honor approvalCorrelationId on tool calls. Fail-closed if absent. | | logger | { info?, warn?, error? } | {} | Compatible with console. | | onAudit | (event: FirewallAuditEvent) => void | () => void | Sink for marked-untrusted, blocked, bypass-allowed, bypass-denied, cleared. |

Methods

processToolResult(input) → boolean — inspects a tool result and flags the conversation if it carries untrusted external content. Returns whether new evidence was appended.
inspectToolCall(input) → Promise<ToolCallVerdict> — returns allow, block, or require-approval.
getSystemPromptAnnotation(conversationId) → string — text to prepend to the system prompt for flagged conversations.
getStatus(conversationId) → ConversationStatus — snapshot of flag + evidence.
clear(conversationId) → void — audited clear. Use sparingly.
listFlagged() → readonly string[] — IDs of every flagged conversation.
flush() → void — sync state to disk.

Verdict shape

type ToolCallVerdict =
  | { decision: "allow" }
  | { decision: "block"; reason: string; blockingCapabilities: readonly ToolCapability[] }
  | { decision: "require-approval"; reason: string; blockingCapabilities: readonly ToolCapability[] };

Approval bypass

Attach a token to the tool call params (top-level or under metadata):

await firewall.inspectToolCall({
  conversationId: "c1",
  toolName: "bash",
  params: { approvalCorrelationId: "appr-xyz" },
});

The firewall calls your approvalVerifier(id). If it returns true, the call is allowed and bypass-allowed is emitted. If it returns false, throws, or no verifier is configured, the call stays blocked (fail-closed) and bypass-denied is emitted.

How flagging works (the four-rule detector)

A tool result flags its conversation if any of the following hold:

R1 — message.metadata.external_origin === true.
R2 — message.content contains <<<EXTERNAL_UNTRUSTED_CONTENT>>> or its matching close marker.
R3 — toolName is in the default external-origin table (web_fetch, fetch_url, search_web, read_email, rag_query, …).
R4 — message.content matches one of the prompt-injection regexes (e.g. /ignore (all )?previous instructions/i).

The detector is intentionally permissive on the marking side and conservative on the gating side. False-positive flags do no harm — the worst case is one extra approval prompt. False-negative gates can exfiltrate data.

Stability

The class shape, ToolCallVerdict shape, audit event names, and the three modes are stable. The detector rules and default tool tables are versioned and may grow over minor releases.

License

Elastic License 2.0 — see LICENSE.

The Elastic License 2.0 is a source-available license. You can read, modify, and redistribute the source. You cannot:

Provide the software as a hosted or managed service that offers users access to a substantial set of its features or functionality.
Move, change, disable, or circumvent license-key functionality.
Alter, remove, or obscure licensing, copyright, or other notices.

For everything else — embedding inside your own agent runtime, using it inside your company's internal tools, building products on top of it — the license is permissive. If your use case requires terms beyond ELv2, contact the maintainer for a commercial license.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@espadalabs/action-firewall

Why

Install

Quick start

Modes

API

new ActionFirewall(options?)

Methods

Verdict shape

Approval bypass

How flagging works (the four-rule detector)

Stability

License

`new ActionFirewall(options?)`