@espadalabs/action-firewall
v0.1.2
Published
Agent Action Firewall — a deterministic gate that blocks the lethal trifecta by construction. A framework-agnostic TypeScript library that any agent runtime (OpenAI SDK, LangChain, custom loops) can wrap around its tool-calling layer.
Maintainers
Readme
@espadalabs/action-firewall
The Agent Action Firewall — a deterministic gate that closes the lethal trifecta by construction.
A framework-agnostic TypeScript library you wrap around any agent runtime — OpenAI SDK, LangChain, custom loops. Three hooks, zero runtime dependencies, fail-closed by default.
Why
The lethal trifecta — an agent that simultaneously has (a) access to private data, (b) the ability to ingest untrusted external content, and (c) the ability to exfiltrate — cannot be defended by classifier accuracy alone. Detector-only approaches keep losing to novel injection vectors. The only durable defense is to break at least one leg by construction.
The Action Firewall breaks the untrusted-ingest → privileged-action
leg. When a tool result enters the conversation from outside the user's
trust boundary, the surrounding conversation is flagged as untrusted.
While flagged, any tool call whose capability is state-changing,
exfil-capable, or credential-emitting is denied unless a verified
human approval is attached.
The flag is monotonic: once flagged, a conversation stays flagged until explicitly cleared via an audited operator action.
Install
pnpm add @espadalabs/action-firewallQuick start
import { ActionFirewall } from "@espadalabs/action-firewall";
const firewall = new ActionFirewall({
mode: "enforce",
statePath: "/var/lib/espada/firewall.json",
approvalVerifier: async (id) => slackBot.isApproved(id),
logger: console,
onAudit: (event) => auditLog.append(event),
});
// 1. After every tool result is persisted:
firewall.processToolResult({
conversationId: turn.conversationId,
toolName: result.toolName,
toolCallId: result.toolCallId,
message: result.message,
});
// 2. Before every tool call is dispatched:
const verdict = await firewall.inspectToolCall({
conversationId: turn.conversationId,
toolName: call.name,
params: call.params,
});
if (verdict.decision === "block") {
return refuse(verdict.reason);
}
// 3. Before every LLM call:
const annotation = firewall.getSystemPromptAnnotation(turn.conversationId);
if (annotation) systemPrompt = `${annotation}\n\n${systemPrompt}`;Modes
| Mode | processToolResult flags? | inspectToolCall returns |
| --------- | -------------------------- | ---------------------------------------------------------------------- |
| off | No | Always allow. Useful for local dev. |
| audit | Yes | allow or require-approval. Never blocks. Use to baseline behavior. |
| enforce | Yes | allow or block. Production posture. |
API
new ActionFirewall(options?)
| Option | Type | Default | Notes |
| ---------------------- | ---------------------------------------------------- | ------------ | ------------------------------------------------------------------------------------- |
| mode | "off" \| "audit" \| "enforce" | "audit" | |
| statePath | string \| undefined | undefined | File path for persistence. In-memory if omitted. |
| toolCapabilities | Record<string, ToolCapability \| ToolCapability[]> | {} | Augments the built-in tool → capability map. |
| blockingCapabilities | readonly ToolCapability[] | all three | Narrow to allow / deny specific capability classes. |
| approvalVerifier | (id: string) => Promise<boolean> | undefined | Required to honor approvalCorrelationId on tool calls. Fail-closed if absent. |
| logger | { info?, warn?, error? } | {} | Compatible with console. |
| onAudit | (event: FirewallAuditEvent) => void | () => void | Sink for marked-untrusted, blocked, bypass-allowed, bypass-denied, cleared. |
Methods
processToolResult(input)→boolean— inspects a tool result and flags the conversation if it carries untrusted external content. Returns whether new evidence was appended.inspectToolCall(input)→Promise<ToolCallVerdict>— returnsallow,block, orrequire-approval.getSystemPromptAnnotation(conversationId)→string— text to prepend to the system prompt for flagged conversations.getStatus(conversationId)→ConversationStatus— snapshot of flag + evidence.clear(conversationId)→void— audited clear. Use sparingly.listFlagged()→readonly string[]— IDs of every flagged conversation.flush()→void— sync state to disk.
Verdict shape
type ToolCallVerdict =
| { decision: "allow" }
| { decision: "block"; reason: string; blockingCapabilities: readonly ToolCapability[] }
| { decision: "require-approval"; reason: string; blockingCapabilities: readonly ToolCapability[] };Approval bypass
Attach a token to the tool call params (top-level or under metadata):
await firewall.inspectToolCall({
conversationId: "c1",
toolName: "bash",
params: { approvalCorrelationId: "appr-xyz" },
});The firewall calls your approvalVerifier(id). If it returns true,
the call is allowed and bypass-allowed is emitted. If it returns
false, throws, or no verifier is configured, the call stays blocked
(fail-closed) and bypass-denied is emitted.
How flagging works (the four-rule detector)
A tool result flags its conversation if any of the following hold:
- R1 —
message.metadata.external_origin === true. - R2 —
message.contentcontains<<<EXTERNAL_UNTRUSTED_CONTENT>>>or its matching close marker. - R3 —
toolNameis in the default external-origin table (web_fetch,fetch_url,search_web,read_email,rag_query, …). - R4 —
message.contentmatches one of the prompt-injection regexes (e.g./ignore (all )?previous instructions/i).
The detector is intentionally permissive on the marking side and conservative on the gating side. False-positive flags do no harm — the worst case is one extra approval prompt. False-negative gates can exfiltrate data.
Stability
The class shape, ToolCallVerdict shape, audit event names, and the
three modes are stable. The detector rules and default tool tables are
versioned and may grow over minor releases.
License
Elastic License 2.0 — see LICENSE.
The Elastic License 2.0 is a source-available license. You can read, modify, and redistribute the source. You cannot:
- Provide the software as a hosted or managed service that offers users access to a substantial set of its features or functionality.
- Move, change, disable, or circumvent license-key functionality.
- Alter, remove, or obscure licensing, copyright, or other notices.
For everything else — embedding inside your own agent runtime, using it inside your company's internal tools, building products on top of it — the license is permissive. If your use case requires terms beyond ELv2, contact the maintainer for a commercial license.
