@allenwu06/agentguard
v0.1.0
Published
Static AST linter that flags missing guards on AI-agent tool calls — unguarded shell/exec, filesystem, network, and dynamic-eval sinks with no input validation or allowlist. Conservative (low false positives). Ships as a CLI, an ESLint plugin, and a GitHu
Maintainers
Readme
agentguard
A building inspector for the missing safety railings around your AI agent's tools.
When you give an AI agent tools, the AI decides what to pass into them — and those values can end up running shell commands, touching your files, hitting the network, or executing code. The safety railing is a check on the way in: validate the input, check it against an allow/deny list, or ask a human first. agentguard reads your code and points at the dangerous tool actions that have no such railing — while staying quiet about the ones you've already guarded, so it doesn't nag you about safe code.
npx @allenwu06/agentguard ./srcIt comes three ways so it fits wherever you already work: a standalone
command-line tool (plain / --json / --sarif output), an ESLint
plugin (it slots into the linter most JS projects already run), and a thin
GitHub Action. No install ceremony, no config required, no API key, no
network, no AI involved. It never runs the code it checks — it only reads
it.
Why this exists
When you give an LLM (a large language model — the kind of AI behind chat assistants) a set of tools, every tool handler is a way in for an attacker: the model — or anything that can steer it, like a prompt injection hidden in a web page it fetched, a poisoned document, or a malicious sub-agent — chooses the arguments, and those arguments flow into a shell command, the filesystem, the network, or code execution. The single most effective defense is a railing on that path: validate the input, check it against an allow/deny list, or require a human to confirm before the action happens.
A Show HN post that scanned 16 open-source AI-agent repos reported that 76% of tool calls had no guards at all.
That 76% figure is from that outside scan — this tool did not measure or reproduce it. agentguard also does not publish any accuracy number for itself (a code-reading tool like this cannot honestly claim one — see Limitations). It exists so you can run a deliberately cautious check on your own code, in seconds, and see your unguarded tool actions before they ship.
What it detects
This is the developer reference. agentguard reads your code with a real JS/TS parser — it builds an AST (an abstract syntax tree, the structured form of your code a compiler uses) rather than crude text matching. It then flags something only when all of these are true:
- the function is an agent tool handler — registered through an
agent framework API it recognizes (MCP
server.tool/setRequestHandler, LangChainDynamicStructuredTool/StructuredTool/tool(fn,{…}), Vercel AI SDKtool({execute}), LlamaIndexFunctionTool.from,defineTool/createTool/registerTool-style wrappers); - a dangerous operation (a "sink" — a spot where input causes a real side effect: a shell command, a file write, a network call, code execution) is reachable inside it, or in a same-file helper it calls, followed one hop;
- the dangerous argument actually traces back to the handler's model-controlled input (a fixed/constant argument never counts, so it never fires);
- no railing is present on that path.
| Rule | Severity | What it flags |
|------|----------|---------------|
| AG001 | critical | Unguarded child_process exec/spawn/execFile/fork from tool input |
| AG002 | high | Unguarded sensitive filesystem write (write/unlink/rm/rename/chmod, incl. fs-extra) from tool input |
| AG003 | high | Unguarded outbound network (fetch/axios/http(s)/got/undici) to a tool-input URL |
| AG004 | critical | Unguarded dynamic code execution (eval / new Function / vm) of tool input |
| AG005 | high | Unguarded sensitive filesystem read (path traversal / secret exfiltration) of a tool-input path |
Each finding reports: rule id, severity, file:line:column, the exact
sink, why it fired, and a concrete remediation.
What counts as a railing (a "guard") — and makes agentguard stay silent
A "guard" is the railing referred to above. agentguard recognizes three families of them. If one is on the path, the finding does not fire:
- Input validation / schema (G1) — the tool is registered with a
schema (Zod / JSON-schema /
parameters/inputSchema), or the handler runs.parse/.safeParse/.validate/.assert/.cast(Zod / Joi / Yup / ajv / superstruct / io-ts) on the input. A value reassigned from a validation result is treated as untainted. - Allowlist / denylist (G2) — a membership/predicate decision on the
tainted value (
Set.has,Array.includes, a regex.test,startsWith/endsWith, …) that can stop the sink (anif+throw/return/reject, a guard clause, a ternary) before the sink. - Human confirmation (G3) — an awaited/used call whose name matches
confirm/approve/requireApproval/askUser/humanInTheLoop/checkpoint/ … before the sink.
A check that appears after the dangerous call does not count — order and reachability are respected.
The conservative philosophy (the whole point)
A linter that cries wolf gets uninstalled. Low false positives is not a nice-to-have here — it is the product.
agentguard deliberately accepts false negatives to keep false positives near zero:
- It is silent on guarded code, on non-agent code, and on test / spec / fixture files by default.
- It only fires when the model-controlled value plausibly reaches the sink (literal/constant arguments never fire).
- It does not guess unknown SDK shapes. If your tool is registered through an API it doesn't recognize, it stays silent rather than guess.
- It is intra-module and conservative about data flow. A guard in another file it can't see will produce a false negative — that is the intended trade.
If agentguard ever flags code you believe is safe, that's the bug that matters most to us — please report it. We tune toward silence, not noise.
Install / run
Zero-install via npx. A local path is scanned fully offline.
# scan your agent's source
npx @allenwu06/agentguard ./src
# multiple paths
npx @allenwu06/agentguard ./src ./packages/agent
# machine-readable for tooling / CI
npx @allenwu06/agentguard ./src --json
# SARIF v2.1.0 (a standard scan-results format GitHub understands) —
# upload it so findings show in GitHub's code scanning tab
npx @allenwu06/agentguard ./src --sarif > agentguard.sarif
# stricter gate: any high or critical fails the command
npx @allenwu06/agentguard ./src --fail-on high # (high is the default)
# include test files in the scan (off by default)
npx @allenwu06/agentguard ./src --include-testsExit codes (so you can wire it into CI)
CI ("continuous integration" — the automated checks that run on every code push) reads these exit codes:
| code | meaning |
|------|---------|
| 0 | scan completed, gate not tripped. Also internal error — see below. |
| 1 | scan completed and a finding met/exceeded --fail-on (default high). |
| 2 | usage error (bad arguments). |
Fails open on purpose: if agentguard itself breaks (a bug, a folder it
can't read, a file it can't parse), it prints a loud error and exits 0. A
linter that is itself broken must never block every build. A single
unparseable file degrades the scan; it never aborts it. If you want it to be a
hard stop, make it a required check with --fail-on set, so a missing
or zero result is visible rather than silently passing.
Use it as an ESLint plugin
The plugin reuses the exact same detection core as the CLI — ESLint's
Program node is an ESTree Program, so there is no re-parse and no
behavioral drift between the CLI and the plugin.
ESLint flat config (eslint.config.js, ESLint 9+):
import agentguard from "@allenwu06/agentguard/eslint-plugin";
export default [
// turn the rule on as an error everywhere
agentguard.configs.recommended,
// …or wire it by hand for more control:
{
plugins: { agentguard },
rules: {
"agentguard/no-unguarded-tool-call": ["error", { includeTests: false }],
},
},
];To lint TypeScript, use @typescript-eslint/parser for your .ts/.tsx
files as usual; plain espree covers .js/.jsx. The single rule
agentguard/no-unguarded-tool-call reports one message per finding, keyed
by the rule id (AG001…AG005) so you can eslint-disable a specific
class without silencing the rest. There is no autofix — choosing the
right guard (which allowlist? validate or confirm?) is a human decision, and
a wrong autofix here is worse than the finding.
Use it as a GitHub Action
# .github/workflows/agentguard.yml
name: agentguard
on: [push, pull_request]
jobs:
agentguard:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: portfolio-foundry/agent-guard-lint@v0 # or a pinned SHA
with:
path: "src"
fail-on: "high"
sarif: "true"
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: agentguard.sarifThe Action emits inline PR annotations, sets outputs
(total/critical/high/medium/low/gate/sarif-file), and fails
open on internal error. No tokens or secrets required.
Configuration (optional)
agentguard works with zero config. To tune it, add
.agentguardrc.json at the repo root (or pass --config <file>):
{
"severities": { "AG003": "medium", "AG005": "off" },
"ignore": ["legacy/**", "scripts/*.js"],
"allow": ["AG-1a2b3c4d"]
}severities— remap a rule's severity, or set it to"off".ignore— path globs to skip (anchored;*= non-/,**= anything). Note: the moat is low false positives, not silencing — prefer fixing orallow-listing a reviewed finding over broadignore.allow— stable finding ids (theAG-xxxxxxxxin--json) that you have reviewed and accepted; they stop gating without hiding new findings.
Limitations (read this)
agentguard is conservative static analysis, not a prover. It is honest about what it cannot do:
- Static analysis cannot prove reachability. Whether a sink is actually reachable from untrusted input at runtime is undecidable in general. agentguard uses AST heuristics and a deliberately shallow, intra-module taint model; it is intentionally tuned to miss rather than to cry wolf.
- One-hop, same-module helper following only. A sink behind a deep call chain, or in another module, or behind dynamic dispatch, is a false negative by design.
- Known-SDK recognition only. If your tools are registered through an API agentguard doesn't recognize, it stays silent (no guessing). Open an issue with the shape and we'll consider adding it conservatively.
- Guards it can't see are assumed absent → it errs toward not firing, but a guard implemented in a way it doesn't model (e.g. a custom validator with an unrecognized name) can still yield a false negative.
- Language scope: JavaScript and TypeScript (
.js .jsx .mjs .cjs .ts .tsx .mts .cts). Minified/bundled/.d.tsfiles are skipped (a bundle is not the audited source). - No autofix, no severity inflation, no invented metrics. It reports what it can defend.
A clean agentguard run means "no unguarded tool-call sink that this conservative analysis can see" — not "this agent is safe." Use it as one fast layer, not the only one.
What it does NOT do
No network. No telemetry. No API key. No LLM/AI. It never executes the code
it scans, never resolves your tsconfig/project, and the analysis core is a
pure function (source → findings) — which is why the whole thing is unit
tested offline.
Development
npm ci
npm test # full suite, offline, no key
npm run dogfood # agentguard scans its own source → must be 0 findingsThe test suite proves the three pillars on committed fixtures:
clean = 0, vulnerable = every rule fires at the right location &
severity, borderline-legit = 0 false positives (the moat). It also
runs the ESLint plugin through the official RuleTester and validates the
SARIF v2.1.0 shape GitHub code scanning ingests.
Feedback
False positives and false negatives are the most useful thing you can send.
See FEEDBACK.md. The zero-friction path: open an issue and
add the agentguard-feedback label (there's an issue template). What you
write is captured and read exactly as written — not summarized.
License
MIT.
