whattheagent

v0.3.1

Published

2 months ago

Local-first capability discovery and governance CLI for AI agent workspaces.

0High
0Medium
0Low

rosh1801

ai-agent agent-security mcp claude codex cursor openclaw hermes security cli

WhatTheAgent

Why this exists

X-obsessed engineers and degens are installing every new AI tool, skill, and MCP server they can get their hands on. That energy is how the ecosystem moves — but as agents start running real parts of how we ship code, manage incidents, and touch infrastructure, "I added this thing yesterday" stops being a good answer to "what can your agent actually do?". We have to be responsible about it. The risk is rarely any single tool. It's the combinations:

A skill that can read .env files. Fine.
A skill that can POST to webhooks. Fine.
The same skill can do both. That's the shape of credential exfiltration, regardless of intent.

Existing tools don't catch this. SAST scans your code. SCA scans your dependencies. Neither understands what your agent can do once you give it a skill, an MCP server, and a script. There's no good answer to "my agent has these 47 capabilities — what should I worry about?"

WhatTheAgent is that answer. One command surfaces the capability chains that matter — credential_access + external_send → Data Exfiltration, execute_code + network_access → Remote Execution — with the exact files and lines, plus a one-shot way to acknowledge intentional ones (wta ack) so future scans only flag what changed.

npm install -g whattheagent
wta understand . --chat            # phone-readable summary
wta understand . --output .wta --open   # local HTML report

Catches what single-capability scanners miss. Each tool sees one risk; chains are emergent.
Local and static. No login, no upload, no LLM, no script execution, no MCP server startup.
Built for AI workflows. Emits human HTML, agent-readable JSON, a chat-friendly summary for personal agents, and a fix plan for Codex / Claude Code / Cursor / OpenClaw / Hermes.

How it looks

What you'll see

WhatTheAgent doesn't dump every capability into one big alarm list. Findings land in one of three tiers:

Inventory — "your skill reads files." Useful to know, no action needed.
Needs attention — "your agent has Burp wired up via MCP." Often legitimate, but you should acknowledge it once. Approve in policy or scope it down.
Risk chain — "this skill can read .env AND post to a webhook." That's the data-exfil shape, regardless of intent. Add a guardrail before the next run.

The point isn't to flag everything. The point is to make intentional capabilities cheap to acknowledge and unintentional combinations impossible to miss.

"Is Burp risky? It's a legitimate tool."

Yes — and yes. The flag is correct: Burp Suite can execute code and does access the network, and an LLM driving it has a remote-execution-capable tool one prompt away. WhatTheAgent surfaces it under Needs attention, not as malicious. Acknowledge it once in your policy file:

expected:
  - component: "mcp.burp"
    capability: "execute_code"
    reason: "Burp Suite MCP — security testing tool, intentional."

After that, Burp moves to "Expected" and only changes to its capabilities reappear.

Install and run

npm install -g whattheagent
wta understand . --output .wta --open    # write report.html and open it
wta plan . --for-codex                   # hand a fix plan to your coding agent

Sanity check: wta --help (or the longer whattheagent --help) lists every command. Node 20 or newer.

Choose your path

WhatTheAgent has two modes:

| Mode | Use this for | Start here | |---|---|---| | Personal agents | OpenClaw, Hermes, local skills, memory, scripts, MCP servers | Personal Agents | | Workspace stations | Codex, Claude Code, Cursor, Kiro, Windsurf, VS Code, team repos | Workspace Stations | | Agent instructions | Paste into Claude, Codex, OpenClaw, Hermes, or another agent | Agent Instructions | | GitHub Actions | CI reports for PRs and agent workspaces | GitHub Actions |

The full docs hub is in readme/.

Core loop:

User understands and approves.
Agent implements.
WhatTheAgent verifies.

Drop into an agent

Two ways:

wta instructions --for-claude     # or --for-codex / --for-openclaw / --for-hermes

…produces a copy-paste prompt that tells your agent to baseline, summarize, suggest guardrails, and re-check whenever something changes.

A ready-made Hermes / OpenClaw skill lives at skills/whattheagent-safety-check.skill.md. It orchestrates wta diff-baseline --chat, posts the message verbatim to the user, and translates approve / guardrail / remove replies into wta ack (or wta ack-batch for "approve all").

GitHub Actions

Run WhatTheAgent in CI and upload the local .wta report:

name: WhatTheAgent

on:
  pull_request:
  push:
    branches: [main]

jobs:
  whattheagent:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm install -g whattheagent
      - run: wta understand . --output .wta --json --no-color
      - run: wta instructions --for-codex --output .wta/codex-instructions.md
      - uses: actions/upload-artifact@v4
        with:
          name: whattheagent-report
          path: .wta/

A reusable example lives at:

examples/github-action/whattheagent.yml

Development From Source

git clone https://github.com/Rosh1106/WhatTheAgent.git
cd WhatTheAgent
npm install
npm run build
npm link

Local development without linking:

npm install
npm run build
npm run dev -- scan examples/risky-agent
npm run dev -- instructions --for-claude

Both binaries work after global install or linking:

whattheagent scan .
wta scan .

Commands

Core commands:

wta understand . --output .wta
wta compatibility
wta instructions --for-claude
wta plan . --for-codex
wta graph . --json
wta diff old.json new.json

Personal-agent approval flow:

wta understand . --profile hermes --output .wta
wta baseline . --profile hermes --output .wta
wta diff-baseline . --profile hermes --output .wta
wta init-policy . --profile openclaw

Agent-friendly flags:

wta understand . --json --no-color --quiet --output .wta

For CI gating and GitHub Code Scanning:

# SARIF 2.1.0 to stdout — pipe directly to github/codeql-action/upload-sarif
wta understand . --sarif > results.sarif

# Or rely on the side-effect file (always written when --output is set)
wta understand . --output .wta            # writes .wta/results.sarif
# upload .wta/results.sarif from there.

# Exit non-zero so CI fails on findings at or above the chosen severity
wta understand . --fail-on critical       # only fail on critical chains/gaps
wta understand . --fail-on high           # high or critical
wta understand . --fail-on medium         # medium, high, or critical

--fail-on accepts none (default), low, medium, high, critical. SARIF-style aliases (note, warning, error) are accepted too. The verdict prints to stderr so it never pollutes a --sarif or --json stdout pipeline.

Skip extra paths beyond the built-in defaults:

wta understand . --exclude vendor --exclude '**/*.generated.*'
wta understand . --exclude vendor,scratch,**/build/**

--exclude is repeatable, comma-separated, and accepts either a bare directory name (auto-wrapped to **/<name>/**) or any glob. Defaults already skip node_modules, dist, .venv, __pycache__, .claude/plugins/marketplaces, .claude/plugins/cache, and similar caches.

Open the rendered HTML report directly after a scan:

wta understand . --open

Acknowledge intentional capabilities

After your first scan, expect to see entries that are powerful but intentional (Burp MCP, GitHub MCP, your CI release script). Two ways to move them out of Needs attention and into Expected:

# Bulk: scan once and seed wta.policy.yaml with every detected capability as expected.
wta init-policy . --from-scan --profile personal-agent

# Targeted: ack a single component (or a single capability of one).
wta ack mcp.burp execute_code --reason "Burp Suite, security testing tool"
wta ack mcp.github --reason "Read-only GitHub MCP, approved"

Without an explicit capability, wta ack reads the current scan and acknowledges every capability that component has. Re-running an ack is a no-op — duplicates are detected by (component, capability).

For agent integrations that compose the reason at runtime (and want to avoid shell-quoting headaches), pipe it on stdin instead of passing --reason:

echo "internal finance pipeline, sends invoices to staging webhook" \
  | wta ack skill.invoice-review --reason-from-stdin

To approve many components in one call (the "approve all" intent from the chat skill), pipe a JSON array to wta ack-batch:

cat <<'JSON' | wta ack-batch --reason "approved during onboarding"
[
  { "componentId": "mcp.burp" },
  { "componentId": "skill.invoice-review", "capability": "external_send", "reason": "specific override" },
  { "componentId": "mcp.github" }
]
JSON

Each item may set its own capability and reason; otherwise it inherits the batch --reason and fans out to every capability detected for that component. Output reports added / already-present / skipped counts (--json returns the structured form for the agent to log).

Chat-style summary for personal agents (Hermes / OpenClaw / Telegram bots)

If your agent talks to you over chat, report.html is the wrong UI. Use --chat:

wta understand .  --chat            # phone-readable markdown to stdout
wta understand .  --chat --json     # structured { message, items[], actions } for the agent
wta diff-baseline . --chat --json   # same shape, only newly added skills

The chat output looks like this:

🔴 1 new skill · 1 new risk chain

invoice-review (Skill)
   skills/invoice-review/SKILL.md
   credential_access → external_send
   data exfiltration
   Component can read credentials and send data externally.

What do you want to do?
   ✅ approve — I trust this, add to policy
   🛡  guardrail — require approval / scope it down
   🚫 remove — delete it

Both files also land at .wta/chat-message.md (the markdown above) and .wta/chat-actions.json (per-item {approve, guardrail, remove} commands keyed to wta ack).

A ready-to-drop-in skill that orchestrates this conversation is at skills/whattheagent-safety-check.skill.md. It tells the agent: run wta diff-baseline --chat --json, post message verbatim to the user, listen for "approve / guardrail / remove" intent, and run the matching command from actions[]. It explicitly forbids approving without a user reason, deleting files, or paraphrasing the message — the chat output is already designed to fit a phone screen.

understand writes:

.wta/
  understand.json
  capability-graph.json
  fix-plan.md
  report.html
  agent-context.json

The report is split into:

detected setup
what your agent can do
needs attention
expected or acknowledged capabilities
suggested fixes
coding-agent fix plan

MCP servers are shown directly as MCP servers in reports and summaries.

For the current known-client path table, see Compatibility.

Workspace Detection

WhatTheAgent automatically detects workspace surfaces from files it can see:

generic MCP: .mcp.json, mcp.json
Cursor MCP: .cursor/mcp.json
VS Code MCP: .vscode/mcp.json
Claude Desktop MCP: claude_desktop_config.json
skills: SKILL.md
scripts: scripts/**/*.py|js|ts|sh
policy: wta.policy.yaml, .wta/policy.*
CI: GitHub workflows that run wta or whattheagent

The client list is intentionally simple: WhatTheAgent checks known config and skills paths, parses MCP server configs when present, and reports only the surfaces it can prove from local files.

List known client paths:

wta compatibility
wta compatibility --json

Continuous Check Loop

For personal agents:

wta baseline . --profile personal-agent --output .wta
wta diff-baseline . --profile personal-agent --output .wta

Run the diff daily, or whenever a new skill or MCP server is added. The agent instruction should summarize new capabilities and ask whether to accept them or add guardrails.

Policy

expected:
  - component: "mcp.github-readonly"
    capability: "network_access"
    reason: "GitHub read-only MCP server needs network access to api.github.com."

Policy does not hide inventory. It moves approved capabilities out of "needs attention" so users can focus on real changes.

Static and Local First

WhatTheAgent runs locally. It does not require login, upload scan data, call an API, use an LLM, execute scripts, or start MCP servers. See docs/ROADMAP.md#non-goals for the explicit list of things WhatTheAgent does not (and will not) try to do.

Example

npm run dev -- understand examples/risky-agent --output .wta
npm run dev -- plan examples/risky-agent --for-codex
npm run dev -- baseline examples/hermes-personal-agent --profile hermes --output .wta
npm run dev -- instructions --for-claude

The example workspace intentionally triggers:

external_send
credential_access
execute_code
network_access
data exfiltration and remote execution risk chains

Additional fixtures cover common review cases:

npm run dev -- understand examples/benign-agent
npm run dev -- understand examples/cursor-agent
npm run dev -- understand examples/claude-desktop-agent
npm run dev -- understand examples/vscode-agent
npm run dev -- understand examples/expected-github-tool
npm run dev -- understand examples/risky-finance-agent
npm run dev -- understand examples/critical-payment-agent

benign-agent shows low-noise inventory and ordinary observations
expected-github-tool shows an expected MCP server declared by policy
risky-finance-agent triggers credential plus external-send risk
critical-payment-agent triggers payment and order-placement risk

Status

WhatTheAgent is pre-1.0. The CLI surface, scan output schema, and policy YAML format may still change. Detection patterns will tighten over time as more workspaces are scanned and false-positive cases are reported.

What's stable today:

wta understand, wta scan, wta graph, wta diff
wta plan, wta instructions, wta compatibility
wta init-policy, wta ack, wta ack-batch
wta baseline, wta diff-baseline
--chat, --open, --exclude, --from-scan, --reason-from-stdin
The HTML report, SVG visual chains, and chat-summary output formats

What's explicitly out of scope — sandbox capability probing and runtime enforcement. See docs/ROADMAP.md#non-goals for the reasoning. WhatTheAgent stays local and static; if you need a sandbox, use gVisor / nsjail / Docker; if you need runtime enforcement, use the agent runtime's own controls.

See docs/ROADMAP.md for what's planned next.

Tests

npm install
npm test           # 173 tests, ~700ms
npm run typecheck
npm run build

Test coverage includes risk classification, chain detection, sensitivity scoring, finding lifecycle, MCP and skill parsers, secret redaction, SVG and HTML report stability (with HTML-injection escape), the chat-summary builder, the policy-mutation engine (ack + ack-batch), and end-to-end scans of the example fixtures.

Contributing

WhatTheAgent is open to contributions. Start with CONTRIBUTING.md — it has the dev setup, the test rules ("no PR without a regression test for whatever you fixed or added"), and the commit-style guide.

If you have a security report, please follow SECURITY.md instead of opening a public issue.

For day-to-day questions, the readme/ directory has audience-specific docs:

readme/personal-agents.md — for OpenClaw / Hermes / personal-agent users
readme/workspace-stations.md — for Codex / Claude Code / Cursor / VS Code repos
readme/agent-instructions.md — copy-paste prompts for agents
readme/compatibility.md — known-client paths and MCP config locations

License

MIT.