hackmyagent

v0.25.0

Published

5 days ago

Find it. Break it. Fix it. The hacker's toolkit for AI agents.

0High
0Medium
0Low

ecolibria

ai agent security mcp claude cursor skills hardening scanner attack benchmark runtime-protection

HackMyAgent

OpenA2A: CLI · HackMyAgent · Secretless · AIM · Browser Guard · DVAA

Security scanner, red-team toolkit, and behavioural simulator for AI agents. Apache 2.0.

Website · Demos · Discord

Quick start

npx hackmyagent secure

  my-project  v1.0.0 · library · 47 files analyzed
  3 critical issues found

  Security  ━━━━━━━━━━━━━━━━━━━━ 42/100

  ── Observations ────────────────────────────────────────────
  Surfaces    library · 47 files
  Checks      209 static · 12 semantic (NanoMind AST) · 0 skipped
  Categories  credentials (3 critical) · MCP (2 high) · 18 others clear
  Verdict     Not safe to ship. Fix 3 critical issues before using this in production.

  ── Findings ────────────────────────────────────────────────
  │ CRITICAL  Exposed API key in .env
  │ .env:3
  │ Anthropic API key (sk-ant-api03-****) detected in plaintext.
  │ Verify: sed -n '3p' '.env'
  │ Fix: hackmyagent secure --fix

No config files. No flags required. Exit code 1 if any critical or high finding fires.

HackMyAgent Demo

What it finds

209 static checks across 44 categories. Credentials, MCP configs, OpenClaw and NemoClaw, Unicode steganography, CVEs, governance, supply chain, memory and RAG poisoning, agent identity, sandbox escape.
29 NanoMind semantic checks. Every artifact (skill, MCP config, SOUL.md, system prompt) compiles into an Abstract Security Tree. The seven AST analyzers run against the tree: capability, credential, governance, scope, prompt, code, stego. Pattern matching misses undeclared capabilities, constraint weakness, scope mismatches, and scanner-evasion attempts. AST queries catch them. (This 29 is the fixed catalog of semantic checks. The Checks line in scan output — e.g. 12 semantic (NanoMind AST) above — reports the number of artifacts compiled in that particular run, not this catalog size.)
164 adversarial payloads across 16 categories. Prompt injection, jailbreak, data exfiltration, capability abuse, context manipulation, MCP and A2A exploitation, memory weaponisation, context window, supply chain, tool shadow, parser differential, persistent agent, fake tool, context lifecycle, policy enforcement integrity.
20-probe behavioural simulation under --deep. Observes what a skill actually does, not only what it declares.
Self-securing. Every binary verifies itself on startup against an embedded SHA-256 manifest. Post-install tampered binaries enter QUARANTINE mode (exit code 3) with a per-file forensics report. Symlink-redirected manifests are rejected.

Full catalogue: docs/SECURITY_CHECKS.md.

Install

npm

npx hackmyagent secure          # run without installing
npm install -g hackmyagent      # global install
npm install --save-dev hackmyagent

Requires Node.js 18 or later.

Homebrew

brew install opena2a-org/tap/hackmyagent

From source

git clone https://github.com/opena2a-org/hackmyagent.git
cd hackmyagent
npm install
npm run build
node dist/cli.js secure

Verifying what was installed

Every release publishes via npm Trusted Publishing with SLSA v1 provenance. No long-lived NPM_TOKEN. GitHub Actions exchanges its OIDC token with npm at publish time.

npm view hackmyagent dist.attestations --json
# Expects non-empty result with predicateType "https://slsa.dev/provenance/v1"

Scan anything

hackmyagent check <target> accepts each of these surfaces. secure scans your own project. scan-soul scans governance.

| Surface | Command | What gets scanned | |---|---|---| | Your own project | hackmyagent secure | 209 static checks + NanoMind on current directory | | A local directory | hackmyagent check ./my-agent/ | tree + auto-detected artifacts | | An npm package | hackmyagent check express | downloads tarball, scans before you install | | A PyPI package | hackmyagent check pip:requests | downloads sdist, scans before you install | | A GitHub repo | hackmyagent check getsentry/sentry-mcp | clones, scans, reports | | A published skill | hackmyagent check @publisher/skill | signature verification + semantic checks | | A local skill directory | hackmyagent check ./my-skill/ | skill files + SOUL.md + manifest | | An MCP server config | hackmyagent check ./my-mcp-server/ | MCP config + declared tools + scope + dependencies | | An A2A agent card | hackmyagent check ./my-agent/ | agent-card capabilities + identity | | A URL tarball | hackmyagent check https://ex.com/pkg.tar.gz | downloads, scans | | External infrastructure | hackmyagent scan example.com | external AI-endpoint inventory | | Governance (SOUL.md) | hackmyagent scan-soul | SOUL.md against OASB-2 behavioural controls |

secure vs check vs red-team vs attack

secure: your own project. Full static + semantic scan, auto-fix option, designed for CI and recurring use.
check: something you don't own yet. Pre-install trust check for any surface above.
red-team: adaptive attacks against a specific skill, MCP, or SOUL. You've scanned it; now see if it resists.
attack: test a live endpoint or local simulation with 164 pre-built adversarial payloads.

Commands

`secure` (security scan)

hackmyagent secure                            # scan current directory
hackmyagent secure --fix                      # auto-fix issues with rollback
hackmyagent secure --fix --dry-run            # preview fixes
hackmyagent secure --deep                     # full behavioural simulation (20 probes)
hackmyagent secure --static-only              # static checks only, faster
hackmyagent secure --ignore CRED-001,GIT-002  # skip specific check IDs
hackmyagent secure --json                     # JSON output for CI
hackmyagent secure --ci                       # non-interactive, exit non-zero on findings
hackmyagent secure --publish                  # push anonymised results to the OpenA2A Registry
hackmyagent secure -b oasb-1                  # OASB-1 benchmark (L1, L2, L3)
hackmyagent secure -b oasb-1 --fail-below 70  # CI gate
hackmyagent secure --nanomind                 # AI analyst: per-finding narratives + coverage escalations

Output shows an Observations block (surfaces, checks, categories, verdict) and a per-finding list. Every HIGH or CRITICAL finding has a file:line location and a runnable Fix: command.

NanoMind semantic analysis

Runs automatically on every secure scan. On first use, HMA downloads a 5.5 MB ONNX classifier from HuggingFace (opena2a/nanomind-security-classifier, a 3M-parameter Mamba TME model) and caches it locally. No external calls after that.

7 AST analyzers: capability, credential, governance, scope, prompt, code, stego.
9 attack classes: exfiltration, injection, privilege_escalation, persistence, credential_abuse, lateral_movement, social_engineering, policy_violation, benign.
--deep adds the 20-probe behavioural simulation.
--static-only disables the semantic layer.
--nanomind opts into the generative analyst (specialist model, not the classifier). It produces per-finding threat narratives on HIGH or CRITICAL findings, and a coverage sweep over artifacts the deterministic checks did not flag — analyst verdicts there surface as advisory escalations for human review (never changing the score, findings, or exit code).

`red-team` (adaptive attack engine)

hackmyagent red-team ./my-skill.md             # red-team a skill file
hackmyagent red-team ./SOUL.md --iterations 10 # more attack iterations
hackmyagent red-team ./mcp-config.json --json  # JSON output

Generates target-specific attacks from the artifact's own language and constraints. Iterates up to 5 times per category, maps defences, produces specific remediation.

`attack` (payload battery)

hackmyagent attack --local                                             # local simulation
hackmyagent attack --local --system-prompt "You are helpful"           # with custom system prompt
hackmyagent attack https://api.example.com/v1/chat                     # test a live endpoint
hackmyagent attack --local --category prompt-injection                 # single category
hackmyagent attack https://api.example.com --fail-on-vulnerable medium # CI gate

164 payloads across 16 categories. Intensity tiers: passive (28 payloads, observation only), active (111 payloads, default), aggressive (164 payloads, includes creative or risky probes).

Only test systems you own or have written authorisation to test.

Need a target to practice on? DVAA is an intentionally vulnerable agent fleet. Break an agent there, then red-team it here:

# Start the DVAA fleet (separate terminal)
docker run -p 7001-7008:7001-7008 -p 7010-7016:7010-7016 -p 7020-7021:7020-7021 -p 9000:9000 opena2a/dvaa:0.9.1

# Red-team LegacyBot, the most vulnerable agent
hackmyagent attack http://localhost:7003/v1/chat/completions --api-format openai --intensity passive

hackmyagent attack red-teaming a live DVAA agent: 100/100 CRITICAL, 28 of 28 attacks successful across 14 categories

`scan-soul` and `harden-soul` (governance)

hackmyagent scan-soul                     # scan current directory for SOUL.md
hackmyagent scan-soul --deep              # LLM semantic analysis (requires ANTHROPIC_API_KEY)
hackmyagent scan-soul --fail-below 60     # CI gate
hackmyagent scan-soul --explain           # print the 9-domain governance model and exit
hackmyagent harden-soul                   # generate or update governance sections
hackmyagent harden-soul --dry-run         # preview without writing

Auto-detects governance file in this priority: SOUL.md, system-prompt.md, CLAUDE.md, .cursorrules, agent-config.yaml.

`detect` (shadow AI audit)

hackmyagent detect                              # audit current directory
hackmyagent detect /path/to/project             # audit a specific project
hackmyagent detect --json                       # machine-readable output
hackmyagent detect --export-csv inventory.csv   # asset inventory for CMDB

Inventory of AI tools, MCP servers, and governance gaps across your machine. Detects Claude Code, Cursor, Copilot, and similar tools; MCP configurations (project-local and machine-wide); AI config files with credential references or broad permission grants; and SOUL.md files.

`trust`, `explain`, `nanomind`

hackmyagent trust server-filesystem      # MCP shorthand trust lookup against the Registry
hackmyagent trust --audit package.json   # audit every dependency
hackmyagent explain CRED-001             # explain a check finding
hackmyagent nanomind setup               # install the optional generative analyst daemon
hackmyagent nanomind status              # check model and runtime status

Optional AAP gate on `trust`

hackmyagent trust can be gated by the Agent Authorization Protocol. When --grant is set, the CLI presents an ATX and a grant reference to the local Secretless broker before any Registry lookup. The broker is the policy decision point; the CLI proceeds only if the broker authorizes.

hackmyagent trust express \
  --grant grant://hackmyagent-trust \
  --atx ~/.opena2a/atx.json

Outcomes:

Broker authorizes -> trust proceeds.
Broker denies (HTTP 403) -> exit 3 with a pointer to ~/.secretless-ai/policies/. AAP §6.6: the denial is opaque; reasons live only in the broker's signed audit log.
Broker unreachable -> exit 4 with a secretless broker start hint.
Broker returns an unexpected status -> exit 6. The response body is never echoed to the user.
No --grant flag -> trust runs exactly as before; the gate is opt-in.

This is the second TypeScript AAP consumer (after opena2a protect --grant, opena2a-org/opena2a#179). Defends T-3002 (cross-tenant grant leakage), T-3003 (over-broad credential scope), T-3006 (credential leaking into agent context), T-8002 (audit attribution gap) at the CLI surface.

OpenClaw and NemoClaw auto-detection

hackmyagent secure auto-detects OpenClaw and NemoClaw installations (.openclaw/, .moltbot/, .nemoclaw/, openclaw.json, openclaw.plugin.json). When detected, 28 NemoClaw plus 34 OpenClaw checks run alongside the standard suite. No separate command needed.

Using with opena2a-cli

opena2a-cli is the unified CLI for the OpenA2A security tools. HackMyAgent powers opena2a review, opena2a scan, opena2a protect, opena2a benchmark, and opena2a scan-soul.

npm install -g opena2a-cli
opena2a review

Runtime protection (ARP)

ARP monitors AI agents during execution with three intelligence layers: rule-based pattern matching (40+ patterns), statistical anomaly detection, and LLM-assisted assessment.

opena2a runtime init     # generate config
opena2a runtime start    # start monitoring
opena2a runtime status   # check status

ARP also runs as an HTTP reverse proxy for inspecting OpenAI API, MCP, and A2A protocol traffic. Configure via opena2a runtime.

CI/CD integration

All commands support --json and --ci flags.

name: Agent Security
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npx hackmyagent secure --json > security-report.json
      - run: npx hackmyagent secure -b oasb-1 --fail-below 70

SARIF for the GitHub Security tab:

- run: npx hackmyagent attack --local -f sarif -o results.sarif --fail-on-vulnerable medium
- uses: github/codeql-action/upload-sarif@v3
  with: { sarif_file: results.sarif }

Pre-commit hook:

#!/bin/sh
# .git/hooks/pre-commit
npx hackmyagent secure --ignore LOG-001,RATE-001

Exit codes

| Code | Meaning | |---|---| | 0 | Clean. No critical or high issues. | | 1 | Critical or high severity issues found. | | 2 | Incomplete scan. One or more plugins failed. | | 3 | QUARANTINE. Binary integrity check failed (tampered installation). |

Auto-fix catalogue

| Check | Issue | Auto-fix | |---|---|---| | CRED-001 | Exposed API keys | Replace with env-var reference | | GIT-001 | Missing .gitignore | Create with secure defaults | | GIT-002 | Incomplete .gitignore | Add missing patterns | | PERM-001 | Overly permissive files | Set restrictive permissions | | MCP-001 | Root filesystem access | Scope to project directory | | NET-001 | Bound to 0.0.0.0 | Bind to 127.0.0.1 | | GATEWAY-001 | Gateway bound to 0.0.0.0 | Bind to 127.0.0.1 | | GATEWAY-003 | Plaintext token | Replace with ${OPENCLAW_AUTH_TOKEN} | | GATEWAY-004 | Approvals disabled | Enable approvals | | GATEWAY-005 | Sandbox disabled | Enable sandbox |

Use --dry-run to preview changes. Backups live in .hackmyagent-backup/. Rollback with hackmyagent rollback.

Programmatic API

import { HardeningScanner, AgentRuntimeProtection, AttackScanner } from 'hackmyagent';

import {
  SemanticCompiler,
  analyzeCapabilities,
  analyzeCredentials,
  analyzeGovernance,
  analyzeScope,
  analyzePrompt,
  analyzeCode,
  getTMEClassifier,
} from 'hackmyagent/nanomind-core';

const compiler = new SemanticCompiler();
const { ast } = await compiler.compile(skillContent, 'my-skill.skill.md');
// ast.intentClassification: 'benign' | 'suspicious' | 'malicious'
// ast.inferredCapabilities, ast.declaredConstraints, ast.inferredRiskSurface
const findings = analyzeCapabilities(ast);

Plugin authoring: docs/PLUGIN_API.md.

Use cases

| Guide | Time | |---|---| | Scan my agent | 5 min | | Red-team MCP servers | 10 min | | Secure OpenClaw | 10 min | | CI/CD pipeline | 5 min |

Full index: docs/USE-CASES.md.

Contributing

Apache 2.0. PRs from outside the org welcome. CONTRIBUTING.md has the dev loop, test conventions, and pre-push review gates.

git clone https://github.com/opena2a-org/hackmyagent.git
cd hackmyagent && npm install && npm run build && npm test

Security issues: [email protected] (coordinated disclosure, response within 24 hours).

License

Apache-2.0. See LICENSE.