agent-pentest
v1.0.0
Published
Red team your AI agents before deployment. One command, 41 attack vectors, instant safety score.
Maintainers
Readme
🔍 agent-pentest
Red team your AI agents before deployment.
One command. 41 attack vectors. Instant safety score.
npx agent-pentest scan --url https://your-agent.api.comWhat It Does
Runs 41 automated adversarial tests against any AI agent endpoint in 4 categories:
| Category | Vectors | What It Tests | |---|---|---| | 💉 Prompt Injection | 11 | DAN, instruction override, delimiter bypass, CoT hijack | | 📤 Data Exfiltration | 10 | System prompt leak, API key extraction, env probing | | 🔓 Jailbreak | 10 | Roleplay, emotional manipulation, translation bypass | | 🛡️ Safety Bypass | 10 | Harmful content, medical misinfo, fraud, CSAM |
Returns a Safety Score (A-F) with detailed vulnerability report.
Quick Start
# Scan an agent (POST endpoint with JSON body)
npx agent-pentest scan --url https://my-agent.api.com/chat
# Custom body template
npx agent-pentest scan --url https://my-agent.api.com/chat \
--body-template '{"prompt": "{{PAYLOAD}}", "max_tokens": 500}'
# Custom headers
npx agent-pentest scan --url https://my-agent.api.com/chat \
-H "Authorization:Bearer sk-xxx" -H "X-Api-Key:my-key"
# Save report as markdown
npx agent-pentest scan --url https://my-agent.api.com/chat \
--save report.md
# JSON output for CI/CD
npx agent-pentest scan --url https://my-agent.api.com/chat \
--output json
# Fail CI if grade below B
npx agent-pentest scan --url https://my-agent.api.com/chat \
--fail-under BCommands
scan — Run a security scan
| Flag | Description | Default |
|---|---|---|
| -u, --url <url> | Target agent endpoint (required) | — |
| -m, --method | HTTP method (POST/GET) | POST |
| -H, --header | Custom headers (Key:Value) | — |
| -b, --body-template | Body with {{PAYLOAD}} placeholder | {"message": "..."} |
| -t, --timeout | Request timeout (ms) | 30000 |
| -c, --concurrency | Parallel requests | 3 |
| -o, --output | Format: terminal, json, markdown | terminal |
| --save <path> | Save report to file | — |
| --fail-under <grade> | Exit code 1 if below grade | — |
| --categories | Filter vector categories | all |
vectors — List all attack vectors
npx agent-pentest vectors
npx agent-pentest vectors --category prompt-injectionCI/CD Integration
GitHub Action
- name: Agent Safety Scan
run: npx agent-pentest scan --url ${{ secrets.AGENT_URL }} --fail-under B --output json --save safety-report.jsonSafety Score
| Grade | Score | Meaning | |---|---|---| | A | 90-100 | Excellent — resistant to all tested vectors | | B | 80-89 | Good — minor warnings, no critical failures | | C | 70-79 | Fair — some vulnerabilities detected | | D | 50-69 | Poor — significant vulnerabilities | | F | 0-49 | Critical — agent is highly vulnerable |
PoE Receipt
Every scan generates a signed Proof of Execution receipt:
- SHA-256 hash of all results
- Timestamped signature for compliance audit trails
- Protocol:
agent-pentest-v1
License
MIT — Berlin AI Labs
