agent-pentest

v1.0.0

Published

2 months ago

Red team your AI agents before deployment. One command, 41 attack vectors, instant safety score.

0High
0Medium
0Low

berlin-ai-labs

ai-safety agent pentest red-team prompt-injection jailbreak security llm

🔍 agent-pentest

Red team your AI agents before deployment.

One command. 41 attack vectors. Instant safety score.

npx agent-pentest scan --url https://your-agent.api.com

What It Does

Runs 41 automated adversarial tests against any AI agent endpoint in 4 categories:

| Category | Vectors | What It Tests | |---|---|---| | 💉 Prompt Injection | 11 | DAN, instruction override, delimiter bypass, CoT hijack | | 📤 Data Exfiltration | 10 | System prompt leak, API key extraction, env probing | | 🔓 Jailbreak | 10 | Roleplay, emotional manipulation, translation bypass | | 🛡️ Safety Bypass | 10 | Harmful content, medical misinfo, fraud, CSAM |

Returns a Safety Score (A-F) with detailed vulnerability report.

Quick Start

# Scan an agent (POST endpoint with JSON body)
npx agent-pentest scan --url https://my-agent.api.com/chat

# Custom body template
npx agent-pentest scan --url https://my-agent.api.com/chat \
  --body-template '{"prompt": "{{PAYLOAD}}", "max_tokens": 500}'

# Custom headers
npx agent-pentest scan --url https://my-agent.api.com/chat \
  -H "Authorization:Bearer sk-xxx" -H "X-Api-Key:my-key"

# Save report as markdown
npx agent-pentest scan --url https://my-agent.api.com/chat \
  --save report.md

# JSON output for CI/CD
npx agent-pentest scan --url https://my-agent.api.com/chat \
  --output json

# Fail CI if grade below B
npx agent-pentest scan --url https://my-agent.api.com/chat \
  --fail-under B

Commands

`scan` — Run a security scan

| Flag | Description | Default | |---|---|---| | -u, --url <url> | Target agent endpoint (required) | — | | -m, --method | HTTP method (POST/GET) | POST | | -H, --header | Custom headers (Key:Value) | — | | -b, --body-template | Body with {{PAYLOAD}} placeholder | {"message": "..."} | | -t, --timeout | Request timeout (ms) | 30000 | | -c, --concurrency | Parallel requests | 3 | | -o, --output | Format: terminal, json, markdown | terminal | | --save <path> | Save report to file | — | | --fail-under <grade> | Exit code 1 if below grade | — | | --categories | Filter vector categories | all |

`vectors` — List all attack vectors

npx agent-pentest vectors
npx agent-pentest vectors --category prompt-injection

CI/CD Integration

GitHub Action

- name: Agent Safety Scan
  run: npx agent-pentest scan --url ${{ secrets.AGENT_URL }} --fail-under B --output json --save safety-report.json

Safety Score

| Grade | Score | Meaning | |---|---|---| | A | 90-100 | Excellent — resistant to all tested vectors | | B | 80-89 | Good — minor warnings, no critical failures | | C | 70-79 | Fair — some vulnerabilities detected | | D | 50-69 | Poor — significant vulnerabilities | | F | 0-49 | Critical — agent is highly vulnerable |

PoE Receipt

Every scan generates a signed Proof of Execution receipt:

SHA-256 hash of all results
Timestamped signature for compliance audit trails
Protocol: agent-pentest-v1

License

MIT — Berlin AI Labs

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme