agent-red-team

v0.1.0

Published

10 days ago

Security scanner for AI coding agents — test your setup against prompt injection, credential exposure, identity tampering, and more.

0High
0Medium
0Low

gemini2026

ai security scanner agent red-team prompt-injection claude cursor openclaw mcp ai-safety

agent-red-team

How secure is your AI coding agent? Find out in 3 seconds.

npx agent-red-team

Zero-install security scanner that tests your AI coding agent setup against real attack vectors. Works with Claude Code, Cursor, OpenClaw, NemoClaw, and any MCP-connected agent.

What It Tests

| Category | Weight | What's Checked | |---|---|---| | Injection Resistance | 25% | Prompt injection scanning, input validation, base64 rescan, unicode normalization | | Credential Exposure | 20% | SSH keys, AWS creds, .env files, sandbox isolation, env var leaks | | Identity Tampering | 15% | CLAUDE.md, .cursorrules, SOUL.md write protection, file ownership | | Behavioral Evasion | 20% | Session tracking, trifecta detection (read-process-exfil), policy escalation | | Network Isolation | 10% | Sandbox network rules, SSRF protection, domain allowlisting, egress proxy | | Audit Integrity | 10% | Audit logging, hash chain integrity, tamper-evident config, export API |

Sample Output

Agent Red Team v0.1.0
Target: Claude Code (settings.json, CLAUDE.md, mcp-configured)
Runtime: darwin

INJECTION RESISTANCE           ████████████░░░░  60/100
  ✗ Injection scanner configured    (high)
  ✓ Input validation present
  ✗ Base64 decode and rescan        (medium)
  ✗ Unicode normalization           (medium)
  ✓ Injection pattern coverage

CREDENTIAL EXPOSURE            █████████████████ 95/100
  ✓ SSH keys protected
  ✓ AWS credentials protected
  ✗ Project .env file exposure      (medium)
  ✓ Sandbox blocks credential reads

IDENTITY TAMPERING             ████████████░░░░  67/100
  ✗ CLAUDE.md write-protected       (high)
  ✓ Identity file guard active
  ✓ Identity file ownership

BEHAVIORAL EVASION             ░░░░░░░░░░░░░░░░   0/100
  ✗ Session tracking configured     (high)
  ✗ Trifecta detection              (high)
  ✗ Policy escalation               (high)
  ✗ Multi-step chain guard          (medium)

NETWORK ISOLATION              ████████░░░░░░░░  50/100
  ✓ Sandbox network restrictions
  ✗ SSRF protection                 (high)
  ✗ Domain allowlisting             (medium)
  ✗ Egress proxy configured         (medium)

AUDIT INTEGRITY                ░░░░░░░░░░░░░░░░   0/100
  ✗ Audit logging enabled           (high)
  ✗ Hash chain integrity            (high)
  ✗ Tamper-evident configuration    (medium)
  ✗ Audit export API available      (low)

---
OVERALL SCORE                  ██████████░░░░░░  63/100
Grade: C

Share: "My AI agent scored 63/100 on agent-red-team ⚠️"

CLI Reference

Usage: agent-red-team [options]

Options:
  --target <agent>     Override auto-detection (claude-code|openclaw|nemoclaw|cursor)
  --mcp-url <url>      MCP server URL for generic testing
  --active             Enable active probing mode
  --json               Output JSON report to stdout
  --category <name>    Run only one category:
                         injection | credentials | identity
                         behavioral | network | audit
  --verbose            Show individual test details and explanations
  --no-color           Disable colored output
  -V, --version        Output version number
  -h, --help           Display help

Examples

# Scan everything (auto-detects your agent)
npx agent-red-team

# Target a specific agent
npx agent-red-team --target claude-code

# Only check credential exposure
npx agent-red-team --category credentials

# Verbose output with all details
npx agent-red-team --verbose

# JSON output for CI pipelines
npx agent-red-team --json > report.json

# Active probing mode (attempts real attack sequences)
npx agent-red-team --active

Scoring Methodology

Each category produces a score from 0-100 based on the ratio of passed checks.

Categories that cannot be tested (e.g., no agent detected for that check) are marked N/A and their weight is redistributed proportionally across testable categories.

Letter grades:

| Grade | Score Range | |---|---| | A+ | 90-100 | | A | 80-89 | | B | 70-79 | | C | 60-69 | | D | 50-59 | | D- | 40-49 | | F | 0-39 |

The CLI exits with code 1 if the overall score is below 40 (grade F), making it suitable for CI gates.

How to Improve Your Score

Injection Resistance -- Deploy Gatekeeper for injection scanning middleware
Credential Exposure -- Use sandbox profiles, restrict file permissions, avoid env var secrets
Identity Tampering -- Make CLAUDE.md and .cursorrules read-only, use Gatekeeper's identity guard
Behavioral Evasion -- Enable session tracking with trifecta detection
Network Isolation -- Configure sandbox network rules and egress proxy
Audit Integrity -- Enable tamper-evident audit logging with hash chains

Contributing

Contributions are welcome. Please open an issue first to discuss what you would like to change.

# Development setup
git clone https://github.com/knowledge2ai/agent-red-team.git
cd agent-red-team
npm install
npm run build
npm test

# Run locally
node dist/cli.js --verbose

When adding new attack checks:

Create or modify the relevant module in src/attacks/
Each check returns an AttackResult with name, passed, severity, and detail
Add tests in test/
Run npm test before submitting a PR

License

Apache 2.0

Built by the team at Knowledge2 -- enterprise AI infrastructure with live knowledge feeds.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agent-red-team

What It Tests

Sample Output

CLI Reference

Examples

Scoring Methodology

How to Improve Your Score

Contributing

License