crowbar-security

v0.1.3

Published

3 months ago

autonomous black-box web penetration testing. give it a URL, it finds everything exploitable.

0High
0Medium
0Low

security penetration-testing vulnerability-scanner web-security pentest owasp sqli xss ssrf dast appsec devsecops ci-cd sarif

crowbar

autonomous web penetration testing. give it a url, it finds everything exploitable.

what it does

crowbar crawls a live website, fingerprints the stack, selects attack vectors, exploits vulnerabilities, and hands you a proof-of-concept report. no source code needed. no manual endpoint mapping. point and shoot.

npx crowbar-security scan https://target.com

crowbar v0.1.0

target: https://target.com
[recon] passive: 14 subdomains, 23 historical endpoints
[recon] complete: 47 endpoints, 12 forms, 8 JS files
[mapper] stack: server=nginx, language=Node.js, framework=Express, database=PostgreSQL, waf=Cloudflare
[mapper] found 6 hidden parameters across 3 endpoints
[mapper] 142 attack vectors planned
[mapper] payload order randomized (anti-fingerprinting)
[attack] 142/142 (5 vulns found)
[verify] 4 vulnerabilities confirmed
[chain] 2 attack chains identified
[report] saved: ./crowbar-report.md, ./crowbar-report.json

crowbar scan complete in 27.3s
4 vulnerabilities confirmed across 47 endpoints

install

npm install -g crowbar-security
npx playwright install chromium

requires node 20+.

api keys

on first run, crowbar prompts for an API key automatically. or set it manually:

crowbar config set anthropic-key sk-ant-YOUR_KEY

ANTHROPIC_API_KEY and/or OPENAI_API_KEY -- enables AI-driven payload generation, strategy adaptation, chain discovery, and ambiguous response analysis. crowbar works without these but loses its smartest features. typical scan costs $1-5 in API spend.
get a key: https://console.anthropic.com/settings/keys or https://platform.openai.com/api-keys

max settings

crowbar scan https://target.com \
  --ai-aggressive \
  --swarm \
  --max-requests 5000 \
  --max-depth 5 \
  --rate-limit 50 \
  --format json,md,html,sarif \
  --output ./crowbar-report

cli

9 commands:

# full autonomous scan
crowbar scan https://target.com

# recon only (no attacks)
crowbar recon https://target.com

# attack specific endpoint
crowbar attack https://target.com --endpoint /api/users

# re-verify findings from a previous report
crowbar verify ./crowbar-report.json

# CI/CD pipeline scan (non-interactive, exits with status code)
crowbar ci https://target.com --fail-on high --webhook https://hooks.slack.com/xxx

# scan multiple targets from a file
crowbar multi targets.txt --parallel 3

# continuous monitoring (periodic scans, delta alerts)
crowbar watch https://target.com --interval 6h --webhook https://hooks.slack.com/xxx

# web dashboard (localhost only)
crowbar serve --port 3333

# manage API keys and config
crowbar config set anthropic-key sk-ant-...
crowbar config list

scan options

# authentication
crowbar scan https://target.com --cookie "session=abc123"
crowbar scan https://target.com --bearer "eyJ..."

# modes
crowbar scan https://target.com --mode stealth
crowbar scan https://target.com --mode aggressive

# output formats
crowbar scan https://target.com --format md,json,html,sarif,csv

# external knowledge bases
crowbar scan https://target.com --nuclei-templates ~/nuclei-templates
crowbar scan https://target.com --wordlist ~/SecLists
crowbar scan https://target.com --payloads ~/PayloadsAllTheThings

# incremental scanning (only test new/changed endpoints)
crowbar scan https://target.com --incremental

# resume interrupted scan
crowbar scan https://target.com --resume

# with pinata integration (white-box-guided)
crowbar scan https://target.com --gaps gaps.json

# dry run (recon + mapping, no attacks)
crowbar scan https://target.com --dry-run

# scope and safety
crowbar scan https://target.com --rate-limit 5 --max-requests 1000
crowbar scan https://target.com --scope "target.com,api.target.com"

ci/cd integration

# .github/workflows/security.yml
- uses: crowbar-security/scan@v1
  with:
    target: https://staging.yourapp.com
    fail-on: high
    format: sarif
    auth-cookie: ${{ secrets.SESSION_COOKIE }}

the GitHub Action runs crowbar ci, uploads SARIF to GitHub Security tab, and fails the build if findings exceed the severity threshold.

how it works

URL
 |
RECON --> MAPPER --> ATTACKER --> VERIFIER --> REPORTER
 |          |          |            |
 v          v          v            v
          KNOWLEDGE GRAPH
          (queryable by all phases)
              |
           AI BRAIN
   (strategy, adaptation, chains)

12-phase pipeline:

passive recon: DNS enumeration (100+ subdomain wordlist), certificate transparency (crt.sh), Wayback Machine historical endpoints
active recon: Playwright crawl with network interception (captures every real API call SPAs make), JS bundle analysis, source map harvesting, path probing (~150 common paths), SPA interaction (clicks buttons, navigates Angular/React routes)
tech fingerprinting: 60+ detection rules across headers, cookies, error messages, body patterns, WAF signatures
auto-authentication: discovers login endpoints, tries SQLi bypass + default credentials, registers test accounts, stores JWT for authenticated testing
attack planning: context-aware attack selection. source-aware priority boosting when --repo is provided
attack execution: 41 attack plugins + targeted probes against well-known API paths. 5-layer WAF evasion
verification: deterministic plugin verification + AI fallback for ambiguous cases
proof-by-exploitation: replays every confirmed vuln in a real Playwright browser, captures screenshots as evidence
autonomous exploitation: ReAct agent loop uses Claude tool_use to escalate each vuln to maximum impact (UNION data extraction, IDOR enumeration, credential theft)
swarm deep-dive (--swarm): 6 specialist AI agents attack in parallel, adversarial QA agent filters false positives
chain discovery: 8 chain templates plus AI-driven novel chain discovery
reporting: markdown, JSON, HTML, SARIF, CSV with curl PoCs and compliance mapping

41 attack plugins

injection: SQL injection (error + blind boolean + blind timing + UNION), NoSQL injection (MongoDB operators), command injection (output + timing), server-side template injection (Jinja2, Twig, Freemarker, Velocity, ERB, Pug), XML external entity (file + SSRF + parameter entity), second-order SQLi/XSS (cross-endpoint correlation)

cross-site: reflected XSS (context-aware: HTML body, attribute, script, URL), stored XSS, DOM XSS (Playwright source-sink tracing), CSRF (cross-origin token check), postMessage origin validation

access control: IDOR (sequential + UUID probing), BOLA (cross-object authorization, method override), CORS misconfiguration (origin reflection, null trust, regex bypass), forced browsing, auth bypass (default credentials), mass assignment (27 sensitive fields), broken access control (method + header injection), rate limit bypass (header-based IP spoofing)

infrastructure: SSRF (localhost, cloud metadata AWS/GCP/Azure/DigitalOcean, 12 IP bypass variants), path traversal (encoding variants, null byte), subdomain takeover (13 service fingerprints), host header injection, WebSocket security (cross-site hijacking, origin validation)

code execution: malicious file upload (web shells, polyglot, extension bypass), prototype pollution (server-side + client-side via Playwright), insecure deserialization

auth: JWT algorithm confusion (alg none + admin forgery), OAuth/OIDC (state parameter CSRF, redirect_uri bypass with 10 evasion variants, scope escalation, implicit flow token exposure)

logic: race conditions (parallel TOCTOU), workflow bypass (step skipping, state machine violation), open redirect, GraphQL (introspection dump, batching, field suggestion)

caching: web cache poisoning (unkeyed header injection, unkeyed parameter injection, path-based cache deception, delimiter discrepancy, hop-by-hop header abuse)

external: known CVE detection via nuclei templates

waf evasion

5 escalating layers, triggered automatically when a WAF blocks:

encoding: URL, double URL, unicode, HTML entity
structural: case randomization, SQL comment insertion, null bytes
http-level: content-type switching, HTTP parameter pollution
protocol-level: chunked transfer encoding obfuscation
network-level: IP spoofing headers (10 variants), proxy rotation

detects Cloudflare, AWS WAF, ModSecurity, Akamai, Imperva, F5, Azure WAF.

ai brain

uses Anthropic Claude and OpenAI GPT with model routing: cheap models (gpt-4o-mini) for response parsing and payload generation, expensive models (claude sonnet) for strategy planning, verification, and chain discovery.

cost tracking: every AI call tracked. configurable budget cap (default $10). typical scan costs $1-5. works without AI too.

--ai-aggressive mode

crowbar scan https://target.com --ai-aggressive

enables 4 AI-powered capabilities that turn crowbar from a template matcher into an adaptive hacker:

AI recon expansion: after crawling, the AI analyzes discovered URL patterns and suggests hidden endpoints (admin panels, API versioning, debug endpoints). crowbar probes each suggestion and adds confirmed ones to the attack surface.
AI target prioritization: the AI ranks which endpoints are most likely exploitable based on the full knowledge graph (tech stack, parameter names, response patterns).
AI novel payload generation: when a plugin's template payloads all fail, the AI generates 3 creative payloads that differ structurally from the failures, considering the detected tech stack and WAF. research shows 80%+ WAF bypass rates with LLM-generated payloads.
AI response analysis: each AI-generated payload's response is analyzed by the AI to determine if exploitation succeeded, catching edge cases that regex patterns miss.

per-endpoint cost guard (max 3 AI calls per endpoint) prevents runaway spend. all AI decisions are logged with [ai] prefix for auditability.

autonomous exploitation (ReAct agent)

when --ai-aggressive is enabled, crowbar doesn't just detect vulnerabilities -- it exploits them to demonstrate maximum impact. after confirming a finding, a ReAct (reasoning + acting) agent loop takes over:

the agent reasons about what exploitation steps to take
executes HTTP requests as actions (UNION SELECT enumeration, data extraction, privilege escalation)
observes the response and adapts its strategy
repeats until it achieves concrete impact or exhausts approaches (max 15 steps)

for SQLi, this means going from "error-based detection" to "extracted 3 user records including password hashes via UNION SELECT." for SSRF, from "internal service accessible" to "read AWS IAM credentials from metadata endpoint." for IDOR, from "sequential ID accepted" to "enumerated 10 user records with PII."

the agent uses Claude's tool_use API for structured action execution. each exploit attempt costs ~$0.50-1.00. exploit logs saved to {output}/exploits/exploit-results.json with full step-by-step reasoning traces.

this is the "no exploit, no report" philosophy: every finding in the report has proven, demonstrated impact, not just pattern-matched detection.

external knowledge bases

crowbar ships with its own payloads and wordlists, but scales massively with external repos:

nuclei-templates: --nuclei-templates ~/nuclei-templates loads YAML templates as a fast-pass known-CVE layer. runs before the AI engine. supports status/word/regex matchers, extractors, AND/OR conditions
SecLists: --wordlist ~/SecLists expands path discovery, parameter enumeration, subdomain bruteforce, and attack payloads. 10-100x coverage boost over built-in lists
PayloadsAllTheThings: --payloads ~/PayloadsAllTheThings imports comprehensive attack payloads organized by type (SQLi, XSS, SSRF, SSTI, XXE, LFI, RCE, NoSQLi)

continuous monitoring

crowbar watch https://target.com --interval 6h --webhook https://hooks.slack.com/xxx

runs periodic scans, maintains a history of findings, alerts when new vulnerabilities appear, and tracks when old ones get fixed. configurable interval (1h, 6h, 1d, 7d), safety-capped at max runs. webhook alerts fire only on deltas, not every scan.

web dashboard

crowbar serve --port 3333

dark-theme web UI at http://127.0.0.1:3333. start scans, view history, inspect vulnerabilities, track remediation. binds to localhost only -- never exposed to the network. optional API key auth via CROWBAR_API_KEY env var. REST API at /api/ for programmatic access. rate limited to 60 req/min. max 3 concurrent scans.

benchmark results

tested against OWASP Juice Shop (110 challenges, the industry standard):

| | crowbar | ZAP | human pentest | |---|---|---|---| | vulns found | 46 | 13 | 18 | | challenges solved | 35-42 / 110 | ~8 | ~20 | | vuln types | 12 | 3-4 | 5+ | | cost | $0.04 | free | $15-30k | | time | 15-25 min | 15-30 min | 2-4 weeks | | auto-auth | SQLi bypass | none | manual | | logic flaws | param omission, race conditions, boundary values | no | yes |

crowbar autonomously broke into Juice Shop via SQLi auth bypass, discovered 300+ endpoints across 91 JS files, solved 35-42 challenges including DOM XSS, UNION credential extraction, null byte file access, JWT forgery, nOAuth password derivation, race condition exploitation, and CAPTCHA bypass. 46 vulnerability findings across 12 types with 10 attack chains. no source code, no manual configuration, no OpenAPI spec.

swarm mode

crowbar scan https://target.com --swarm

6 specialist AI agents attack in parallel, each with deep domain knowledge:

| specialist | focus | |---|---| | injection | SQLi (error/blind/UNION), NoSQLi, command injection, SSTI, XXE | | xss | reflected, stored, DOM, postMessage, CSP bypass | | access-control | IDOR, BOLA, CORS, forced browsing, mass assignment | | infrastructure | SSRF, path traversal, subdomain takeover, cache poisoning | | auth | JWT, OAuth/OIDC, default credentials, rate limit bypass | | logic | race conditions, workflow bypass, GraphQL, file upload |

after specialists finish, an adversarial QA agent reviews every finding and tries to disprove it. only findings that survive scrutiny reach the report.

interactive mode (--swarm --interactive) generates a prompt for Claude Code Agent Teams, spawning each specialist in its own tmux pane for real-time observation.

validation and benchmarks

five benchmark suites covering every major standard:

validation ladder (quick smoke tests)

./scripts/validation.sh fixture          # built-in test fixture
./scripts/validation.sh dvwa-low         # DVWA security=low
./scripts/validation.sh juice-shop       # OWASP Juice Shop
./scripts/validation.sh all              # run everything

XBOW benchmark (the gold standard for AI pentesters, 104 CTF challenges)

./scripts/xbow-benchmark.sh --limit 10  # first 10 challenges
./scripts/xbow-benchmark.sh             # all 104 (1-2 hours)

each challenge gets a random flag injected at build time. crowbar must extract the flag through exploitation to pass. directly comparable to Shannon (96.15%) and XBOW (85%). also runs on GitHub Actions (workflow_dispatch).

API security (OWASP API Top 10)

./scripts/benchmark-apis.sh crapi       # crAPI: BOLA, mass assignment, auth bypass, SSRF
./scripts/benchmark-apis.sh vampi       # VAmPI: SQLi, BOLA, enumeration, rate limiting
./scripts/benchmark-apis.sh             # both

DAST comparison (pentest-tools.com methodology: TP/FP/FN scoring)

./scripts/benchmark-dast.sh dvwa        # DVWA with vuln manifest scoring
./scripts/benchmark-dast.sh crystals    # Broken Crystals (React/Node.js)
./scripts/benchmark-dast.sh             # both, produces TP/FP/FN rates

scores against known vulnerability manifests, directly comparable to Acunetix, Burp Suite, Qualys, Rapid7, ZAP from the 2024 pentest-tools.com benchmark.

OWASP Top 10 coverage (HTB AI Range equivalent)

./scripts/benchmark-htb.sh              # maps findings to all 10 OWASP categories

tests against Juice Shop with OWASP Top 10 category mapping. also documents HackTheBox MCP integration for Cursor (see .cursor/mcp.json.example).

honeypot detection

analyzes target responses for 5 honeypot signals before wasting attack budget: suspiciously high vuln rate (>90%), fake SQL errors on non-SQL input, artificially consistent timing, known honeypot signatures (HFish, Cowrie, T-Pot), fingerprint mismatches.

pipeline integration

crowbar is the third tool in a security pipeline:

pinata (white-box) scans source code, outputs gaps.json

whackamole (gray-box) attacks known endpoints, generates and verifies fixes

crowbar (black-box) needs nothing. optionally pass --gaps gaps.json for guided priorities

pinata (knows code)  -->  whackamole (knows gaps)  -->  crowbar (knows nothing)

safety

scope enforcement: every request checked against domain allowlist. DNS pre-resolution. private IP blocking
banned targets: .gov, .mil, .edu, major platforms blocked by default
rate limiting: configurable with hard cap at 100 req/s. adaptive slowdown on 429
destructive prevention: no DELETE/DROP by default. explicit flag required
request logging: every request/response as compressed JSONL audit trail
confirmation prompt: explicit yes/no before first attack
cost cap: AI budget enforced. scan completes gracefully if exceeded
payload randomization: shuffled per scan to avoid fingerprinting
honeypot detection: aborts or warns before wasting budget on decoys

compliance

vulnerability reports include regulatory compliance references:

PCI-DSS (payment card security)
OWASP (web application security)
HIPAA (healthcare data)
SOC2 (service organization controls)
GDPR (data protection)

tech stack

TypeScript, Node.js 20+
Playwright (browser crawling, DOM XSS, client-side pollution, postMessage)
commander.js (CLI)
zod (runtime type validation)
Anthropic SDK + OpenAI SDK (AI brain)
vitest (testing)
tsup (bundling)

development

npm run dev      # watch mode
npm test         # run tests (1249 tests)
npm run build    # production build (571KB)
npm run lint     # type check

legal

this tool is for authorized security testing only. users must have explicit written permission to test targets. unauthorized access to computer systems is illegal. report findings responsibly.

license

MIT