vulcn
v1.1.1
Published
Security evals for the AI era. Probes · Targets · Graders · Proof. Confirmed XSS / SQLi / BOLA / prompt-injection / MCP-RCE with reproducible proof attached to every finding.
Maintainers
Readme
vulcn
Security evals for the AI era. Probes · Targets · Graders · Proof.
vulcn is a security evaluation framework — the Promptfoo / DeepEval shape,
with security probes and security graders. Deterministic validators
for web vulns (XSS, SQLi, BOLA, RCE, traversal); calibrated LLM-as-judge
graders for AI-shaped surfaces (LLM endpoints, MCP servers, agents). Every
finding ships with reproducible proof.
# Scan a deployed preview:
npx vulcn scan https://your.app
# Two-identity scan unlocks BOLA / broken-auth:
npx vulcn scan https://your.app/dashboard \
--user alice --pass "$ALICE_PASS" \
--user2 bob --pass2 "$BOB_PASS"
# LLM-endpoint scan (needs ANTHROPIC_API_KEY or another provider):
npx vulcn scan https://your.app/api/chat --kind llm-endpoint
# Save findings, patch the bug, re-verify:
npx vulcn scan https://your.app --json > findings.json
# (apply your fix)
npx vulcn verify findings.json # PASS = fixed, FAIL = still exploitableDocs: vulcn.dev/docs · Repo: vulcnize/vulcn · Community probes: vulcnize/probes
Suites shipping in v1
| Suite | What it scans for | Grader |
| ----------------- | ------------------------------------------------------------------ | ------------- |
| passive | Security headers, cookie flags, CORS, info-disclosure | deterministic |
| web-classics | Reflected XSS (browser-execution-confirmed), SQLi (error / boolean / timing) | deterministic |
| access-control | BOLA / IDOR (two-identity cross-user replay), broken-auth | deterministic |
| llm-endpoint | Prompt injection, jailbreak, system-prompt leak, exfil, tool-misuse | LLM-judge |
| mcp-tool | RCE / path traversal / tool-definition poisoning via MCP servers | deterministic |
| community | YAML probe templates loaded via --probes <dir> | deterministic |
Principles
- Proof over probability. Confirmed findings only — every finding ships with the payload, the request, and the observed evidence.
- Calibrated, never vibe-judged. Every grader can be measured against
a labeled benchmark via
vulcn calibrate <suite>. - Self-sufficient. No fixtures, no recorded flows, no AppSec team
required. Vulcn drives its own login from
--user/--pass. - Developer-first. Agent-native. CLI, MCP server (
vulcn mcp), GitHub Action, Claude Code / Cursor skill, HTML report. - One coherent thing. One package. No plugin SDK. No driver abstraction.
Install
npm install -g vulcn
# One-time: install Chromium for the browser-tier graders (XSS).
npx playwright install chromium
# Optional: set an AI provider for the llm-endpoint suite.
export ANTHROPIC_API_KEY=sk-ant-...
# Or: OPENAI_API_KEY / GOOGLE_GENERATIVE_AI_API_KEY / OLLAMA_HOST.CLI surface
vulcn scan <url> # run a security scan
vulcn verify <findings> # re-run scan probes to confirm a fix
vulcn calibrate <suite> # report precision / recall / F1 vs a benchmark
vulcn mcp # start the MCP server (stdio transport)Run vulcn <command> --help for all flags.
Agent / CI integration
- MCP server —
vulcn mcpexposesscan/verify/get_findings/calibrateas tools Claude Code, Cursor, or any MCP-aware agent can call. - GitHub Action —
actions/vulcn-scan/action.ymlruns Vulcn on a PR, uploads SARIF to Code Scanning, optionally posts confirmed findings as a PR comment. - Agent skill —
skills/vulcn/SKILL.mdis a drop-in Claude Code skill with proactive-trigger guidance and CLI examples. - HTML report —
vulcn scan --html report.htmlwrites a self-contained Vulcan-styled report.
Community templates
Authoring a YAML probe template? See
templates/CONTRIBUTING.md for the
schema, severity guidelines, and PR review checklist. Load your local
templates with vulcn scan --probes ./your-templates/.
Status
Stardate 79234.5 · v0.1.0 launch candidate. F1 numbers per suite at vulcn.dev/docs/calibration/results.
License
MIT. See LICENSE.
🖖 Live long and ship securely.
