@phoenixaihub/vuln-harvest

v0.1.0

Published

2 months ago

AI-guided vulnerability discovery framework. Agentic harness: hypothesis → PoC → verify → triage.

0High
0Medium
0Low

phoenixaihub

security vulnerability ai llm static-analysis sarif xss sqli code-scanning

VulnHarvest — AI-Guided Vulnerability Discovery Framework

Problem

Security teams lack open-source tooling to use LLMs for systematic vulnerability discovery in codebases. Mozilla internally proved AI can find 423 Firefox bugs in one month (including 15-year-old UAFs, sandbox escapes, race conditions) using Claude Mythos — but hasn't open-sourced the pipeline. Meanwhile, offense is getting cheaper (AI-assisted vulnerability scanning by attackers), and defenders need the same tooling.

Solution

Open-source agentic harness for AI-guided vulnerability discovery:

Hypothesis Generation — LLM analyzes code patterns, generates vulnerability hypotheses
PoC Creation — Automated proof-of-concept test generation per hypothesis
Verification — Execute PoCs in sandboxed environment, confirm exploitability
Deduplication — Match against known CVEs, filter false positives
Triage — Severity classification (CVSS-like scoring), report generation

Project-agnostic: bring your own codebase, your own model, your own CI.

Market

TAM: $15B+ application security testing market (growing 20%+ YoY)
Adjacent validated: Snyk ($8.5B valuation), Semgrep (OSS + commercial), CodeQL (GitHub/Microsoft)
Gap: None of these use LLMs for hypothesis-driven discovery. They're pattern-matching or static analysis. VulnHarvest is the next generation.
Mozilla proof point: 423 bugs in 1 month including bugs that survived decades of fuzzing — validates the approach at scale

Architecture

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│  Code Ingest │───▶│  Hypothesis  │───▶│  PoC Gen    │
│  (AST/CFG)   │    │  Generator   │    │  Engine     │
└─────────────┘    └──────────────┘    └─────────────┘
                                              │
┌─────────────┐    ┌──────────────┐    ┌──────▼──────┐
│  Reporter   │◀───│  Triage &    │◀───│  Sandbox    │
│  (SARIF)    │    │  Dedup       │    │  Executor   │
└─────────────┘    └──────────────┘    └─────────────┘

TypeScript/Node.js CLI
SARIF output for CI/CD integration
Pluggable LLM backend (OpenAI, Anthropic, local models)
Sandboxed execution (Docker-based)
CVE database integration

Competitive Landscape

| Tool | Approach | LLM-Guided? | Open Source? | |------|----------|-------------|-------------| | Semgrep | Pattern matching | No | Yes | | CodeQL | Dataflow analysis | No | Partial | | Snyk Code | ML pattern detection | Partial | No | | Mozilla/Mythos | LLM hypothesis | Yes | No (internal) | | VulnHarvest | LLM hypothesis + PoC | Yes | Yes |

Verdict: BUILD

Rationale:

Technical moat: Agentic harness with hypothesis→PoC→verify loop is novel in open source
Market timing: Mozilla just proved the approach works; no open-source equivalent exists
Brand fit: Extends phoenix-assistant security cluster (mcp-security-scanner, agent-security-scanner, etc.)
Feasibility: MVP scope is achievable — hypothesis gen + PoC for common vulnerability classes (XSS, SQLi, path traversal, buffer overflow patterns)
Converging signals: 3+ independent sources (Mozilla/Mythos 357 HN pts, jefftk.com 367 HN pts, Karpathy supply chain alert, Xeiaso 831 HN pts)

MVP Scope

Code ingestion (AST parsing for JS/TS/Python/C)
Hypothesis generation via LLM (configurable model)
PoC test generation for top 5 vulnerability classes
Basic sandbox execution
SARIF output
CLI interface: vulnharvest scan ./src --model claude-sonnet

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme