mcp-web-recon-agent

v0.8.1

Published

4 months ago

MCP server for analyst-first owned-target web security assessments of authenticated, high-friction apps

0High
0Medium
0Low

leddconsulting

mcp mcp-server security web-security recon assessment appsec

web-recon-agent

Analyst-first web application assessment workflow for authenticated, challenge-heavy, and edge-protected targets. It combines passive AI analysis, operator-controlled browser depth, deterministic active probing, and verification-first reporting to find chainable auth, workflow, API, and exposure risk without collapsing into generic scanner noise.

v0.8.1 — 9 passive AI agents + 32 deterministic active agents + 4 advanced AI attack-intelligence modules. New in v0.8.x: Hypothesis Engine generates ranked attack hypotheses from the target graph and cross-scan knowledge; Adaptive Probing executes multi-step curl probe chains guided by hypotheses with domain-scoped URL validation; Attack Chain Composer identifies multi-step exploitation chains from confirmed findings; Cross-Scan Knowledge Graph persists Bayesian-scored attack patterns across scans for continuous learning. Active coverage now includes direct origin-exposure verification for edge/CDN-protected targets in addition to subdomain enumeration, JS secret scanning, source-map leakage, cloud storage, WAF detection, CSRF, file upload handling, race-condition signals, workflow mutation, SQLi/LFI, HTTP smuggling, DNS/email security, host header injection, OAuth/OIDC/SAML surface mapping, OpenAPI abuse surface mapping, safe API mutation confirmation, open redirect, GraphQL, JWT analysis, prototype pollution, cache poisoning, OAST scaffolding, SSRF, and XXE. A shared recon-core powers three product policies: saas, analyst, and enterprise-assisted.

This repo should be understood as a verified assessment workflow, not a generic "AI scanner." The strongest fit is authenticated apps, Cloudflare or challenge-heavy targets, SPA-only surface, role/object access-control depth, workflow or API abuse discovery, and owned-target origin/deployment exposure checks. If you want the product wedge and homepage positioning spelled out, see PRODUCT_POSITIONING.md, HOMEPAGE_COPY.md, the target filter in TARGET_SELECTION_RUBRIC.md, and the execution plan in ROADMAP-90-DAYS.md.

Roadmap

Product roadmap: ROADMAP-90-DAYS.md
Prioritized backlog: BACKLOG-PRIORITIES.md
Trust and verifier roadmap: VERIFICATION-ROADMAP.md
Outreach workflow: OUTREACH_PLAYBOOK.md
Hosted MCP packaging plan: MCP_SERVER_PLAN.md
Hosted MCP pricing model: MCP_PRICING_MODEL.md
Private MCP deployment: ops/private-mcp/README.md

Product Wedge

Best fit: authenticated apps, admin/staff back offices, SPAs, challenge-heavy targets, workflow-heavy products
Core differentiators: browser-attached scanning, authenticated role/object diffing, workflow recording, safe API mutation, origin-exposure verification, verification-first reporting, disclosure-safe handoff
Not the goal: compete as a generic public internet scanner or judge product quality by whether it can "take over" a static single-page site
Simpler edge-protected sites still matter: the owned-target story there is origin exposure, JS secrets, source maps, leaked backend APIs, and deployment/storage mistakes
Future hosted packaging should expose this through a remote MCP control plane over scan jobs, not by billing raw tool invocations
Use TARGET_SELECTION_RUBRIC.md before scanning to classify domains as outreach, benchmark-only, design-partner, or skip

Primary workflows

| Workflow | Command | What it does | |------|---------|--------------| | Discover | npm run discover -- <url> [--browser] [--output-dir path] | Passive-first discovery for new targets. Uses the passive/full entrypoint without active probing. | | Verify | npm run verify -- <url> [--profiles profiles.json] [--browser] [--output-dir path] | Owned-target verification loop. Runs the deterministic active-only entrypoint and auto-adds --i-own-this. | | Retest | npm run retest -- <url> [--profiles profiles.json] [--output-dir path] | Verification plus baseline diffing. Auto-wires --baseline output/<domain>/report.json by default, or <output-dir>/report.json when --output-dir is set, unless you pass --baseline yourself. |

Low-level entrypoints are still available when you want finer control:

| Entrypoint | Command | What it does | |------|---------|--------------| | Passive | npx tsx src/index.ts <url> | 9 AI agents crawl the target and analyze headers, auth, dependencies, exposed files, privacy/compliance, API surface, content security, and business logic | | Passive + Browser | npx tsx src/index.ts <url> --browser | Passive AI analysis plus optional browser discovery for SPA routes, forms, storage, XHR/fetch capture, and challenge-aware coverage reporting | | Active | npx tsx src/active-only.ts <url> --i-own-this [--scan-preset balanced|owned-aggressive] | Sends real HTTP probes across 32 deterministic agents — your own site only | | Full | npx tsx src/index.ts <url> --active --i-own-this [--profiles profiles.json] [--browser] [--scan-preset balanced|owned-aggressive] | Passive + active in sequence, with optional scripted login automation, MFA/manual checkpoints, authenticated role/object diffing, workflow recording, and browser discovery |

Requirements

Node.js 18+
curl and openssl (pre-installed on macOS/Linux)
dig (pre-installed on macOS/Linux — for subdomain enumeration)
nmap (optional — for port scanning)
nikto (optional — for 6,700+ web vulnerability checks)
playwright (optional — required for --browser, login automation, and browser workflow capture)
Claude Code CLI (claude) installed globally — required for passive/full mode only
ANTHROPIC_API_KEY (optional but recommended) — enables automatic passive/report retry through Anthropic API auth when Claude Code quota or subscription auth is exhausted
OPENAI_API_KEY (optional) — required when --passive-provider openai

Installation

git clone https://github.com/joepangallo/web-recon-agent.git
cd web-recon-agent
npm install

npm package for MCP clients

The public MCP package is published as mcp-web-recon-agent and runs over stdio:

npx -y mcp-web-recon-agent

At minimum, set MCP_TARGET_ALLOWLIST before launching it:

MCP_TARGET_ALLOWLIST="app.example.com,staging.example.com" npx -y mcp-web-recon-agent

Usage

Exact npm command syntax

When invoking through npm run, put -- before the scan arguments:

npm run discover -- https://example.com
npm run discover -- https://example.com --browser --output-dir output/example-discover
npm run verify -- https://app.example.com --profiles profiles.json --product-profile analyst
npm run verify -- https://app.example.com --profiles profiles.json --product-profile analyst --output-dir output/app-verify
npm run retest -- https://app.example.com --profiles profiles.json --product-profile analyst
npm run retest -- https://app.example.com --profiles profiles.json --product-profile analyst --output-dir output/app-retest
npm run scan -- https://example.com
npm run scan -- https://app.example.com --browser --product-profile analyst
npm run active -- https://app.example.com --i-own-this --profiles profiles.json --product-profile analyst
ANTHROPIC_API_KEY=your_key_here npm run scan -- https://app.example.com --active --i-own-this --passive-provider claude --passive-model claude-opus-4-6 --passive-effort max --passive-thinking adaptive
OPENAI_API_KEY=your_key_here npm run scan -- https://app.example.com --active --i-own-this --browser --product-profile analyst --passive-provider openai --passive-model gpt-5.1-codex-max --passive-effort max
npm run scan -- https://app.example.com --active --i-own-this --profiles profiles.json --owned-aggressive --product-profile analyst

Use npm run discover -- ..., npm run verify -- ..., and npm run retest -- ... as the default product workflow. Keep npm run scan -- ... and npm run active -- ... for low-level control, and npm run report:render -- ... / npm run report:disclosure -- ... for report helpers. If you omit --passive-provider, the passive/report path defaults to claude.

Workflow shorthand

npm run discover -- https://app.example.com --browser
npm run verify -- https://app.example.com --profiles profiles.json --browser --product-profile analyst
npm run retest -- https://app.example.com --profiles profiles.json --product-profile analyst

Use distinct --output-dir values when you want clean passive and active artifact sets instead of reusing output/<domain> across multiple runs.

discover maps to src/index.ts without --active.
verify maps to src/active-only.ts and automatically adds --i-own-this.
retest maps to src/active-only.ts, automatically adds --i-own-this, and injects --baseline output/<domain>/report.json by default, or --baseline <output-dir>/report.json when --output-dir is set, unless you provide --baseline yourself.
If you already know you want the deep owned-target pass, verify and retest still accept --owned-aggressive.

Product profiles

saas — passive-only product policy for low-friction self-serve scans. It blocks active probing, browser discovery, authenticated session profiles, scripted login automation, and premium consulting outputs.
analyst — full local consulting workflow. This is the default profile and preserves the current deep assessment behavior.
enterprise-assisted — same deep scan surface as analyst, intended for higher-touch managed assessments.

Use --product-profile <name> on either entrypoint, or set "productProfile" in profiles.json.

Passive scan (AI agents — read-only)

npx tsx src/index.ts https://example.com
npx tsx src/index.ts example.com
npx tsx src/index.ts https://app.example.com --browser --product-profile analyst
npx tsx src/index.ts https://app.example.com --passive-provider claude --passive-model claude-opus-4-6 --passive-effort max --passive-thinking adaptive
npx tsx src/index.ts https://app.example.com --passive-provider openai --passive-model gpt-5.1-codex-max --passive-effort max
npm run scan -- https://app.example.com --passive-provider openai --passive-model gpt-5.1-codex-max --passive-effort max

Runs 9 AI agents in parallel against the target:

Recon — crawls site, builds intelligence context (llm.txt)
Headers — analyzes all HTTP security response headers
Disclosure — probes for exposed files, backups, VCS directories
Auth — analyzes authentication and session security
Deps — fingerprints JS/CSS library versions against known CVEs
Privacy — analyzes privacy policy, cookie consent, GDPR/CCPA compliance, tracking scripts
API Discovery — maps API endpoints from JavaScript, finds undocumented routes, analyzes auth patterns
Content Security — detects PII exposure, internal IPs, debug artifacts, developer comments in source
Business Logic — analyzes forms for IDOR, price manipulation, privilege escalation, account flow weaknesses

Default passive/full scans stop there. They do not run the optional hypothesis engine, adaptive probing, attack-chain composer, or consult the cross-scan knowledge graph unless you explicitly enable those flags later.

If Claude Code quota or subscription auth is exhausted during passive/full mode, passive agents and the report writer now retry automatically through the same Agent SDK using ANTHROPIC_API_KEY when it is set. Without that env var, the run fails with a message telling you the API fallback is unavailable. Use --passive-provider claude|openai to choose the passive/report backend. If omitted, the provider defaults to claude. claude keeps the current Claude Agent SDK path with built-in WebFetch, Read, and Write; openai runs the same prompts and tool contract through the OpenAI Responses API with local implementations of those same tools. --passive-model, --passive-effort, --passive-thinking, and --passive-thinking-budget then tune that selected provider. The --passive-thinking-budget flag is primarily useful on the Claude path; on the OpenAI path, --passive-effort is the main reasoning control. The OpenAI path now retries transient 429 and 5xx Responses API failures with bounded backoff before failing the passive agent.

analyst and enterprise-assisted write the full consulting artifact set: output/<domain>/report.md, report.json, report.html, artifact-index.html, coverage.json, coverage-diff.{json,md}, target-graph.json, expansion-plan.json, planner-execution.json, assertion-evaluation.json, disclosure-summary.{json,md}, handoff-gate.json, hallucination-flags.{json,md}, false-positive-review.json, scan-drift.{json,md}, scan-drift-history.json, report.csv, report.sarif.json, jira-export.json, asset-inventory.{md,json}, evidence-pack/, remediation-pack.{md,json}, fix-plan.{md,json}, fix-verification.{md,json,sh}, shared-observations.jsonl, shared-observations.md, and llm.txt. If --browser is enabled, they also write browser-capture.json, browser-evidence/, and browser-state/. When authenticated profiles are present, the tool can also write session-bootstrap.json, session-traces/, workflow-recorder.json, workflow-recordings/, workflow-recordings/action-traces/, workflow-artifacts/, state-replay.json, role-diff-matrix.json, object-diff-matrix.json, exact-object-diff.json, file-upload-probes.json, and workflow-mutations.json, and api-mutations.json. All agent runs also emit raw-evidence/manifest.json, and the verifier keeps a shared local learning store at output/_shared/false-positive-learning.json. If a baseline report is available, it also writes retest-diff.{md,json}. After manual review, you can optionally add reviewed handoff copies such as report-reviewed.md, report-reviewed.html, and report-reviewed.pdf. The saas profile keeps only the core passive report artifacts (report.{md,json,html}, asset-inventory.{md,json}, exports, shared observations, planner/verification/disclosure/drift snapshots, and optional retest diff). Persisted handoff artifacts are redacted on write so cookies, bearer tokens, CSRF values, typed credentials, and obvious PII do not get copied into reports or evidence bundles by default. browser-state/ is intentionally kept as live session material for reuse and should be treated as sensitive local-only state.

Active scan (direct probes — your own site only)

npx tsx src/active-only.ts https://yoursite.com --i-own-this
npx tsx src/active-only.ts https://app.example.com --i-own-this --profiles profiles.json --product-profile analyst
npx tsx src/active-only.ts https://app.example.com --i-own-this --profiles profiles.json --owned-aggressive --product-profile analyst
npm run active -- https://app.example.com --i-own-this --profiles profiles.json --product-profile analyst

Use profiles.example.json as the starting template for authenticated user/admin session profiles, scripted login steps, optional manualCheckpoint MFA pauses, OAST settings, suppression rules with optional expiry, export toggles, optional browser settings, and target policies. If a profile includes loginFlow.steps, the tool will use Playwright to log in first, capture fresh cookies into session-bootstrap.json, persist browser state under output/<domain>/browser-state/, and reuse those cookies for later active probes. If passive recon generates a challenge page, auth wall, maintenance shell, or generic error shell instead of real site context, the original llm.txt is quarantined under output/<domain>/llm-quarantine/ and later browser capture can supplement the trusted context. If you want schema-aware API analysis from runtime responses, set browser.captureApiBodies to true. For challenge-heavy sites, use --browser-headed --browser-manual-challenge or set browser.headless=false and browser.manualChallenge=true so the browser can pause on bot-protection interstitials and persist the resulting clearance state. Target policies let you scope scans without code edits: choose a preset (conservative, balanced, aggressive, owned-aggressive), allow or block path patterns, skip specific agents, cap workflow and mutation volume, and toggle screenshot and HTML evidence capture. owned-aggressive is reserved for infrastructure you control. It auto-enables browser discovery, API body capture, the hypothesis engine, adaptive probing, attack-chain composition, and higher workflow/mutation budgets while still blocking obviously destructive paths by default. Assertion packs let you turn owned-target expectations into first-class scan invariants. They live in profiles.json, run after normal finding verification, write assertion-evaluation.json, and failed assertions become confirmed assertion-pack findings so they participate in report counts, evidence packs, and handoff gating. Coverage gates let you turn partial browser coverage into a failing run: use --fail-on-partial, --fail-on-blocked-required-path, or --min-required-path-coverage <0-100> in CI, or set coverageGates in profiles.json. Required paths can now be weighted and marked critical so the most important authenticated routes count more heavily than low-value pages. The scan only marks completion.status=partial when agents fail or critical required paths remain blocked or missing; non-critical challenge and policy-skipped pages still appear in coverage.json and handoff scoring without automatically failing the whole run.

"requiredPaths": [
  { "path": "/billing", "profileId": "user", "weight": 3, "critical": true },
  { "path": "/reports", "profileId": "admin", "weight": 5, "critical": true },
  { "path": "/api/account", "weight": 2 }
]

"assertions": [
  {
    "id": "admin-only-reports",
    "title": "Only the admin profile may reach /reports",
    "type": "route-access",
    "path": "/reports",
    "allowProfiles": ["admin"],
    "denyProfiles": ["anonymous", "user"]
  },
  {
    "id": "no-origin-or-js-leaks",
    "title": "Owned target should not expose origin bypass or same-origin secret leakage",
    "type": "finding-absence",
    "severity": "critical",
    "match": {
      "agents": ["active:origin-exposure", "active:secrets", "active:sourcemap"],
      "verificationStatuses": ["confirmed", "strong-signal"],
      "minimumSeverity": "medium"
    }
  }
]

Useful browser flags:

--browser-headed launches the browser in headed mode.
--browser-manual-challenge pauses on detected challenge interstitials and waits for them to clear.
--browser-channel chrome uses an installed browser channel instead of bundled Chromium.
--browser-connect-url http://127.0.0.1:9222 attaches to an operator-launched Chromium browser over CDP.
--browser-user-data-dir /path/to/profile launches a persistent Chromium profile and reuses its browser state.
--browser-user-agent, --browser-locale, --browser-timezone, and --browser-proxy-server let you align the browser with the target environment.
--fail-on-partial, --fail-on-blocked-required-path, --fail-on-critical-required-path, and --min-required-path-coverage 80 convert incomplete coverage into a non-zero exit for CI or benchmark gating.

Operator attach helper:

npm run browser:attach -- --channel chrome --target https://app.example.com
npm run browser:attach -- --channel chrome --target https://app.example.com --profiles profiles.json --product-profile analyst --owned-aggressive --manual-challenge
npm run browser:attach -- --channel chrome --target https://app.example.com --profiles profiles.json --product-profile analyst --owned-aggressive --manual-challenge --write-config output/browser-attach-plan.json

That launches a local browser with remote debugging enabled and prints ready-to-run discover, verify, and retest commands using the matching --browser-connect-url and --browser-user-data-dir settings. Add --write-config when you want a reusable attach plan artifact for handoff or repeated challenge-heavy scans.

Private MCP server

This repo now includes a private remote MCP server for design-partner and owned-target workflows:

cp mcp-server-config.example.json mcp-server-config.json
export MCP_API_KEYS="replace-with-long-random-key"
export MCP_TARGET_ALLOWLIST="app.example.com,staging.example.com"
export MCP_OWNED_TARGETS="app.example.com"
npm run mcp

Runtime endpoints:

GET /health
GET /healthz
POST /mcp

Use ops/private-mcp/README.md for deployment notes and the sample systemd unit.

MCP Registry metadata

The checked-in MCP Registry manifest is server.json. It maps the registry server name io.github.joepangallo/web-recon-agent to the npm package mcp-web-recon-agent.

Operator review helper:

npm run review:false-positives -- list --output-dir output/example.com
npm run review:false-positives -- show --output-dir output/example.com --signature 'missing_direct_evidence::headers::header:strict-transport-security'
npm run review:false-positives -- mark --output-dir output/example.com --signature 'missing_direct_evidence::headers::header:strict-transport-security' --decision false-positive --note 'Reviewed during triage'
npm run review:false-positives -- stats --output-dir output/example.com

Use that helper to review pending verifier signatures without hand-editing false-positive-review.json.

Trust and reviewed-handoff helpers:

node .claude/skills/recon-triage/scripts/report-trust-score.mjs output/example.com
npm run report:disclosure -- output/example.com
npm run report:disclosure -- output/example.com --recheck-live
npm run report:disclosure -- output/example.com --strict-outreach
npm run report:disclosure -- output/example.com --strict-notify
npm run report:recheck -- output/example.com
npm run report:recheck -- output/example.com --strict-outreach
npm run report:recheck -- output/example.com --strict-notify
npm run report:decide -- output/example.com
npm run report:notify -- --output-root output --mode both --dry-run
npm run report:render -- output/example.com

Use the trust scorer before external outreach or client handoff. Use npm run report:disclosure -- output/<domain> to regenerate disclosure-summary.{json,md} and handoff-gate.json from an existing report.json. Add --recheck-live when you also want a live sendability-recheck.{json,md} artifact for the first-contact candidates, or run npm run report:recheck -- output/<domain> separately. Add --strict-outreach to either helper when you want a fail-closed external send/no-send gate: the command exits non-zero unless at least two outreach-grade findings still reproduce on a live recheck. Add --strict-notify when you want a stricter cron/operator-email gate: the command exits non-zero unless the handoff trust data is clean, the report still looks like outreach, and every selected first-contact finding survives the live recheck with no pending reviews, downgrades, unsupported passive claims, or critical coverage gaps. Use npm run report:decide -- output/<domain> when you want one blunt operator answer written to notification-decision.{json,md}; it now refreshes disclosure-summary, handoff-gate, and sendability-recheck in the same pass. The disclosure summary labels the current output as outreach or benchmark-only, and the sendability recheck now reports three machine-friendly lanes: notificationDisposition=alert for immediate operator mail, notificationDisposition=digest for blocked-but-interesting runs that belong in a lower-priority summary, and notificationDisposition=none for archive-only output. The notification decision turns that into a simple action: cold-outreach, digest-review, or archive-only. The new npm run report:notify -- --output-root output --mode alert|digest|both helper is the unattended runner: it keeps state in output/_shared/notification-state.json, auto-bootstraps existing runs without spamming you, and only sends mail for new alert or digest fingerprints. If notification-decision.json says operatorAction: "cold-outreach", you can reach out. Otherwise, do not. Use npm run report:render -- output/<domain> after you have prepared a curated report-reviewed.md and want matching report-reviewed.html and report-reviewed.pdf copies. report:render now fails closed when handoff-gate.json is blocked unless you explicitly pass --force. Use target-graph.json and expansion-plan.json when you want the scan to deepen from evidence-backed gaps instead of from a bigger fixed agent list.

The --i-own-this flag is required and confirms you own or have written authorization to test the target. Only use active scanning on sites you control. The active-only.ts entrypoint requires analyst or enterprise-assisted; the saas profile blocks active depth by design.

Runs 32 always-on active agents in sequence, plus authenticated role/object differential analysis when --profiles provides usable session profiles:

| Agent | What it tests | |-------|--------------| | SSL/TLS | TLS version support (SSLv3, TLS 1.0/1.1/1.2/1.3), weak ciphers (NULL, RC4, DES/3DES, anon DH), cert expiry, HTTP→HTTPS redirect | | Fuzz | 700+ paths — config files, backups, logs, admin panels, API endpoints, Spring Boot actuators, GraphQL, OAuth, cloud configs, source control artifacts | | HTTP Methods | TRACE (XST), PUT (file upload), DELETE, DEBUG (IIS), OPTIONS method enumeration | | CORS | Arbitrary origin reflection, null origin, domain-bypass patterns (evil.com, null, subdomain tricks) | | Rate Limit | Burst anonymous GET requests to auth/API-adjacent routes — detects missing generic request throttling signals | | nmap | Top 1000 TCP + top 100 UDP ports, service/version detection, vuln scripts (Heartbleed, POODLE, ssl-enum-ciphers), OS detection | | nikto | 6,700+ web vulnerability checks (10 min cap) | | Subdomain | DNS enumeration of 100 common subdomains — finds staging/dev/admin/internal hosts, wildcard DNS detection, HTTP probes | | Origin Exposure | Verifies whether origin-like hosts or direct IP resolution can serve the production hostname outside the edge/CDN path | | Source Maps | Discovers .js.map files — flags exposed source maps that reveal original code, sensitive file paths, and framework internals | | Secrets | Scans HTML + same-origin JS files against 20 secret patterns: AWS keys, Stripe live keys, GitHub tokens, Slack tokens, database URLs, private keys, JWT, generic API keys | | Cloud Storage | Probes 50 S3/Azure/GCS bucket name candidates — flags publicly listable and public-readable buckets | | WAF | Fingerprints 10 WAF products (Cloudflare, AWS WAF, Akamai, Imperva, Fastly, Sucuri, ModSecurity, Barracuda, F5, Azure App GW), blocking vs monitor mode | | CSRF | Extracts HTML forms from 12 auth/account pages, checks for CSRF tokens on POST forms, analyzes SameSite cookie attributes | | File Upload | Discovers upload forms/routes, sends harmless TXT/SVG/HTML multipart probes, checks active-content acceptance and public retrieval of uploaded markers | | Race Conditions | Sends short duplicate bursts to conservative workflow candidates and flags strong duplicate-acceptance or inconsistent-state signals without claiming financial abuse by default | | Workflow Mutation | Replays safe workflow requests with mutated privilege, tenant, pricing, status, and redirect fields to find hidden-field trust and business-logic abuse signals | | SQLi/LFI | Injects payloads into discovered URL params — detects SQL error strings, LFI confirmation (/etc/passwd), error disclosure | | HTTP Smuggling | Probes CL.TE/TE.CL conditions, detects reverse proxy presence, chunked TE support — reports risk without destructive desync | | DNS/Email Security | SPF, DKIM (20 selectors), DMARC policy/enforcement level, CAA records, DNSSEC, MX records, zone transfer attempt | | Host Header Injection | Tests Host, X-Forwarded-Host, X-Host headers for reflection — detects password reset poisoning and cache poisoning vectors | | Open Redirect | Tests 30 redirect params across 10 auth/checkout paths with 6 bypass payloads (absolute, protocol-relative, backslash, encoding) | | GraphQL Security | Detects GraphQL endpoints, tests introspection, field suggestions, and batch query abuse | | API Abuse Surface | Parses readable OpenAPI/Swagger docs and captured API response shapes to flag high-risk writable fields, object-scoped operations, and admin-like routes | | Safe API Mutation | Replays captured preview/quote/validate-style API requests with conservative high-risk field mutations and flags materially different accepted responses | | JWT Analysis | Discovers JWTs in responses/cookies, decodes headers/payloads, tests alg:none bypass, weak HS256 secrets, missing exp claims, sensitive payload data | | OAuth / OIDC / SAML | Mines same-origin auth surface, probes discovery/metadata docs, and runs cautious redirect URI, state, nonce, and logout heuristics | | Prototype Pollution | Injects __proto__ and constructor.prototype payloads via URL params and JSON body — detects reflection, error disclosure, and response anomalies | | Cache Poisoning | Tests 9 unkeyed headers (X-Forwarded-Host, X-Original-URL, etc.) for reflection with active caching — detects scheme downgrade via X-Forwarded-Scheme | | OAST Scaffolding | Generates blind SSRF/XXE/webhook payloads and dispatches low-risk callback probes when an OAST receiver is configured | | SSRF | Tests 30 URL params with 16 SSRF targets (AWS/GCP/Azure metadata, localhost, RFC1918) — content-based and timing-based detection | | XXE | Discovers XML-accepting endpoints, tests classic/parameter/SOAP XXE payloads for /etc/passwd file read | | Role Diff | Compares anonymous/user/admin responses on sensitive paths when session profiles are provided — flags access-control gaps and privileged surface overlap | | Object Diff | Replays discovered object-like URLs across two non-admin accounts — flags IDOR and tenant-isolation signals when cross-user responses look equivalent |

If a previous passive scan exists for the exact same target URL, active results replace the previous active section in report.md and refresh the active findings in report.json. Different target URLs on the same hostname start a fresh report.

Full scan (passive + active)

From a regular terminal (not inside Claude Code):

npx tsx src/index.ts https://yoursite.com --active --i-own-this
npx tsx src/index.ts https://app.example.com --active --i-own-this --profiles profiles.json --browser --baseline output/app.example.com/report.json --scan-preset aggressive --product-profile analyst
npx tsx src/index.ts https://app.example.com --active --i-own-this --profiles profiles.json --owned-aggressive --baseline output/app.example.com/report.json --product-profile analyst
npm run scan -- https://app.example.com --active --i-own-this --profiles profiles.json --owned-aggressive --baseline output/app.example.com/report.json --product-profile analyst

Runs all passive AI agents first, then active deterministic probes, then writes a deterministic structured report plus Markdown and HTML reports. With --profiles, scripted login automation can capture fresh cookies, pause for MFA/manual checkpoints, validate authenticated probe paths, and enable authenticated role/object diff testing. With --browser, a browser-backed crawl now captures surface across all configured browser-capable profiles, aggregates the results into browser-capture.json, bridges same-origin auth headers from SPA requests into later API probes, and records challenge or policy-skipped pages separately from true critical coverage gaps. The API abuse, safe API mutation, workflow mutation, role diff, and object diff agents now consume per-profile browser capture directly so role-specific surfaces are not flattened into a single session view. The report layer now performs semantic surface clustering for overlapping auth/API/workflow/access-control findings and emits target-graph.json plus expansion-plan.json so deeper follow-up work is driven by required-path gaps, auth/workflow value, and duplicate evidence instead of raw agent count. If you already cleared login, MFA, or a bot challenge in a real browser, prefer --browser-connect-url or --browser-user-data-dir over trying to make headless Chromium look stealthier. The profile config also supports OAST settings, suppression rules, export toggles, coverage gates, and path or agent policy controls.

Local OAST receiver

npm run oast:receiver -- --port 8787 --output output/oast-receiver.jsonl --manifest output/oast-receiver-manifest.json --token change-me

Use the printed baseUrl, logPath, and manifestPath values in profiles.example.json. When oast.logPath or oast.manifestPath is configured, the OAST agent will correlate matching callback hits into confirmed findings automatically.

Note: If you run this from inside a Claude Code session, use active-only.ts for the active phase instead — the Claude Agent SDK cannot spawn nested sessions. If you select --passive-provider openai, passive/full scans no longer depend on nested Claude sessions and can run from inside Codex as long as OPENAI_API_KEY is set.

Attack simulation (client demos)

npx tsx src/attack-sim.ts https://yoursite.com --i-own-this-or-have-authorization

Generates HTML proof-of-concept files for client presentations: clickjacking demo, XSS reflection, rate-limit brute-force visualization, and information disclosure summary. Non-destructive — no state modified.

Benchmark lab

npm run benchmark:lab
npx tsx src/index.ts http://127.0.0.1:4010 --active --i-own-this --profiles benchmarks/lab/profiles.json --owned-aggressive --product-profile analyst
npx tsx src/benchmark.ts benchmarks/lab/benchmark-spec.json --output benchmark-results/lab-results.json --scorecard-output benchmark-results/lab-scorecard.md
npx tsx src/benchmark.ts benchmarks/golden/benchmark-spec.json --output benchmark-results/golden-results.json --scorecard-output benchmark-results/golden-scorecard.md

The local benchmark lab ships a seeded app for authorization drift, workflow mutation, safe API mutation replay, browser evidence capture, inline/public upload exposure, and static/edge-style exposures such as same-origin JS secrets and public source maps. Its benchmark spec scores seeded-weakness recall so you can track whether the scanner is actually finding dangerous seeded conditions instead of just emitting more findings. The golden benchmark suite validates static report-control behavior such as planner execution metadata, disclosure-safe summaries, and handoff gating. Benchmark runs now emit both raw JSON and a markdown scorecard so CI and release notes can link to one human-readable artifact with per-track pass rate, average score, scenario checks, and recall. See benchmarks/lab/README.md and benchmarks/golden/README.md for the full flow.

The golden benchmark spec now also carries release-gate policies. src/benchmark.ts exits non-zero when either a scenario fails or the benchmark policies fail, so CI can block merges and releases on trust or recall regressions instead of only on raw test failures.

AI attack intelligence (v0.8.0, advanced and off by default)

# Enable all AI attack intelligence features explicitly
npx tsx src/index.ts https://yoursite.com --active --i-own-this --hypothesis-engine --adaptive-probing --attack-chains

# Or use the owned-target depth preset to enable them together
npx tsx src/index.ts https://yoursite.com --active --i-own-this --owned-aggressive --product-profile analyst

# Customize hypothesis count
npx tsx src/index.ts https://yoursite.com --active --i-own-this --hypothesis-engine --hypothesis-max 30

# Adaptive probing with custom limits
npx tsx src/index.ts https://yoursite.com --active --i-own-this --hypothesis-engine --adaptive-probing --adaptive-max-iter 15 --adaptive-timeout-ms 90000

# Manage cross-scan knowledge graph
npm run knowledge -- list
npm run knowledge -- stats
npm run knowledge -- show kp-wordpress-sqli
npm run knowledge -- prune

These modules are opt-in advanced analysis. They are not part of the default scan path, and the cross-scan knowledge graph is only consulted when you explicitly enable the hypothesis engine. The owned-aggressive preset is the one shortcut that enables them together for an owned target.

Four AI-driven modules that learn across scans:

Hypothesis Engine (Phase 2C) — After passive analysis, an AI agent analyzes the target graph, tech stack signals, crawl context, and knowledge from prior scans to generate ranked attack hypotheses. Each hypothesis includes technique, target endpoints, expected signals, and confidence level. Output: hypothesis-plan.json.
Adaptive Probing (Phase 2.6) — After active probes, executes multi-step curl probe chains guided by hypotheses. Each chain tests a specific technique (sqli, xss, path-traversal, ssrf, cors-bypass, etc.) with technique-specific payloads and response signal analysis. All probes are domain-scoped via isUrlAllowed(). Output: adaptive-probes.json.
Attack Chain Composer (Phase 3.5) — After the structured report, an AI agent identifies which confirmed findings can be chained into multi-step attack scenarios with composite severity higher than individual findings. Output: attack-chains.json.
Cross-Scan Knowledge Graph — Persistent store at output/_shared/scan-knowledge.json with Bayesian confidence scoring. Learns attack patterns, tech stack signatures, and path patterns from confirmed findings only. Knowledge informs future hypothesis prioritization as a weak prior, not as substitute evidence for the current target.

Output

All output goes to output/<domain>/:

output/
  _shared/
    false-positive-learning.json
    scan-knowledge.json   # Cross-scan knowledge graph (v0.8.0)
  example.com/
    llm.txt         # Raw crawl intelligence context (passive scan)
    llm-quarantine/ # Quarantined degraded passive recon context, when applicable
    report.md       # Full human-readable report
    report.json     # Structured findings with confidence/verification metadata
    report-reviewed.md   # Optional manually reviewed handoff report
    report-reviewed.html # Optional rendered HTML copy of the reviewed report
    report-reviewed.pdf  # Optional rendered PDF copy of the reviewed report
    coverage.json   # Machine-readable coverage, challenge status, and required-path status
    coverage-diff.json
    coverage-diff.md
    target-graph.json
    expansion-plan.json
    planner-execution.json
    assertion-evaluation.json
    disclosure-summary.json
    disclosure-summary.md
    handoff-gate.json
    hallucination-flags.json
    hallucination-flags.md
    false-positive-review.json
    scan-drift.json
    scan-drift.md
    scan-drift-history.json
    shared-observations.jsonl
    shared-observations.md
    report.csv
    report.sarif.json
    jira-export.json
    asset-inventory.md
    asset-inventory.json
    evidence-pack/
    remediation-pack.md
    remediation-pack.json
    fix-plan.md
    fix-plan.json
    fix-verification.md
    fix-verification.json
    fix-verification.sh
    retest-diff.md
    retest-diff.json
    session-bootstrap.json
    session-traces/
    browser-capture.json
    browser-evidence/
    browser-state/
    raw-evidence/
    workflow-artifacts/
    workflow-recorder.json
    workflow-recordings/
    workflow-recordings/action-traces/
    state-replay.json
    role-diff-matrix.json
    object-diff-matrix.json
    exact-object-diff.json
    file-upload-probes.json
    race-probes.json
    workflow-mutations.json
    api-abuse-surface.json
    api-mutations.json
    oauth-surface.json
    oast-payloads.json
    hypothesis-plan.json    # Ranked attack hypotheses (v0.8.0)
    adaptive-probes.json    # Multi-step probe chain results (v0.8.0)
    attack-chains.json      # Exploitation chain compositions (v0.8.0)
    poc-*.html      # Attack simulation PoC files (attack-sim only)

browser-capture.json keeps the original top-level pages, apiEndpoints, and requests arrays for backward compatibility, and now also includes a profiles array with per-profile page coverage, captured auth-header names, and browser-session metadata. That lets downstream agents stay compatible while coverage and reporting stay profile-aware.

report.json structure

{
  "metadata": {
    "target": "https://example.com",
    "scanDate": "2026-03-10T...",
    "tool": "web-recon-agent v0.8.0",
    "mode": "passive+active",
    "agentsRun": ["headers", "active:ssl", "active:role-diff"],
    "runId": "scan-2026-03-11T...",
    "productProfile": "analyst",
    "profiles": [{"id": "admin", "label": "Admin", "role": "admin"}],
    "browserEnabled": true,
    "coverageReportPath": "output/example.com/coverage.json",
    "coverageDiffPath": "output/example.com/coverage-diff.json",
    "targetGraphPath": "output/example.com/target-graph.json",
    "expansionPlanPath": "output/example.com/expansion-plan.json",
    "plannerExecutionPath": "output/example.com/planner-execution.json",
    "assertionEvaluationPath": "output/example.com/assertion-evaluation.json",
    "disclosureSummaryPath": "output/example.com/disclosure-summary.json",
    "handoffGatePath": "output/example.com/handoff-gate.json",
    "findingVerificationPath": "output/example.com/hallucination-flags.json",
    "falsePositiveLearningPath": "output/_shared/false-positive-learning.json",
    "falsePositiveReviewPath": "output/example.com/false-positive-review.json",
    "scanDriftPath": "output/example.com/scan-drift.json",
    "findingMerge": {
      "duplicateGroups": 1,
      "mergedFindings": 1,
      "sourceFindingsCollapsed": 1
    },
    "targetGraphSummary": {
      "totalNodes": 18,
      "totalEdges": 11,
      "uncoveredRequiredNodes": 2
    },
    "expansionPlanSummary": {
      "totalActions": 7,
      "p0": 2,
      "p1": 2,
      "p2": 2,
      "p3": 1,
      "topLabels": ["Deepen /billing", "Deepen /wp-json/api/v1/token"]
    },
    "plannerExecutionSummary": {
      "executedActions": 5,
      "publicRouteFindings": 2,
      "authBarrierFindings": 2,
      "unresolvedRoutes": 1
    },
    "disclosureSummarySummary": {
      "included": 4,
      "deferred": 7,
      "firstContactReady": ["redirect-1", "tls-1", "exposure-1"]
    },
    "handoffGateSummary": {
      "status": "blocked",
      "score": 62.6,
      "band": "mixed"
    },
    "findingVerification": {
      "passiveFindingsReviewed": 6,
      "corroboratedFindings": 2,
      "flaggedFindings": 1,
      "downgradedFindings": 1,
      "suppressedFindings": 0,
      "learningKnownSignatures": 4,
      "learningDecisionsApplied": 1,
      "learningMatchedFlags": 1,
      "learningSuppressedFlags": 0,
      "reviewableSignatures": 2
    },
    "scanDrift": {
      "alertCount": 1,
      "highSeverityAlerts": 1,
      "mediumSeverityAlerts": 0,
      "lowSeverityAlerts": 0,
      "summaries": ["Passive-vs-deterministic conflicts increased from 0 to 1."]
    },
    "completion": {
      "status": "partial",
      "plannedAgents": ["headers", "active:ssl", "active:role-diff"],
      "executedAgents": ["headers", "active:ssl"],
      "failedAgents": ["active:role-diff"],
      "skippedAgents": []
    },
    "coverage": {
      "planned": 3,
      "executed": 2,
      "succeeded": 2,
      "failed": 0,
      "skipped": 1,
      "challengeCoverage": {
        "blockedPages": 1,
        "resolvedPages": 2,
        "affectedAgents": ["browser-discovery"]
      },
      "requiredPathCoverage": {
        "total": 3,
        "captured": 2,
        "blocked": 1,
        "missing": 0
      }
    },
    "rawEvidenceManifestPath": "output/example.com/raw-evidence/manifest.json",
    "baselineCompared": "output/example.com/report.json",
    "hypothesisPlanPath": "output/example.com/hypothesis-plan.json",
    "hypothesisPlanSummary": {
      "totalHypotheses": 15,
      "byTechnique": { "sqli": 3, "xss": 2, "auth-bypass": 4 },
      "highConfidence": 5,
      "knowledgeInformed": 3,
      "targetEndpointCount": 12
    },
    "adaptiveProbesPath": "output/example.com/adaptive-probes.json",
    "adaptiveProbesSummary": {
      "totalChains": 15,
      "confirmed": 2,
      "refuted": 8,
      "inconclusive": 3,
      "timedOut": 1,
      "limitReached": 1,
      "findingsGenerated": 2,
      "totalStepsExecuted": 47
    },
    "attackChainsPath": "output/example.com/attack-chains.json",
    "attackChainsSummary": {
      "totalChains": 3,
      "bySeverity": { "critical": 1, "high": 2 },
      "uniqueFindingsInvolved": 6,
      "elevatedSeverityChains": 3
    }
  },
  "summary": {
    "critical": 2,
    "high": 3,
    "medium": 1,
    "low": 0,
    "info": 4,
    "total": 10,
    "derived": 1,
    "overallTotal": 11,
    "confirmed": 5,
    "strongSignal": 3,
    "inconclusive": 2
  },
  "findings": [
    {
      "id": "active-ssl-001",
      "agent": "active:ssl",
      "severity": "critical",
      "title": "...",
      "description": "...",
      "evidence": "...",
      "recommendation": "...",
      "reproCommand": "curl -i 'https://example.com/api/orders/1001' -H 'Cookie: <session>'",
      "confidence": "high",
      "verificationStatus": "confirmed",
      "fixPriority": "p0",
      "artifactRefs": ["output/example.com/object-diff-matrix.json"],
      "affectedProfiles": ["anonymous", "admin"],
      "tags": ["transport-tls"],
      "trustNotes": ["Verifier confirmed the missing transport control on a fresh probe."]
    }
  ],
  "derivedFindings": [
    {
      "id": "attack-chain-api-authorization",
      "agent": "reporting:attack-chain",
      "findingClass": "derived",
      "severity": "high",
      "verificationStatus": "strong-signal"
    }
  ]
}

Use coverage.json when you need deterministic gating in CI or client handoff automation. It records agent coverage, browser challenge coverage, and required browser paths that were captured, blocked, redirected to login, policy-skipped, or missed. Use coverage-diff.json and coverage-diff.md when you have a baseline report and want to detect required-path regressions between runs. Use target-graph.json when you want a normalized inventory of the routes, APIs, forms, auth surfaces, and object/workflow nodes the scan actually touched. Use expansion-plan.json when you want the ranked next queue for deeper probing based on required-path gaps, auth/workflow value, duplicate signals, and current verification strength. Use planner-execution.json to see what the bounded planner executor actually re-probed from that queue. Use assertion-evaluation.json when you want owned-target invariants such as route access or "this class of finding must stay absent" checks to fail closed and show up in the report as confirmed assertion-pack items. Use disclosure-summary.{json,md} when you need a disclosure-safe subset of the report for initial outreach or security-team handoff; it now distinguishes the broader disclosure-safe set from an explicit firstContactStatus and firstContactReady subset, and it labels the current result as outreach or benchmark-only. Use sendability-recheck.{json,md} when you want live confirmation that first-contact findings like public logs, debug pages, staging/dev hosts, username enumeration, API/docs exposure, blocked-but-present artifacts, and auth discovery endpoints still reproduce. Add --strict-outreach when you want that helper to fail closed unless at least two outreach-grade findings still reproduce. Add --strict-notify when you want unattended automation to fail closed unless the report is clean enough that an operator email is worth trusting at face value. Otherwise, use sendability-recheck.json's notificationDisposition field to split blocked-but-interesting runs into a digest lane without promoting them to immediate alerts. Use handoff-gate.json to enforce a fail-closed reviewed-report workflow before rendering report-reviewed.html or report-reviewed.pdf; it now reports both the full reviewed-handoff decision and whether a first-contact packet is still safe to send. Use hallucination-flags.json and hallucination-flags.md to review passive findings that were downgraded or flagged by the verifier. Use false-positive-review.json to mark recurring verifier noise as false-positive or valid, and use npm run review:false-positives -- stats --output-dir output/<domain> when you want the current learning backlog and per-agent hotspot summary. Use output/_shared/false-positive-learning.json to audit what the local verifier has learned to suppress or down-weight. Use scan-drift.json, scan-drift.md, and scan-drift-history.json to compare scan trust and coverage quality against both a manual baseline and recent run history. Use node .claude/skills/recon-triage/scripts/report-trust-score.mjs output/<domain> when you need a single trust score before disclosure or handoff, and create report-reviewed.md only after removing downgraded or inconclusive claims from the original report. If you upgrade the trust pipeline or want a completely cold start, delete output/ entirely, recreate output/_shared/, and re-run scans so disclosure summaries, learning stores, and knowledge priors are regenerated from current code.

Severity levels

| Level | Meaning | |-------|---------| | critical | Immediate exploitation risk — fix now | | high | Significant risk — fix within days | | medium | Moderate risk — fix in next sprint | | low | Minor issue — fix when convenient | | info | Informational — no action required |

npm scripts

npm start                  # Passive scan (requires URL arg)
npm run discover          # Discover workflow (requires URL arg)
npm run verify            # Verify workflow (requires URL arg)
npm run retest            # Retest workflow (requires URL arg)
npm run active             # Active scan (requires URL and --i-own-this)
npm run scan:browser       # Passive scan with browser discovery
npm run scan:assets        # Multi-asset orchestration from assets.example.json
npm run review:false-positives -- list --output-dir output/example.com
npm run benchmark          # Validate reports and optionally write JSON/scorecard output
npm run benchmark:lab      # Start the local seeded benchmark target
npm run oast:receiver      # Start a local OAST receiver and JSONL log
npm run typecheck          # TypeScript validation

Agent architecture

index.ts
├── recon-core/
│   ├── product-policy.ts      — product-profile gating (`saas`, `analyst`, `enterprise-assisted`)
│   └── tasks.ts               — shared task builders for supplemental, passive, and active phases
├── scan-config.ts             — profiles, browser, OAST, suppression expiry, target policy presets, baseline config
├── Phase 1 — Recon agent (Claude Agent SDK)
│   └── Crawls site, writes llm.txt
├── Shared observation bus
│   └── Writes shared-observations.jsonl / .md so later passive waves can subscribe to earlier results
├── Phase 1.25 — Session bootstrap (optional, Playwright)
│   └── Replays scripted login flows, captures fresh cookies
│       and validates authenticated probe paths with action traces
├── Phase 1.5 — Browser discovery (optional, Playwright)
│   └── Writes browser-capture.json, inspects forms/storage/XHR
├── Phase 1.75 — Workflow replay (optional, Playwright)
│   └── Replays authenticated pages, captures DOM sink, state-pair, and computed-value signals
├── Phase 1.85 — Workflow recorder (optional, Playwright)
│   └── Writes action-level workflow-recorder.json, per-profile workflow-recordings/, action traces, and browser evidence artifacts
├── Phase 2A — Discovery analysis agents (Claude Agent SDK, parallel)
│   ├── headers.ts
│   ├── disclosure.ts
│   ├── deps.ts
│   ├── api-discovery.ts      — API endpoint mapping from JS
│   └── content-security.ts   — PII, debug info, data exposure
├── Phase 2B — Reasoning analysis agents (Claude Agent SDK, parallel)
│   ├── auth.ts
│   ├── privacy.ts            — privacy policy, GDPR/CCPA, tracking
│   └── business-logic.ts     — forms, IDOR, logic flaws
├── Phase 2.5 — Active agents (curl/openssl/dig, sequential) [--active only]
│   ├── active/ssl.ts          — TLS/cert analysis
│   ├── active/fuzz.ts         — 700+ path directory fuzzing
│   ├── active/http-methods.ts — dangerous HTTP method testing
│   ├── active/cors.ts         — CORS policy bypass testing
│   ├── active/rate-limit.ts   — rate limit detection
│   ├── active/nmap.ts         — port scan + vuln scripts
│   ├── active/nikto.ts        — 6,700+ web vuln checks
│   ├── active/subdomain.ts    — DNS subdomain enumeration
│   ├── active/sourcemap.ts    — JS source map discovery
│   ├── active/secrets.ts      — JS/HTML secret scanning
│   ├── active/cloud.ts        — S3/Azure/GCS bucket enumeration
│   ├── active/waf.ts          — WAF fingerprinting
│   ├── active/csrf.ts         — CSRF token + SameSite analysis
│   ├── active/file-upload.ts  — safe upload probes + public retrieval checks
│   ├── active/race.ts         — duplicate-request / race-condition signals
│   ├── active/workflow-mutate.ts — stateful workflow mutation replay
│   ├── active/sqli.ts         — SQL injection / LFI detection
│   ├── active/smuggling.ts    — HTTP request smuggling detection
│   ├── active/dns.ts          — SPF/DKIM/DMARC/CAA/DNSSEC/zone transfer
│   ├── active/hostheader.ts   — Host header injection + reset poisoning
│   ├── active/redirect.ts     — Open redirect detection
│   ├── active/graphql.ts      — GraphQL introspection/suggestions/batching
│   ├── active/api-abuse.ts    — OpenAPI/runtime abuse surface analysis
│   ├── active/api-mutate.ts   — safe field-level API mutation confirmation
│   ├── active/jwt.ts          — JWT alg:none/weak secrets/claim analysis
│   ├── active/oauth.ts        — OAuth/OIDC/SAML surface mapping + heuristics
│   ├── active/proto.ts        — Prototype pollution detection
│   ├── active/cache.ts        — Web cache poisoning via unkeyed headers
│   ├── active/oast.ts         — blind-callback/OAST payload generation
│   ├── active/ssrf.ts         — SSRF via URL params + timing detection
│   ├── active/xxe.ts          — XML External Entity injection
│   ├── active/role-diff.ts    — profile-aware authorization diffing
│   └── active/object-diff.ts  — cross-account object replay / IDOR signals
├── Structured reporting layer
│   └── reporting.ts           — deterministic report.json, remediation-pack, fix-plan, fix-verification, HTML outputs, retest-diff, evidence pack, asset inventory, exports
└── Phase 3 — Report agent (Claude Agent SDK)
    └── Writes report.md from structured report.json

active-only.ts             # Phase 2.5 only — no SDK required
attack-sim.ts              # PoC demonstration mode
asset-orchestrator.ts      # Batch-run passive/active/full scans from an asset plan
benchmark.ts               # Validate expected findings against prior reports
oast-receiver.ts           # Local callback receiver + JSONL correlation log

The passive agents use the Claude Agent SDK — each runs as an autonomous sub-agent with access to WebFetch, Read, and Write tools. The active agents use curl, openssl, dig, nmap, and nikto directly via Node.js subprocesses — no AI inference, deterministic probes.

Runtime artifacts

Generated scan output lives under output/<domain>/ and is intentionally gitignored. The main artifacts are:

report.json for the canonical structured report
report.md and report.html for human-readable output
optional report-reviewed.{md,html,pdf} for a manually curated handoff version
raw-evidence/manifest.json for per-agent raw transcripts
evidence-pack/manifest.json for per-finding evidence bundles
target-graph.json, expansion-plan.json, and planner-execution.json for evidence-driven follow-up planning and bounded re-probes
disclosure-summary.{json,md} and handoff-gate.json for disclosure-safe packaging and strict reviewed-report gating
hallucination-flags.{json,md} for passive finding verifier output
false-positive-review.json and output/_shared/false-positive-learning.json for local verifier learning and reviewable suppression
scan-drift.{json,md} and scan-drift-history.json for machine-readable scan quality drift metrics and rolling history

Assessment depth

The right way to go deeper is not to keep multiplying agents until the target gets hammered.

Preferred model:

discover the surface once
build a shared target graph of routes, forms, APIs, objects, roles, and evidence
prioritize high-value targets such as required paths, authenticated workflows, and strong-signal findings
launch targeted deterministic micro-probes only where evidence justifies more depth
verify, merge, and score the results before expanding again

Why this matters:

more agents alone mostly increases duplicate noise, verification burden, and WAF or rate-limit pressure
Cloudflare, auth walls, and SPA state usually punish indiscriminate traffic more than they reward it
adaptive depth improves coverage and trust at the same time; agent sprawl usually trades one for the other

The next depth direction for this repo is an adaptive expansion planner, not just a bigger fixed agent count.

A2A boundary

A2A-style communication can make sense here, but only at the orchestration boundary.

Good fit:

export a scoped target graph or evidence bundle to a remote specialist
request a bounded task such as workflow review, auth review, or API abuse triage
ingest a structured result back into the local report pipeline

Bad fit:

replacing the internal local run with agent-to-agent chatter for every phase
letting multiple remote agents independently rediscover and hammer the same target
bypassing the existing verifier, merge, coverage, and trust layers

If A2A is added later, it should be:

optional
scope-limited
budgeted
authenticated
treated as another evidence source, not as the core runtime transport

Trust model

Implemented today:

recon-context health checks that quarantine degraded llm.txt inputs before downstream passive analysis
report-level confidence and verificationStatus enrichment for all findings
a claim-normalization layer so verifier, merge, and disclosure logic operate on atomic findings such as missing headers, blocked sensitive artifacts, auth surfaces, and CSRF gaps instead of raw narrative prose
a shared passive observation bus so later passive agents can use earlier observations as hints
raw evidence manifests, per-finding evidence packs, and fix-verification bundles for human retesting
baseline retest diffs for introduced, resolved, persisted, and severity or verification-status changes
a post-passive verifier that flags unsupported passive claims, missing evidence, passive-vs-deterministic conflicts, hedged claims, missing artifacts, unreachable claimed URLs, cookie/session issues, browser-storage claims, and CSRF contradictions, then downgrades or corroborates findings accordingly
verifier-driven claim rewrites that distinguish blocked sensitive artifacts from truly public exposure and prevent irrelevant route-existence conflicts from downgrading unrelated findings
a scan-drift stage that snapshots scan-output trust metrics, keeps a rolling per-target history, compares against both explicit baselines and recent history, and surfaces learning-aware drift alerts
local false-positive learning that can down-weight or suppress recurring soft verifier flags after review, plus a review:false-positives CLI for triage
per-agent and per-flag-type precision reporting derived from the local review store, plus automatic confidence penalties for hotspot agents with poor historical precision or large pending review backlogs
a confirmation-aware cross-agent merge layer that distinguishes dedup-only merges from passive findings independently reinforced by deterministic agents
a repo-local report-trust-score helper that condenses verification mix, verifier flags, review backlog, and overreach hints into a single handoff score
a two-tier disclosure flow where the full reviewed report can remain blocked while a smaller first-contact-safe subset is still explicitly allowed
a reviewed-report workflow where operators can promote a cleaned report-reviewed.md into matching HTML/PDF handoff copies with npm run report:render -- output/<domain>
benchmark and unit-test coverage for hallucination suppression, confirmation-aware merge, review tooling, and drift regression paths

Planned next:

an adaptive expansion planner that deepens coverage only when evidence, required paths, or strong signals justify more probes
broader deterministic verifier coverage for additional auth/config and exposure families beyond the current cookie, storage, CSRF, header, redirect, and URL checks
richer confirmation clustering for complex overlaps where multiple agents describe the same access-control or workflow issue with different titles
an optional A2A boundary for handing off scoped target graphs or evidence bundles to remote specialist agents without replacing the local trust pipeline
an optional shared review backend if local-only learning becomes a collaboration bottleneck
more end-to-end benchmark lab scenarios that exercise review application and rolling drift history across multiple live runs

The intended reuse model is to borrow the scoring and false-positive-learning patterns from the existing swarm hallucination tooling and the baseline math from the drift detector, while keeping web-recon-agent self-contained. The plan is explicitly not to embed the full swarm runtime here.

See VERIFICATION-ROADMAP.md for the detailed design notes.

Ethics and authorization

This tool is for authorized security testing only.

Passive mode sends only standard HTTP GET requests — equivalent to a browser visit
Active mode sends additional probes (method testing, fuzzing, burst requests, injection payloads) that go beyond normal browsing
Always obtain written authorization before running active scans against systems you do not own
The --i-own-this flag is a self-certification of authorization — use it responsibly
SQLi/LFI injection tests use read-only payloads — they detect errors and file disclosure, never modify data