@whitenoisenpm/testforge-mcp
v0.37.0
Published
TestForge MCP Server — AI-powered testing in your IDE. Analyzes code for security, unit tests, load, accessibility, vision alignment, scope coverage, and stack quality.
Maintainers
Readme
@whitenoisenpm/testforge-mcp
AI-powered testing in your IDE. The TestForge MCP server integrates with Cursor, VS Code, Windsurf, Claude Code, and any MCP-compatible editor to provide real-time code analysis — entirely on your machine.
npx -y @whitenoisenpm/testforge-mcp@latest # start the server → http://localhost:33221
npx -y @whitenoisenpm/testforge-mcp setup # interactive config wizard (AI provider, port, secret)
npx -y @whitenoisenpm/testforge-mcp --help # full env-var referenceTier-1 (22 dimensions) needs no config. For Tier-2 (LLM test generation + sims), run setup once — it configures an AI provider (OpenRouter cloud or a local model server like Ollama / LM Studio) and writes ~/.testforge/.env. No database to install — run history is auto-stored in SQLite at ~/.testforge/history.db.
What it does
| Dimension category | Examples | |---|---| | Security (SAST) | SQL/NoSQL injection, eval, XSS, sensitive data in logs/responses, hardcoded secrets, CORS misconfig, OWASP coverage | | Quality | Unit-test coverage, mutation-score estimate, predictive risk, dead-code, license/supply-chain audit | | Performance & resilience | Load profile, rate limiting, caching, n+1 query patterns, chaos resilience | | Product & ops | Vision/goal alignment (observability, analytics, feature flags), scope coverage, stack quality, DORA estimate, agentic-scale prediction | | UI | Accessibility (WCAG-ish): alt text, form labels, visual-regression hints |
All Tier-1 analysis is regex/static — fast, no LLM calls, deterministic. Same input → same output. Tier 2 (Generate & Run) layers an LLM on top: it writes real tests (Vitest / pytest / Go) for the top findings — grounded in your actual source, and importing & executing your real code when it safely can — and runs them in a sandboxed Docker container. Simulate goes further: it boots your app and exercises the running system (load, chaos, real-code unit tests in the booted image, and a Playwright browser crawl + LLM-authored user journeys). See the Tier 2 and Simulate sections below.
Quick Start (Tier 1)
# Start the server (port 33221)
npx @whitenoisenpm/testforge-mcp@latest
# Dashboard:
open http://localhost:33221The dashboard lets you paste a local project path or a public GitHub URL, runs the full 22-dimension analysis, and persists each run to SQLite at ~/.testforge/history.db so /reports shows your history. Everything stays on your machine — no API keys required.
Tier 2 — Generate & Run (LLM tests + sandbox)
Added in v0.25.0. Tier 1 keeps working without any extra setup. Tier 2 needs an AI provider (cloud key or a local model server) and Docker running.
Easiest: npx @whitenoisenpm/testforge-mcp setup — pick OpenRouter or a local server (Ollama/LM Studio), and it writes the config for you. Or set env vars manually:
# Option A — OpenRouter (cloud). Free key at https://openrouter.ai/keys
export OPENROUTER_API_KEY=sk-or-v1-...
# Option B — local model server (Ollama/LM Studio/vLLM), free + private, no key:
# ollama pull qwen2.5-coder:14b
export TESTFORGE_LLM_BASE_URL=http://localhost:11434/v1
export TESTFORGE_PRIMARY_MODEL=qwen2.5-coder:14b
# (from Docker, use http://host.docker.internal:11434/v1)
# 2. Make sure Docker is running (Docker Desktop on macOS / Windows;
# docker daemon on Linux). No image build step needed — the runner
# image is pulled from GHCR on first use.
# 3. Start the server with the key in env
OPENROUTER_API_KEY=$OPENROUTER_API_KEY \
npx @whitenoisenpm/testforge-mcp@latest
# 4. In the dashboard, click "🤖 Generate Tests (Tier 2)"
# under any analysis report. The first call pulls
# ghcr.io/t4tarzan/testforge-runner:latest (~92 MB, ~10s).
# Subsequent calls reuse the local image and run in ~1s.What it does: takes the top-3 highest-severity findings from a Tier-1 run, sends each to the LLM with a Zod-enforced schema (filename, content, reasoning), then runs the generated tests in a node:22-slim / pytest / Go container (--network=none, --rm, caps dropped) with the framework's JSON reporter.
Tests are grounded in your real code, not a description of it:
- Each finding ships the actual source at the flagged line, so the generated test reproduces your logic — not a generic example.
- When the finding's file is a leaf module (imports only Node built-ins), the test imports and executes the real module in the sandbox, so the pass/fail reflects your actual code.
- For deeper coverage (modules with dependencies), the Simulate
wiredlane runs tests against the real code inside the booted app image, where its dependencies already resolve. See Simulate below.
Polyglot since v0.29: .ts/.js → Vitest, .py → pytest, .go → go test, each in its matching sandbox image.
Provider stack (default models via OpenRouter; override either, or point at a local server with TESTFORGE_LLM_BASE_URL):
| Model | Role | Override |
|---|---|---|
| deepseek/deepseek-v4-flash | Primary — cheap, fast, capable coder | TESTFORGE_PRIMARY_MODEL |
| moonshotai/kimi-k2.6 | Fallback — different provider (hit when primary rate-limits/fails) | TESTFORGE_FALLBACK_MODEL |
Endpoint shape:
curl -X POST http://localhost:33221/generate-and-run \
-H "Content-Type: application/json" \
-d '{
"findings": [{ "title": "…", "description": "…", "filePath": "…",
"lineNumber": 42, "severity": "high", "rule": "…",
"fixSuggestion": "…" }],
"maxFindings": 3,
"cluster": "edge-case"
}'
# → { generationId, provider, generationMs, runMs,
# results: [{ finding, file: { filename, content, reasoning },
# attempts: [{ model, ok, durationMs }] }],
# run: { numPassedTests, numFailedTests, files: [...] } }History endpoints:
| Endpoint | Returns |
|---|---|
| GET /api/generations | List of recent Tier-2 generations (id, cluster, provider, pass/fail counts) |
| GET /api/generations/:id | One generation with the full payload (source files + run details) |
Cost at OpenRouter list prices: roughly $0.02 per Tier-2 invocation (3 generations × ~1.5k output tokens at Qwen 3.7 Max pricing). Sandbox compute is free locally.
Self-host vs managed: the local MCP runs Tier 2 with no quota — you BYOK OpenRouter and pay them directly. The managed SaaS at testforge.run gates Tier 2 to the Forge plan ($99/mo, 100 iterations/mo) and handles the keys for you.
Simulate — exercise the running app
Where Tier-2 runs sandboxed unit tests, Simulate boots your app and drives the running system. Needs a root
Dockerfile(ordocker-compose) it can build, plus Docker running. Async (real sims take minutes): you get ajobId, then poll.
# Kick off (opt into the lanes you want via "dimensions")
curl -X POST http://localhost:33221/simulate \
-H "Content-Type: application/json" \
-d '{"repoUrl":"https://github.com/owner/repo",
"dimensions":["load","chaos","wired","e2e"],
"journeys":2, "maxPages":8}'
# → { jobId, statusUrl }
# Poll for phased progress + the final result
curl http://localhost:33221/simulate/<jobId>It clones → detects how to boot (Dockerfile/compose) → builds + boots the app once on an isolated network → runs the requested lanes against it → tears down. If it can't be auto-booted, each lane returns an honest ranReal:false + reason (and a static fallback for load/chaos).
| Lane (dimensions) | What it does |
|---|---|
| load (default) | autocannon ramp (10→500 concurrency) → p50/p90/p99, rps, error rate, breaking-point concurrency |
| chaos | baseline load → inject a fault (restart/pause) → errorRateDuringFault + recoverySeconds |
| agent | ramps a fleet of think-time agents → maxHealthyAgents |
| wired | generates node:test files that import & run your real code inside the booted image (deps resolve from the image; Node apps, v1) |
| e2e | Playwright crawls the running app → console errors, 4xx/5xx, axe a11y violations. Add journeys:N for LLM-authored user journeys (navigate/click/fill/assert) run as a deterministic step-DSL |
maxPages bounds the e2e crawl (default 8); journeys (0 = smoke only) sets how many user journeys the model authors. concurrencyLevels, durationPerLevelSec, faultType tune load/chaos.
Manual MCP Setup
Cursor / Windsurf / Claude Desktop
Open IDE settings → MCP → add server:
{
"mcpServers": {
"testforge": {
"command": "npx",
"args": ["-y", "@whitenoisenpm/testforge-mcp@latest"],
"env": {
"TESTFORGE_MCP_PORT": "33221",
"OPENROUTER_API_KEY": "sk-or-v1-… (optional — only needed for Tier 2)"
}
}
}
}VS Code
Use the Continue / Cline extension and add the same JSON to its MCP config block.
MCP Tools
| Tool | What it does | Latency |
|---|---|---|
| testforge_analyze | Synchronous: scan codebase structure (files, endpoints, dependencies, tech stack) | seconds |
| testforge_quick_scan | Async: security + unit dimensions only. Streams progress via SSE. | ~30s |
| testforge_test | Async: full suite across all dimensions. Streams progress via SSE. Persists summary to SQLite on completion (since 0.2.19). | 1–5 min |
| testforge_report | Get or generate a structured PRD report for a completed test run | seconds |
REST API (running standalone)
# Health
curl http://localhost:33221/health
# → {"status":"ok","version":"0.36.5"}
# Public-status check (for badges/uptime)
curl http://localhost:33221/api/reports/latest
# → 404 {"error":"No reports yet"} if SQLite is empty;
# the most recent report otherwise (no more seed/demo data fallback).
# Synchronous full analysis of a public repo
curl -X POST http://localhost:33221/clone-and-analyze \
-H "Content-Type: application/json" \
-d '{"repoUrl":"https://github.com/owner/repo"}'
# Async test run (background, streams via SSE)
curl -X POST http://localhost:33221/test \
-H "Content-Type: application/json" \
-d '{"projectPath":"/path/to/local/project"}'
# → {"testRunId":"...","status":"running","streamUrl":"/mcp/sse"}
# Progress for a specific run
curl http://localhost:33221/test/<testRunId>/progress
# List recent persisted runs (from SQLite)
curl http://localhost:33221/reports
# Single report by id
curl http://localhost:33221/report-view/<reportId>Local data
| File | Contents |
|---|---|
| ~/.testforge/history.db | SQLite with a reports table — one row per analyze / test run, including per-dimension scores and the full JSON blob in full_data. WAL mode. |
| ~/.testforge/history.db-wal, .db-shm | SQLite WAL sidecars. |
| /tmp/testforge-repos/ (or $TMP_DIR) | Temp clones of public repos for /clone-and-analyze. Deleted after each analysis. |
Your source never leaves the machine — the dashboard is local, the analyzers are local, the DB is local. The only outbound calls are the git clone step (when you give it a public URL) and dependency lookups for license/supply-chain checks.
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| TESTFORGE_MCP_PORT | 33221 | Server port. 33221 chosen to avoid common dev-server collisions (3000/3001/5173/8080). |
| TESTFORGE_MCP_HOST | 0.0.0.0 | Bind address. Set 127.0.0.1 to listen loopback-only — e.g. when a reverse proxy / Tailscale Serve already fronts the port and a wildcard bind would collide. Docker/managed deploys keep the wildcard. |
| TMP_DIR | /tmp/testforge-repos | Where /clone-and-analyze puts temp checkouts. |
| LOG_LEVEL | info | Fastify logger level (debug, info, warn, error). |
| DATABASE_URL | — | Optional. If set, the server can fall back to Neon for read-replica history. Not required for local-only use. |
| OPENROUTER_API_KEY | — | Tier 2 only. OpenRouter key for LLM test generation. Without it, POST /generate-and-run returns 503. Get one at https://openrouter.ai/. |
| TESTFORGE_LLM_BASE_URL | — | Tier 2 (local model). Point at an OpenAI-compatible local server (Ollama http://localhost:11434/v1, LM Studio, vLLM) for free, private, keyless generation. From Docker use host.docker.internal. |
| TESTFORGE_PRIMARY_MODEL | deepseek/deepseek-v4-flash | Tier 2 primary model. Any OpenRouter (or local) model id works. |
| TESTFORGE_FALLBACK_MODEL | moonshotai/kimi-k2.6 | Tier 2 fallback when the primary errors or rejects the schema. |
| TESTFORGE_RUNNER_IMAGE | ghcr.io/t4tarzan/testforge-runner:latest | Tier 2 unit-test sandbox image (+ _PYTHON / _GO variants). Auto-pulled from GHCR; builds locally from the bundled Dockerfile if the pull fails. |
| TESTFORGE_LOADGEN_IMAGE | ghcr.io/t4tarzan/testforge-loadgen:latest | Simulate load/chaos driver (autocannon). Override to a local build if needed. |
| TESTFORGE_E2E_IMAGE | ghcr.io/t4tarzan/testforge-e2e:latest | Simulate e2e crawler (Playwright + Chromium + axe). Override to a local build if needed. |
| TESTFORGE_RUNNER_TAG | v0.36.x | Version tag for the runner images (:latest is cached and never re-pulled, so fixes ship on a fresh tag). |
Changelog highlights
- 0.37.0 — Tier-2 runs your real code, and Simulate exercises the running app. Tier-2 test generation now ships the actual source at the flagged line (so tests reproduce your logic, not a generic example), imports & executes the real module for leaf findings (Node built-ins only), and the new Simulate
wiredlane runsnode:testfiles against your real code inside the booted app image (dependencies and all). The Simulate engine also gains a browser/E2E lane — Playwright crawls the running app (console errors, 4xx/5xx, axe a11y) and, withjourneys:N, runs LLM-authored user journeys as a deterministic step-DSL. New runner imagestestforge-loadgen+testforge-e2e(multi-arch GHCR, local-build fallback);TESTFORGE_MCP_HOSTmakes the bind address configurable. (Tier-1 stays at 22 dimensions — this work is all Tier-2 / Simulate.) - 0.29.0–0.36.x — Simulate runtime engine (load/agent/chaos, then the Kubernetes runtime tier), polyglot Tier-2 (pytest + Go test), the analyzer flywheel, and precision passes. Per-release detail in git history and [[Evolution]] (
docs/knowledge/Evolution.md). - 0.28.4 — Accessibility analyzer skips test paths. Same suppression the security analyzer got in 0.27.0: a11y per-file checks now skip
tests/,__tests__/,e2e/,*.spec.*, etc. Test fixtures routinely contain intentional a11y violations (TestForge's owna11y-jsxfixture is deliberately broken to test the analyzer), so flagging them is noise. Removes 9 fixture findings from the TestForge self-audit + fixes it for any user with a11y component tests. - 0.28.3 — Security + a11y precision pass (false-positive cleanup). Caught by the Supabase + TestForge self-audit reports, which showed mostly false positives. Four targeted fixes:
- SQL/NoSQL sink receiver-awareness:
isDbQueryCallmatched generic method names (get,find,all,run,count) regardless of receiver — sourlParams.get(),Promise.all(),map.get(), and an HTTPget()helper all tripped the "SQL injection" critical. Now split into STRONG methods (query/exec/execute/raw/findOne/findMany/findUnique/findFirst/aggregate— fire always) vs WEAK methods (find/get/all/run/count— only when the receiver looks like a DB handle:db/conn/client/pool/knex/prisma/sequelize/collection/mongoose/etc.). Supabase criticals dropped 14 → 1. - Minified / vendored skip: files under
vendor/,monaco-editor/,*.min.js, or with minified content shape (single >5k-char line, or >500-char average line) are excluded from per-file security analysis. Killed the monaco-editorworkerMain.jsfindings. - Placeholder secrets:
checkHardcodedSecretskips bracketed/templated values ([YOUR-PASSWORD],<password>,${…},{{…}}) and common placeholder words (your-/example/changeme/dummy/sample/…) + all-symbol masks. Killed the 4 Supabase connection-string-UI password findings. - Luminance-aware contrast:
checkColorContrastmatched ANY hex (#[a-fA-F0-9]{3,6}), so near-black text like#12101Afired "low contrast" — 100 false positives on the TestForge self-audit alone. Now parses the hex, computes WCAG relative luminance, and only flags genuinely light text (luminance > 0.55). Tailwind pattern narrowed to gray-300/400 (gray-500 passes AA on white). TestForge a11y findings dropped 175 → 87. - Tests: 211 → 221 (+10).
- SQL/NoSQL sink receiver-awareness:
- 0.28.2 — Coverage + Mutation dimension-correctness fixes. Both dimensions were producing 0% scores on perfectly testable repos for reasons that turned out to be bugs in their signals — caught by the LangChain in-the-wild report (4,849 test cases but coverage 0%) and the TestForge self-audit (mutation always 0).
- Coverage (
runUnitAnalysis): function-name matching between test descriptions and source function names returns ~0 on library code where test names don't echo function names (it('chain handles long context')doesn't matchfunction formatDate()). New rule: when the precise heuristic returns near-zero on a project with substantial tests (≥10 test files AND test:source ratio ≥ 0.1), fall back to the test-to-source-file ratio as an honest secondary signal. App code keeps its precise score (TestForge stayed at 72%); libraries get a real number (LangChain went 0% → 31%). - Mutation (
runMutationAnalysis): thehasTestFrameworkearly-return checked the ROOTpackage.jsondevDeps only, ignoring workspace members. TestForge's vitest lives inmcp-server/package.json; root has none, so root devDeps:[] → "no test framework" → score 0, even though the unit-analyzer correctly detected 17 vitest files via AST. Replaced the misleading devDeps signal with the actual test-file count (which the function already iterates anyway). Also extended the test-file regex to recognize pytest (test_*.py,*_test.py) and Go (*_test.go) conventions so cross-language repos count their tests properly. TestForge mutation went 0 → 35; LangChain went 0 → 54. - Tests: 208 → 211. Two synthetic-fixture tests prove the ratio-fallback engages on abstract-test-name projects + the mutation analyzer scores >0 when test files exist regardless of root devDeps.
- Coverage (
- 0.28.1 — Vulnerable-dependency check is version-aware. Caught by the TestForge self-audit:
express ^5.2.1fired "Potentially Vulnerable Dependency" even though the CVE in our table is on<4.17.3.checkInsecureDependenciespreviously matched by package name alone, ignoring the declared spec. Now collects version specs from everypackage.jsoninfileContentsand short-circuits the finding when the spec's major version is strictly greater than the vulnerable upper bound's major (e.g.^5.2.1vs<4.17.3→ safe → no finding). When the spec is unknowable (git+URL, "latest", workspace alias), still fires conservatively — safer than hiding a real vuln. New findingdescriptionnow embeds the declared spec for actionability ("declared as \"express@^4.16.0\""). Tests: 205 → 208 (+3 covering the three branches: safe-major, vulnerable-major, unknowable-spec). - 0.28.0 — Go native support.
.gofiles now count in totalFiles + lines. New endpoint regex covers Gin / Echo / Chi / Fiber / Gorilla Mux (r.GET("/path", h),app.Get(...),mux.HandleFunc(...)) plus stdlibhttp.HandleFunc. NewparseGoMod()parsesrequireblocks (single-line + grouped form), skips// indirectdeps, normalizes module paths to short package names (github.com/gin-gonic/gin→gin), and correctly handles semantic-import-versioning suffixes (jackc/pgx/v5→pgx, notv5). Tech-stack tagging covers Gin, Echo, Chi, Fiber, Gorilla Mux, GORM/sqlx, Cobra, Viper, gRPC, structured logging (Zap/Zerolog/Logrus), testify/ginkgo, PostgreSQL (pgx/pq). Function-name extraction handles both package-levelfunc Name(...)and receiver methodsfunc (r *T) Name(...).gomoved fromUNSUPPORTED_EXT_TO_LANGtoNATIVE_EXTS. Same conventional-monorepo recursion (libs/*,packages/*,apps/*,services/*) applies togo.mod. New fixturetests/fixtures/polyglot-go/(Gin server + GORM repo + go.mod with indirect deps + semver-path suffix) covers all paths. Tests: 197 → 205. - 0.27.2 — Accessibility analyzer ignores non-UI files + reports
applicable: falseon non-UI repos. Caught by the in-the-wild LangChain report: scored 10/100 on Accessibility because the per-file loop rancheckLinkTextonREADME.mdand emitted 44 false-positive "Empty Link" findings. The glob fallback was already filtered to.{html,tsx,jsx,vue,svelte}but when called with a pre-populatedfileContents(the orchestrator path), the loop iterated every file. Now hard-filters to UI files at the loop level. Newapplicable: booleanonA11yReportistruewhen the repo has any UI files,falseotherwise (Python lib, CLI, data-science repo). Surfaced in the dashboard JSON so non-UI repos can be rendered as N/A instead of a fake score. Three new test cases prove the LangChain regression won't recur. Tests: 194 → 197. - 0.27.1 — "Missing Rate Limiting" check now only fires on web apps. Caught by the in-the-wild LangChain report: a pure Python library with no web framework still got a medium "Missing Rate Limiting" finding because
checkMissingRateLimitfired unconditionally on any project without arate-limitpackage. 0.27.1 gates the check on aWEB_FRAMEWORK_DEPSset covering JS (Express, Fastify, Koa, Hono, NestJS, Next, Remix, Astro, Nuxt, SvelteKit, h3, Polka, etc.) and Python (FastAPI, Flask, Django, Starlette, Sanic, Tornado, Aiohttp, Litestar, etc.). Libraries, CLIs, data-science repos, and other non-web projects no longer get the finding. Three new test cases cover: (1) libs-monorepo fixture (no web framework) emits zero rate-limit findings, (2) polyglot-python fixture (FastAPI) still emits one, (3) vulnerable-app fixture (Express) still emits one. Tests: 191 → 194. - 0.27.0 — Security findings in test paths are now suppressed. Per-file security analysis (SQL/NoSQL injection, RCE sinks, path traversal, open redirect, reflected XSS, hardcoded secrets, etc.) is skipped on any path matching common test conventions:
tests/,test/,__tests__/,__mocks__/,__fixtures__/,e2e/,specs/,fixtures/,cypress/,playwright/dir segments anywhere in the path;*.test.{js,jsx,ts,tsx,mjs,cjs,mts,cts}and*.spec.*suffixes; pytesttest_*.pyand*_test.pyfilenames; and.d.tsdeclaration files (which have no runtime). Triggered by the in-the-wild Supabase report: 125 "critical" findings were almost all SQL-string-concat ine2e/studio/features/*.spec.tswhere building the string is exactly what the test is testing. Project-level checks (rate-limiting, vulnerable dependencies, missing security headers) are unaffected — those are real signals regardless of where test files live. New exportedisTestPath()helper. New fixturetests/fixtures/test-path-suppression/proves a production file with a SQL-injection pattern still flags while four sibling test-path files (matching.test.js/e2e//__tests__//tests/conventions) carrying the identical pattern do not. Tests: 189 → 191. - 0.26.2 — Conventional-monorepo recursion (
libs/,packages/,apps/,services/). A real-world test onlangchain-ai/langchain(run via the In-the-Wild showcase pipeline) caught 0.26.1 returningdeps: 0because LangChain ships every package underlibs/<name>/pyproject.tomlwithout declaring[tool.uv.workspace]at root. The workspace-recursion in 0.26.1 only followed declared workspaces. 0.26.2 also globslibs/*/<manifest>,packages/*/<manifest>,apps/*/<manifest>,services/*/<manifest>forpyproject.toml,package.json, andrequirements.txt. New helperdiscoverConventionalMembers(root, manifest)incode-scanner.ts. Same input, very different output on the real LangChain clone:deps: 0, techStack: []→deps: 27 + 66 dev-deps, techStack: 5(Pydantic, SQLAlchemy, pytest, httpx/requests, Playwright). New fixturetests/fixtures/libs-monorepo/mirrors the LangChain shape. Tests: 184 → 189. - 0.26.1 — Monorepo / workspace recursion for Python + Node. A real-world test on
tiangolo/full-stack-fastapi-templateexposed that 0.26.0 detected endpoints + pytest files but returneddependencies: 0, techStack: []because the manifest-discovery code only read the rootpyproject.toml/package.json. The actual deps lived inbackend/pyproject.toml(uv workspace member) andfrontend/package.json(bun workspace member). Fixes: (1) parse[tool.uv.workspace] members = [...]and recurse, (2) parsepackage.json "workspaces": [...](handles globs likepackages/*) and recurse, also readspnpm-workspace.yaml, (3) parse PEP 735[dependency-groups](Astral's new standard for dev/test/docs groups), (4)peerDependenciesrolled into runtime so framework targeting (React/Vue/Svelte) shows in techStack, (5)@playwright/testrecognized as Playwright. Critical bug fix: the PEP 621dependencies = [...]parser used a non-greedy regex that silently truncated arrays at the first]— which is inside"fastapi[standard]". NewextractTomlArrayBody()helper does string-aware bracket balancing. Same input → very different output: full-stack-fastapi-template went fromdeps: 0, techStack: []todeps: 49 + 21 dev, techStack: 11in 25ms. New fixturetests/fixtures/uv-workspace/mirrors the failing real-world layout. Tests: 179 → 184. - 0.26.0 — Python support (FastAPI / Flask / Django / pytest). Closes the polyglot blind spot that produced false-positive reports on Next.js+FastAPI repos like dclawstack/dclaw-monitor (analyzer was claiming "0 endpoints / no test framework / 0 test files" while the backend actually had 54 routes and 13 pytest files). Code-scanner now: (1) includes
.pyin the file glob, (2) parsesrequirements.txt/requirements-dev.txt/pyproject.toml(PEP 621 + Poetry tables) with version-spec / extras / env-marker / comment handling, (3) regex-detects FastAPI/Starlette/Flask/Django routes (@router.get(...),@app.route(...),path(...)), (4) emits alanguageCoveragefield (natively-analyzed % + counts of skipped languages like Go/Ruby/Rust/Java). Unit-analyzer countstest_*.py/*_test.py/tests/**/*.pypytest files viadef test_…regex and addspytestto the frameworks list. Tech-stack now includes FastAPI, Flask, Django, Starlette, SQLAlchemy, Pydantic, Alembic, Celery, pytest, Uvicorn/Gunicorn, APScheduler, OpenTelemetry, PostgreSQL (asyncpg/psycopg2). Dashboard shows an amber banner wheneverlanguageCoverage< 100%, naming each unsupported language with its file count — no more pretending "0 endpoints" means "no endpoints" when really we just didn't read the files. New fixturetests/fixtures/polyglot-python/(FastAPI + requirements.txt + pyproject.toml + Next.js); tests: 166 → 179. - 0.25.2 — Runner image published to GHCR (
ghcr.io/t4tarzan/testforge-runner:0.25.2). No more manualdocker buildstep on first Tier-2 use — the MCP auto-pulls the image (~92 MB) on the first/generate-and-runcall. Existingtestforge-runner:localbuilds still work viaTESTFORGE_RUNNER_IMAGEoverride. - 0.25.1 —
/healthnow reports the correct version (was hardcoded to "0.6.0"). - 0.25.0 — Tier 2: Generate & Run. New
POST /generate-and-runendpoint takes findings from a Tier-1 report, generates one Vitest file per finding via OpenRouter (primary: Qwen 3.7 Max, fallback: DeepSeek V4 Flash), executes them inside a pre-baked Docker container (node:22-slim+ vitest,--network=none --rm), and returns structured pass/fail JSON. Provider rotation is automatic on rate-limit or schema rejection; both attempts are recorded. New~/.testforge/history.db.generationstable persists every iteration. NewGET /api/generations+GET /api/generations/:idendpoints. Dashboard grows a "🤖 Generate Tests (Tier 2)" button under any report. Env overrides:TESTFORGE_PRIMARY_MODEL,TESTFORGE_FALLBACK_MODEL,TESTFORGE_RUNNER_IMAGE. Self-host has no quota (BYOK pays OpenRouter); managed SaaS gates Tier 2 to the Forge plan ($99/mo · 100 iterations/mo). Verified end-to-end againsttinyhttp/malibu: 3 findings → 3 Vitest files → sandbox run in ~45s total. Demo video at https://testforge.run/malibu-tier2.mp4. - 0.24.0 — Dimension deepening, pass 16. Stack analysis polished — substring traps eliminated and new signals added. Old code:
dep.includes('vite')matchedvitest,vitest-mock-extended,vite-something-else(vitest is a test framework, NOT a bundler — false strength). Now uses strict Sets per category for: test frameworks (jest/vitest/mocha/ava/tap/node-tap/@japa/runner/uvu/tape), lint tools (eslint/prettier/@biomejs/biome/rome/standard/xo/oxlint), ORMs (Prisma/Drizzle/TypeORM/Sequelize/Mongoose/MikroORM/Kysely/Knex/Objection), caches (Redis/ioredis/@upstash/memcached/lru-cache/node-cache/cache-manager), monorepo (Turbo/Nx/Lerna/Rush/Changesets), modern bundlers (vite/esbuild/SWC/Turbopack/Parcel/Rspack/Rollup), and new categories: modern frameworks (Next/Remix/Astro/Nuxt/SvelteKit/SolidStart/Qwik/Hono/h3), runtime validation (Zod/Yup/Joi/ajv/Valibot/Arktype/Effect/io-ts/class-validator), tRPC, TS runtimes (tsx/ts-node/esno). New tsconfig strict-mode detection: parsestsconfig.jsonand emits a low-severity finding when TypeScript is present butcompilerOptions.strictis not true. New "API server without validation library" finding: medium severity, fires only when a server framework is detected (Express/Fastify/Koa/Hono/NestJS) and no Zod/Yup/Joi/Valibot is in deps. Monorepo detection now also looks atnx.jsonandpnpm-workspace.yaml(not just turbo.json). Tests: 159 → 166. New fixturestests/fixtures/stack-modern/(Next + Hono + Prisma + Zod + tRPC + Vitest + Vite + Biome + tsconfig strict) andtests/fixtures/stack-legacy/(Express + Mongo + no TS +vite-something-elseandvitest-mock-extendedtraps that must NOT count). - 0.23.0 — Dimension deepening, pass 15. Visual regression and property-based testing both move from substring soup to AST-aware signals. Visual regression: new
lib/visual-regression.tswalks JSXAttribute nodes for thestyleattribute, counts REALstyle={{…}}props (not lines containing "style="), and inspects each object property's string value for hardcoded pixel values (/(\d{2,4})px\b/g) and inline hex color literals (#abc/#abcdef/#abcdef00). Findings fire at proper thresholds (≥3 files with inline styles + no CSS Modules → medium; ≥10 hardcoded px / ≥5 inline colors → low). Property-based testing: newlib/property-based.tsremoves the previous noisy "function with this.* > 1 is impure" heuristic (fired on every class method) and replaces substring checks with proper AST detection: imports offast-check/jsverify/@fast-check/vitest;fc.assert()/fc.property()/fc.check()call sites;typeof x === '…'/Array.isArray(x)/x instanceof Classtype guards;assert(...)/invariant(...)runtime invariants. New scope-aware findings: "no framework", "framework but nofc.assertcalls" (catchesimportwithout usage), "no runtime invariants". Tests: 150 → 159. New fixtures:tests/fixtures/visual-quality/(Bad.tsx with 3 components heavy in inline styles + comment trap; Good.tsx using CSS Modules) andtests/fixtures/property-quality/(util.js with type guards + assert.ok; util.property.test.js with twofc.propertyinvariants). - 0.22.0 — Dimension deepening, pass 14. Edge-case detection moves from broken line-level checks (the old code asked "does the ENTIRE PROJECT contain
.length?" to decide if any array access was bounds-checked — false clean on every real project) to AST-aware footgun detection. Newlib/edge-cases.tscatches six real bug shapes: (1)parseInt(x)without explicit radix (MDN best-practice); (2)JSON.parse(x)outside a try/catch (range-tracks try blocks via pre-pass); (3)new Date(nonLiteralString)(Invalid Date silently breaks downstream math); (4) loose equality==/!=(with the== nullexception preserved as canonical nullish check); (5)Number(x)used inline in a binary expression / return / member access where guarding is structurally impossible (parent-aware Babel traverse to skipconst n = Number(x)cases); (6)switchwithoutdefault:. Public report grows abyRulefield with hit counts per rule. Score: weighted cost per rule (JSON-parse 4pts, parseInt/Number 2pts, switch/loose 1pt). Tests: 141 → 150. New fixturetests/fixtures/edge-cases/withsrc/bad.js(every rule fires) andsrc/good.js(well-guarded variants; nothing fires, including== nullandconst n = Number(x); if (isNaN(n))…). - 0.21.0 — Dimension deepening, pass 13. Vision and Scope analyzers cleaned up from substring soup to precise matching. The old code had several broken cases the new module fixes:
analyticssubstring matchedcache-analytics,crypto-analytics-lib(false positive);authorin a README matched theauthfeature; the implementation check scanned the README itself so any feature documented there was automatically "implemented." Newlib/strategic-signals.tsconsolidates: case-insensitive README discovery (README.md,Readme.md,readme.md,docs/README.md); explicit Features-section extraction via markdown parsing (capture content under## Featuresuntil the next##heading or EOF); strict dep-name sets for product analytics, feature flags, error tracking, APM — no more.includes()traps; word-boundary keyword matching (\b<word>\b) soauthno longer matchesauthor. Vision dimension drops its CI/CD finding (now lives only in DORA, pass 12) to avoid double-surfacing. Code-scanner now loads.mdfiles (was excluded, hence README invisible). Scope's implementation check explicitly excludes README/markdown/package.json from the haystack so docs can't satisfy their own claims. Tests: 133 → 141. New fixturestests/fixtures/strategic-strong/(every documented feature actually implemented) andtests/fixtures/strategic-weak/(Payments + Notifications in README without implementation, pluscache-analytics/crypto-analytics-libdeps as substring traps). - 0.20.0 — Dimension deepening, pass 12. DORA metrics reframed from fabricated estimates (
"Daily (estimated)") to honest capability framing. Real DORA needs git/deploy history; a static analyzer can't see how often the team deploys. What it CAN see is the STATIC SIGNALS that map to each axis. Newlib/dora-signals.tsextracts: CI workflow files (.github/workflows/*.yml,.gitlab-ci.yml,.circleci/config.yml, etc.) — parsed via js-yaml to count jobs, detect type-check steps, and identify deploy jobs by name pattern; deployment platform configs (Dockerfile, vercel.json, render.yaml, fly.toml, app.yaml, netlify.toml, serverless.yml, terraform/, kubernetes/, helm/); observability deps (Sentry, Datadog, NewRelic, OpenTelemetry, Honeycomb, Rollbar, Bugsnag — 20+ specific dep names); structured logging deps (pino, winston, bunyan, roarr); feature-flag deps (LaunchDarkly, Statsig, Unleash, Flagsmith, Posthog, GrowthBook, Split, ConfigCat); CODEOWNERS / branch-protection files. Each of the 4 DORA axes is now described asCapability: Good | Partial | Weakrather than a fake frequency string. Per-axis findings fire only when the matching capability is weak: missing CI, missing deploy automation, missing type-check in CI, missing observability, missing feature flags, missing CODEOWNERS. Code-scanner extended to load.github/**,Dockerfile,Procfile,CODEOWNERS(extensionless config files were silently excluded before). Tests: 125 → 133. New fixturestests/fixtures/dora-mature/(full CI workflow + Dockerfile + CODEOWNERS + Sentry + pino + Posthog) andtests/fixtures/dora-immature/(nothing wired). - 0.19.0 — Dimension deepening, pass 11. Mutation testing moves from a single test-to-source ratio to AST-based assertion-quality analysis. True mutation testing requires running mutated code (out of scope for a static analyzer), but assertion shapes are a strong proxy for mutation-kill rate and statically observable. New
lib/mutation-quality.tswalks each test file's AST, classifies every assertion call into strong (toBe,toEqual,toThrow,toBeInstanceOf,toHaveLength,toMatchObject,toHaveBeenCalledWith,toBeCloseTo, comparison matchers, …), weak (toBeTruthy,toBeFalsy,toBeDefined,toBeNull,toBeNaN,toHaveBeenCalled— a mutation42 → 41still satisfiestoBeTruthy()), snapshot (toMatchSnapshot,toMatchInlineSnapshot), or other. Handles.not/.resolves/.rejectsmodifiers, avat.Xstyle, chaishould.X. Public report growsassertionStats(per file) +assertionTotals(project rollup includingweakRatio,snapshotRatio,overallVariety= count of distinct strong-matcher types). Score model: base from test-to-source ratio plus adjustments (+5 if variety ≥ 5, −10 if weakRatio > 0.3, −5 if snapshotRatio > 0.5, +10 if Stryker present). Bounded[10, 90]. New findings: medium "test file(s) dominated by weak assertions" (>50% weak); low "snapshot-dominated test file(s)" (≥90% snapshot); low "low matcher variety" (<4 distinct strong types across the project); existing "Stryker not configured" preserved. Tests: 118 → 125. New fixturetests/fixtures/mutation-quality/with three contrast test files (strong / weak / snapshot) plus a focused source under test. - 0.18.0 — Dimension deepening, pass 10. Chaos / resilience analyzer moves from substring matching (
allContent.includes('SIGTERM')— any comment with the word fooled it) to AST-based detection of actual resilience patterns. Newlib/chaos-patterns.tswalks parsed ASTs for: graceful shutdown handlers (process.on('SIGTERM'|'SIGINT', ...)); process-level safety nets (process.on('unhandledRejection'|'uncaughtException', ...)); retry library imports + call sites (p-retry / async-retry / axios-retry / exponential-backoff / cockatiel); manual retry loops (for/while + try/catch + setTimeout — heuristic); Express global error middleware (4-arg handler signature); Fastify setErrorHandler; new AbortController() instantiation; Idempotency-Key header reads. Public report growspatternsfield withChaosPatternHit[]per category. New findings: critical "no try/catch anywhere"; high "no graceful shutdown" / "no global error handler"; medium "no retry/backoff" / "no unhandledRejection guard"; medium "payment code without Idempotency-Key" (only fires when stripe/payment deps detected). Pass 6 covered circuit breakers + outbound timeouts; pass 10 doesn't duplicate those. Tests: 109 → 118. New fixturestests/fixtures/chaos-resilient/(every pattern wired) andtests/fixtures/chaos-fragile/(nothing wired, plus comment-trap mentioning the keywords). - 0.17.0 — Dimension deepening, pass 9. License compliance check rewritten from broken to functional. The previous version had a
knownGPLlist containing['react', 'vue', 'angular', 'moment', 'underscore']— all of which are MIT-licensed. It also never populated thecopyleftDepsarray it promised in its return type. The new version: walksnode_modules/(when present) and reads each package'slicensefield; categorizes per SPDX into permissive (MIT/ISC/Apache-2.0/BSD-*/0BSD/CC0/Unlicense), copyleftWeak (LGPL/MPL/EPL — LGPL correctly classified as weak even though it contains "GPL"), copyleftStrong (GPL/AGPL/OSL/SSPL — incompatible with proprietary distribution), proprietary (UNLICENSED, "SEE LICENSE IN …"), or unknown (missing field). Emits per-category findings: strong copyleft = HIGH severity with the warning about source-disclosure obligations; weak copyleft = MEDIUM with the linking-exception caveat; UNLICENSED = MEDIUM; missing field = LOW. Whennode_modules/isn't present, emits an honest "license audit could not run" finding instead of silently returning a fake clean report. Public report shape grows:inspected,byCategory(counts per category),strongCopyleft/weakCopyleft(full package lists), plus the existing fields preserved.runLicenseCheck(deps, projectPath?)— back-compat, but you need projectPath for the audit to actually run. Bothindex.tsandmcp-server.tscall sites updated. Tests: 101 → 109. New fixturetests/fixtures/license-mixed/with a syntheticnode_modules/containing MIT, GPL-3.0, LGPL-2.1, UNLICENSED, no-license, and a scoped@scope/scoped-mitpackage; 8 tests including a focused unit test forcategorizeLicensecovering 11 edge cases. - 0.16.0 — Dimension deepening, pass 8. Supply-chain audit becomes lockfile-aware. New
lib/supply-chain.tsreadspackage-lock.json(lockfileVersion 2/3, npm v7+), surfaces the full transitive dependency graph, and adds detection for: (1) non-registry sources — packages installed viagit+,github:,file:,link:, orhttp://(skip npm's tarball signing); (2) missing integrity hashes — registry-resolved entries with nointegritySRI; (3) duplicate-version drift — same package resolved to multiple versions; (4) transitive CVE matches — the existing hardcoded vuln list now scans EVERY entry in the lockfile, not just direct deps. New "no lockfile" finding when a project has nopackage-lock.jsonat all. CVE catalogue extended to include minimist, word-wrap, jsonwebtoken alongside the existing list. Public report shape grows fields:totalTransitive,nonRegistrySources,missingIntegrity,duplicateVersions.runSupplyChainAudit(deps, devDeps, projectPath?)is back-compat: old two-arg callers still work but only see direct-dep CVEs (with the new "no lockfile" finding emitted to flag the limitation). Bothindex.ts(HTTP server) andmcp-server.ts(test runner) call sites updated to pass projectPath. Tests: 93 → 101. New fixturestests/fixtures/supply-chain-dirty/(lock with all four red flags) andtests/fixtures/supply-chain-clean/(negative). - 0.15.0 — Dimension deepening, pass 7. OWASP Top 10 (2021) coverage redesigned to be honest about what the analyzer can and can't detect. The previous version counted "categories with any finding" as "covered" — which inverted the meaning (more vulnerabilities = higher score). The new report distinguishes three orthogonal signals: analyzer-coverage (which categories the analyzer ships rules for, project-independent), project-findings (per-category severity breakdown from this project's findings), and gaps (categories the analyzer doesn't yet cover — currently A08 Software Integrity and A10 SSRF, explicitly flagged). Public report grows
byCategory: OwaspCategoryReport[]with severity-bucketed counts per code and the detector categories that contributed to each. New rollup findings: any OWASP code with ≥1 critical or ≥3 high findings surfaces as a category-level finding (e.g.A03:2021 — Injection: 4 finding(s)with severity breakdown) so dashboards can show the OWASP framing. Newlib/owasp-map.tsis the single source of truth for security-category → OWASP code mapping (a finding can map to multiple codes; CORS now correctly maps to A05, not A01). Tests: 86 → 93. New cases assert score-stability across finding count (analyzer-coverage doesn't change when project findings change), correct bucketing, rollup triggers, no-false-positive on sparse low-severity findings, plus an integration test mapping vulnerable-fixture findings into OWASP categories. - 0.14.0 — Dimension deepening, pass 6. Load analyzer moves from substring-matching to AST-based middleware + call-pattern detection. New
lib/load-patterns.tswalks parsed ASTs for:app.use(rateLimit(...))/fastify.register(fastifyRateLimit)style middleware registration;app.use(compression()); cache calls (redis.get/set,cache.get/set, etc. — receiver-name discriminated); pool constructions (new Pool({...}),mysql.createPool); timeout configurations (server.timeout = N,axios.create({ timeout }),fetch(url, { signal })); health endpoints (actual route registration matching/health,/ready,/live,/healthz,/status); circuit breaker imports (opossum/brakes/cockatiel) andbreaker.fire()calls; sync I/O inside route handlers (readFileSync/writeFileSync/execSyncetc. — a new HIGH-severity finding for a real production performance bug). Public report shape grows apatternsobject with file+line locations for every detected hit. Bug fixes: circuit-breaker rule had a precedence bug (!hasCircuitBreaker && allContent.includes('fetch') || allContent.includes('axios')) that fired on every codebase containing the word "axios"; now it correctly requires both external-call presence AND missing breaker. Boolean flags (hasRateLimiting,hasCaching, etc.) are now backed by AST hits OR explicit strong-evidence deps (not loose substring matches like'cache'). Tests: 79 → 86. New fixtures:tests/fixtures/load-resilient/(every pattern correctly wired) andtests/fixtures/load-fragile/(nothing wired, plus sync I/O in handlers, plus comments that mention the keywords as a regex trap). - 0.13.0 — Dimension deepening, pass 5. Accessibility analyzer for JSX/TSX moves from line-by-line regex to AST-based JSX attribute inspection. New
lib/a11y-jsx.tswalksJSXElementnodes (parent ofJSXOpeningElementso it can see children) and runs proper attribute-aware checks. New rules:img-no-alt(HIGH, WCAG 1.1.1);button-no-accessible-name(HIGH, WCAG 4.1.2) — icon-only buttons that have only a self-closing<svg/>child with noaria-labelare now correctly flagged;anchor-no-accessible-name(HIGH, WCAG 2.4.4);anchor-target-blank-no-noopener(MEDIUM — tab-nabbing risk);input-no-label(MEDIUM, WCAG 3.3.2) — excludestype="hidden" | "submit" | "button";clickable-non-interactive(MEDIUM, WCAG 2.1.1) —<div onClick>withoutrole="button" tabIndex={0};aria-empty(MEDIUM) —aria-label=""is worse than no aria-label. Children-aware accessible-name resolution: text content, expression containers (string/template/identifier), and recursive child JSX elements all contribute. Same-content<a target="_blank" rel="noopener noreferrer">no longer fires the tab-nabbing rule (rel parses correctly). Findings carry the matching WCAG criterion in the publicwcagCriterionfield. HTML/Vue/Svelte files keep the prior regex path (Babel doesn't parse them). Tests: 69 → 79. New fixturetests/fixtures/a11y-jsx/with 11 JSX patterns covering 6 anti-patterns + their accessible counterparts; 10 tests assert each detection plus negative cases for the accessible variants and<input type="hidden">. - 0.12.0 — Dimension deepening, pass 4. Predictive failures goes from 5 project-level heuristic counts to cross-signal per-file risk aggregation. New
lib/predictive.tsingests signals from other dimensions (security findings by severity, N+1 hits, dead exports) plus AST-derived cyclomatic complexity (newlib/complexity.ts) and TODO/FIXME density per file, and produces a ranked list of risk hotspots. Each hotspot carries areasons[]breakdown so you know exactly why a file scored high (e.g. "security: 2 critical · 1 N+1 hit · hot function (cc=23 inprocessOrder)"). The aggregator runs in two modes: standalone (derives N+1, dead-code, complexity itself) or cross-signal (caller passes pre-computed findings — preferred when running the full pipeline). The dimension's public report grows atopRiskyFiles: FileRisk[]field; up to 5 hotspots also surface as findings withcategory: 'Predictive'and severity scaling with the per-file score. Deterministic by construction — same inputs → same scores; weights centralized inlib/predictive.ts. Replaces the previous brace-counting "max nesting" heuristic which over-fired on arrow functions and template literals. Tests: 64 → 69. New test cases cover hotspot surfacing, cross-signal aggregation, multi-reason scoring, theRisk hotspot:finding shape, and the Low-risk no-signal floor. - 0.11.0 — Dimension deepening, pass 3. Contract analysis goes from a substring check on filenames to real cross-referencing between OpenAPI/Swagger specs and AST-discovered routes. New
lib/openapi-parse.tsloadsopenapi.{yaml,yml,json}/swagger.*/api-spec.*files (parses with js-yaml, validates theopenapi:/swagger:root, extractspaths→ method tuples + operationIds). Newlib/endpoint-discovery.tsAST-walks source files forapp.get/router.post/fastify.put/etc., recording the canonical(method, path)of each registration.canonicalPathnormalizes/users/{id}(OpenAPI) and/users/:id(Express) to the same shape so they match. New findings emitted: undocumented endpoints (in code, missing from spec), orphan endpoints (in spec, no handler in code), invalid-but-named spec files (file looks like a spec but noopenapi:root / parse error), and a smarter missing-versioning check (only fires when a spec exists to compare).code-scanner.tsnow loads YAML/JSON files (with package-lock and yarn-lock explicitly excluded). Tests: 58 → 64. New fixtures:tests/fixtures/contracts/(spec + Express server with intentional mismatches: 2 matched, 2 undocumented, 1 orphan) andtests/fixtures/contracts-missing/(8 endpoints, no spec at all). - 0.10.0 — Dimension deepening, pass 2. Unit-analyzer goes from a regex test-counter to an AST-aware test-quality analyzer. New
lib/test-quality.tswalks each parseable test file and produces a structuredTestFileQualityper file. New report shape carries aqualityblock at the top level:{ totalCases, skippedCases, focusedCases, assertionlessCases, emptyCases, isolatedTestFiles }. Detection of new anti-patterns:it.skip/xit/it.todo(skipped — rot risk);it.only/fit(focused — silently kills sibling tests in CI); test bodies with NO recognized assertion call (expect/assert/should/t.X/snapshot matchers from Jest/Vitest/Mocha/Chai/AVA/tap/node:test/Testing Library); empty test bodies (only comments / trivial statements); test files that import nothing project-relative (testing only the framework). Recognized frameworks expanded: Jest, Vitest, Mocha, AVA, Node Tap, node:test, Testing Library, Chai. Tests: 51 → 58. New fixturetests/fixtures/test-quality/with healthy + 5 unhealthy patterns; 7 tests assert each detection. - 0.9.0 — Dimension deepening, pass 1. The keepers across the 21 dimensions stay at 21; the depth inside each one grows. This pass takes N+1 detection and dead-code detection from regex-and-substring heuristics to AST-aware analysis using the same Babel + visitor + cross-file infrastructure the security spine uses.N+1 detection — new
lib/n-plus-one.ts. Walks parsed ASTs for db sinks (.query/.exec/.findOne/.findUnique,sql\`,prisma.x,mongoose.x,sequelize.x) nested insidefor/for-of/for-in/while/do-while, plus the higher-orderarr.forEach/map/filter/reduce/some/every/find/flatMapforms (callback body = loop body). Skips calls already wrapped inPromise.all/Promise.allSettled(parallelised, not N+1). Replaces the prior{/}line-counter that over-fired on inner closures and missed db calls in arrow-function loop bodies.<br><br>**Dead-code detection** — newlib/dead-code.ts. For each project file, the AST yields its declared/exported symbols + every referenced identifier + every imported module specifier. An exported symbol is "dead" iff no OTHER file references its name. Replaces the priorallContent.includes(name)heuristic that flagged nothing because every symbol's own declaration line contained its name. **Unused-deps** check now matches on the module ROOT (lodash) soimport { get } from 'lodash/get'counts as a use — covers a common false-positive that wrongly flagged sub-path-only imports.<br><br>Tests: 41 → 51. New fixtures attests/fixtures/n-plus-one/(3 positive cases: for-of, forEach, classic for; 2 negative cases:Promise.all-wrapped, no-loop) andtests/fixtures/dead-code/` (used vs. unused exports, sub-path import, genuinely-unused dep). Limitations called out in the source: cross-file dead-code is name-based (global-scope collisions over-count as "used"), and N+1 doesn't follow function calls into closures (intentional — would explode FP rate). - 0.8.1 — Patch. Internal cleanup matching the repo-wide lint backlog clearance (127 → 0 errors). Type tightening across
mcp-server.ts,local-db.ts,types.d.ts: replacedany/as anywithAwaited<ReturnType<…>>,unknownwith narrowing, and concrete shapes (e.g. newReportRow). Dead-code purges (unused regex constants inaccessibility-analyzer.ts, unusedfindParamForExpressioninfunction-summaries.ts, dead imports inmcp-server.ts/test-runner.ts).catch (err: any) → (err)then(err as Error).message. No behavior changes; same analyzer outputs on the same fixtures (41/41 tests pass). - 0.8.0 — Spine, Phase 4c. User-authored rules DSL. Projects can drop a
.testforge/rules.yaml(or.yml/.json) at the repo root to declare custom pattern detectors that ride on top of the built-in analyzer — no fork required. Each rule hasid,title,severity,category, an optionaldescription/fixSuggestion, and amatchblock. Match shapes in v1:callee(exact dotted match, string or array),calleeRegex(anchored as written),taintedArg(require the arg at this index to come back tainted via the Phase 2 engine), andargRegex(require the string-literal arg at this index to match). Taint-gated rules get HIGH confidence (real source-to-sink flow); shape-only rules get MEDIUM. Malformed rules log a one-shot warning and are skipped — one bad rule never aborts analysis. Up to 200 rules per project. Rules can also be supplied programmatically via the newuserRules?: UserRule[]config field (overrides the on-disk file). Tests: 36 → 41; newtests/fixtures/user-rules/exercises all three match shapes plus the no-fire negative paths. - 0.7.0 — Spine, Phase 4b. Cross-file taint propagation. New
lib/cross-file-summaries.tswalks every parseable file in a single pre-pass, computes the per-file function summary table (from Phase 4a), then publishes the ones that carry sinks under<resolvedPath>::<exportName>keys. A companionlib/module-resolver.tsresolves relative imports against the candidate file set (.ts,.tsx,.mts,.cts,.js,.jsx,.mjs,.cjs, plus/index.*directory-imports and explicit-extension swaps) without touching disk. Each file gets its owncollectFileImportsmap of "local-name → cross-file key" — handles ESM (import { x },import x,import * as ns) and CJS (const { x } = require(...),const x = require(...).y,const ns = require(...)). The analyzer'scheckCrossFunctionSinkCallnow consults the cross-file index for both direct identifier calls andns.Xmember calls, emitting findings at the call site of the importing file. Deferred: re-exports (export { x } from './y'), tsconfig path aliases, node_modules resolution, dynamicrequire(). Tests: 30 → 36; newhelpers/db-helper.js(CJS) +helpers/redirect-helper.js(ESM) +cross-file-cjs.js+cross-file-esm.jsfixture set. - 0.6.0 — Spine, Phase 4a. Cross-function taint propagation (intra-file). New
lib/function-summaries.tsbuilds a per-file table summarizing each named/aliased function: which parameters land in a sink (and which category), which sanitizers wrap them, whether the return value propagates taint. The analyzer then emits findings at the call site when a helper with a sink summary is called with tainted arguments. Catchesfunction runQuery(q) { db.query(q); }+runQuery('...' + req.body.x)as critical/high SQL injection. Handles named declarations, aliased function expressions (const fn = function() {…}), arrow functions (const fn = (a, b) => …). Per-helper intra-procedural taint runs to a small fixpoint so chainsparam → const A = param + '…' → const B = A → sink(B)resolve cleanly. Deferred: cross-file resolution (Phase 4b), higher-order references like[].map(handler). Tests: 25 → 30, newcross-function.jsfixture covering SQL inj / open redirect / path traversal / XSS via helpers. - 0.5.0 — Spine, Phase 3. Structured fix suggestions. Each finding can now carry
fix: { description, before, after, importsNeeded?, applicable }.applicable: truemeans "safe to apply mechanically" — the dashboard / CLI can offer a one-click apply (still asking confirmation).applicable: falsemeans "directional advice, the rewrite needs human judgment." Categories that auto-rewrite: SQL injection (concat / template → parameterized form with$Nplaceholders + bind array), hardcoded named secrets (const api_key = 'sk_…'→const api_key = process.env.API_KEY), reflected XSS viares.send(wrap argument withescape()), innerHTML / dangerouslySetInnerHTML (wrap withDOMPurify.sanitize(...)). Description-only suggestions foreval/Function/exec, open redirect, path traversal, CORS wildcard, sensitive field inres.json(destructure-omit). Public response shape stays additive; old consumers unaffected. - 0.4.0 — Spine, Phase 2. Generalized intra-procedural taint tracking across all sinks (was only SQL injection in 0.3.0). New
lib/taint.tsengine: per-file table ofMap<localName, {source, sanitizers[]}>, expression-tree walker that traces taint through identifiers, member access, template literals, string concat, conditional/logical ops, andJSON.parse. Recognizes 20+ sanitizers (DOMPurify, sanitize-html, escape, path.normalize, parseInt/Number, encodeURIComponent, allowlist.includes()/.has()). New per-findingflowfield — narrative like "argument flows from request through DOMPurify.sanitize".confidencesemantics tightened: HIGH = source→sink no sanitizer, MEDIUM = sanitizer in path, LOW = pattern matched without taint. All 6 sink categories (SQL inj, RCE, path traversal, open redirect, reflected XSS, DOM XSS) now share the same engine — adding a new source or sanitizer extends all of them at once. - 0.3.0 — Spine, Phase 1. Security analyzer moved from line-level regex to a Babel AST traversal. New per-finding
confidencefield (high/medium/low). Inline suppression comments (// testforge-disable-next-line <category>and// testforge-disable-file <category>). Findings now carry acolumnnumber alongside the line. File-size cap (500 KB) and per-file 250 ms parse-and-traverse budget. Basic intra-procedural taint: SQL injection detection catchesconst q = '…' + req.x; db.query(q);shape, not just inline interpolation. False-positive corpus and true-positive corpus added undertests/fixtures/to lock in the new precision.eval()re-categorized from XSS to "Dangerous Functions" (more accurate — it's RCE, not script-injection). Old consumers unaffected: the public response shape is additive-only. - 0.2.19 —
/testand/quick-scannow persist their summary to~/.testforge/history.dbon completion (previously written to in-memory Maps only — runs evaporated on restart). - 0.2.18 — Default port changed from
3001→33221to avoid local-dev collisions./api/reports/latestreturns 404 when the local DB is empty instead of fabricated seed data.fast-json-stringifylisted as direct dep (defensive against npx cache quirks)./healthnow reports the actual package version. - 0.2.17 and earlier — see git history.
License
MIT
