@yasserkhanorg/impact-gate
v2.4.0
Published
Shift QA left: analyze your PR diff, identify affected user flows, and generate the exact E2E tests you need.
Maintainers
Keywords
Readme
Impact Gate
@yasserkhanorg/impact-gate
Shift QA left: analyze your PR diff, identify affected user flows, and generate the exact E2E tests you need. One command takes you from git diff to a ready-to-run test file.
What It Does
impact-gate review --generate is the primary workflow. Given a git diff, it:
- Identifies which user flows your PR affects (behavior analysis)
- Finds existing test coverage for those flows
- Generates the actual E2E test files for uncovered flows
- Outputs ready-to-run test code
npx impact-gate review --path . --since origin/main --generateThat's it. The developer runs the test, commits it, done.
The same diff-aware engine works for pull requests against main, release branches, hotfixes, and any "what changed between these refs?" check. Without --generate, the review command outputs a human-readable report (or a CI PR comment with --ci-comment-path).
Product Shape
| Level | Commands | What They Are For |
|------|----------|-------------------|
| Recommended | review, review --generate | Unified PR review: what changed, what's tested, what's missing, and optionally generate the missing tests |
| Core CI Workflow | impact, plan, gate | Lower-level building blocks: impact analysis, coverage planning, and CI gating |
| Defect Prediction | predict, predict-feedback | Research-backed defect risk scoring from git diffs, with optional LLM semantic analysis and calibration |
| AI Generation | generate, heal, analyze, finalize-generated-tests | Standalone test generation and healing (review --generate is the preferred entry point) |
| Setup and Calibration | init, train, bootstrap, traceability-*, feedback, cost-report, llm-health | Build the manifest, feed execution data back in, and inspect cost/provider health |
| Advanced / Experimental | crew, MCP mode, plugins, impact-gate-qa | Deeper orchestration and browser-driven workflows |
Known Limitations
- AI test generation requires an LLM API key (Anthropic, OpenAI, or local Ollama). The review report works without one.
- Generation quality improves with a good
route-families.jsonmanifest and a knowledge graph. - The
reviewcommand without--generateis fully deterministic and free. - The strict profile is the most opinionated path. Start with defaults and opt into stricter heuristics once your mappings are mature.
Free Tier
These commands work with zero LLM cost and do not require an API key:
| Command | What It Does |
|---------|-------------|
| review | Unified PR review: behavior analysis, coverage gaps, defect risk, test recommendations |
| impact | Deterministic impact analysis from a git diff |
| plan | Coverage-gap detection and recommended run set |
| gate | CI coverage gate that exits non-zero below a threshold |
| predict | Research-backed defect risk scoring from a git diff |
| train --no-enrich | Build route-families.json with the scanner only |
| bootstrap | Generate route-families.json from a knowledge graph |
| traceability-capture | Extract test-file relationships from Playwright JSON |
| traceability-ingest | Merge traceability mappings into rolling manifest |
| feedback | Ingest recommendation outcomes for calibration |
| predict-feedback | Record prediction outcomes for calibration |
| cost-report | View LLM cost breakdown from past runs |
AI features (review --generate, review --deep, generate, heal) use Anthropic, OpenAI, or a local Ollama instance.
Start Here
Install:
npm install -D @yasserkhanorg/impact-gateRequires Node.js >= 20. Ships both CommonJS and ESM builds.
Review a PR (recommended starting point)
# See what your PR changes, what's tested, and what's missing
npx impact-gate review --path . --since origin/mainThis outputs a behavior-aware report: which user flows changed, existing test coverage, coverage gaps, defect risk score, and specific test recommendations. No API key needed.
Generate tests for uncovered flows
# Review + generate ready-to-run test files for the gaps
npx impact-gate review --path . --since origin/main --generate
# Dry run: generate test files without executing them
npx impact-gate review --path . --since origin/main --generate --dry-runRequires an LLM API key (ANTHROPIC_API_KEY or OPENAI_API_KEY). The generator uses your project's existing test patterns, page objects, and API surface to produce grounded test code.
Review options
# Deep mode: add LLM-powered semantic risk analysis
npx impact-gate review --path . --since origin/main --deep
# JSON output for tooling
npx impact-gate review --path . --since origin/main --json
# Write a PR comment file for CI
npx impact-gate review --path . --since origin/main --ci-comment-path comment.md
# Custom output directory for generated tests
npx impact-gate review --path . --since origin/main --generate --generate-output ./generated-testsCI gating
For CI pipelines that need a pass/fail gate:
# Exit non-zero if coverage is below threshold
npx impact-gate gate --path . --threshold 80
# Exit non-zero if defect risk exceeds threshold
npx impact-gate review --path . --since origin/main --predict-threshold 0.7Lower-level commands
The review command combines impact, plan, and predict into one. You can still use them individually:
# Just the impact analysis
npx impact-gate impact --path . --since origin/main
# Just the coverage plan
npx impact-gate plan --path . --since origin/main
# Release readiness check against a tag
npx impact-gate plan --path . --since v2.1.0Notes:
impactprints a deterministic summary to stdout.planwrites.e2e-ai-agents/plan.jsonand.e2e-ai-agents/ci-summary.md.gateexpects a threshold in the range0-100and exits1when the threshold is missed.
Defect Prediction
Research-backed defect risk scoring that works on any repo with zero config. No LLM required.
# Score a PR against main
npx impact-gate predict --path . --since origin/main
# With LLM semantic analysis (~$0.02/PR)
npx impact-gate predict --path . --since origin/main --deep
# CI gate: exit 1 if risk exceeds threshold
npx impact-gate predict --path . --since origin/main --predict-threshold 0.7The engine extracts 14 change-level metrics (Kamei et al. 2013), code complexity deltas (Hassan 2009), and an optional LLM semantic layer that flags removed error handling, weakened validation, and risky patterns. Scores improve over time with feedback:
# Record outcome after a PR ships
npx impact-gate predict-feedback --outcome clean --ref abc123
# Retrain weights after 50+ labeled samples
npx impact-gate predict --trainFunction-Level Accuracy (Knowledge Graph)
For function-level impact analysis, generate a code knowledge graph:
# Graphify: deterministic AST extraction, 20 languages, no LLM cost
pip install graphifyy
graphify .
# impact-gate auto-detects the KG and enables function-level output
npx impact-gate review --path . --since origin/mainWith a KG, the review shows which specific functions are untested:
Untested Functions:
❌ ClearChannelManagedCategory (channel_category.go) -- called by: patchChannel
Tested Functions:
✅ SetChannelManagedCategory -- tested by: managed_categories.spec.tsAlso supports Understand-Anything knowledge graphs. See the Knowledge Graph Guide for details.
Dogfood Proof
The current repo includes a full dogfood run at dogfood/2026-03-28/README.md.
- Playwright example: synthetic auth change ->
impact,plan, andgateall behaved as expected - Cypress example: synthetic dashboard change -> parity proof for the same deterministic flow
- Self dogfood: heuristic fallback grouped changes truthfully, but still read too optimistically for a package-style repo
Takeaway:
- the strongest product path is still an app-shaped Playwright/Cypress repo with a maintained manifest
- zero-config / heuristic fallback is useful for orientation, but it should not be treated as equally trustworthy for release decisions
Setup and Calibration
These commands help the core CI workflow become accurate and project-aware.
# Build the manifest from the repo structure
npx impact-gate train --path /path/to/project --no-enrich
# Or bootstrap it from an Understand-Anything knowledge graph
npx impact-gate bootstrap --path /path/to/project [--kg-path ./knowledge-graph.json]
# Feed execution data back into the manifest
npx impact-gate traceability-capture --path /path/to/project --traceability-report ./playwright-report.json
npx impact-gate traceability-ingest --path /path/to/project --traceability-input ./traceability-input.json
# Calibration and diagnostics
npx impact-gate feedback --path /path/to/project --feedback-input ./feedback.json
npx impact-gate cost-report --path /path/to/project
npx impact-gate llm-healthAI-Powered Test Generation
The recommended way to generate tests is through review --generate, which feeds the review's uncovered recommendations directly into the test generator:
npx impact-gate review --path . --since origin/main --generateFor standalone generation (e.g., from a pre-built plan), the individual commands are still available:
# Generate tests from a plan or scenario file
npx impact-gate generate --path /path/to/project [--scenarios <path>]
# Heal flaky or failing specs from a Playwright report
npx impact-gate heal --path /path/to/project --traceability-report ./playwright-report.json
# Stage generated tests, commit, and optionally open a PR
npx impact-gate finalize-generated-tests --path /path/to/project --create-pr
# All-in-one wrapper (legacy): impact + coverage + optional generation/healing
npx impact-gate analyze --path /path/to/project [--generate] [--heal]How Hallucinations Are Tackled
The AI path is intentionally constrained instead of trusting raw LLM output.
- Deterministic first: impact analysis, coverage planning, and release-diff planning work without an LLM. The AI layer comes after the diff and coverage evidence are already established.
- Local API surface grounding: generation prompts are built from discovered page objects, helpers, method signatures, and inherited methods from your own repository.
- Prompt-level constraints: the generator is explicitly told to use only known methods and to fall back to raw Playwright selectors when a method is not available.
- Prompt sanitization: flow names, evidence, and user-action strings are sanitized before being injected into prompts.
- Hallucination detection gate: generated code is scanned for method calls that do not exist in the discovered API surface. Suspicious specs are blocked by default instead of being written into the main specs directory.
- Needs-review quarantine: blocked specs are written to
generated-needs-review/so teams can inspect them manually rather than accidentally trusting them in CI. - Verification after generation: written specs go through compile checks and smoke-run verification. Failing specs are moved out of the trusted path.
This is why the strongest product story is still: deterministic diff -> test plan -> optional AI assistance with guardrails.
Advanced / Experimental
These features are real, but they are not the clearest place to start if your goal is simple CI coverage decisions.
Multi-Agent Crew
The Crew orchestrates deeper multi-agent workflows on top of the same impact-analysis foundation. Use it when you want richer strategy output, structured test design, or end-to-end generation pipelines.
# Quick strategy recommendations
npx impact-gate crew --workflow quick-check --path /path/to/project --tests-root ./e2e-tests --since origin/master
# Full design-only workflow
npx impact-gate crew --workflow design-only --path /path/to/project --tests-root ./e2e-tests --since origin/master
# End-to-end workflow
npx impact-gate crew --workflow full-qa --path /path/to/project --tests-root ./e2e-tests --since origin/masterBuilt-in safeguards include budget enforcement, provider circuit breaking, and structured output for downstream tooling.
Plugins
External agents can register into crew workflows via the plugins config:
import type {AgentPlugin, AgentTask, AgentResult, CrewContext} from '@yasserkhanorg/impact-gate';
const myPlugin: AgentPlugin = {
role: 'my-custom-analyzer',
phase: 'understand',
runAfter: ['impact-analyst'],
async execute(task: AgentTask, ctx: CrewContext): Promise<AgentResult> {
return {role: 'my-custom-analyzer', status: 'success', output: null, warnings: []};
},
};
export default myPlugin;npx impact-gate crew --plugins ./my-plugin.ts --workflow full-qa --path ./appSee docs/PLUGIN_API_STABILITY.md for the API contract and stability guarantees.
Programmatic API
import {
CrewOrchestrator,
ImpactAnalystAgent,
CrossImpactAgent,
RegressionAdvisorAgent,
StrategistAgent,
TestDesignerAgent,
} from '@yasserkhanorg/impact-gate';
const orchestrator = new CrewOrchestrator();
orchestrator.registerAgent(new ImpactAnalystAgent());
orchestrator.registerAgent(new CrossImpactAgent());
orchestrator.registerAgent(new RegressionAdvisorAgent());
orchestrator.registerAgent(new StrategistAgent());
orchestrator.registerAgent(new TestDesignerAgent());
const result = await orchestrator.run({
appPath: './webapp',
testsRoot: './e2e-tests',
gitSince: 'origin/master',
workflow: 'design-only',
});
console.log(result.context.strategyEntries);
console.log(result.context.testDesigns);
console.log(result.context.crossImpacts);Route-Families Training
What it produces
The train command builds a knowledge map of your codebase — a single JSON file (route-families.json) that maps source files to features, test directories, and user flows. This is not ML training; no model is trained. It's building a structured manifest like:
{
"id": "channels",
"routes": ["/{team}/channels/{channel}"],
"priority": "P0",
"webappPaths": ["src/components/channel_header/**"],
"serverPaths": ["server/channels/api4/channel*.go", "server/channels/app/channel*.go"],
"specDirs": ["specs/functional/channels/"],
"userFlows": ["Create channel", "Archive channel", "Search in channel"],
"components": ["ChannelHeader", "ChannelSidebar"]
}Why the tool needs this
When a PR changes server/channels/app/channel.go, the tool needs to answer: "which E2E tests should I run?" Without the manifest, it has no idea. With it:
channel.go changed
→ belongs to "channels" family
→ specs are in specs/functional/channels/
→ run those tests
→ flag if coverage is missing for the affected user flowsEvery downstream command (impact, plan, generate, heal, impact-gate-qa) reads this manifest to understand the codebase.
How scanning works
The scanner uses 4 strategies to build the file → family mapping:
- Directory matching —
src/channels/+tests/channels/share a name → channels family - Test-derived —
specs/functional/channels/drafts/exists with spec files → drafts family (even if source code is scattered across components/actions/reducers) - Server-derived —
api4/channel.go+app/channel.go+store/channel_store.gospan 3 backend tiers → channel family (related files likechannel_bookmark.goare grouped under the parent) - Name-matched —
src/utils/channels.tsorserver/public/model/channel.gobasename matches → add to channels family's paths
What LLM enrichment adds
The scanner finds files. The LLM reads code samples and adds semantic metadata the scanner can't determine:
- Accurate URL routes (
/{team}/channels/{channel}instead of guessed/channels) - Priority classification (P0 critical user flow vs P2 nice-to-have)
- Human-readable user flows ("Create channel", "Search messages")
- React component and page object names
This metadata makes impact analysis smarter — it can prioritize P0 flows and suggest specific test scenarios.
What validation does
The --validate flag measures manifest accuracy against real git history. It's not training data — it's a quality check:
835 commits → 5105 changed files → 3223 bound to a family = 63% coverageThis tells you the manifest is complete enough. If coverage were 30%, impact analysis would be blind to most code changes.
Usage
# Scan your codebase + LLM enrichment (default)
npx impact-gate train --path /path/to/project
# Offline mode (no LLM, no API key needed)
npx impact-gate train --path /path/to/project --no-enrich
# Validate accuracy against recent git history
npx impact-gate train --path /path/to/project --validate --since HEAD~50
# Full pipeline: scan + enrich + validate
npx impact-gate train --path /path/to/project --validate --since HEAD~20Why LLM enrichment is on by default: The manifest gives AI context for impact analysis, scenario suggestion, and bug detection. AI-generated context produces better AI reasoning downstream. Use --no-enrich for offline/free operation or to avoid sending code snippets to third-party LLM APIs.
Training loop: Run train → review route-families.json → run train --validate to check coverage % → fix gaps → repeat.
Additional flags:
--verbose/-v— DEBUG-level output with timing for each phase--json— structured JSON log output (for CI pipelines)--server-path— explicit path to backend server root--budget-usd— max LLM spend (default: $0.50, max: $10)
Output:
<testsRoot>/.e2e-ai-agents/route-families.json— the manifest<testsRoot>/.e2e-ai-agents/train-report.json— timing data, family counts, coverage stats, LLM metrics
Configuration
Create impact-gate.config.json in your project (auto-discovered):
{
"path": ".",
"profile": "strict",
"testsRoot": ".",
"mode": "impact",
"framework": "auto",
"git": { "since": "origin/master" },
"impact": {
"dependencyGraph": { "enabled": true, "maxDepth": 3 },
"traceability": { "enabled": true },
"aiFlow": { "enabled": true, "provider": "anthropic" }
},
"pipeline": {
"enabled": false,
"scenarios": 3,
"outputDir": "specs/functional/ai-assisted",
"mcp": false
},
"policy": {
"enforcementMode": "block",
"blockOnActions": ["must-add-tests"]
}
}Analysis Profiles
Profiles are not the same thing as frameworks. They control analysis strictness and project-specific conventions.
| Profile | Description |
|---------|-------------|
| default | Standard analysis behavior for most repositories |
| strict | Stricter handling of heuristic-only mappings and more opinionated analysis defaults |
Framework detection is separate. The CLI can auto-detect Playwright, Cypress, pytest, supertest, and Selenium usage from the project structure and dependencies.
Key options
testsRoot— path to tests when they live outside the app rootprofile—defaultorstrictimpact.dependencyGraph— static reverse dependency graph for transitive impactimpact.traceability— file-to-test mapping from CI execution dataimpact.aiFlow— LLM-powered flow mapping through the configured providerpipeline.mcp— use Playwright MCP server for browser-aware generation/healingpolicy.enforcementMode—advisory,warn, orblock
CI Integration
GitHub Actions
- name: PR Impact Review
run: |
npx impact-gate review \
--path . \
--since origin/${{ github.base_ref }} \
--ci-comment-path comment.md
- name: Coverage Gate
run: |
npx impact-gate gate --path . --threshold 80The review command with --ci-comment-path writes a markdown summary for PR comments. The gate command exits non-zero when coverage is below the threshold.
For the full plan artifacts (plan.json, ci-summary.md, metrics-summary.json), use the plan command:
- name: Build coverage plan
run: |
npx impact-gate plan \
--config ./impact-gate.config.json \
--since origin/${{ github.base_ref }} \
--fail-on-must-add-tests \
--github-output "$GITHUB_OUTPUT"See examples/github-actions/pr-impact.yml for a complete workflow template.
Pipeline Modes
Package Native (default)
Strategy-based test templates with quality guardrails and iterative heal attempts. The strongest path today is still a repo whose impact analysis and manifest quality are already in good shape.
MCP Mode (--pipeline-mcp)
Uses the official Playwright Test Agent loop (planner/generator/healer) with Claude CLI orchestration. Validates generated specs against discovered local API surface to block hallucinated methods.
--pipeline-mcp-only— fail if MCP setup fails (no silent fallback)--pipeline-mcp-allow-fallback— fall back to package-native if MCP unavailable--pipeline-mcp-timeout-ms— per-command timeout--pipeline-mcp-retries— retry count for transient failures
Agentic Generation (generate command)
LLM-powered generate-run-fix loop: generates a spec, runs it, analyzes failures, and iterates up to --max-attempts times.
LLM Providers
Used internally for AI enrichment, test generation, and healing.
# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# OpenAI
export OPENAI_API_KEY=sk-...
# Ollama (free, local)
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=deepseek-r1:7bProgrammatic provider usage:
import { AnthropicProvider } from '@yasserkhanorg/impact-gate';
const claude = new AnthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY
});
const response = await claude.generateText('Analyze test failure');Factory pattern with auto-detection, hybrid mode (free local + premium fallback), and custom OpenAI-compatible endpoints are also supported. See the provider API exports for full details.
Advanced / Experimental: MCP Server
Exposes 6 tools for test agents (Playwright v1.56+):
import { E2EAgentsMCPServer } from '@yasserkhanorg/impact-gate/mcp';
const server = new E2EAgentsMCPServer();
// Tools: discover_tests, read_file, write_file, run_tests, get_git_changes, get_repository_contextSecurity: write_file is restricted to test spec files (*.spec.ts, *.test.ts) and the .e2e-ai-agents/ directory. Path traversal and symlink escape are blocked. Rate limited to 100 requests/minute.
Traceability
Build file-to-test mappings from CI execution data:
- Capture — extract test-file relationships from Playwright JSON reports
- Ingest — merge into a rolling manifest (
.e2e-ai-agents/traceability.json) - Query — impact analysis uses the manifest to map changed files to relevant tests
Tuning flags: --traceability-min-hits, --traceability-max-files-per-test, --traceability-max-age-days.
Schemas: schemas/traceability-input.schema.json
Artifacts
| File | Written by | Purpose |
|------|-----------|---------|
| route-families.json | train | Route family manifest |
| train-report.json | train | Training timings, coverage, LLM metrics |
| plan.json | plan | Coverage plan with gaps, decisions, metrics |
| ci-summary.md | plan | Markdown for PR comments |
| metrics.jsonl | plan | Append-only run metrics |
| metrics-summary.json | plan | Aggregated metrics |
| traceability.json | traceability-ingest | File-to-test manifest |
| traceability-state.json | traceability-ingest | Rolling counts |
| feedback.json | feedback | Recommendation outcomes |
| calibration.json | feedback | Precision/recall calibration |
| flaky-tests.json | feedback | Flaky test scores |
| agentic-summary.json | generate | Agentic generation results |
| review-generate-summary.json | review --generate | Review-driven generation results with scenario mapping |
All written under <testsRoot>/.e2e-ai-agents/.
Advanced / Experimental: Autonomous QA Agent (impact-gate-qa)
An autonomous QA engineer that can take a diff or a feature prompt, open a real browser, navigate changed features, hunt edge cases, generate follow-up specs, heal failures, and produce a findings report. Built on top of agent-browser and the Anthropic tool-use API.
If you want the full product story and the natural-language front door, start
with the Autonomous Browser QA guide
and the QA Skill Guide
for Codex and Claude examples using /qa.
Quick Start
# PR mode — test features changed since origin/main
npx impact-gate-qa pr --since origin/main --base-url http://localhost:3000
# Hunt mode — deep-test a specific area
npx impact-gate-qa hunt "settings panel" --base-url http://localhost:3000
# Release mode — systematic exploration of all critical flows
npx impact-gate-qa release --base-url http://localhost:3000 --time 30
# Fix mode — verify healed specs
npx impact-gate-qa fix --base-url http://localhost:3000Architecture
- Phase 1 (Script) — Runs
impact-gate impact/planto determine scope, then executes matched Playwright specs. - Phase 2 (Explore) — LLM-driven browser loop: observe (accessibility snapshot) → think → act (click/fill/navigate) → record findings. Includes stuck detection, multi-user testing, console error capture, and vision-based analysis.
- Phase 3 (Report) — Generates a structured report with findings, per-flow sign-off, and a release-readiness verdict (go/no-go/conditional).
Options
| Flag | Default | Description |
|------|---------|-------------|
| --base-url | Required | Application URL |
| --time | 15 | Time limit in minutes |
| --budget | 2.00 | Max LLM spend in USD |
| --phase | all | Run only 1, 2, or 3 |
| --headed | off | Keep browser visible |
| --since | — | Git ref for diff-based scoping |
| --tests-root | — | Path to Playwright tests directory |
Requires agent-browser CLI (npm install -g agent-browser) and ANTHROPIC_API_KEY.
Production Usage
The recommended production workflow:
- Developer runs
review --generatelocally before pushing. Gets a report of what changed and ready-to-run tests for uncovered flows. - CI runs
review --ci-comment-path comment.mdto post a PR comment with coverage status and risk. - CI gate uses
gate --thresholdto block merges with insufficient coverage. - Traceability data from test runs feeds back into the manifest, improving accuracy over time.
License
Apache 2.0
