@yasserkhanorg/e2e-agents

v1.5.0

Published

a day ago

AI-powered E2E test impact analysis, generation, and healing. Analyzes code changes to identify affected Playwright tests, detects coverage gaps, and generates or repairs specs using pluggable LLM providers (Claude, OpenAI, Ollama). Includes MCP server, t

@yasserkhanorg/e2e-agents

AI-powered E2E test impact analysis, generation, healing, and autonomous QA for frontend repositories.

What It Does

Given a git diff, e2e-ai-agents determines which E2E test flows are impacted, identifies coverage gaps, and can generate or heal Playwright tests — all from the CLI. The companion e2e-qa-agent goes further: it opens a real browser, explores your app autonomously, and produces a QA report with findings and a release-readiness verdict.

Pipeline: impact → plan → generate → heal → finalize

Installation

npm install @yasserkhanorg/e2e-agents

Requires Node.js >= 20. Ships both CommonJS and ESM builds.

CLI Commands

# All-in-one: impact + plan + optional generate/heal
npx e2e-ai-agents analyze --path /path/to/project [--generate] [--heal]

# Analyze which flows are impacted by code changes
npx e2e-ai-agents impact --path /path/to/project

# Generate a coverage plan with gap analysis
npx e2e-ai-agents plan --path /path/to/project

# Generate tests for uncovered gaps (requires plan output)
npx e2e-ai-agents generate --path /path/to/project

# Heal flaky/failing specs from a Playwright report
npx e2e-ai-agents heal --path /path/to/project --traceability-report ./playwright-report.json

# Stage generated tests, commit, and open a PR
npx e2e-ai-agents finalize-generated-tests --path /path/to/project --create-pr

# Ingest test execution data for traceability
npx e2e-ai-agents traceability-capture --path /path/to/project --traceability-report ./playwright-report.json
npx e2e-ai-agents traceability-ingest --path /path/to/project --traceability-input ./traceability-input.json

# Ingest recommendation feedback for calibration
npx e2e-ai-agents feedback --path /path/to/project --feedback-input ./feedback.json

# Test LLM provider connectivity
npx e2e-ai-agents llm-health

plan and suggest are aliases. analyze is a convenience wrapper that runs impact + plan and optionally generation/healing in one invocation. Use --help for all available flags.

Route-Families Training

Route-families map your source files to features, test directories, and user flows. They are the context that powers accurate impact analysis. The train command bootstraps and maintains this manifest.

# Scan your codebase + LLM enrichment (default)
npx e2e-ai-agents train --path /path/to/project

# Offline mode (no LLM, no API key needed)
npx e2e-ai-agents train --path /path/to/project --no-enrich

# Validate accuracy against recent git history
npx e2e-ai-agents train --path /path/to/project --validate --since HEAD~50

# Full pipeline: scan + enrich + validate
npx e2e-ai-agents train --path /path/to/project --validate --since HEAD~20

Why LLM enrichment is on by default: The manifest exists to give AI context for impact analysis, scenario suggestion, and bug detection. AI-generated context produces better AI reasoning downstream. Use --no-enrich for offline/free operation or to avoid sending code snippets to third-party LLM APIs.

Training loop: Run train → review the generated route-families.json → run train --validate to check coverage % → fix gaps → repeat until 95%+.

The train command:

Scans your project structure (frontend src/, backend server/, test dirs)
Matches source directories to test directories by name
Enriches with LLM (priority, user flows, routes, components)
Merges intelligently with any existing manifest (preserves human curation)
Validates against git history to measure accuracy

Output is written to <testsRoot>/.e2e-ai-agents/route-families.json.

Configuration

Create e2e-ai-agents.config.json in your project (auto-discovered):

{
  "path": ".",
  "profile": "mattermost",
  "testsRoot": ".",
  "mode": "impact",
  "framework": "auto",
  "git": { "since": "origin/master" },
  "impact": {
    "dependencyGraph": { "enabled": true, "maxDepth": 3 },
    "traceability": { "enabled": true },
    "aiFlow": { "enabled": true, "provider": "anthropic" }
  },
  "pipeline": {
    "enabled": false,
    "scenarios": 3,
    "outputDir": "specs/functional/ai-assisted",
    "mcp": false
  },
  "policy": {
    "enforcementMode": "block",
    "blockOnActions": ["must-add-tests"]
  }
}

Key options:

testsRoot — path to tests when they live outside the app root
profile — default or mattermost (strict mode with escalation for heuristic-only mappings)
impact.dependencyGraph — static reverse dependency graph for transitive impact
impact.traceability — file-to-test mapping from CI execution data
impact.aiFlow — LLM-powered flow mapping (requires ANTHROPIC_API_KEY)
pipeline.mcp — use Playwright MCP server for browser-aware generation/healing
policy.enforcementMode — advisory, warn, or block

CI Integration

GitHub Actions

- name: Run E2E coverage check
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: |
    npx e2e-ai-agents plan \
      --config ./e2e-ai-agents.config.json \
      --since origin/${{ github.base_ref }} \
      --fail-on-must-add-tests \
      --github-output "$GITHUB_OUTPUT"

The plan command writes:

.e2e-ai-agents/plan.json — structured plan with runSet, confidence, decision
.e2e-ai-agents/ci-summary.md — markdown summary for PR comments
.e2e-ai-agents/metrics-summary.json — run metrics

Use --fail-on-must-add-tests to exit non-zero when uncovered P0/P1 gaps exist. Use --github-output to expose outputs to subsequent workflow steps.

See examples/github-actions/pr-impact.yml for a complete workflow template.

Pipeline Modes

Package Native (default)

Strategy-based Playwright test templates with quality guardrails (no test.describe, single tag) and iterative heal attempts.

MCP Mode (`--pipeline-mcp`)

Uses the official Playwright Test Agent loop (planner/generator/healer) with Claude CLI orchestration. Validates generated specs against discovered local API surface to block hallucinated methods.

--pipeline-mcp-only — fail if MCP setup fails (no silent fallback)
--pipeline-mcp-allow-fallback — fall back to package-native if MCP unavailable
--pipeline-mcp-timeout-ms — per-command timeout
--pipeline-mcp-retries — retry count for transient failures

Agentic Generation (`generate` command)

LLM-powered generate-run-fix loop: generates a spec, runs it, analyzes failures, and iterates up to --max-attempts times.

LLM Providers

Used internally for AI enrichment, test generation, and healing.

# Anthropic (default)
export ANTHROPIC_API_KEY=sk-ant-...

# OpenAI
export OPENAI_API_KEY=sk-...

# Ollama (free, local)
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=deepseek-r1:7b

Programmatic provider usage:

import { AnthropicProvider } from '@yasserkhanorg/e2e-agents';

const claude = new AnthropicProvider({
    apiKey: process.env.ANTHROPIC_API_KEY
});
const response = await claude.generateText('Analyze test failure');

Factory pattern with auto-detection, hybrid mode (free local + premium fallback), and custom OpenAI-compatible endpoints are also supported. See the provider API exports for full details.

MCP Server

Exposes 6 tools for test agents (Playwright v1.56+):

import { E2EAgentsMCPServer } from '@yasserkhanorg/e2e-agents/mcp';

const server = new E2EAgentsMCPServer();
// Tools: discover_tests, read_file, write_file, run_tests, get_git_changes, get_repository_context

Security: write_file is restricted to test spec files (*.spec.ts, *.test.ts) and the .e2e-ai-agents/ directory. Path traversal and symlink escape are blocked. Rate limited to 100 requests/minute.

Traceability

Build file-to-test mappings from CI execution data:

Capture — extract test-file relationships from Playwright JSON reports
Ingest — merge into a rolling manifest (.e2e-ai-agents/traceability.json)
Query — impact analysis uses the manifest to map changed files to relevant tests

Tuning flags: --traceability-min-hits, --traceability-max-files-per-test, --traceability-max-age-days.

Schemas: schemas/traceability-input.schema.json

Artifacts

| File | Written by | Purpose | |------|-----------|---------| | plan.json | plan | Coverage plan with gaps, decisions, metrics | | ci-summary.md | plan | Markdown for PR comments | | metrics.jsonl | plan | Append-only run metrics | | metrics-summary.json | plan | Aggregated metrics | | traceability.json | traceability-ingest | File-to-test manifest | | traceability-state.json | traceability-ingest | Rolling counts | | feedback.json | feedback | Recommendation outcomes | | calibration.json | feedback | Precision/recall calibration | | flaky-tests.json | feedback | Flaky test scores | | agentic-summary.json | generate | Agentic generation results |

All written under <testsRoot>/.e2e-ai-agents/.

Autonomous QA Agent (`e2e-qa-agent`)

An autonomous QA engineer that opens a real browser, navigates to changed features, tries edge cases, and produces a findings report — all unsupervised. Built on top of agent-browser and the Anthropic tool-use API.

Quick Start

# PR mode — test features changed since origin/main
npx e2e-qa-agent pr --since origin/main --base-url http://localhost:8065

# Hunt mode — deep-test a specific area
npx e2e-qa-agent hunt "channel settings" --base-url http://localhost:8065

# Release mode — systematic exploration of all critical flows
npx e2e-qa-agent release --base-url http://localhost:8065 --time 30

# Fix mode — verify healed specs
npx e2e-qa-agent fix --base-url http://localhost:8065

Architecture

Phase 1 (Script) — Runs e2e-ai-agents impact/plan to determine scope, then executes matched Playwright specs.
Phase 2 (Explore) — LLM-driven browser loop: observe (accessibility snapshot) → think → act (click/fill/navigate) → record findings. Includes stuck detection, multi-user testing, console error capture, and vision-based analysis.
Phase 3 (Report) — Generates a structured report with findings, per-flow sign-off, and a release-readiness verdict (go/no-go/conditional).

Options

| Flag | Default | Description | |------|---------|-------------| | --base-url | http://localhost:8065 | Application URL | | --time | 15 | Time limit in minutes | | --budget | 2.00 | Max LLM spend in USD | | --phase | all | Run only 1, 2, or 3 | | --headed | off | Keep browser visible | | --since | — | Git ref for diff-based scoping | | --tests-root | — | Path to Playwright tests directory |

Requires agent-browser CLI (npm install -g agent-browser) and ANTHROPIC_API_KEY.

Production Usage

Used by Mattermost for CI-integrated E2E coverage gating, test generation, and spec healing. See the Mattermost Playwright integration for a real-world example.

License

Apache 2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@yasserkhanorg/e2e-agents

What It Does

Installation

CLI Commands

Route-Families Training

Configuration

CI Integration

GitHub Actions

Pipeline Modes

Package Native (default)

MCP Mode (--pipeline-mcp)

Agentic Generation (generate command)

LLM Providers

MCP Server

Traceability

Artifacts

Autonomous QA Agent (e2e-qa-agent)

Quick Start

Architecture

Options

Production Usage

License

MCP Mode (`--pipeline-mcp`)

Agentic Generation (`generate` command)

Autonomous QA Agent (`e2e-qa-agent`)