qaa-agent

v1.9.2

Published

18 days ago

QA Automation Agent for Claude Code — multi-agent pipeline that analyzes repos, generates tests, validates, and creates PRs

0High
0Medium
0Low

martinbackhaus

qa automation testing claude-code playwright jest pytest ai-agent

QAA - QA Automation Agent

Multi-agent QA pipeline for Claude Code. Analyzes any codebase, generates a complete test suite following industry standards, validates everything, and delivers the result as a draft pull request.

scan → map → research → analyze → plan → generate → validate → deliver

The Problem

Starting from zero is painful — a new project with no tests means weeks of setup
Coverage gaps are invisible — without analysis, teams don't know what's missing until production breaks
Standards drift — different team members write tests differently: inconsistent locators, vague assertions, mixed naming
QA is always behind dev — features ship faster than tests get written

The Solution

QAA runs a pipeline of 12 specialized AI agents, each responsible for one stage:

| Stage | What happens | Output | |-------|-------------|--------| | Scan | Detects framework, language, testable surfaces | SCAN_MANIFEST.md | | Research | Investigates testing ecosystem via Context7 MCP and official docs | TESTING_STACK.md, FRAMEWORK_CAPABILITIES.md | | Map | Deep-scans codebase with 4 parallel agents (testability, risk, patterns, existing tests) | 8 codebase documents | | Analyze | Produces risk assessment, test inventory, testing pyramid | QA_ANALYSIS.md, TEST_INVENTORY.md | | Plan | Groups test cases by feature, assigns to files, resolves dependencies | GENERATION_PLAN.md | | Generate | Writes test files, POMs, fixtures, configs following project standards | Test suite on disk | | Validate | 4-layer validation (syntax, structure, dependencies, logic) with auto-fix | VALIDATION_REPORT.md | | Deliver | Creates branch, commits per stage, opens draft PR | Pull request URL |

Install

npx qaa-agent

The interactive installer:

Copies agents, commands, skills, templates, and workflows into your runtime directory
Registers two MCP servers in your user-scope config (~/.claude.json) so they're available in all projects:
- Playwright MCP — live browser control for E2E tests and locator extraction
- Context7 MCP — up-to-date library documentation on demand
Merges required permissions into settings.json

Supported runtimes: Claude Code, OpenCode

Install scope: Global (~/.claude/, available in all projects) or Local (./.claude/, this project only)

Requirements

Node.js 18+
Claude Code installed

Bundled MCP servers

Both MCP servers are registered automatically in ~/.claude.json when you run npx qaa-agent. No manual setup required — once installed, they're available in every Claude Code project on your machine.

Playwright MCP — live browser control

Uses @playwright/mcp to:

Open a real browser and navigate your running app
Extract actual locators (data-testid, ARIA roles, labels) from live pages
Run E2E tests, capture failures, and auto-fix locator mismatches
Build a persistent Locator Registry (.qa-output/locators/) that caches real locators across features

Context7 MCP — up-to-date library docs

Uses @upstash/context7-mcp to:

Fetch the latest documentation for Playwright, Cypress, Jest, Vitest, pytest, and any other library the agent is working with
Keep generated tests aligned with current framework APIs instead of outdated training data
Free tier: ~60 requests/hour, ~3,300 tokens/query

Verifying the MCPs are connected

Open Claude Code in any project and type /mcp. You should see both playwright and context7 listed as connected.

Manual config (fallback)

If for any reason the automatic registration fails, you can add the servers manually to ~/.claude.json:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    },
    "context7": {
      "command": "npx",
      "args": ["-y", "@upstash/context7-mcp@latest"]
    }
  }
}

Quick Start

New project, no tests

/qa-start --dev-repo ./myproject --auto

Runs the full pipeline end-to-end: scan, map, analyze, plan, generate, validate, and deliver as a draft PR.

Mature project, new feature

/qa-map                                          # build the "brain" (once)
/qa-create-test "password reset"                 # generate tests using codebase knowledge
/qa-pr --ticket PROJ-123 "password reset tests"  # ship as draft PR

From a Jira ticket

/qa-from-ticket https://company.atlassian.net/browse/PROJ-456
/qa-pr --ticket PROJ-456 "login flow tests"

Fix broken tests after a deploy

/qa-fix ./tests/e2e/checkout*
/qa-pr --ticket PROJ-789 "fix checkout tests"

Commands

| Command | Purpose | |---------|---------| | /qa-start | Full pipeline end-to-end (scan through PR) | | /qa-research | Research testing ecosystem via Context7 MCP | | /qa-map | Deep codebase analysis with 4 parallel agents | | /qa-create-test <feature> | Generate tests for a specific feature | | /qa-fix [path] | Diagnose and fix broken tests | | /qa-audit [path] | 6-dimension quality audit with scoring | | /qa-pr | Create a draft pull request from QA artifacts | | /qa-testid [path] | Inject data-testid attributes into components |

Additional Commands

| Command | Purpose | |---------|---------| | /qa-from-ticket <url> | Generate tests from a Jira/Linear/GitHub Issue | | /qa-analyze | Analyze a repo without generating tests | | /qa-validate [path] | Validate test files against standards | | /qa-gap | Find coverage gaps between dev and QA repos | | /qa-report | Generate a QA status report | | /qa-audit | Full quality audit with weighted scoring | | /qa-blueprint | Generate QA repo structure from scratch | | /qa-research | Research best testing stack for a project | | /qa-pom | Generate Page Object Models | | /update-test | Improve existing tests incrementally |

Run any command in Claude Code to see full usage and available flags.

Three Workflows

QAA adapts to the project's QA maturity:

Option 1: No QA repo yet — Full pipeline from scratch. Produces a complete test suite, repo blueprint, and draft PR.

/qa-start --dev-repo ./myproject

Option 2: Immature QA repo — Scans both repos, fixes broken tests, fills coverage gaps, standardizes existing tests.

/qa-start --dev-repo ./myproject --qa-repo ./tests

Option 3: Mature QA repo — Surgical additions only. Finds thin coverage areas and adds targeted tests without touching working code.

/qa-start --dev-repo ./myproject --qa-repo ./tests

The "Brain" — Codebase Map

Before generating anything, QAA maps the codebase with 4 parallel agents producing 8 documents:

| Focus | Documents | |-------|-----------| | Testability | TESTABILITY.md, TEST_SURFACE.md — what's testable, entry points, mock boundaries | | Risk | RISK_MAP.md, CRITICAL_PATHS.md — business-critical paths, security-sensitive areas | | Patterns | CODE_PATTERNS.md, API_CONTRACTS.md — naming conventions, API shapes, import style | | Existing tests | TEST_ASSESSMENT.md, COVERAGE_GAPS.md — current quality, frameworks, gaps |

Every downstream agent reads these documents. The result: generated tests feel native to the codebase, not generic boilerplate.

Standards Enforced

Every generated artifact follows strict rules:

Testing Pyramid

         /  E2E  \        3-5%   (critical path smoke only)
        /  API    \       20-25% (endpoints + contracts)
       / Integration\     10-15% (component interactions)
      /    Unit      \    60-70% (business logic, pure functions)

Locator Hierarchy

Tier 1 (Best): data-testid, ARIA roles with accessible names
Tier 2 (Good): Form labels, placeholders, visible text
Tier 3 (Acceptable): Alt text, title attributes
Tier 4 (Last Resort): CSS selectors, XPath — always with a // TODO comment

Page Object Model

One class per page, no god objects
No assertions in POMs — assertions belong in test specs
Locators as readonly properties
Every POM extends a shared BasePage

Assertion Quality

// Good — concrete values
expect(response.status).toBe(200);
expect(data.name).toBe('Test User');

// Bad — never do this
expect(response.status).toBeTruthy();
expect(data).toBeDefined();

Test Case IDs

Every test case has a unique ID following the pattern:

UT-MODULE-001 — unit tests
INT-MODULE-001 — integration tests
API-RESOURCE-001 — API tests
E2E-FLOW-001 — E2E tests

Validation

Generated tests pass through a 4-layer validation with auto-fix (up to 3 loops):

Syntax — does it parse? Are imports correct?
Structure — POM rules, file organization, naming conventions
Dependencies — all imports resolve, mocks set up correctly
Logic — assertions are concrete, locators follow tier hierarchy

If issues remain, the Bug Detective classifies each failure:

| Classification | Action | |----------------|--------| | APPLICATION BUG | Flagged for developer — not auto-fixed | | TEST CODE ERROR | Auto-fixed at HIGH confidence | | ENVIRONMENT ISSUE | Documented with setup instructions | | INCONCLUSIVE | Flagged with evidence for manual review |

Framework Support

QAA auto-detects the project's existing stack and matches it:

Languages: JavaScript/TypeScript, Python, Java, .NET/C#, Go, Ruby, PHP, Rust

Test Frameworks: Playwright, Cypress, Jest, Vitest, pytest, Selenium, and more

Build Tools: Vite, Next.js, Nuxt, Angular, Vue, Webpack, SvelteKit

Git Platforms: GitHub, Azure DevOps, GitLab

Learning System

QAA remembers your preferences across sessions. When you correct it — "use Playwright, not Cypress" or "our branches start with feature/" — it saves the rule permanently to MY_PREFERENCES.md. Every agent reads your preferences before generating output.

Your team's conventions always win over defaults.

Architecture

qaa-agent/
  agents/          # 12 specialized QA agents
  commands/        # 7 slash commands (user-facing entry points)
  skills/          # 6 reusable skills
  templates/       # 10 artifact templates (output format contracts)
  workflows/       # 7 workflow orchestration specs
  bin/             # Installer and CLI tools
  docs/            # User documentation
  CLAUDE.md        # QA standards (read by every agent)
  .mcp.json        # Playwright + Context7 MCP server config
  settings.json    # Claude Code permissions

Agents

| Agent | Responsibility | |-------|---------------| | qa-scanner | Framework detection, file tree scanning | | qa-codebase-mapper | 4-parallel-agent deep analysis | | qa-analyzer | Risk assessment, test inventory, pyramid | | qa-planner | Test case grouping, file assignment | | qa-executor | Test file, POM, fixture generation | | qa-validator | 4-layer validation with auto-fix | | qa-e2e-runner | Browser-based test execution via Playwright MCP | | qa-bug-detective | Failure classification with evidence | | qa-testid-injector | data-testid attribute injection | | qa-project-researcher | Testing stack research | | qa-discovery | Project discovery | | qa-pipeline-orchestrator | Pipeline coordination |

Git Workflow

QAA follows strict git conventions:

Branch: qa/auto-{project}-{date} (e.g., qa/auto-shopflow-2026-03-18)
Commits: One per agent stage — qa(scanner): produce SCAN_MANIFEST.md for shopflow
PR: Draft PR with analysis summary, test counts, coverage metrics, validation status

Documentation

All documentation is included in the installed package under docs/, templates/, and CLAUDE.md.

License

MIT