qaclaw
v2.0.13
Published
Autonomous QA agent - runs browser-based tests via MCP
Downloads
109
Maintainers
Readme
[!WARNING] Early stage. Expect rough edges, breaking changes, and incomplete coverage of edge cases.
Quickstart
npm install -g qaclawOr run directly without installing:
That's it. No config files, no test framework, no selectors. Just describe what to test.
How it works
qaclaw runs your instructions through a staged reasoning process:
prompt → Plan → Execute → Recover → Result
↑ |
└── Ask ───┘| Stage | What happens | |-------|-------------| | Plan | A preflight model interprets your instructions into discrete steps | | Execute | The primary model drives the browser, sees the page, and decides what to do | | Recover | If stuck, it escalates to a fallback model to unblock itself | | Ask | Only if both models fail does it ask for help, via stdin (CLI) or MCP |
Answers are persisted so the same question is never asked twice.
MCP configuration
When installed as an MCP server, the calling AI tool (Claude Code, Cursor, etc.) acts as the supervisor. It reads your codebase, understands the product, and answers qaclaw's questions with full context, no human in the loop.
Add to your AI tool's MCP config:
{
"mcpServers": {
"qaclaw": {
"command": "npx",
"args": ["qaclaw@latest"],
"env": {
"GOOGLE_API_KEY": "your-key",
"QA_TARGET_URL": "http://localhost:3100"
}
}
}
}Works with Claude Code, Cursor, Windsurf, Continue, Open Code, and any MCP capable tool.
{
"mcpServers": {
"qaclaw": {
"command": "node",
"args": ["mcp.js"],
"cwd": "/path/to/qaclaw"
}
}
}CLI usage
qaclaw "Go to /settings, change timezone to PST, verify it shows in the header"In CLI mode, clarifications are handled interactively via stdin instead of the MCP bridge.
API keys
MCP users: set your key in the MCP config env block shown above.
CLI users: export in your shell profile:
# Pick one provider
export GOOGLE_API_KEY=your-key # google/* models (default)
export ANTHROPIC_API_KEY=your-key # anthropic/* models
export OPENAI_API_KEY=your-key # openai/* modelsConfiguration
All configuration is via environment variables.
| Variable | Default | Description |
|----------|---------|-------------|
| QA_TARGET_URL | http://localhost:3100 | URL of the app to test |
| QA_MODEL | google/gemini-2.5-pro | Primary model (provider/model) |
| QA_FALLBACK_MODEL | google/gemini-2.5-flash | Fallback model for recovery |
| QA_PLANNER_MODEL | same as fallback | Preflight planner model |
| QA_HEADLESS | true | Set false to watch the browser |
| QA_VIEWPORT_WIDTH | 1920 | Browser viewport width |
| QA_VIEWPORT_HEIGHT | 1080 | Browser viewport height |
| QA_CHROME_PROFILE | (see tips) | Path to a Chrome user data dir |
| QA_CACHE_TTL_DAYS | 7 | Delete stagehand cache entries older than N days |
| QA_CACHE_MAX_MB | 500 | Delete oldest cache entries when total exceeds N MB |
Model format is provider/model-name. The agent picks the right API key based on the provider prefix.
Commands and skills
Create shortcuts for AI tools that support project level commands:
Run a QA test. Call the `test` MCP tool with $ARGUMENTS as the prompt.
If the response has a `question`, answer it with `respond` or ask the user.
Repeat until status is completed or failed. Report the results.When asked to run QA tests, use the `test` MCP tool with the user's instructions.
Handle clarifications by calling `respond`. Report pass/fail results.Other tools: the MCP tool descriptions are self documenting. Most tools figure out the protocol from the descriptions alone.
Architecture
Communication flow
Happy path: test runs without questions:
sequenceDiagram
participant Caller
participant MCP
participant Agent as Runner
Caller->>MCP: test(prompt)
MCP->>Agent: spawn
activate Agent
Note over Agent: plan → execute → audit
Agent-->>MCP: exit(0)
deactivate Agent
MCP-->>Caller: { status: completed }With clarification: agent gets stuck and needs input:
sequenceDiagram
participant Caller
participant MCP
participant Agent as Runner
Caller->>MCP: test(prompt)
MCP->>Agent: spawn
activate Agent
Note over Agent: gets stuck
Agent-)MCP: writes question
MCP-->>Caller: { question: "..." }
Caller->>MCP: respond(answer)
MCP-)Agent: writes answer
Note over Agent: continues
Agent-->>MCP: exit(0)
deactivate Agent
MCP-->>Caller: { status: completed }Why this split?
The caller doesn't drive the browser. The agent does.
| Benefit | Detail | |---------|--------| | Fire and forget | Send instructions, get results. No browser state management. | | Model agnostic | Claude, GPT, Gemini, local models, anything that speaks MCP. | | Autonomous recovery | Handles stuck situations, model escalation, and retries on its own. |
Agent internals
Model escalation
Primary model ──stuck──→ Fallback model ──stuck──→ Ask caller ──answer──→ Primary modelClarifications
When the agent hits something ambiguous, it asks. Answers are persisted to .qa-agent/clarifications.json and scoped by prompt hash. The same question is never asked twice.
| Scope | Behavior | |-------|----------| | Prompt scoped | Tied to a specific test prompt. Only loaded when that exact prompt runs again. | | Global | Loaded for every test. Useful for shared knowledge like credentials. |
Recipes
After a successful run, the agent saves the action sequence as a recipe in .qa-agent/recipes.json. On repeat runs the recipe is injected as suggested steps, the planner is skipped entirely, and matching clarifications are merged in. If the UI has changed and the recipe fails, the agent falls back to exploration automatically.
Caching
Stagehand's built-in caching (.qa-agent/stagehand-cache/) stores LLM responses for identical page states. Combined with recipes and clarifications, repeat runs are dramatically faster and cheaper.
Audit phase
Include expected outcomes in your prompt and a separate agent pass verifies each one:
✅ PASSED: timezone shows PST in the header
❌ FAILED: notification preference still shows "email", expected "slack"
⚠️ UNKNOWN: cannot verify email was sent (requires inbox access)Tips
Bypassing authentication
Create a dedicated Chrome profile, log in once manually, then point qaclaw at it:
# 1. Create profile and log in
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--user-data-dir="$HOME/Library/Application Support/Google/Chrome-QAClaw" \
--no-first-run
# 2. Log in to your app, then close the browser
# 3. Set the env var
export QA_CHROME_PROFILE="$HOME/Library/Application Support/Google/Chrome-QAClaw"Watching the agent
QA_HEADLESS=false qaclaw "your test instructions"Writing good prompts
- Be specific:
"Go to /users, click 'Add User', fill in name 'Test User', click Save, verify 'Test User' appears in the list" - Include expected outcomes: append
"Expected outcome: the user appears with status 'Active'"to trigger the audit phase - Describe steps in order: the planner handles dependencies
Logs
All output goes to .qa-agent/runner.log. MCP responses are truncated to ~8000 chars. Check the log for the full trace.
Night shift
The night shift agents workflow is an ideal fit. Queue test instructions before you leave for the day, let the agent work through them overnight, and come back to results and a fully populated memory in the morning. Each run gets faster as recipes and clarifications accumulate.
Built with
- Stagehand: AI browser automation framework
