@qagent/cli
v0.4.0
Published
AI-driven end-to-end test runner: prose goals, an LLM drives the browser, and an LLM judge verifies the result.
Downloads
624
Readme
QAgent
AI-driven end-to-end browser test runner. You write a goal in natural language; a driver LLM picks browser actions, Playwright drives the browser, and a separate LLM judge verifies the result.
Runs interactively for humans (live progress, ✓/✗ summary) or streams JSON events for AI agents like Claude Code (--reporter=ndjson).
Status: pre-1.0, experimental. One inline goal per invocation; multi-goal specs and orchestration are not yet built. Cost scales with snapshot size, driver turns, and verifier calls —
--max-turns(default 50) is currently the main spending knob.
Quick Start
Requirements:
- Node.js 20 or newer.
- An OpenRouter API key.
- A Playwright Chromium browser install, or a reachable system Chrome.
npm install -g @qagent/cli
npx playwright install chromium
qagent config set apiKey sk-or-...
qagent config set model qwen/qwen3.5-flash-02-23
qagent "Open https://example.com and verify that the page heading exists"Output:
▶ Open https://example.com and verify that the page heading exists
1 navigate https://example.com 2.6s
2 done "The page heading 'Example Domain' exists." 2.4s
✓ PASS — The final snapshot confirms the presence of the heading 'Example Domain'.
2 turns · 5.0s · $0.0001Browser Install
QAgent does not download browsers during npm install. Install Chromium once on each machine or CI image:
npx playwright install chromiumOn Linux CI images that are missing browser system libraries, run:
npx playwright install-deps chromium
npx playwright install chromiumIf a run fails with a Playwright message like "Executable doesn't exist" or asks you to run playwright install, install Chromium with the commands above and retry. If your machine already has Google Chrome installed, QAgent tries that first and falls back to Playwright's bundled Chromium.
OpenRouter Setup
OpenRouter is currently the only supported LLM provider. QAgent uses one model for browser-driving actions and, by default, the same model as the LLM judge verifier.
Create an API key at openrouter.ai/keys.
Store it once:
qagent config set apiKey sk-or-...Pick a model known to the installed
pi-aiOpenRouter registry:qagent config set model qwen/qwen3.5-flash-02-23Optionally use a different verifier model:
qagent config set verifierModel qwen/qwen3.5-flash-02-23
For CI, prefer env vars over config files:
QAGENT_API_KEY=sk-or-... QAGENT_MODEL=qwen/qwen3.5-flash-02-23 qagent "<goal>"OPENROUTER_API_KEY is accepted as a fallback for QAGENT_API_KEY. If QAgent says unknown model, choose a model ID supported by the installed package version.
Use Cases
| I want to... | Run |
|---|---|
| Run one goal locally | qagent "<goal>" |
| Stream events to an AI agent | qagent "<goal>" --reporter=ndjson |
| Save a JSON trace file | qagent "<goal>" --reporter=trace |
| Watch the browser | qagent "<goal>" --headed |
Reporters
| Name | Output |
|---|---|
| list (default) | Live human-readable progress with ✓/✗, color, per-turn timing |
| ndjson | One JSON event per turn streamed to stdout, ending with a done envelope |
| json | Single JSON object dumped at the end |
| trace | Writes results/<YYYY-MM-DDTHH-MM>H<HASH>.json (path overridable with --output-dir); confirmation goes to stderr so machine-readable reporters keep stdout clean |
Compose with a comma: --reporter=list,trace. Default is list.
Configuration
QAgent reads from ~/.config/qagent/config.json (user, XDG-style) and ./qagent.config.json (project; only the file in your current working directory, no walk-up).
Resolution order (highest first): CLI flag → env var → project config → user config → built-in default.
qagent config set apiKey sk-or-...
qagent config set --project model anthropic/claude-sonnet-4.5
qagent config list # show effective values + their sources
qagent config --help # all keys, types, defaults, valid valuesRecognized keys: model, verifierModel, apiKey, maxTurns, testTimeout, networkTimeout, actionTimeout, reporter, outputDir, headed.
For AI Agents
QAgent is built so a parent agent (Claude Code, CI scripts) can run goals and consume results structurally.
Stable exit codes:
| Code | Meaning |
|---|---|
| 0 | Goal passed |
| 1 | Goal failed (verifier said fail) |
| 2 | Config or setup error (missing key, bad flag, unknown reporter) |
| 3 | Runtime error (browser crash, network) |
ndjson event schema — qagent "<goal>" --reporter=ndjson emits one JSON object per line on stdout. Two event types: turn (one per LLM-driven action during the run) and done (a single final envelope, always last).
// turn event — fields:
{
"event": "turn", // string, always "turn"
"turn": 1, // number, sequential, starts at 1
"atMs": 1594, // number, ms since run start (cumulative)
"action": { // object, the action emitted by the driver LLM
"action": "navigate", // string, one of: navigate | click | fill | wait | done | fail
"url": "https://...", // string (navigate)
"ref": "e6", // string (click | fill — snapshot ref)
"value": "...", // string (fill)
"ms": 1500, // number (wait — requested duration)
"summary": "...", // string (done — driver's natural-language verdict)
"reason": "..." // string (fail — driver's natural-language reason)
},
"target": "Sign in", // string, optional — human label resolved from ref (click | fill)
"url": "https://.../page", // string, page URL after the action
"ms": 180, // number, browser-action duration; absent for ref-miss errors
"error": "ref e87 not in snapshot" // string, present only when the action errored
}
// done event — always the last line on stdout, regardless of outcome:
{
"event": "done", // string, always "done"
"goal": "...", // string, the input goal verbatim
"outcome": "pass", // string, one of: pass | fail | error (matches exit code 0 | 1 | 3)
"evidence": "...", // string, the verifier's one-sentence rationale (always present)
"turns": 2, // number, total LLM turns executed
"elapsedMs": 4933, // number, total wall time
"driverCost": 0.0001, // number, USD — driver (executor) LLM only
"verifierCost": 0.00003, // number, USD — verifier LLM only (0 if verifier didn't run)
"totalCost": 0.00013, // number, USD — driverCost + verifierCost
"driverTokens": 1424, // number, driver total tokens (input + output, incl. cache)
"verifierTokens": 320, // number, verifier total tokens (0 if verifier didn't run)
"totalTokens": 1744, // number, driverTokens + verifierTokens
"finalUrl": "https://...", // string
"warnings": [] // string[], may include verifier-fallback notices; often empty
}A done event is emitted even on outcome: fail and outcome: error — the envelope shape is stable; only outcome and evidence differ.
Pipe-friendly recipes:
qagent "<goal>" --reporter=ndjson | jq -c . # consume the full event stream
qagent "<goal>" --reporter=ndjson | tail -1 | jq -r .outcome # just pass / fail / error
qagent "<goal>" --reporter=ndjson,trace # stream + persist trace fileStderr stays clean — only the trace reporter writes its path confirmation there, so piping stdout into jq always works.
CI tips
Pass the API key via env var (
QAGENT_API_KEYorOPENROUTER_API_KEY). Avoid--api-key sk-or-...on argv (visible inpsand most CI job logs) and avoidqagent config set apiKey ...in CI scripts (writes to~/.config/qagent/config.jsonon the runner — leaks across cached or shared workers).Tune the wall-clock budget.
--test-timeoutcaps the loop in seconds (default 300 = 5 min); the verifier still runs against whatever state the loop left behind, so the run terminates with a real verdict instead of hanging. Wrap withtimeout(1)only as a belt-and-braces backstop:qagent "<goal>" --test-timeout=600 --reporter=ndjson timeout 11m qagent "<goal>" --test-timeout=600 --reporter=ndjson # hard kill if even the verifier hangsBrowsers don't auto-install. Run
npx playwright install chromiumonce per runner image. On minimal Linux images, runnpx playwright install-deps chromiumfirst.
Philosophy
- Two-stage: a driver LLM picks the next action; a judge LLM verifies the end-state. Browser tools (click, fill, navigate) are deterministic Playwright calls.
- No spec files yet — one inline goal per invocation.
- No classes, no folders, no TypeScript. Functions and modules.
- OpenRouter only for now — select supported OpenRouter model IDs via the
modelconfig key.
CLI Reference
qagent [options] "<goal>"
qagent config <subcommand> [args]
qagent --help | --version
Run options:
--model <id> LLM model
--verifier-model <id> Verifier model (defaults to --model)
--api-key <key> OpenRouter key
--max-turns <n> Turn cap (default 50)
--test-timeout <s> Wall-clock loop budget in seconds; verifier still runs after (default 300)
--network-timeout <s> Per page.goto + post-action networkidle wait, in seconds (default 30)
--action-timeout <s> Per click/fill in seconds; doubles as blocked-element detector (default 2)
--reporter <list> Comma-separated: list,json,ndjson,trace (default list)
--output-dir <path> Where trace files land (default results/)
--headed Show the browser window
Config subcommands:
qagent config set [--project] <key> <value>
qagent config list
qagent config --help
Environment:
QAGENT_API_KEY, OPENROUTER_API_KEY, QAGENT_MODEL
QAGENT_TEST_TIMEOUT, QAGENT_NETWORK_TIMEOUT, QAGENT_ACTION_TIMEOUT (seconds)
BASIC_AUTH_USER, BASIC_AUTH_PASS (per-page httpCredentials)
Resolution: flag > env > project > user > built-in default
Exit codes: 0 pass | 1 fail | 2 config error | 3 runtime errorIssues
Bug reports and feature requests welcome on GitHub Issues.
License
MIT.
