tester-h
v0.2.5
Published
tester-h — run an H QA agent against your web app from the command line.
Downloads
604
Maintainers
Readme
tester-h
████████╗███████╗███████╗████████╗███████╗██████╗ ██╗ ██╗
╚══██╔══╝██╔════╝██╔════╝╚══██╔══╝██╔════╝██╔══██╗ ██║ ██║
██║ █████╗ ███████╗ ██║ █████╗ ██████╔╝█████╗███████║
██║ ██╔══╝ ╚════██║ ██║ ██╔══╝ ██╔══██╗╚════╝██╔══██║
██║ ███████╗███████║ ██║ ███████╗██║ ██║ ██║ ██║
╚═╝ ╚══════╝╚══════╝ ╚═╝ ╚══════╝╚═╝ ╚═╝ ╚═╝ ╚═╝
QA for the web, driven by a vision agent.tester-h is a QA agent in your terminal. Hand it a URL and a sentence; it drives a real browser, verifies the rendered product with a vision model, runs deterministic SEO / accessibility / link / hygiene audits on every page it touches, and answers with a structured pass / fail report listing every issue and its severity.
Built on H Company's Holo3 vision-language model. Runs locally out of the box.
npx tester-h --url https://your-staging.example.com \
"Sign up flow works; check SEO and a11y while you're there" trajectory 6f5e0d…
· navigate: https://your-staging.example.com
· audit_summary: 2 issues — [high] seo.meta_description.missing, [medium] a11y.img_missing_alt
· click: Sign up
· type: [email protected]
· click: Submit
· read_text("h1") → "Welcome, foo"
────────── findings ──────────
[CRITICAL] FAIL html_hygiene/http.status.404 — Page returned HTTP 404.
[HIGH] FAIL seo/seo.meta_description.missing — No <meta name="description">.
[HIGH] FAIL a11y/a11y.form_control_no_name — 2 form controls without an accessible name.
[MEDIUM] WARN seo/seo.canonical.missing — No <link rel="canonical"> declared.
✗ FAIL (12 actions)Exit 0 on pass, 1 on fail.
Why
QA agents that "look at a screenshot and decide" miss too much. tester-h splits the work into two lanes that play to their strengths:
- Vision decides what only humans can: is the CTA the brand red, is the layout broken, does the hero image render, is the focus ring visible.
- Deterministic audits decide everything else: does the page have a meta description, are images missing alt text, are any links 404, is the heading hierarchy sane, is there mixed content.
The text on the page is never OCR'd from pixels — the agent reads it straight from the DOM. The verdict you get back is a structured JSON object listing every check that ran and every issue with a severity, so it pipes cleanly into CI, dashboards, or another agent.
Install
Requires Node.js 20+.
npm install -g tester-hFirst install pulls a Chromium binary via Playwright (~150 MB). If it fails behind a proxy or in CI, run it manually:
npx playwright install chromiumWhere everything lives
Everything tester-h reads or writes sits under a single global folder:
~/.tester-h/
├── config.json # auth (api_key, agent_id) — mode 0600
├── tester-h.yaml # defaults (url, viewport, timeout, …)
└── trajectories/ # recorded runs
├── signup.json
├── signup-mobile.json
└── checkout.jsonNo per-project files, no surprises about which directory wins. tester-h init
creates the folder once. Override with TESTER_H_CONFIG_DIR=/some/path if you
genuinely need a different location.
Authenticate
Get a key from hcompany.ai, then:
tester-h login
# › paste your key (hidden)The key is written to ~/.tester-h/config.json with mode 0600. You can
also pass --api-key, set HAI_API_KEY, or put defaults in
~/.tester-h/tester-h.yaml.
Run a test
tester-h --url https://staging.example.com \
"On /pricing, the primary CTA is brand red and clicking it lands on /signup"The agent navigates, observes, acts, and answers. Every page it lands on gets audited automatically. The output ends with a verdict line and a sorted findings table.
Useful flags:
--url <url> Where to start (default http://localhost:3000)
--device <name> mobile | tablet | desktop
--viewport <WxH> Override viewport size
--headed Show the browser
--max-steps <n> Hard cap on agent turns (default 60)
--timeout <s> Wall-clock timeout (default 600)
--base-url <url> Self-host Holo3? Point here (OpenAI-compatible /v1)
--model <id> Override the model id
--json Emit NDJSON events on stdout
--debug Stream every event for troubleshootingRecord once, replay forever
Add --record <name> to save a hybrid trajectory. Every click captures
the element that was clicked — a stable Playwright selector
([data-testid], role=…[name="…"], text=…) — alongside the pixel
coordinates and the agent's semantic intent. Replay across viewports and
browsers without re-recording.
Bare names land in .tester-h/trajectories/; pass a path with / to write
elsewhere.
# Record (writes .tester-h/trajectories/signup.json)
tester-h --url http://localhost:3000 --record signup \
"Sign up flow works end-to-end"
# Replay by name (looks in .tester-h/trajectories/)
tester-h replay signup
# Strict replay — primitives only, no fallback
tester-h replay signup --url https://staging.example.com --strict
# Different viewport — clicks still land because they're element-keyed
tester-h replay signup --viewport 390x844
# Different browser
tester-h replay signup --browser firefoxWhen a semantic-fallback recovery happens (e.g. mobile UI hides nav behind a
burger menu), the recovered primitive sequence is written back into the
trajectory under a per-(browser × viewport) key — so the next replay of
that exact combo skips the fallback. Other variants stay untouched. Use
--no-update to disable the write-back.
Replays exit 0 on full success including recoveries, 1 otherwise. Pair
with --json for machine-parseable per-step status.
Browser × viewport matrix
One trajectory, every combination, one command:
tester-h replay signup.json --matrixDefault matrix is 3 browsers × 3 viewports = 9 runs:
1920×1080 1280×720 mobile
chromium ✓ pass ✓ pass ✓ pass
firefox ✓ pass ✓ pass ✗ fail
webkit ✓ pass ✓ pass ✓ pass
✗ 1/9 combinations failedCustomise the axes:
tester-h replay signup.json --matrix \
--browsers chromium,webkit \
--viewports 1920x1080,390x844First time you use firefox or webkit, install the binaries:
npx playwright install firefox webkitWhat the audit covers
Run automatically on every page navigation. Available standalone via the
agent's dom_audit tool.
| Category | Checks |
| -------------- | ------------------------------------------------------------------------------------------------- |
| seo | title, meta description, canonical, viewport, charset, html lang, h1 count, Open Graph, JSON-LD, noindex |
| a11y | missing alt text, unlabeled form controls, button names, empty anchors, duplicate ids, heading-skip |
| links | HEAD-fetches every <a href> and reports 4xx / 5xx / timeouts |
| html_hygiene | doctype, mixed content, inline event handlers, dangling anchors |
| console | errors and unhandled rejections captured during the page session |
Login walls are respected. When a page returns 401 or 403 the audit is
skipped and no findings are raised — gated pages are expected, not defects.
404 / 5xx do raise a critical finding.
Verdict format
Every run ends with a structured JSON block at the bottom of the agent's
reply. With --json, the end event carries it on stdout.
{
"verdict": "fail",
"summary": "Sign-up flow works but the form has a11y issues and the page is missing core SEO meta.",
"checks_run": [
"behavior.signup_redirect",
"visual.cta_color",
"seo.title",
"seo.meta_description",
"a11y.form_control_no_name",
"links.checked"
],
"findings": [
{
"id": "a11y.form_control_no_name",
"category": "a11y",
"severity": "high",
"status": "fail",
"message": "2 form controls without an accessible name.",
"evidence": "<input name=\"email\" type=\"email\">, <input name=\"password\" type=\"password\">",
"remediation": "Associate each control with a <label for=\"id\">, wrap it in <label>, or set aria-label."
}
]
}Severity: critical · high · medium · low · info.
Pipe through jq:
tester-h --json --url https://example.com "Verify the homepage" \
| tail -1 | jq '.findings[] | select(.severity=="high")'Use from Claude Code (MCP)
tester-h mcp runs an MCP server on stdio. Wire it into Claude Code:
claude mcp add tester-h -- tester-h mcpThen in Claude Code:
Use tester-h to verify the signup flow on http://localhost:3000 — both behavior and a11y — and save the trajectory so I can replay it.
Tools exposed:
run_qa(url, instruction, …)— full agent run; returns the verdict + findingsreplay(trajectory_path, …)— deterministic replay with semantic fallbackvisual_check(url, question, …)— one-shot screenshot + vision question
Use from any agent framework (A2A)
tester-h serve starts an A2A-compatible
HTTP server.
tester-h serve --a2a-port 18794
# AgentCard at http://127.0.0.1:18794/.well-known/agent.jsonSend a task via JSON-RPC message/send with the URL on the first line and
the scenario after.
Self-host the model
The hosted H Models API is the default. To run Holo3 on your own hardware, point at any OpenAI-compatible endpoint:
tester-h --url http://localhost:3000 \
--base-url http://localhost:8000/v1 \
--model Hcompany/Holo3-35B-A3B \
"Verify the cart total matches the line items"35B fits on a recent MacBook Pro at Q4; 122B (--model Hcompany/Holo3-122B-A10B)
gives the best quality but needs a beefier rig. vLLM and llama.cpp both work
— see holo-desktop for production-grade
launch scripts.
Cloud agent (AGP)
If you have an agent deployed on the H platform, route through AGP instead of running the loop locally:
tester-h --cloud --agent-id <your-agent-id> \
--url https://staging.example.com \
"Sign up flow works"--cloud (or --agp) takes over; the local Holo3 loop is skipped. Setting
--agent-id alone implies --cloud.
Project config
tester-h init drops a tester-h.yaml. CLI flags override file values;
${HAI_API_KEY} expands from env.
api_key: ${HAI_API_KEY}
url: http://localhost:3000
timeout: 600
viewport: 1920x1080Troubleshooting
Failed to launch Chromium→npx playwright install chromiumAuthentication failed→tester-h whoami, thentester-h loginNo verdict produced→ re-run with--debug, or raise--max-stepsCtrl-C doesn't stop right away→ give it a beat; the CLI cancels cleanly
Exit codes
| Code | Meaning |
| ---- | ---------------------------------------------- |
| 0 | Agent reported pass |
| 1 | fail, unknown verdict, timeout, or run error |
License
Proprietary — see LICENSE. © H Company.
