npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

aidlc-testagent

v0.6.0

Published

Open-source AI test agent for web, desktop, API & mobile. Point it at a target, approve the plan, get a passing test suite. No telemetry.

Readme

aidlc-testagent

npm license

Point it at a target, approve the plan, get a passing test suite back.

An open-source AI test agent for web, Electron desktop, REST API, and mobile targets. Grounded on your project's existing requirements, manual test cases, and business rules, it proposes a test pipeline, lets you confirm it, then generates, executes, and self-heals real, committable Playwright specs.

  • 🧠 One shared agent loop, surface-agnostic: explore → plan → [confirm] → generate → execute (+stability) → heal.
  • 🕹️ Manual explore mode: navigate the app yourself while the agent watches, auto-snapping each DOM-idle state — then reuse the recorded session for plan/run without re-opening the browser.
  • 🔌 Pluggable adapters behind one contract. Phase 1 ships playwright-web and playwright-electron (shared driver + perception).
  • 📚 Context-grounded: human intent (requirements → manual tests → business rules) outranks source code; intent/implementation disagreements are surfaced as findings, not silently coded around.
  • 🙋 Human-in-the-loop: a confirmation gate shows the proposed pipeline and writes plan.md before any code is generated.
  • 🧪 Reliability built in: auth/session reuse, a flaky-test stability gate, and a cost budget guard are first-class.
  • 🔐 No telemetry. No model API keys. See Security & Privacy.

Phase 1 (this build): Web + Electron desktop + the reliability core. REST and iOS adapters, the traceability report, --diff mode, and Claude Code skill / MCP packaging arrive in later phases (see Roadmap).


No telemetry

aidlc-testagent collects and transmits no usage data of any kind — no analytics SDK, no phone-home, by default or otherwise. It also stores no model API keys: all reasoning is delegated to a locally logged-in LLM CLI (e.g. claude), whose auth and billing are entirely its own responsibility. This is a stated product promise, not just a default.

How it works

                ┌─────────────────────────────────────────────┐
                │              CORE (shared brain)              │
                │  orchestrator → planner →[CONFIRM]→ generator │
                │       → executor (+stability gate) → healer   │
                └───────────────────┬───────────────────────────┘
                                    │  ADAPTER CONTRACT
        ┌───────────────┬───────────┼───────────┬───────────────┐
        ▼               ▼           ▼           ▼               ▼
  playwright-web  playwright-    rest-api    appium-ios     (future:
                  electron       (Phase 2)   (Phase 4)       android, …)

The core never imports Playwright or any surface library — it talks only to an adapter (which owns perception and driving) and to an LlmProvider (which shells out to a local CLI). Swapping the reasoning engine is a one-line config change.

Quickstart

# 1. Install
npm install
npx playwright install chromium

# 2. Make sure a local LLM CLI is logged in (default: Claude Code)
claude --version        # https://docs.claude.com/claude-code

# 3. Configure interactively (or hand-write testagent.config.yaml)
npx ata config

# 4. Propose a plan (no code generated)
npx ata plan todomvc

# 5. Run the full loop: plan → confirm → generate → execute → heal
npx ata run todomvc

# 6. Gate CI: run every target, PASS/FAIL table, non-zero exit on failure
npx ata validate

(ata is the short alias for the aidlc-testagent binary.)

Scoped runs

ata run saucedemo --feature docs/requirements/checkout.md   # one requirement file
ata run saucedemo --scope checkout                          # a declared feature
ata run saucedemo --diff main                               # flows near a git diff (Phase 3)
ata run todomvc --yes                                       # skip the confirmation prompt
ata run todomvc --dry-run                                   # generate only, skip execution
ata run todomvc --plan generated/todomvc/plan.json          # generate from an approved/edited plan
ata run todomvc --reuse                                     # skip plan+generate, run existing specs

Manual explore — navigate once, plan many times

Drive the browser yourself — the agent turns it into a test.

Navigate your app freely while ata watches silently. Every DOM idle and every click is auto-snapped with a meaningful name (click "Generate" — studio). Pin checkpoints, record use-case flows, preview each step's screenshot. Hit Done, name the session, and the agent writes the test doc.

# Step 1 — Open the browser, navigate freely, click "✅ Done" when finished.
ata explore myapp --manual --headed

# Step 2 — Plan and run using the recorded session. No browser reopened.
ata plan myapp --reuse-perception
ata run  myapp --reuse-perception --yes

Record once, run many times — including in CI.

--manual also works inline on plan and run:

ata plan myapp --manual --headed          # explore → plan in one shot
ata run  myapp --manual --headed --yes    # explore → plan → generate → execute

Auto step naming

Every step is named automatically from the DOM — no typing required as you navigate:

  • Click actions — prefixed with the element label: click "Save" — Edit Profile
  • DOM context — falls back in order: open dialog title → page heading → URL path segment → page title

In-browser toolbar

| Button | What it does | | --- | --- | | 📌 Checkpoint | Lists all captured steps. Select one, name it, optionally mark as common precondition (e.g. "after-login"). Saved to checkpoints/<name>.json. Each step shows a 👁 eye button to preview its screenshot. | | 🎬 Use case | Pick a start step and name the flow, navigate through it, then click 🏁 End. The agent calls the LLM and produces a markdown manual-test doc under use-cases/<name>.md. | | 🌐 N | Live counter of XHR/fetch calls captured so far. Click to open a popup table: method, URL path, status (colour-coded), round-trip time. Shows the last 50 entries, newest first. | | ✅ Done | Opens the review panel. |

Done → review panel

  1. Test case name (optional) — name the entire session as a use case. The agent generates a markdown test doc covering all steps.
  2. Step list — every captured step with an editable name input and its URL path.
  3. 👁 eye toggle — dim by default; click to preview that step's screenshot full-screen (icon turns blue). Click again, press ESC, or click the dark background to close.
  4. Output paths info block — shows exactly where perception.json, HAR file(s), and use-case docs will land. The HAR filename updates live as you type the test case name.
  5. Keep exploring to continue, ✅ Save & Done to save everything and write all artefacts.

Saved artefacts

| Path | Description | | --- | --- | | generated/<target>/perception.json | Full journey (accessibility tree + step names). Reuse with --reuse-perception. | | generated/<target>/<target>-<name>.har | HAR 1.2 file for each use case (XHR/fetch only, sliced by step timestamps). Falls back to <target>-explore.har when no use cases are defined. | | generated/<target>/use-cases/<name>.md | LLM-generated markdown test doc for each recorded use case. | | generated/<target>/checkpoints/<name>.json | Named checkpoints (common preconditions). | | .auth/<target>.json | Browser storageState captured after auth. Generated specs load it automatically. |

Target-level config:

explore:
  strategy: manual          # auto (default) | manual
  idle_timeout_ms: 2000     # ms of DOM quiet before auto-snap (default 2000)

auth:
  strategy: reuse-state     # load session saved by the last explore run
  store_state: .auth/myapp.json

Configuration

A root testagent.config.yaml plus one file per target. Credentials are never in YAML — credentials_env names the env vars to read at run time.

# testagent.config.yaml
version: 1
env: staging-only          # refuse non-staging/non-public hosts (allowlist enforced)
llm:
  provider: claude-cli     # shell out to a locally logged-in CLI
  model: claude-sonnet-4-6
  bare: true               # reproducible CI runs
defaults:
  max_heal_attempts: 2
  approval: prompt         # prompt | auto | manual-edit
  max_budget_usd: 2.00     # cost guard per target run; 0 = unlimited
  stability:
    runs: 3                # run each new test N times; accept only if all pass
    quarantine: true       # flaky tests → quarantine, not the green suite
targets:
  - include: targets/public/*.target.yaml
  - include: targets/private/*.target.yaml   # gitignored; optional
# targets/public/saucedemo.target.yaml
name: saucedemo
adapter: playwright-web
url: https://www.saucedemo.com
auth:
  strategy: form
  credentials_env: SAUCE_USER, SAUCE_PASS    # read from env, never YAML
  store_state: .auth/saucedemo.json          # reusable session (gitignored)
context:                                     # grounding, in trust order
  requirements: [docs/requirements/checkout.md]
  manual_tests: [test-cases/checkout.md]
scope:
  feature: checkout
success:
  min_scenarios: 3
  must_pass: true
# output_dir: ../other-repo/tests/ata-generated   # write specs to an external repo
# surface_guide: "Custom Playwright guidance…"    # override LLM surface prompt

Don't want to learn the schema? Run ata config — it auto-detects your stack (Electron dep → playwright-electron, OpenAPI file → rest-api, else playwright-web; prefills context globs from docs/requirements/, test-cases/, docs/domain/), prompts for the rest, validates with the same Zod schema the loader uses, and never overwrites files silently.

CLI

| Command | What it does | | --- | --- | | ata explore <target> | Observe the target and save perception.json for reuse (supports --manual) | | ata validate | Run all targets, print a PASS/FAIL table, exit non-zero on failure (CI gate) | | ata plan <target> | Propose a pipeline + write plan.md (never generates) | | ata run <target> | plan → confirm → generate → execute → heal | | ata list | List configured targets and adapters | | ata config [add\|target <name>\|show] | Interactive wizard / inspect effective config | | ata ask <prompt> | Ask the AI a question about config or setup (answers in plain language) | | ata guide | Print a step-by-step getting-started reference | | ata report <target> | Requirement → test traceability matrix (Phase 2) |

Flags: --feature <file>, --scope <name>, --diff <ref>, --yes, --plan <file>, --dry-run, --reuse, --manual, --reuse-perception, --headed, --refresh-auth, -c/--config <path>.

Approval: an interactive TTY prompts to confirm/edit/abort; --yes (or approval: auto) skips it; CI / non-TTY auto-approve so pipelines never hang on stdin. plan never generates.

Definition of done (per target)

A target PASSES when it produced ≥ min_scenarios scenarios, the generated suite executes green (must_pass) and survives the stability gate (no accepted test is flaky), any healing stayed within max_heal_attempts, and the run stayed within max_budget_usd. ata validate aggregates all targets and exits non-zero if any required target fails.

Generated output

Each run writes to generated/<target>/:

  • perception.json — recorded journey (multi-step accessibility snapshots). Reuse with --reuse-perception.
  • plan.md — the human-readable, editable plan (review it; re-run with --plan).
  • plan.json — the machine plan the generator consumes.
  • tests/*.spec.ts + pages/*real, committable @playwright/test specs.
  • checkpoints/<name>.json — named checkpoints captured during manual explore (common preconditions).
  • use-cases/<name>.md — LLM-generated manual-test docs for each recorded use case.
  • <target>-<name>.har — HAR 1.2 network log, one per use case (or <target>-explore.har when no use cases). Import into Chrome DevTools or Insomnia to inspect API calls.

Whether to commit generated/ is your call (it's gitignored by default). Recommendation: commit for public targets, ignore for private.

Security & Privacy

  • No telemetry, ever (see above).
  • No model API keys in the repo — reasoning is delegated to a local CLI.
  • env: staging-only is enforced in code: production-looking hosts are refused unless explicitly allowlisted (allow_hosts) or env: any.
  • Public repo = public targets only. Internal apps, staging URLs, binaries, and flows live in targets/private/ (gitignored). Credentials come from env vars / CI secrets, never YAML; saved sessions live under .auth/ (gitignored).
  • Action guardrails (Phase 1 stub): destructive verbs (delete/pay/send) are flagged/blocked by a conservative default policy + config hook; a full policy engine comes later.

Architecture notes

  • src/adapters/adapter.ts — the contract (interface + shared types). The most important file; everything depends on it.
  • src/core/ — orchestrator, planner, pipeline (confirm gate), generator, executor (stability gate), healer, budget, and the provider-agnostic LLM client. core/ never imports Playwright and depends only on the LlmProvider interface.
  • src/config/ — Zod schema + loader + grounding-context reader + the wizard.
  • src/auth/ — credential resolution (env), session storage, reuse.

Roadmap

| Phase | Scope | | --- | --- | | 1 (this build) | Web + Electron + reliability core (auth, stability gate, cost guard, grounding, scope, wizard, CLI, CI) + manual explore mode + ata explore command | | 2 | REST API adapter (OpenAPI → request+assertion tests) + requirement→test traceability report | | 3 | --diff scoped mode + Claude Code skill / MCP packaging | | 4 | iOS (appium-ios, simulator) | | 5+ | Android, native desktop, vision-fallback perception, self-eval harness, action-guardrail policy engine |

Development

npm run typecheck   # tsc --noEmit (strict)
npm run lint        # eslint
npm run build       # compile to dist/

License

MIT — see LICENSE.