bangonit

v0.5.17

Published

4 months ago

AI-powered E2E testing tool

0High
0Medium
0Low

petehunt

testing e2e ai browser automation

Bang On It!

Bang On It! bangs on your apps so you don't have to.

Bang On It! replaces annoying manual QA and flakey end-to-end tests with a CLI-friendly AI agent that launches a real browser, reads your test plan, and executes it autonomously — clicking, typing, navigating, and verifying everything works.

Quick Start


# Run a single test
npx bangonit run --plan \
  "Go to localhost:3000, login as [email protected] (password: 12345), \
  and click all the buttons in the dashboard and make sure they work"

# Run a suite of test plans
npx bangonit run testplans/*.md --concurrency 3

# Record a video of your test run in the recordings/ directory
npx bangonit run testplans/*.md --record

# Initialize your project with full CI integration and S3 video recording storage
npx bangonit init

Where it fits

Local development — Replace the "click around and see if it works" step. Write a test plan once, run it every time you change something. Faster feedback than manual testing, and you don't forget to check the edge cases.
CI, instead of unit tests — Run your test plans on every PR. Your actual app, in a real browser, doing real things. If the tests pass, it works. If they don't, it doesn't. That's the only thing that matters.
Staging gate before prod — Point Bang On It! at your staging environment and run the full suite before promoting to production. Catch regressions where they matter: in an environment that looks like the real thing.

Why This Matters

As Simon Willison aptly put it, the job of an engineer is to deliver code you have proven to work. AI coding agents have solved the writing part. It's the proving-it-works part that's unsolved.

Today, two things gate that process: testing and code review. Both are slow, expensive, and breaking right now.

The amount of AI-generated code is exploding. Coding agents are shipping real PRs today, and the volume is only going up. Code review cannot scale to match this. You can't 10x the volume of PRs and expect the same number of humans to review them thoughtfully. Inevitably, code review as we know it is going away, and the fastest teams today have already figured that out.

That leaves testing as the last line of defense. If you're not reading every line, you need confidence that the code works — that it does what it claims, in a real browser, with real user interactions.

And let's be honest about unit tests: they don't prove much. Unit tests prove that your function returns the right value when you pass it the right mock. They don't prove your app works. Integration tests and end-to-end tests are the only real proof that your shit actually works — that a user can click a button, fill out a form, and get the result they expect.

The problem is that traditional E2E testing is its own bottleneck. Writing Selenium or Playwright tests is slow, maintaining them is painful, and they break whenever the UI changes. Nobody writes enough of them, and the ones that exist are often flaky and ignored.

Bang On It! removes that bottleneck. You describe what to test in plain English. An AI agent launches a real browser and executes the test — clicking buttons, filling forms, navigating pages, and verifying results. No selectors, no page objects, no flaky waits. Tests that take minutes to write instead of hours, and that don't break when you rename a CSS class because the agent interprets the website just like a real user would.

The loop is simple: agent writes code, agent tests code, human reviews the test plans. And because test plans are plain English, anyone on the team — PMs, designers, QA — can write and review them, not just engineers. Bang On It! is the testing layer for that loop.

Bang On It vs...

vs Claude Code with computer use / browser automation MCP

Claude Code can drive a browser via Anthropic's computer use or third-party browser MCP servers. The approach is similar in spirit — an AI agent interacting with a real browser using natural language instructions. But Claude Code is a general-purpose coding agent, not a testing tool. It can do browser testing, the same way Playwright can, but it wasn't built for it.

Purpose-built browser tooling. Claude Code's computer use takes full-screen screenshots and uses pixel coordinates for every interaction — it's controlling a generic desktop, not a browser specifically. Bang On It! uses the browser's accessibility tree for element identification (fast, text-based, no image processing needed) and only falls back to screenshots when visual verification is required (charts, colors, layout). This hybrid approach is dramatically faster — a text snapshot is returned in milliseconds vs. capturing, transmitting, and processing a full screenshot on every single action.
Batched actions. Claude Code executes one browser action per tool call — click, wait for response, screenshot, next action. Bang On It! batches multiple actions into a single tool call (navigate + click + type + observe) and only captures page state at the end. Fewer round-trips to the model means faster test execution.
Parallel test execution. Claude Code runs one browser session at a time. Bang On It! is a real test runner that can run N agents in parallel (--concurrency N), each with its own isolated browser partition — separate cookies, localStorage, and session state. A 10-test suite runs in the time of 2 tests, not 10.
Session recordings. Bang On It! records every test run as a self-contained HTML replay with a multi-agent timeline view — video clips, console logs, and tool invocations all synced together. Share a link, not a screenshot. Claude Code has no recording capability.
Real-time UI. Bang On It! includes a live observation UI where you can watch agents execute in real time — see the cursor move, watch pages load, monitor progress across parallel agents. Claude Code outputs text to a terminal.
CI-native. Bang On It! generates GitHub Actions workflows out of the box, handles headless execution, uploads results as artifacts, and optionally pushes recordings to S3. Wiring Claude Code into CI for browser testing is a DIY project.
Realistic input simulation. Claude Code's computer use moves the mouse in straight lines and types text as a single string. Bang On It! drives the browser through CDP Input.dispatchMouseEvent and Electron sendInputEvent — mouse movements follow eased Bézier-style curves, keystrokes fire individual keyDown/char/keyUp events with randomized 30–100ms delays. This catches hover states, drag interactions, debounced inputs, and event listeners that only trigger on real input events.

Claude Code is an excellent coding agent. But for the specific job of testing a web app in a browser — fast, in parallel, with recordings and a UI — Bang On It! is purpose-built for exactly that.

vs OSS browser automation (Playwright, Selenium, Cypress)

These are powerful browser automation libraries — but they're libraries, not testing tools. You write code: selectors, page objects, explicit waits, retry logic, assertion helpers. When the UI changes, your selectors break and you're back in the maintenance treadmill.

Development speed:

No test code. Test plans are plain English Markdown files. A checkout flow test is 5 lines of English, not 50–100 lines of code. PMs, designers, and QA testers can write and review test plans without learning a framework or writing a line of code.
No selectors. Bang On It! uses the browser's accessibility tree to identify elements, the same way a screen reader does. No CSS selectors or XPath expressions to break when you rename a class or restructure your markup.
No page objects. The AI agent interprets the page semantically on every interaction. There's no abstraction layer to keep in sync with your UI.

Debuggability:

Session recordings. Every test run can be recorded as a self-contained HTML replay with video clips, console logs, and tool invocations all synced on a timeline. Share a link, not a log file.
Real-time UI. Watch agents execute live — see the cursor move, watch pages load, monitor progress across parallel agents. Playwright gives you a trace viewer after the fact; Bang On It! lets you watch in real time.
Real browser. Bang On It! launches actual Chromium via Electron. Multi-tab, cross-origin, OAuth popups, file downloads — all work naturally. Cypress runs in an iframe and can't test these natively.

De-flaking:

No explicit waits. Bang On It! tracks real network activity via Chrome DevTools Protocol. It knows when the page is idle — no waitForSelector, no sleep(2000), no polling.
Self-correcting execution. If a button moves, changes label, or is behind a modal, the agent adapts. A Playwright script just fails.
Realistic input simulation. These tools dispatch events synthetically — element.click() and element.value = "text" bypass the browser's input pipeline entirely. Bang On It! sends real input events through CDP — eased mouse curves, per-character keystroke events with natural timing, real mouseWheel events for scrolling. This catches hover menus that don't open, inputs that don't validate on blur, drag-and-drop that doesn't work, and custom components that listen for native events.

These tools are great for building custom browser automation. But if the goal is testing, you're writing and maintaining a lot of infrastructure that Bang On It! eliminates entirely.

vs Commercial AI record-and-replay

These tools use AI to help you create and maintain traditional selector-based tests — typically through record-and-replay with smart locators that auto-heal when elements move. They reduce maintenance, but the underlying model is still the same: a recorded script of UI interactions replayed deterministically.

Generative, not recorded. Bang On It! doesn't record and replay a fixed script. The AI agent reads your test plan and decides how to execute it on each run. If the UI changes, the agent figures out the new path — it doesn't try to heal a stale recording.
No vendor lock-in. Test plans are Markdown files in your repo. No proprietary test format, no cloud dashboard required, no per-seat pricing. You own your tests.
Runs locally and in CI. These tools are typically cloud-hosted services. Bang On It! runs on your machine or in your own CI pipeline. Your app never leaves your network.
Understands intent, not just actions. A recorded test replays "click the third button in the sidebar." Bang On It! executes "verify the user can navigate to settings" — if the settings link moves from the sidebar to a top nav, the test still passes.

vs Manual QA

Manual QA catches things automated tests miss — but it doesn't scale, it's slow, and humans get tired. The same tester clicking through the same flow for the 50th time will miss things.

Bang On It! runs the same tests with the same thoroughness every time, in parallel, in seconds. Write the test plan once, run it on every PR. And because test plans are plain English, your QA team can write them directly — translating their domain knowledge into automated tests without waiting on engineering. Keep manual QA for exploratory testing where human judgment matters; let Bang On It! handle the repetitive verification.

Performance and realistic simulation

Two technical advantages that cut across all comparisons:

Batched actions with parallel agents. Bang On It! executes multiple actions per tool call in a single batch — navigate, click, type, and observe in one round-trip instead of one-action-at-a-time. Combined with --concurrency N to run multiple agents in parallel (each with its own isolated browser session), a full test suite finishes in a fraction of the wall-clock time. A 10-test suite at --concurrency 5 runs in roughly the time of 2 tests, not 10.

Real input events, not DOM hacks. Most testing tools dispatch events synthetically — calling element.click() or element.value = "text" directly in the DOM. This skips the browser's actual input pipeline, which means hover states don't trigger, drag-and-drop doesn't work, debounced inputs behave differently, and event listeners attached to mousedown/mousemove/mouseup never fire.

Bang On It! drives the browser through CDP Input.dispatchMouseEvent and Electron sendInputEvent:

Mouse movement follows eased Bézier-style curves with distance-proportional speed, triggering every mousemove, mouseenter, and mouseover handler along the path.
Clicks fire the full mousedown → mouseup → click sequence at real coordinates, with proper clickCount for double-clicks.
Typing sends individual keyDown → char → keyUp events per character with randomized 30–100ms delays between keystrokes, triggering input, change, and keypress handlers exactly as a human typist would.
Scrolling dispatches real mouseWheel events instead of calling scrollTo.

This catches an entire class of bugs that synthetic-event tools miss: broken drag-and-drop, hover menus that don't open, inputs that don't validate on blur, custom components that listen for native events.

Install

npm install -g bangonit

The package provides three aliases: bangonit, bang-on-it, and boi.

Getting started

# Launch the interactive UI
boi run

# Run a specific test plan
boi run testplans/checkout-flow.md

# Run an inline test
boi run --plan "Go to my-app.com, sign up with a test account, and verify the dashboard loads"

Set up a project

# Creates config, test plan directories, and optionally GitHub Actions CI
boi init

# Run all test plans
boi run

# Run just the smoke tests
boi run testplans/smoke/

# Filter test plans by name
boi run -t checkout

Config File

Run boi init to create a .bangonit/config.toml in your project root:

testplans = "testplans"
# recordings_dir = "recordings"  # default
# anthropic_api_key = "${ANTHROPIC_API_KEY}"

# Optional: upload recordings to S3 (or any S3-compatible provider)
[s3]
bucket = "my-recordings"
region = "us-east-1"
prefix = "bangonit"
# endpoint = "nyc3.digitaloceanspaces.com"  # for DigitalOcean Spaces, MinIO, etc.
# access_key = "${AWS_ACCESS_KEY_ID}"
# secret_key = "${AWS_SECRET_ACCESS_KEY}"

All fields are optional. The config is loaded from .bangonit/config.toml in the current directory (or any parent up to the repo root) by default, or from a custom path with --config <path>.

Any string value supports ${ENV_VAR} interpolation, so you can reference environment variables without committing secrets.

testplans — directory of .md test plan files. When set, boi run auto-discovers plans. Without it, boi run launches the interactive UI.
recordings_dir — where session recordings are written (default: recordings).
[s3] — optional S3 upload for recordings. Works with any S3-compatible provider — set endpoint for DigitalOcean Spaces, Backblaze B2, MinIO, etc.

How It Works

You write test plans in plain English (or Markdown). Bang On It! spins up a real Chromium browser and an AI agent that:

Reads your test plan
Navigates to the target site
Interacts with the page — clicks buttons, fills forms, navigates around
Observes the results via DOM snapshots and screenshots
Reports pass/fail with a summary of what happened

Test Plans

Test plans are Markdown files. boi init creates a recommended two-tier structure:

testplans/
  smoke/          # is the app alive? — run on every push/PR
    homepage.md
    login.md
  acceptance/     # does the app do what it should? — core user journeys
    checkout.md
    onboarding.md
  regression/     # did we break something? — bug fixes, edge cases
    issue-123.md

Smoke tests should be quick and focused — verify critical paths still work.
Acceptance tests cover core user journeys and happy paths.
Regression tests lock down bug fixes and edge cases so they don't recur.

Smoke tests run on every commit. The daily full run discovers everything recursively (smoke + acceptance + regression). Run boi run testplans/smoke/ to run just the smoke tests.

---
name: Add Todos
retries: 1
---

## Steps

1. Navigate to http://localhost:3000
2. Verify the page loads with a heading and input field
3. Type "Buy groceries" and press Enter
4. Verify the todo appears in the list
5. Verify the footer shows "1 item left"

The --- frontmatter is optional. name sets the display name, retries enables auto-retry on failure. You can also set retries for all tests via --retries N on the CLI (frontmatter takes precedence).

Filtering Tests

Run a subset of test plans by name (requires testplans set in .bangonit/config.toml):

# Only run test plans with "checkout" in the filename
boi run -t checkout

# Equivalent
boi run --filter checkout

Project-Level System Prompt

You can customize the AI agent's behavior per-project by creating a .bangonit/system_prompt.sh script. This script is executed before each test run, and its stdout becomes the project-level system prompt. boi init creates one for you.

#!/bin/bash
# .bangonit/system_prompt.sh
echo "The app is running on http://localhost:${DEV_SERVER_PORT}"

Environment variables are available for interpolation, making it easy to pass dynamic values like server ports or base URLs.

Claude Code Skills

boi init installs two Claude Code skills into your project:

/test — Run tests locally. Pass file paths, directories, or a filter (e.g. /test testplans/smoke/, /test -t login).
/create-test — Generate a new test plan from a description (e.g. /create-test user can reset their password). Reads your codebase to write accurate steps and places the file in the right directory.

Auto-generated Tests from Git Changes

Use --since to automatically generate test plans from your recent commits:

# Test changes since the last 3 commits
boi run --since HEAD~3

# Test changes since a tag
boi run --since v1.0.0

# Test changes from the last hour
boi run --since "1 hour ago"

# In CI: test changes in a PR
MERGE_BASE=$(git merge-base origin/main HEAD)
boi run --since "$MERGE_BASE" --headless --exit

Bang On It analyzes the git diff, uses an LLM to identify user-facing changes, and generates targeted test plans. Changes that don't affect user behavior (refactoring, comments, CI config) produce no tests.

Session Recordings

Record test runs with --record. Each recording produces a self-contained HTML replay viewer. Configure where they're saved with recordings_dir in your config, and optionally upload to S3 via the [s3] config section.

CLI Reference

Usage: boi <command> [options]

Commands:
  run [files...] [options]   Run test plans (or launch interactive UI)
  init                       Set up config, test plans, and optionally CI

Run options:
  -t, --filter <text>        Filter test plans by name substring
  --config <path>            Path to config file (default: .bangonit/config.toml)
  --plan <text>              Inline test plan (instead of file)
  --since <ref>              Auto-generate test plans from git changes since a commit/tag/time

  --additional-system-prompt <text>  Additional system prompt text appended to test plan
  --record                   Record session replay
  --retries <n>              Retry failed tests N times
  --headless                 Run without showing the browser window
  --exit                     Exit immediately after tests complete
  --keep-open                Keep the browser window open after tests pass
  --json                     Stream NDJSON events to stdout
  --console                  Forward browser console logs to stdout
  --output <file>            Write JSON results to file
  --concurrency <n>          Number of parallel agents (default: 1)
  --timeout <seconds>        Test timeout in seconds (0 = none)
  --help                     Show this help message

In CI environments, --headless and --exit default to true automatically.

CI Usage

boi init optionally generates two GitHub Actions workflows:

Smoke tests (bangonit-smoke.yml) — runs testplans/smoke/ on every push and PR
Full tests (bangonit-full.yml) — runs all test plans daily at 6pm local time

In CI, --headless and --exit are enabled automatically.

Multiple Tests

Run multiple test plans in parallel:

boi run testplans/login.md testplans/checkout.md testplans/search.md --concurrency 3

Each test gets its own browser session with isolated cookies and storage.

Requirements

Node.js >= 18
ANTHROPIC_API_KEY — set via environment variable, .env file, or anthropic_api_key in .bangonit/config.toml

License

MIT