runly-cli
v0.3.0
Published
Playwright tests in plain English. Lives in your repo. Runs anywhere.
Downloads
25
Maintainers
Readme
Runly
Plain-English browser tests for AI agents. MCP-native. Lives in your repo.
npm install -g runly-cli
runly test "open google.com and search playwright"No dashboard. No subscription. No signup.
What's new in v0.3
- MCP server — Claude Code, Cursor, and Windsurf can drive browsers directly through Runly
runly import— convert any Playwright.spec.ts/.spec.jsto plain English@usemodules — reusable test flows with variable overridesverify ai: …— LLM-backed semantic assertions{{faker.*}}— synthetic test data without fixtures
Why Runly
Every browser testing tool today falls into two camps:
Camp 1 — Raw Playwright/Cypress You write 20 lines of boilerplate for every test. Selectors break on every deploy. Debugging is an ordeal.
Camp 2 — Enterprise SaaS $500/month minimum. Tests live in their cloud dashboard, not your repo. You can't grep them, diff them, or review them in a PR.
Runly is camp 3 — Tests-as-code in plain English, driven by your AI.
| Feature | Raw Playwright | Enterprise SaaS | Runly | |---|---|---|---| | Tests in your repo | ✓ | ✗ | ✓ | | Plain English | ✗ | ✓ | ✓ | | Self-healing with AI | ✗ | ✓ | ✓ | | Semantic AI assertions | ✗ | ✓ | ✓ | | Reusable modules | ✓ | ✓ | ✓ | | Callable from Claude Code / Cursor (MCP) | ✗ | ✗ | ✓ | | Imports existing Playwright tests | — | ✗ | ✓ | | Free & open source | ✓ | ✗ | ✓ | | Works offline | ✓ | ✗ | ✓ | | Git-diffable | ✓ | ✗ | ✓ | | Pipe to jq | ✗ | ✗ | ✓ |
Use with Claude Code, Cursor, Windsurf (MCP)
Runly ships as a Model Context Protocol server. Add it once to your agent's config and your IDE can run browser tests on demand.
~/.claude.json:
{
"mcpServers": {
"runly": { "command": "runly", "args": ["mcp"] }
}
}Then in Claude Code:
"Use runly_open_url to smoke-test https://staging.myapp.dev."
Claude calls Runly, the browser opens, the result streams back as a tool response.
Six tools exposed over MCP
| Tool | Does |
|---|---|
| runly_test | Run one English instruction ("open x.com and verify login visible") |
| runly_run_file | Run a .runly file or directory, with tag/grep filters |
| runly_list_tests | Discover .runly files under a path without running them |
| runly_inspect | Find DOM elements on a page by English description |
| runly_open_url | Quick URL health check (status + title + duration) |
| runly_last_run | Fetch the most recent run's metadata |
Migrate from Playwright in 60 seconds
Already have a .spec.ts suite? Convert it:
runly import tests/e2e/Runly rewrites each page.goto, page.click, page.fill, expect.toBeVisible (and friends) into plain English .runly lines. describe / test nesting is preserved as tags. Lines it can't translate are flagged # TODO (unconverted) — nothing drops silently.
Example round-trip — this .spec.ts:
test('standard user logs in', async ({ page }) => {
await page.goto('https://www.saucedemo.com/');
await page.getByPlaceholder('Username').fill('standard_user');
await page.getByPlaceholder('Password').fill('secret_sauce');
await page.getByRole('button', { name: 'Login' }).click();
await expect(page).toHaveURL(/inventory/);
});…becomes this .runly:
@name: standard user logs in
open https://www.saucedemo.com/
type standard_user in Username
type secret_sauce in Password
click Login button
verify url contains inventory30-Second Demo
1. Write a test in plain English — save as tests/login.runly:
@name: Login flow works
@tags: critical, auth
open {{site}}/login
type {{user}} in username
type {{pass}} in password
click sign in button
verify dashboard is visible2. Run it:
runly run tests/ --vars '{"site":"https://app.acme.dev","user":"admin","pass":"secret"}'3. See results:
──────────────────────────────────────────────────────────
Suite Results 1 passed, 0 failed, 3829ms total
──────────────────────────────────────────────────────────
1. ✓ Login flow works 3829ms [critical,auth]
ALL PASSED 1/1 tests4. Commit the test file to your repo. Review it in PR diffs. Grep it. Diff it. Like any other code.
Installation
npm install -g runly-cli
npx playwright install chromium
runly doctor # verify setupThat's the full install. No accounts. No config files. No cloud services.
Quick Start
One-off test
runly test "open google.com and search playwright"Test files in your repo
my-app/
├── src/
└── tests/
├── login.runly
├── checkout.runly
└── smoke.runlyrunly run tests/ # all tests
runly run tests/ --tag critical # filter by tag
runly run tests/ --parallel 4 # run 4 at a time
runly run tests/login.runly # single file
runly run tests/ --grep "login|auth" # filter by contentInteractive REPL
runly # drops into a shell with history, tab-completion,
# and a persistent browser pool for instant rerunsIn GitHub Actions
# .github/workflows/test.yml
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ritikpal1122/Runly@v1
with:
tests: tests/Pipe to jq / scripts
# NDJSON output for scripting
runly run tests/ --json | jq '.results[] | select(.success == false)'The .runly Format
Plain text. One instruction per line. Metadata lines start with @. Comments start with #.
# Tests the entire signup flow
@name: Signup flow works
@tags: auth, critical, signup
@timeout: 30
@retry: 2
open {{site_url}}/signup
type {{email}} in email field
type {{password}} in password field
click create account button
verify welcome message is visible
verify url contains /dashboardMultiple tests in one file
Separate with ---:
@name: Homepage loads
open example.com
verify Example Domain
---
@name: Title is correct
open example.com
verify title contains ExampleReusable modules (@use)
Keep shared flows DRY. login.runly:
@name: Login module
open {{site}}/login
type {{user}} in username field
type {{pass}} in password field
click sign in buttoncheckout.runly:
@name: Checkout
@tags: critical, e2e
@use login.runly with user=admin pass=secret site=https://staging.app
click add to cart for the premium plan
click checkout button
verify order confirmation is visible@use inlines the module's steps with your variable overrides, and detects cycles if two modules reference each other.
AI assertions (verify ai: …)
When exact text matching is too brittle, ask the LLM to make the call:
open https://shop.example/cart
verify ai: the cart total equals the sum of all item prices plus tax
verify ai: a promo banner with this month's discount is visible
verify ai: no error messages are shown anywhere on the pageOn each verify ai: step, Runly snapshots the DOM, asks Claude for a strict pass/fail verdict with reasoning, and logs the evidence.
Synthetic data (Faker)
Any {{faker.*}} expression resolves at interpolation time:
type {{faker.person.firstName}} in first name
type {{faker.internet.email}} in email
type {{faker.location.zipCode}} in zip
type {{faker.phone.number}} in phoneVariables
Use {{variable}} syntax. Values come from:
--vars '{"key":"value"}'inline--vars-file path/to/vars.json~/.runly/vars/*.jsonglobal defaults{{faker.*}}synthetic values@useinlinewithoverrides
Features
Core
- English parser — converts natural language into atomic test steps
- 12-strategy selector engine — finds elements by testid, role, text, label, placeholder, alt, title, and more
- Session reuse — auto-saves login state per domain (skip login on next run)
- Persistent browser pool — 10× faster than launching fresh (~50 ms reconnect vs ~2 s launch)
- Variable substitution —
{{site}},{{faker.*}}, and@usewithoverrides - Tag filtering —
--tag criticalruns only what matters - Parallel execution —
--parallel 4runs tests simultaneously - Retry on flake —
--retry 3auto-retries failures
AI mode (optional)
When ANTHROPIC_API_KEY is set, Runly becomes agentic:
- Smart parsing — complex English decomposed into atomic steps
- DOM-aware element finding — Claude picks the right selector when the regex fallback can't
- Self-healing — on step failure, Claude diagnoses and retries with an alternative
- Semantic assertions —
verify ai: …routes to Claude with a DOM snapshot - Learning cache — successful heals are cached so future runs are free and fast
AI mode is completely optional. Runly works great without it.
MCP server
- stdio transport — one-line config in Claude Code / Cursor / any MCP client
- Six tools — test, run_file, list_tests, inspect, open_url, last_run
- Zero state — same local browser + reports as the CLI
Output
- Terminal — colored step-by-step results with a summary badge
- JSON — full report at
output/reports/report-{timestamp}.json - Screenshots —
output/screenshots/result-*.pngandfailure-*.png - Auto-generated Playwright specs — every
runly test/runly runemits standalone.spec.jsnext to the report - HTML dashboard —
runly reportgenerates a pretty view of all past runs - NDJSON —
--jsonflag for piping into jq, shell scripts, or CI systems
Commands
runly # Interactive REPL
runly run <path> # Run .runly test files
runly test "..." # One-off test from English
runly import <spec.ts> # Convert Playwright specs to .runly
runly mcp # Start MCP server (for Claude Code etc.)
runly watch "..." # Re-run at intervals
runly inspect <url> # Find elements on a page
runly diff A.json B.json # Compare two runs
runly report # HTML dashboard
runly record [url] # Record user actions into an instruction
runly replay last # Re-run last test
runly sessions # Manage saved auth sessions
runly doctor # Health check
runly auth sk-ant-... # Enable AI mode
runly serve # HTTP API serverPhilosophy
Tests should live in your repo, not someone else's dashboard.
When tests are in your repo:
- You can git-diff them
- You can review them in PRs
- You can grep them
- You can refactor them like any other code
- You can version them alongside the code they test
- You own them forever
When tests are in a SaaS dashboard:
- They disappear when your subscription lapses
- You can't diff them across branches
- You can't review them in PRs
- You're locked in
- You pay per seat
Runly takes Camp 1's ownership and adds Camp 2's ergonomics — and makes the result addressable by your AI agent.
FAQ
Is this production-ready? Yes for small/medium projects. For enterprise needs (SOC 2, on-prem, SSO), you'll want a cloud test platform.
Does it work with React/Vue/Angular? Yes. Runly uses Playwright under the hood — any SPA framework works.
Do I need an Anthropic API key?
No for core usage. The regex parser handles most common patterns. AI mode is optional and only improves complex flows, self-healing, and verify ai: assertions.
How does it compare to Playwright codegen?
Playwright codegen needs a GUI and produces verbose code. Runly is text-first and produces plain-English .runly files that humans read and edit. Runly also imports Playwright tests in the other direction via runly import.
Can I export back to Playwright code?
Yes. Every runly test / runly run auto-emits a standalone .spec.js in output/specs/. Opt out with --no-spec.
What about mobile? Not yet. Currently Chromium / Firefox / WebKit on desktop. Mobile via Appium is planned.
Does it support visual regression?
Yes — runly test "..." --baseline saves a baseline, --diff compares against it.
Does it work from Cursor / Windsurf?
Yes. Both support MCP. Same runly mcp config.
Contributing
Runly is MIT-licensed. The codebase is ~7000 LOC and organized by concern:
src/
parser/ # Regex tokenizer + action map
importer/ # Playwright → .runly converter
ai/ # Claude client, prompts, healer, assertions
runner/ # Playwright execution + session store + browser pool
mcp/ # MCP stdio server + tool handlers
commands/ # CLI commands
utils/ # Logger, config, pathsSee skills/runly.md for a complete architectural reference.
License
MIT
