luna-orbit
v0.4.1
Published
Luna Orbit — AI-powered autonomous regression testing SDK. Drop into any web app (Playwright) or mobile app (Appium). Describe flows in plain English, the agent drives a real browser/device, self-heals, screenshots, and reports.
Maintainers
Readme
Luna Orbit
AI-native regression testing. You describe what to test. Luna writes the test plan AND runs it.
luna-orbit auto \
--target https://shop.example.com \
--requirement "Verify a logged-out user can search for a product, add it to cart, and reach checkout"That's the whole interface. No selectors. No recording. No prior plan file. Luna opens the app, explores it briefly, writes a runnable test plan in markdown, executes the plan, and prints PASS/FAIL with per-step screenshots and an HTML report.
Of course you can also keep your tests as files in your repo (luna-orbit run plans/checkout.md), or have Luna only write plans (luna-orbit author --save), or stand up an HTTP API (luna-orbit serve) — all four flows are first-class.
Built by Lumen AI Solutions. Same MLX agent stack that powers our internal agents (Lysa, Luna) —
gemma-4-31b-itrunning locally on a Mac Studio over Tailscale, free per-run. Swap in Claude / GPT / Azure with one env var.
How Luna Orbit is different
Tested against 22 competitors (Mabl, QA.tech, Octomind, Reflect, Momentic, Functionize, TestRigor, Stagehand, Browser-Use, Skyvern, Hercules, LaVague, Magnitude, Cypress Cloud, Sauce Labs, BrowserStack, Applitools, Katalon, Mabl, Testim, Checkly, Anthropic Computer Use). Luna Orbit's four real differentiators:
AI-NATIVE: tests write themselves.
luna-orbit auto --target URL --requirement "..."is the headline interface — Luna explores the app and authors the plan from one sentence, then runs it. Octomind/QA.tech/Reflect have plan generation but it's locked behind closed-source SaaS that ships your DOM to OpenAI. No OSS competitor offers AI-native authoring + local LLM.Local-LLM default = $0/run, no data leaves your network. Defaults to MLX
gemma-4-31b-iton Tailscale. Override to Claude/GPT/Azure with one env var. No competitor defaults to local. Stagehand technically supports local but the docs nudge you to OpenAI/Anthropic. Big deal in regulated industries (health, fintech, defense) where SOC2/HIPAA prohibits sending UI traces to a third-party LLM.One markdown plan format → web AND mobile. Same authoring surface for Playwright (web) and Appium (mobile). Stagehand, Browser-Use, Skyvern, Magnitude, Hercules, LaVague are web-only. Mabl/Reflect/Momentic have both but with separate authoring stacks. Luna Orbit unifies them under one
platform: web|mobiletoggle.Tests live as markdown files in your app repo. Version-controlled, code-reviewed in PRs, diff-able. All SaaS competitors store tests in their cloud DB. OSS Hercules/LaVague come closest but use Gherkin, not plain markdown.
Pricing: MIT-licensed core, free forever for local runs. Hosted dashboard / shareable links / scheduled runners coming as a paid add-on (usage-based).
The four ways to use Luna Orbit
| Interface | When |
|---|---|
| luna-orbit auto --target … --requirement … | Pure AI-native. Don't even write a plan — describe what you want tested, Luna does the rest. Best for smoke tests, exploratory QA, "did this PR break anything obvious." |
| luna-orbit author --target … --requirement … --save plan.md | Have Luna write the plan, then you commit it to your repo. Best when you want repeatable tests but don't want to author them by hand. |
| luna-orbit run plan.md | Run an existing plan. Best for CI — tests are version-controlled markdown files in your app repo. |
| luna-orbit serve | Long-lived HTTP API + tiny dashboard. Best when other services or non-engineers need to trigger runs and see history. |
What it covers
| Platform | Driver | Targets |
|---|---|---|
| Web (platform: web) | agent-browser (Playwright under the hood) | Any website / SPA / webapp — Chrome, Edge, Safari, Firefox |
| Mobile (platform: mobile) | Appium via WebDriverIO | Any iOS or Android app (native, hybrid, or WebView) |
Appium is mobile-only. For web/webapps, Luna Orbit uses Playwright through agent-browser. The same plan format, the same agent loop, the same report — just a different driver under the hood.
Why Luna Orbit vs hand-written Playwright/Cypress
Selector-based E2E suites break the moment someone renames a button. They also can't author themselves. Luna Orbit:
- Self-heals — refs come from the accessibility tree, not CSS. When a ref vanishes, the agent re-snapshots and picks a new one.
- Writes themselves from intent —
1. Click "Send for signature"is the test. Nocy.get('[data-testid=send-btn]'). - Catches semantic bugs — assertions are natural language ("the latest response mentions Trump by name"), checked by the LLM against the live snapshot.
- One package, two platforms — same plan format works for web and mobile.
- Defaults to local LLM — uses our MLX
gemma-4-31b-itover Tailscale by default. Free, fast enough, no rate limits. Override to Claude/GPT viaLUNA_ORBIT_LLM_*env vars. - Drop into any platform — installable npm package + CLI, no coupling to any specific app.
Install
npm i -D luna-orbit
# Once-off: install the headless browser binary used by the web driver
npx agent-browser installFor mobile: install Appium (npm i -g appium && appium driver install uiautomator2 xcuitest) and have the device/simulator running.
Plan format
Markdown with a YAML frontmatter block + ## Steps + ## Assertions:
Web
---
name: My checkout flow
target: https://shop.example.com
platform: web
viewport: 1280x900
max_steps_per_intent: 8
---
## Steps
1. Click "Add to cart" on the first product
2. Open the cart
3. Click "Checkout"
4. Fill the email field with "[email protected]"
5. Click "Continue to payment"
## Assertions
- The order summary shows exactly one item
- The "Continue to payment" page is visibleMobile (generic Appium — any app)
---
name: Settings smoke
platform: mobile
mobile_mode: appium
capabilities: {"platformName":"Android","appium:automationName":"UiAutomator2","appium:appPackage":"com.android.settings","appium:appActivity":".Settings"}
---
## Steps
1. Tap "Network & internet"
2. Tap "Internet"
## Assertions
- A list of nearby Wi-Fi networks is visiblecapabilities: is a single-line JSON object — passes straight to Appium. iOS example: {"platformName":"iOS","appium:automationName":"XCUITest","appium:bundleId":"com.example.MyApp","appium:platformVersion":"17.0","appium:deviceName":"iPhone 15"}.
Mobile (Lumen-specific Hyundai POM)
If you already have a Page-Object-Model suite (we use ~/Code/hma_automation for Hyundai/Genesis dealer flows), use mobile_mode: hma and reference Page.method:
---
name: Dealer onboarding smoke
platform: mobile
mobile_mode: hma
hma_entry: dealer_onboarding_myh.py
---
## Steps
1. HomePage.tap_search
2. DealersPage.tap_first_result
3. DealerDetailPage.tap_schedule_serviceCLI
# AI-native one-shot: describe what to test, Luna writes & runs it
luna-orbit auto --target https://shop.example.com --requirement "Verify checkout works"
# Have Luna author a plan, save to disk
luna-orbit author --target https://shop.example.com --requirement "..." --save plan.md
# Run an existing plan
luna-orbit run plans/checkout.md --out ./orbit-out
luna-orbit run plans/checkout.md --headed # see the browser window
# Capture cookies for an authenticated app
luna-orbit login --target https://app.example.com --save fixtures/admin.json
# Long-lived SaaS server: signup, login, multi-user dashboard, API + widget
luna-orbit serve --port 8780 --data-dir ./orbit-data --max-parallel 2
# Scheduled monitoring (auto-load all plans with `cron:` from a directory)
luna-orbit monitor ./plans --port 8780
# Short alias
orbit run plans/checkout.mdSelf-host the SaaS (docker-compose up)
git clone https://github.com/BitCodeHub/luna-orbit
cd luna-orbit
docker-compose up
# → http://localhost:8780/ (signup → onboarding → dashboard)In v0.4 SaaS mode you get:
- Multi-user accounts — signup / login / sessions, bcrypt-hashed passwords
- Per-user workspaces with plan tiers (free/pro/team/enterprise) + monthly run quotas
- Per-workspace API keys — generate / revoke from the settings page
- Onboarding wizard — first-run flow with 5 test templates so a non-engineer can run their first AI test in 30 seconds
- Embeddable widget — drop
<script src=".../widget.js" data-key="lo_pk_..."></script>into any web app, get a "AI test this page" button in the corner
Behind a public URL: set LUNA_ORBIT_PUBLIC_URL=https://lunaorbit.your-domain.com so the widget snippet in the dashboard renders the right host. Put it behind your VPN/Cloudflare Access/Tailscale Funnel for production.
run exits 0 on pass, 1 on test failure, 2 on crash.
Programmatic
import { runPlan } from "luna-orbit";
const report = await runPlan("plans/checkout.md");
if (!report.passed) process.exit(1);HTTP API (luna-orbit serve)
For triggering runs from a service, scheduling jobs, or letting a non-engineer kick off a smoke from a tiny built-in dashboard.
luna-orbit serve --port 8780 --data-dir ./orbit-data --max-parallel 2
# Open http://localhost:8780/ → list of recent runs + links to each reportEndpoints:
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/runs | Start a run. Body: { "plan_md": "<markdown>", "options": { "headed": false } } (or "plan_path": "/path/on/server"). Returns { id, status: "queued" }. |
| GET | /v1/runs | List recent runs (?limit=N&status=passed\|failed\|errored\|running). |
| GET | /v1/runs/:id | Status + summary (status, passed, intents_satisfied/total, assertions_pass/total, duration_ms). |
| GET | /v1/runs/:id/report.json | Full machine-readable trace. |
| GET | /v1/runs/:id/report.html | Pre-rendered single-file HTML report with screenshots. |
| GET | /v1/runs/:id/screenshots/:name | Serve a per-step screenshot PNG. |
| GET | /healthz | Liveness + queue depth. |
| GET | / | Tiny built-in dashboard (HTML). |
Auth: set LUNA_ORBIT_API_KEYS=key1,key2,key3. All /v1/* endpoints then require Authorization: Bearer <one-of-them>. With the env var unset, the server is OPEN — fine for local dev, never expose to the internet without keys set.
Concurrency: --max-parallel caps simultaneous runs (default 2). Each run spawns a real Chromium / Appium session, so set this conservatively.
Storage: filesystem. <data-dir>/index.json is the registry; <data-dir>/runs/<id>/ holds each run's plan + report + screenshots. No DB.
Outbound webhooks
When any run finishes, POST a JSON payload to one or more URLs. Slack and Discord webhook URLs are auto-detected and reshaped — generic URLs receive the canonical payload.
LUNA_ORBIT_WEBHOOKS="https://hooks.slack.com/services/T0/B0/XYZ,https://example.com/ci/luna" \
LUNA_ORBIT_WEBHOOK_SECRET="my-shared-secret" \
luna-orbit serveGeneric payload (sent to non-Slack/Discord URLs):
{
"event": "run.completed",
"run": { "id": "run_…", "plan_name": "Checkout · happy path", "status": "passed",
"passed": true, "intents_satisfied": 5, "intents_total": 5,
"assertions_pass": 3, "assertions_total": 3, "duration_ms": 28341, … },
"summary": "✓ PASS — Checkout · happy path · intents 5/5 · assertions 3/3 · 28.3s",
"emitted_at": "2026-05-04T04:21:12.839Z"
}If LUNA_ORBIT_WEBHOOK_SECRET is set, an x-luna-orbit-signature: sha256=<hex> header is added — verify on the receiving side to defend against forged calls.
Reports
Each run writes:
<outDir>/report.json— full machine-readable trace (intents, actions, decisions, assertions)<outDir>/report.html— single-file HTML rollup with per-intent screenshots<outDir>/screenshots/intent-NN.png+final.png
Env vars
| Var | Default | Notes |
|---|---|---|
| LUNA_ORBIT_LLM_BASE_URL | http://100.90.199.128:8084/v1 | OpenAI-compatible chat endpoint. Same gemma-4-31b instance Lysa uses. |
| LUNA_ORBIT_LLM_MODEL | mlx-community/gemma-4-31b-it-bf16 | Model name |
| LUNA_ORBIT_LLM_API_KEY | (none) | Bearer token if the endpoint requires it |
| LUNA_ORBIT_AGENT_BROWSER_BIN | agent-browser | Override path/binary |
| LUNA_ORBIT_APPIUM_URL | http://127.0.0.1:4723 | Appium server URL for mobile |
| LUNA_ORBIT_HMA_DIR | ~/Code/hma_automation | Lumen-specific POM dir (only used when mobile_mode: hma) |
Cloud LLM examples — point at Anthropic Claude:
export LUNA_ORBIT_LLM_BASE_URL="https://api.anthropic.com/v1" export LUNA_ORBIT_LLM_MODEL="claude-sonnet-4" export LUNA_ORBIT_LLM_API_KEY="sk-ant-..."Or OpenAI:
export LUNA_ORBIT_LLM_BASE_URL="https://api.openai.com/v1" export LUNA_ORBIT_LLM_MODEL="gpt-4o" export LUNA_ORBIT_LLM_API_KEY="sk-..."
Legacy QA_PILOT_* env vars are also accepted (Luna Orbit was renamed from qa-pilot on 2026-05-03).
Architecture
plan.md ──► parsePlan ──► [intent₁, intent₂, …] ──┐
▼
driver (web | mobile) ◄─── agent loop ◄─── LLM ─── for each intent:
│ │ snapshot → LLM picks action
│ │ act → re-snapshot
│ │ repeat until done|give_up|max
│ ▼
│ final assertions ──► LLM checks each natural-language assertion
▼
report.html + report.json + screenshots/- Web driver = subprocess wrapper around
agent-browserCLI (Playwright underneath). - Mobile driver = WebDriverIO + Appium (generic mode) or subprocess shells to a Python POM (
hmamode). - Agent loop runs LLM-picked actions one-at-a-time; anti-loop guard forces a wait if the same action runs three times in a row.
- Adding a new platform = implement the
Driverinterface (open,snapshot,act,screenshot,close).
License
MIT. Built by Lumen AI Solutions.
