stars-end
v0.4.0
Published
Lightweight, Playwright-only, natural-language e2e testing agent, influenced by Midscene
Readme
stars-end
A lightweight, Playwright-only library for end-to-end (e2e) testing with
natural language. Hand it a Playwright Page and plain-English instructions, and
it locates elements visually, performs actions, reads structured data, asserts on
what's on screen, and can run an autonomous planning loop to accomplish a goal.
Drop it into your existing Playwright e2e suite to write resilient, intent-based
tests instead of brittle selectors.
Two things it's good at:
- e2e testing: replace brittle selectors with intent;
assertandwaitForon what's actually on screen. - browser automation: hand it a goal and let the agent loop plan, act, and
verify in a
while-loop until the task is done, no step-by-step scripting.
Visual grounding is powered by Google's Gemini models through the Vercel AI SDK; the model layer is abstracted, so other providers can be slotted in.
import { chromium } from "playwright";
import { z } from "zod";
import { Agent } from "stars-end";
const browser = await chromium.launch();
const page = await browser.newPage({ deviceScaleFactor: 2 });
await page.goto("https://example.com/shop");
const agent = new Agent(page, { model: "gemini-2.5-flash" });
// autonomous: give it a goal and it plans + acts until done
const result = await agent.act("add the cheapest backpack to the cart and go to checkout");
// ...or drive individual steps yourself:
// instant actions
await agent.tap('the "Add to cart" button for the backpack');
await agent.input("[email protected]", "the email field");
await agent.keyboardPress("Enter");
await agent.scroll({ direction: "down", distance: 600 }, "the product list");
// structured read
const items = await agent.query(
z.array(z.object({ name: z.string(), price: z.number() })),
"the cart line items",
);
// assertions
await agent.assert("the order total is $42.00");
await agent.waitFor("the success toast is visible", { timeoutMs: 10_000 });Install
pnpm add stars-end playwright
# or: npm i stars-end playwrightSet your model key:
export GOOGLE_GENERATIVE_AI_API_KEY=...playwright is a peer dependency. Model calls go through the Vercel AI SDK
(ai + @ai-sdk/google); swapping providers is mostly a one-liner because the
model tier is abstracted behind a small interface.
Why
- One driver, one focus. Just Playwright and visual grounding, with no Android, iOS, or desktop surfaces, no bridge mode, and no MCP servers.
- Structured output via the AI SDK. Schemas are passed natively to
generateObject; no hand-rolled JSON repair on the happy path. - Deterministic, testable core. The coordinate pipeline (normalized → image pixels → CSS pixels) is pure and fully unit-tested.
- Cheap reruns. An optional XPath-keyed locate cache makes repeated flows fast and deterministic.
API
const agent = new Agent(page, {
model: "gemini-2.5-flash",
cache: { id: "checkout-flow" }, // optional XPath locate cache
trace: { path: "trace.jsonl" }, // optional JSONL trace
});
// actions
await agent.tap(prompt);
await agent.rightClick(prompt);
await agent.doubleClick(prompt);
await agent.hover(prompt);
await agent.input(value, prompt?, { mode: "replace" | "clear" | "typeOnly" });
await agent.keyboardPress("Control+A");
await agent.scroll({ direction, distance }, prompt?);
// insight
const data = await agent.query(schema, demand);
await agent.assert(assertion); // throws on false
const ok = await agent.check(assertion); // non-throwing
await agent.waitFor(assertion, { timeoutMs });
// low-level + autonomous
const { x, y, rect, xpath } = await agent.locate(prompt);
const result = await agent.act(goal);
await agent.flushTrace();How it works
The one load-bearing abstraction is locate(instruction) → { x, y } in CSS
pixels. Everything else composes from it:
- an instant action is
build context → locate → one driver call - planning is
plan → resolve each action's locate → execute, with replanning and a step history fed back to the model
The visual-grounding model returns a bounding box in its own normalized coordinate space; a pure pipeline reorders, denormalizes, validates, and clamps it down to a clickable CSS-pixel point. For dense or tiny targets there's an optional two-stage "deep locate" that crops and upscales the region before re-locating.
Development
pnpm test # run the suite
pnpm typecheck
pnpm build
pnpm lint # oxlint
pnpm fmt # oxfmtLive model tests (under test/live/) require GOOGLE_GENERATIVE_AI_API_KEY and
are excluded from the default run.
Acknowledgements
stars-end is heavily influenced by Midscene. Several of its core mechanisms are ported or adapted from Midscene, most notably the visual-grounding coordinate pipeline and the locate / planning approach. Midscene is a much broader project (many platforms and model families); stars-end is a focused, Playwright-only take on the parts we use most.
Huge thanks to the Midscene authors at ByteDance for the battle-tested design this builds on.
License
MIT. Midscene is also MIT-licensed; its copyright notice is included in the LICENSE file for the ported portions.
