st4ck-runner

v0.1.0-alpha.19

Published

a day ago

Deterministic Playwright-based test runner for st4ck. Replays markdown test recordings authored by the `st4ck` CLI. Library + CLI binary `st4ck-runner`.

0High
0Medium
0Low

edo-ceder

playwright test-runner deterministic regression-testing ai-agents claude-code

st4ck-runner

The Playwright-based test runner that drives recorded tests deterministically. Most users want st4ck — that wraps this runner with the public CLI ergonomics. This package is the lower-level binary the CLI depends on.

npm install st4ck-runner

Provides a single binary: st4ck-runner.

What it does

Records a session: agent drives a browser via line-delimited JSON IPC; runner captures every primitive into a markdown test file.
Replays a recorded md test file: deterministic Playwright execution, zero LLM calls, ~10–15× faster than the recording.
Pauses at agentic blocks: emits a structured envelope to stdout, reads commands from stdin until continue or abort. Chrome stays alive for the whole test.

No daemons. No external services required (the --no-mcp flag skips the optional st4ck MCP integration). One Chrome process per test, cleaned up on exit.

Subcommands

`st4ck-runner record <url> --instruction "<text>" [--out <path.md>] [--name <slug>] [--headless]`

Standalone record mode. Launches Chrome, navigates to <url>, synthesizes an agentic_pause envelope with <text> as the brief, runs the IPC pause loop. On continue, serializes the captured primitives to a markdown test file at --out. Refuses to overwrite existing files.

Emits a record_complete envelope on stdout when done.

`st4ck-runner run --test-file <path.md> [--no-mcp] [--headless]`

Replay a recorded md test file. Skips the MCP integration (no profile pool, no execution log persistence) — pure local replay. Emits replay_complete on stdout.

`st4ck-runner run <test_case_id> <base_url> [token] [options]`

The integrated path used by the paid platform: fetch test from the st4ck MCP server, acquire a profile from the project's pool, execute, persist a test_executions row. Requires ST4CK_TOKEN (or .mcp.json walk-up).

`st4ck-runner stop-browser <project_id>`

Stops an authoring-mode daemon (Phase 3+ feature; Phase 1/2 stub).

Primitive surface

Every primitive returns an ActionResult { primitive, status, started_at, completed_at, error?, evidence? }.

Actions (state-changing — captured into the recording):

| Method | Args | |---|---| | click | {locator, scope?} | | fill | {locator, value, scope?} | | press | {key, locator?, scope?} | | select | {locator, value, scope?} | | check_box | {locator, checked, scope?} | | hover | {locator, scope?} | | upload | {locator, files: string[], scope?} | | wait_until | {kind, locator?, url?, js?, timeout_ms?} — kind ∈ visible / hidden / attached / detached / url / networkidle / custom | | evaluate | {js} — JS expression evaluated against the page; value lands in evidence.result |

Observation (NOT recorded):

| Method | Returns | |---|---| | snapshot | a11y-tree YAML excerpt in evidence.result (Playwright's ariaSnapshot) | | url | current page URL |

IPC-driven (Phase 3 LLM primitives — handled by the parent agent, not the runner):

| Method | Mechanic | |---|---| | check | emits check_request, agent answers check_response with verdict + reasoning | | see | emits see_request, agent answers see_response.answer:string | | extract | emits extract_request with serialized zod schema; agent answers extract_response.data; runner runs schema.safeParse locally |

These primitives let the parent agent provide LLM-grade judgment without the runner ever calling an LLM provider — per ADR-004.

Locators

{ by: "testid", value: "email-input" }
{ by: "role", value: "button", options: { name: "Sign in", exact?: false } }
{ by: "label", value: "Email address", options?: { exact?: false } }
{ by: "placeholder", value: "[email protected]" }
{ by: "text", value: "Forgot password?" }
{ by: "css", value: "form > .submit" }

A locator-priority ladder (primitives/ladder.ts) re-resolves through testid → role+name → label → placeholder → text → css when the primary locator misses on replay (Tier-1 self-heal, free).

md test format

---
name: sign-in-as-alice
base_url: https://app.example.com
created_at: 2026-04-25T12:34:56.000Z
record_source: agentic_pause
---

# sign-in-as-alice

## Blocks

```json
[
  {
    "block": 0,
    "block_type": "frontend",
    "run_type": "serial",
    "browser_window": 1,
    "actions": [
      { "primitive": "fill", "args": { "locator": {"by":"testid","value":"email"}, "value": "[email protected]" } },
      { "primitive": "click", "args": { "locator": {"by":"testid","value":"sign-in"} } }
    ]
  }
]


Frontmatter is YAML-lite scalars. Blocks JSON mirrors the executor's `PrimitiveAction` shape — same data, portable wrapper.

---

## Architecture

- **Phase 1.6** locked the full primitive API surface (signatures + error classes).
- **Phase 1.9 / §8.5** specified the IPC pause protocol — line-delimited JSON on stdin/stdout.
- **Phase 2** filled stubs (`press` / `select` / `check_box` / `hover` / `upload`), built the locator ladder, wired strict-mode disambiguation, parallel block execution.
- **Phase 3** added the LLM-grade primitives (`check`, `see`, `extract`, `snapshot`) — all driven by IPC pause; runner has zero LLM provider integration in the External mode (this binary).

See the [LLM-native platform plan](https://github.com/edo-ceder/fig-video-scribe/blob/main/docs/modules/st4ck-llm-native/plans/2026-04-20_st4ck-llm-native-platform.md) for the architecture record + ADRs.

---

## Environment

| Variable | Purpose |
|---|---|
| `ST4CK_TOKEN` | MCP bearer token (paid mode) |
| `ST4CK_MCP_URL` | MCP endpoint (default `https://app.st4ck.io/mcp/v3/`) |
| `ST4CK_MCP_DATA_URL` | st4ck-dev data endpoint at `/mcp/dev/` (derived from MCP_URL if unset) |
| `ST4CK_STORAGE_STATE_ROOT` | override storageState root dir |
| `ST4CK_STORAGE_STATE_TTL_MS` | override storageState TTL (default 24h) |

---

## License

Apache 2.0.

---

## Status

`0.1.0-alpha.0`. Phase 3 of the LLM-native platform plan. Primitive API + IPC protocol stable; CLI surface stable. Internal-mode runner (autonomous LLM via backend MCP proxy, [ADR-005](https://github.com/edo-ceder/fig-video-scribe/blob/main/docs/modules/st4ck-llm-native/plans/2026-04-20_st4ck-llm-native-platform.md#adr-005--internal-mode-uses-backend-mcp-proxy-not-in-runner-sdk)) lands post-alpha.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

st4ck-runner

What it does

Subcommands

st4ck-runner record <url> --instruction "<text>" [--out <path.md>] [--name <slug>] [--headless]

st4ck-runner run --test-file <path.md> [--no-mcp] [--headless]

st4ck-runner run <test_case_id> <base_url> [token] [options]

st4ck-runner stop-browser <project_id>

Primitive surface

Locators

md test format

`st4ck-runner record <url> --instruction "<text>" [--out <path.md>] [--name <slug>] [--headless]`

`st4ck-runner run --test-file <path.md> [--no-mcp] [--headless]`

`st4ck-runner run <test_case_id> <base_url> [token] [options]`

`st4ck-runner stop-browser <project_id>`