@paleo/openclaw-qa-runner
v0.4.1
Published
Dockerised regression-test harness for OpenClaw workspaces: bus, scenario driver, judge, Compose stack.
Maintainers
Readme
@paleo/openclaw-qa-runner
Dockerised regression-test harness for OpenClaw workspaces. Drives the agent through two synthetic channels (discord-mock, slack-mock) and asserts the results.
Pair with @paleo/openclaw-channel-mock-core, @paleo/openclaw-discord-mock, @paleo/openclaw-slack-mock.
For internals (topology, Dockerfile pair, mocked-CLI shim, channel plugin mechanics, OpenClaw quirks), see openclaw-qa-architecture.md.
Install
npm i -D @paleo/openclaw-qa-runner @paleo/openclaw-channel-mock-core @paleo/openclaw-discord-mock @paleo/openclaw-slack-mock openclawRequires Docker Compose v2.20+ (overlay uses Compose include:).
Wire package.json scripts:
"scripts": {
"env:build": "openclaw-qa-runner env build",
"env:up": "openclaw-qa-runner env up",
"env:down": "openclaw-qa-runner env down",
"qa": "openclaw-qa-runner qa"
}Init
npx @paleo/openclaw-qa-runner init <qa-dir>Drops four files:
openclaw.json— gateway config (modelocal, both channel plugins enabled, main agent placeholder)..env.local.example— copy to.env.local, fillANTHROPIC_API_KEY+OPENCLAW_WORKSPACE_DIR.docker-compose.yml— thin overlay thatinclude:s the base fromnode_modules/.Dockerfile— consumer-owned. Inherits the base viaFROM paleo/openclaw-qa-runner-base:${QA_RUNNER_BASE_TAG}. AddRUN/COPY/ENVfor consumer-specific setup (extra system packages, skills install, etc.).
Configure
Edit openclaw.json:
agents.list[id=main].model— LiteLLM-styleprovider/modelref. The template ships a placeholder; OpenClaw fails loudly until you pick one.agents.list[id=main].workspace— host path to your OpenClaw workspace, bind-mounted into the gateway. Field name isworkspace, notworkspaceDir.channels.*— bothdiscord-mockandslack-mockblocks point at the same bus.
Drop scenarios under scenarios/<id>.ts. Project fixtures under projects-fixture/ (bind-mounted to ~/projects/ in the gateway).
Scenarios are loaded by Node 24's built-in TypeScript stripping. Stick to the strip-compatible subset (no enum, namespace, decorators, ctor parameter properties, import =). Shared helpers go under scenarios/_lib/ — discoverScenarios() skips directories.
Env vars (.env.local)
Required:
ANTHROPIC_API_KEYOPENCLAW_WORKSPACE_DIR— host path mounted at/home/claw/.openclaw/workspace.
Optional (defaults relative to the consumer's qa/ dir):
OPENCLAW_CONFIG_PATH→./openclaw.jsonQA_PROJECTS_DIR→./projects-fixtureQA_SCENARIOS_DIR→./scenariosQA_ARTIFACTS_DIR→./artifactsQA_GATEWAY_LOGS_DIR→./.gateway-logsOPENCLAW_RAW_STREAM=1— also writeraw-stream.jsonlalongside the always-onanthropic-payload.jsonl.
QA_PROJECT_DIR, QA_RUNNER_PACKAGE_DIR, CLAW_UID, CLAW_GID are injected by the CLI.
Run
npm run env:build # build base + consumer image
npm run env:up # bring up bus + gateway (both channels register)
npm run qa -- --channel all <scenario> # one scenario, both channels
npm run qa -- --channel all --all # every scenario, both channels
npm run qa -- --channel discord-mock <scenario> # restrict to one channel
npm run qa -- --channel all --iterations 5 <scenario> # repeat each (scenario, channel) pair 5×
npm run qa -- --channel all --iterations 5 --max-failures 1 <s> # abort a pair after >1 failure
npm run env:downenv:build first builds the base image (paleo/openclaw-qa-runner-base:<pkg-version>) from this package's Dockerfile.base, then builds the consumer image. Layer cache makes repeat base builds near-free; env:up / qa skip the base build when the tag already exists.
Rebuild required after: bumping any @paleo/openclaw-* dependency, edits to openclaw.json, or any change to the consumer Dockerfile.
Scenarios run serially through one gateway. Exit 0 iff every pair passes.
Scenario primitives
From @paleo/openclaw-qa-runner (src/context.ts):
channel,conversationId,accountId— per-task isolation. Usectx.conversationIdeverywhere; never hard-code a value.sendInbound(input)— push an inbound on the bus.poll,waitForOutbound,expectNoOutbound— bus consumers.assertRegex,assertEqual,assertLength— structural assertions.judgeLLM({ message, rubric, label })— Anthropic-direct judgement (no bus traffic, no gateway).mockCli(name, handler)— intercepts the gateway's calls togit/npm/pnpm/yarn/claude. Unregistered calls fail the scenario withfailure.source = "cliMock".log,getCursor.
Prefer structural assertions over judgeLLM; reserve the judge for free-form content claims.
Judge model
Defaults to anthropic/claude-haiku-4-5. Override via QA_JUDGE_MODEL on the runner service (set in your consumer overlay). The ref must be LiteLLM-style; only the anthropic/ provider is wired up today. The judge is not an OpenClaw agent — don't configure it in openclaw.json.
Artifacts
artifacts/<runStamp>/<scenario>-<channel>[-<NN>][-<VERDICT>]/:
events.jsonl— appended live, survives a runner crash.report.json— finalScenarioReport. Mergesevents.jsonlwithagentToolCallentries from the gateway payload log; adds per-scenariocost.
<NN> is the iteration index (omitted when --iterations 1). <VERDICT> is PASS / FAIL, applied by renaming the directory after report.json is written. A directory with no verdict suffix means the run is pending or crashed before rename.
Authoritative types: src/report.ts.
Channels
Both discord-mock and slack-mock register on every boot. Pick which to drive per scenario via --channel discord-mock|slack-mock|all.
discord-mock— full Discord-shaped surface;thread-createposts an optional body atomically.slack-mock— restricted Slack-shaped surface (react/read/edit/delete/reactions/search). Bare-channel inbounds auto-thread on the triggering message.
Inbound metadata claims Provider / Surface / OriginatingChannel = the registered channel id so the SDK routes tool-schema discovery to the right plugin. Assert on conversation.id / threadId, not envelope formatting.
Target format
Canonical destination param is to. Accepted shapes:
channel:<id>or bare<id>(channel)dm:<id>group:<id>thread:<channelId>/<threadId>
Actions resolve to → target → channelId.
Attribution
The runner package contains no upstream-adapted code. See sibling packages' NOTICE.md for OpenClaw attribution covering the channel plugins.
