@gianfrancopiana/openclaw-autoresearch
v1.0.11
Published
Faithful OpenClaw port of pi-autoresearch.
Maintainers
Readme
openclaw-autoresearch
Autonomous experiment loop for any optimization target.
Faithful OpenClaw port of davebcn87/pi-autoresearch, including upstream statistical confidence scoring.
How it works
The agent runs a loop: edit code, run a benchmark, measure the result, keep or discard. Each iteration is logged. The loop runs autonomously until interrupted.
Three tools drive the loop:
| Tool | What it does |
|---|---|
| init_experiment | Configures the session: name, primary metric, unit, direction (lower/higher). Once runs exist, starting a new segment requires reset: true, and the prior segment's best result is carried forward into checkpoint context. |
| run_experiment | Executes a shell command, times it, captures stdout/stderr, parses METRIC name=number lines, and opens a pending experiment window that must be logged before another run can start. |
| log_experiment | Records the pending run. The first logged run in a segment is tagged as the baseline automatically. keep auto-commits to git. discard/crash log without committing, and discard now requires an idea note that is appended to autoresearch.ideas.md. If the prior run_experiment captured the primary metric, log_experiment can infer commit and metric automatically. After 3+ runs in a segment, it also reports a confidence score for the best improvement versus noise. |
In OpenClaw sessions, the plugin uses the host-provided workspaceDir as the normal repo root. Each tool also accepts an optional cwd so callers can explicitly target a nested or non-session repo when needed.
All state lives in six repo-root files:
| File | Purpose |
|---|---|
| autoresearch.md | Session doc. The plugin keeps the Metrics, How to Run, What's Been Tried, and Plugin Checkpoint sections synchronized so resumes are less agent-dependent. |
| autoresearch.sh | Benchmark script. Outputs METRIC name=number lines. |
| autoresearch.jsonl | Structured log: config headers + experiment entries (metric, status, timestamp, segment, commit hash). |
| autoresearch.ideas.md | Backlog of promising ideas not yet tried. Optional. |
| autoresearch.checkpoint.json | Plugin-managed checkpoint: latest logged state, recent runs, and any pending unlogged run. |
| autoresearch.lock | Session lock with PID + timestamp so another agent can detect an active or stale loop before forking a second session. |
The design is file-first: any agent can pick up the repo-root files and continue the loop without prior context.
Install
Requires OpenClaw 2026.4.25 or newer.
Needs bash, git, and a git repo.
Use OpenClaw's plugin installer:
openclaw plugins install @gianfrancopiana/openclaw-autoresearchIf you're running from a local OpenClaw checkout, use:
pnpm openclaw plugins install @gianfrancopiana/openclaw-autoresearchFor local plugin development, link your working copy instead of copying files:
openclaw plugins install --link /absolute/path/to/openclaw-autoresearch
# or from a local OpenClaw checkout:
# pnpm openclaw plugins install --link /absolute/path/to/openclaw-autoresearchFor a packaged local install, build the tarball and install that artifact:
npm install
npm pack
openclaw plugins install ./gianfrancopiana-openclaw-autoresearch-<version>.tgzThe install command records the plugin, enables it, and makes it available
after restart. OpenClaw reads the package metadata, loads the compiled runtime
entry dist/index.js, and finds the manifest in
openclaw.plugin.json.
Verify:
- skill:
autoresearch-create - tools:
init_experiment,run_experiment,log_experiment,autoresearch_status - command:
/autoresearch(recommended) - direct skill fallback:
/skill autoresearch-create
Prefer the explicit /autoresearch command surface in OpenClaw. The auto-generated native skill alias /autoresearch_create may not trigger reliably on some hosts, so use /skill autoresearch-create if you need to invoke the skill directly.
Workflow Guarantees
run_experimentrefuses to start a second run until the previous one is logged.run_experimentparsesMETRIC name=numberlines and stores a pending run solog_experimentcan default from the actual benchmark output.init_experimentrefuses to reset a live history unlessreset: trueis passed explicitly.- The first
log_experimentin a segment is tagged as the baseline automatically, even if it is later discarded. discardlogs must include anideanote, and that note is appended toautoresearch.ideas.md.- During active autoresearch mode, raw benchmark execution through OpenClaw
exec/bashis blocked. Userun_experimentinstead. autoresearch_statuswarns when a pending run is unlogged, when the canonical branch has drifted, when a stale/live lock exists, or when git history has moved ahead of the last logged experiment. Onautoresearch/*branches it explicitly warns not to push unlogged commits.- After 3+ positive-metric runs in a segment,
log_experiment,autoresearch_status, and the synced session doc report a MAD-based confidence score so the agent can distinguish likely wins from noise. - The plugin updates
autoresearch.checkpoint.json,autoresearch.lock, and the plugin-managed sections inautoresearch.mdafter init, run, and log transitions.
Use
In the repo you want to optimize:
- Load the plugin.
- Run
/autoresearchor/autoresearch setup <goal>. - Send a normal message with the goal, command, metric (+ direction), files in scope, and constraints.
- If you need the raw skill invocation, use
/skill autoresearch-create. - The agent writes
autoresearch.mdandautoresearch.sh, captures or reuses the canonicalautoresearch/*branch, runs a baseline withrun_experiment, then records it withlog_experiment. - Use
/autoresearchor/autoresearch statusto re-prime context on a later turn.
To resume an existing session, a new agent reads the repo-root files and continues from where the last one stopped.
User steers
Messages sent while an experiment is running are queued and surfaced after the next log_experiment. The agent finishes the current experiment before incorporating the steer.
Ideas backlog
When the agent discovers promising but complex ideas mid-loop, it appends them to autoresearch.ideas.md. Discarded experiments now require an idea note, so failed paths leave behind concrete follow-up suggestions instead of disappearing. On resume, the agent reads the backlog, prunes stale entries, and uses the remaining ideas as experiment paths.
Upstream reference
This port preserves upstream semantics, names, and file contracts while adapting presentation to OpenClaw. There is no Pi-style widget, dashboard, or editor shortcut layer. Remaining differences are tracked in docs/non-parity.md.
- upstream repo:
https://github.com/davebcn87/pi-autoresearch - pinned upstream commit:
2227029fa5712944a36938b5fe59f709cb30ed22(2227029f) - later upstream parity cherry-pick: confidence scoring from
cf1bbf03debca8f3fb2cca2c3e799b9e23320f87(cf1bbf0, March 19, 2026)
Validation
npm install --include=dev
npm run check:release-metadata
npm run typecheck
npm test
npm run validate
npm run release:verify
npm run smoke:openclaw-host -- /absolute/path/to/openclaw
npm run smoke:registry-openclaw-host -- <published-version> /absolute/path/to/openclawRelease instructions, including npm run release:prepare -- <version> --host /absolute/path/to/openclaw and GitHub Actions trusted publishing with npm provenance, live in RELEASING.md.
The local test shim supports typechecking and tests without a full OpenClaw host checkout. Runtime behavior depends on a real OpenClaw host, so run the host smoke against a current checkout before release.
License
MIT
