autoresearch-cli
v0.1.0
Published
Standalone contract CLI for autonomous, no-human-in-the-loop optimization research on Claude Code. Frame a need, loop (hypothesis → measure → keep/discard), deliver reviewable branches.
Maintainers
Readme
autoresearch-cli
Autonomous, no-human-in-the-loop optimization research for Claude Code.
The user states a need once. An agent frames it, runs an experiment loop — try a hypothesis, measure it, keep wins, revert losses, repeat — and delivers reviewable branches. No human in the middle.
This repo is the contract layer the agent leans on. It doesn't drive the agent; the loop lives in the agent. The CLI just makes the parts the agent can't be trusted to do by hand impossible to get wrong.
The responsibility split
- agent-own — judgment that can't be encoded: framing the need into a
measurable target, picking the next hypothesis, reading the result, deciding
when the work is delivered. (
skill/SKILL.md) - cli-own — this binary. A synchronous contract the agent calls: honest measurement, keep→commit / loss→revert, the checks gate, confidence, bookkeeping. Numbers come only from real execution — never from the agent.
- hooks-own — Claude Code lifecycle hooks that keep the loop alive in the
gaps between iterations, where no agent is in scope:
Stopblocks the turn from ending and pushes the next iteration — the real mechanism behind "never stop", not a prompt instruction.SessionStartre-injects session state so a fresh, amnesiac instance picks up the relay after a context reset.PreCompactguard (state already lives on disk).
Install
npm install -g autoresearch-cli # provides the `autoresearch` binaryRequires Node ≥ 18. The published package ships a compiled bundle
(dist/autoresearch.js); the source is TypeScript and runs directly on
Node ≥ 23.6 via node autoresearch.ts <cmd> for development.
Then copy the hooks block from templates/claude-settings.json into your
project's .claude/settings.json.
Commands
autoresearch init --name N --metric M [--unit U] [--direction lower|higher]
autoresearch run [--timeout S] [--checks-timeout S]
autoresearch log --metric N --status keep|discard|crash|checks_failed --desc D \
[--metrics JSON] [--asi JSON] [--force]
autoresearch hook stop|sessionstart|precompact [--cwd DIR]Add --json to init/run/log for machine output.
Measurement contract: run resolves the benchmark as just measure if a
justfile defines it, else .auto/measure.sh. Checks resolve the same way
(just check / .auto/checks.sh); check time is excluded from the primary
metric. Only METRIC name=value lines in the output are trusted as numbers.
You never run git yourself — log owns commit and revert, and always preserves
.auto/.
Delivering
When the agent judges the work done, it groups kept commits into independent,
reviewable units and runs finalize/finalize.sh <groups.json> (format in
finalize/FINALIZE.md) to cut one clean branch per group from the merge-base,
then creates .auto/.done to stop the loop.
State
Everything lives under .auto/ and is the single source of truth — the CLI
holds no in-memory state between calls and reconstructs from .auto/log.jsonl
on each invocation. .auto/.done stops the loop; .auto/.last-checks.json and
.auto/.loop-state.json are internal.
Layout
autoresearch.ts # entry: arg parsing + dispatch
core/
jsonl.ts paths.ts # vendored pure modules (session log format + .auto layout)
stats.ts # confidence (MAD), isBetter, baseline/best
metrics.ts # METRIC parsing, unit inference
git.ts # commit / revert (preserving .auto/)
runner.ts # just-probe, measure + checks execution, timing
session.ts # state reconstruction, config
commands.ts # init / run / log handlers
hooks/index.ts # stop / sessionstart / precompact
templates/ # Claude Code settings template
finalize/ # finalize.sh + groups.json doc
skill/SKILL.md # agent-own instructions
test/ # node:test suitesTest
npm testProvenance
Extracted from pi-autoresearch
and decoupled from the pi runtime. core/jsonl.ts, core/paths.ts, and
finalize/finalize.sh are vendored from that project.
License
MIT
