autoresearch-cli

v0.1.0

Published

12 days ago

Standalone contract CLI for autonomous, no-human-in-the-loop optimization research on Claude Code. Frame a need, loop (hypothesis → measure → keep/discard), deliver reviewable branches.

0High
0Medium
0Low

qq8933

autoresearch claude-code autonomous-agent optimization benchmark experiment-loop

autoresearch-cli

Autonomous, no-human-in-the-loop optimization research for Claude Code.

The user states a need once. An agent frames it, runs an experiment loop — try a hypothesis, measure it, keep wins, revert losses, repeat — and delivers reviewable branches. No human in the middle.

This repo is the contract layer the agent leans on. It doesn't drive the agent; the loop lives in the agent. The CLI just makes the parts the agent can't be trusted to do by hand impossible to get wrong.

The responsibility split

agent-own — judgment that can't be encoded: framing the need into a measurable target, picking the next hypothesis, reading the result, deciding when the work is delivered. (skill/SKILL.md)
cli-own — this binary. A synchronous contract the agent calls: honest measurement, keep→commit / loss→revert, the checks gate, confidence, bookkeeping. Numbers come only from real execution — never from the agent.
hooks-own — Claude Code lifecycle hooks that keep the loop alive in the gaps between iterations, where no agent is in scope:
- Stop blocks the turn from ending and pushes the next iteration — the real mechanism behind "never stop", not a prompt instruction.
- SessionStart re-injects session state so a fresh, amnesiac instance picks up the relay after a context reset.
- PreCompact guard (state already lives on disk).

Install

npm install -g autoresearch-cli   # provides the `autoresearch` binary

Requires Node ≥ 18. The published package ships a compiled bundle (dist/autoresearch.js); the source is TypeScript and runs directly on Node ≥ 23.6 via node autoresearch.ts <cmd> for development.

Then copy the hooks block from templates/claude-settings.json into your project's .claude/settings.json.

Commands

autoresearch init  --name N --metric M [--unit U] [--direction lower|higher]
autoresearch run   [--timeout S] [--checks-timeout S]
autoresearch log   --metric N --status keep|discard|crash|checks_failed --desc D \
                   [--metrics JSON] [--asi JSON] [--force]
autoresearch hook  stop|sessionstart|precompact [--cwd DIR]

Add --json to init/run/log for machine output.

Measurement contract: run resolves the benchmark as just measure if a justfile defines it, else .auto/measure.sh. Checks resolve the same way (just check / .auto/checks.sh); check time is excluded from the primary metric. Only METRIC name=value lines in the output are trusted as numbers.

You never run git yourself — log owns commit and revert, and always preserves .auto/.

Delivering

When the agent judges the work done, it groups kept commits into independent, reviewable units and runs finalize/finalize.sh <groups.json> (format in finalize/FINALIZE.md) to cut one clean branch per group from the merge-base, then creates .auto/.done to stop the loop.

State

Everything lives under .auto/ and is the single source of truth — the CLI holds no in-memory state between calls and reconstructs from .auto/log.jsonl on each invocation. .auto/.done stops the loop; .auto/.last-checks.json and .auto/.loop-state.json are internal.

Layout

autoresearch.ts          # entry: arg parsing + dispatch
core/
  jsonl.ts paths.ts      # vendored pure modules (session log format + .auto layout)
  stats.ts               # confidence (MAD), isBetter, baseline/best
  metrics.ts             # METRIC parsing, unit inference
  git.ts                 # commit / revert (preserving .auto/)
  runner.ts              # just-probe, measure + checks execution, timing
  session.ts             # state reconstruction, config
  commands.ts            # init / run / log handlers
hooks/index.ts           # stop / sessionstart / precompact
templates/               # Claude Code settings template
finalize/                # finalize.sh + groups.json doc
skill/SKILL.md           # agent-own instructions
test/                    # node:test suites

Test

npm test

Provenance

Extracted from pi-autoresearch and decoupled from the pi runtime. core/jsonl.ts, core/paths.ts, and finalize/finalize.sh are vendored from that project.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

autoresearch-cli

The responsibility split

Install

Commands

Delivering

State

Layout

Test

Provenance

License