autoresearcher

v0.1.8

Published

2 months ago

Benchmark-driven autonomous research CLI for post-quantum and blockchain R&D

0High
0Medium
0Low

rmarcus

cli autoresearch benchmark agent post-quantum blockchain cryptography

autoresearcher

autoresearcher is a standalone terminal CLI for benchmark-driven autonomous research loops.

It runs this cycle repeatedly:

Run one internal headless agent iteration.
Run your benchmark command.
Parse metric with a regex.
Keep iteration only if metric improved.

Quick Start

npm i -g autoresearcher

# In your research repo:
cd /path/to/your/new-repo
autoresearcher wizard
autoresearcher run --iterations 20

Commands

autoresearcher init
autoresearcher wizard
autoresearcher run [--iterations N]

Important Config

The init command creates .autoresearcher/config.json:

{
  "agentMode": "internal",
  "agentPromptFile": "program.md",
  "agentPrompt": "Improve the benchmark metric while preserving correctness, test behavior, and safety.",
  "agentCommand": "./scripts/agent-step.sh",
  "backendAgent": "",
  "backendModel": "",
  "backendMaxIterations": 1,
  "benchmarkCommand": "./scripts/benchmark.sh",
  "metricRegex": "score=([0-9.]+)",
  "direction": "max",
  "iterations": 20,
  "finalReportPath": ".autoresearcher/reports/final-report-{run_id}.md",
  "researchReportPath": "RESEARCH.md",
  "autoCommit": false,
  "onRejectCommand": "",
  "onKeepCommand": "",
  "onCompleteCommand": "",
  "stopOnAgentFailure": true,
  "streamAgentOutput": true,
  "commitMessageTemplate": "research: improved metric to {metric} (iter {iteration})"
}

agentMode: "internal" is the default. For a fully custom step command, set agentMode to "command" and edit agentCommand. In internal mode, backend output is streamed through a status-focused relay so users only see clean autoresearcher loop logs.

Example Configs

Default internal mode:

{
  "agentMode": "internal",
  "agentPromptFile": "program.md",
  "agentPrompt": "Improve benchmark with safe, minimal changes.",
  "backendAgent": "amp",
  "backendModel": "claude-sonnet-4-5-20250929",
  "backendMaxIterations": 1,
  "benchmarkCommand": "./scripts/benchmark.sh",
  "metricRegex": "score=([0-9.]+)",
  "direction": "max",
  "iterations": 40,
  "finalReportPath": ".autoresearcher/reports/final-report-{run_id}.md",
  "researchReportPath": "RESEARCH.md",
  "onCompleteCommand": "",
  "autoCommit": false,
  "streamAgentOutput": true
}

Custom command mode:

{
  "agentMode": "command",
  "agentCommand": "./scripts/agent-step.sh",
  "benchmarkCommand": "./scripts/benchmark.sh",
  "metricRegex": "score=([0-9.]+)",
  "direction": "max",
  "iterations": 40,
  "finalReportPath": ".autoresearcher/reports/final-report-{run_id}.md",
  "researchReportPath": "RESEARCH.md",
  "onCompleteCommand": ""
}

Typical Real Setup

Start with internal mode and tailor agentPrompt to your objective.
Set one provider key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OPENROUTER_API_KEY).
Optionally pin backendAgent and backendModel.
Edit program.md with your experiment objective and constraints.
Replace ./scripts/benchmark.sh so it prints one numeric metric, like score=0.8123.
Set direction to max or min.
Optionally switch to agentMode: "command" and customize agentCommand.
Optionally set onRejectCommand to revert non-improving changes.
Optionally set onCompleteCommand to post-process generated reports (for example, convert markdown to PDF).

Logs

Every run writes JSONL logs in .autoresearcher/runs/<timestamp>.jsonl. Every run also writes a markdown final report using finalReportPath (supports {run_id} template token). Every run writes a synthesized research result markdown using researchReportPath (default: RESEARCH.md).

onCompleteCommand template variables include {run_id}, {report_path}, {research_report_path}, {best_metric}, and {best_iteration}.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

autoresearcher

Quick Start

Commands

Important Config

Example Configs

Typical Real Setup

Logs