autoresearcher
v0.1.8
Published
Benchmark-driven autonomous research CLI for post-quantum and blockchain R&D
Maintainers
Readme
autoresearcher
autoresearcher is a standalone terminal CLI for benchmark-driven autonomous research loops.
It runs this cycle repeatedly:
- Run one internal headless agent iteration.
- Run your benchmark command.
- Parse metric with a regex.
- Keep iteration only if metric improved.
Quick Start
npm i -g autoresearcher
# In your research repo:
cd /path/to/your/new-repo
autoresearcher wizard
autoresearcher run --iterations 20Commands
autoresearcher init
autoresearcher wizard
autoresearcher run [--iterations N]Important Config
The init command creates .autoresearcher/config.json:
{
"agentMode": "internal",
"agentPromptFile": "program.md",
"agentPrompt": "Improve the benchmark metric while preserving correctness, test behavior, and safety.",
"agentCommand": "./scripts/agent-step.sh",
"backendAgent": "",
"backendModel": "",
"backendMaxIterations": 1,
"benchmarkCommand": "./scripts/benchmark.sh",
"metricRegex": "score=([0-9.]+)",
"direction": "max",
"iterations": 20,
"finalReportPath": ".autoresearcher/reports/final-report-{run_id}.md",
"researchReportPath": "RESEARCH.md",
"autoCommit": false,
"onRejectCommand": "",
"onKeepCommand": "",
"onCompleteCommand": "",
"stopOnAgentFailure": true,
"streamAgentOutput": true,
"commitMessageTemplate": "research: improved metric to {metric} (iter {iteration})"
}agentMode: "internal" is the default. For a fully custom step command, set agentMode to "command" and edit agentCommand.
In internal mode, backend output is streamed through a status-focused relay so users only see clean autoresearcher loop logs.
Example Configs
Default internal mode:
{
"agentMode": "internal",
"agentPromptFile": "program.md",
"agentPrompt": "Improve benchmark with safe, minimal changes.",
"backendAgent": "amp",
"backendModel": "claude-sonnet-4-5-20250929",
"backendMaxIterations": 1,
"benchmarkCommand": "./scripts/benchmark.sh",
"metricRegex": "score=([0-9.]+)",
"direction": "max",
"iterations": 40,
"finalReportPath": ".autoresearcher/reports/final-report-{run_id}.md",
"researchReportPath": "RESEARCH.md",
"onCompleteCommand": "",
"autoCommit": false,
"streamAgentOutput": true
}Custom command mode:
{
"agentMode": "command",
"agentCommand": "./scripts/agent-step.sh",
"benchmarkCommand": "./scripts/benchmark.sh",
"metricRegex": "score=([0-9.]+)",
"direction": "max",
"iterations": 40,
"finalReportPath": ".autoresearcher/reports/final-report-{run_id}.md",
"researchReportPath": "RESEARCH.md",
"onCompleteCommand": ""
}Typical Real Setup
- Start with internal mode and tailor
agentPromptto your objective. - Set one provider key (
ANTHROPIC_API_KEY,OPENAI_API_KEY, orOPENROUTER_API_KEY). - Optionally pin
backendAgentandbackendModel. - Edit
program.mdwith your experiment objective and constraints. - Replace
./scripts/benchmark.shso it prints one numeric metric, likescore=0.8123. - Set
directiontomaxormin. - Optionally switch to
agentMode: "command"and customizeagentCommand. - Optionally set
onRejectCommandto revert non-improving changes. - Optionally set
onCompleteCommandto post-process generated reports (for example, convert markdown to PDF).
Logs
Every run writes JSONL logs in .autoresearcher/runs/<timestamp>.jsonl.
Every run also writes a markdown final report using finalReportPath (supports {run_id} template token).
Every run writes a synthesized research result markdown using researchReportPath (default: RESEARCH.md).
onCompleteCommand template variables include {run_id}, {report_path}, {research_report_path}, {best_metric}, and {best_iteration}.
