ralph-research
v0.1.2
Published
Local-first runtime for recursive research improvement.
Downloads
335
Readme
ralph-research
Local-first runtime for recursive research improvement.
ralph-research runs a bounded improvement loop over a real artifact:
- define a metric
- generate one candidate change
- evaluate it
- keep only verified improvements
The v0.1 focus is a writing workflow that is runnable in under five minutes on a local machine.
Quickstart
Zero-config demo
npx ralph-research demo writingThis creates a temporary writing repo, runs one accepted cycle, and prints the path plus the run id. The v0.1 demo supports the bundled writing template only.
Template flow
npx ralph-research init --template writing
npx ralph-research run --json
npx ralph-research run --until-target --until-no-improve 3 --json
npx ralph-research inspect run-0001 --jsonThis path is the v0.1 success bar: init -> run -> inspect should work quickly and produce an acceptance reason you can inspect. The bundled template set is currently writing only.
Core Concepts
Manifest:ralph.yamldefines the research program.Metric: how candidate quality is measured.Frontier: the currently accepted best candidate set.Ratchet: the acceptance policy that decides whether the frontier advances.Proposer: how a bounded candidate change is generated.Judge: how qualitative outputs are compared when numeric metrics are not enough.
Writing Template
The bundled writing template is self-contained:
docs/draft.md: sample draftscripts/propose.mjs: bounded rewritescripts/experiment.mjs: output materializationscripts/metric.mjs: local heuristic metricprompts/judge.md: pairwise judge prompt you can upgrade to later
The default template uses a local command metric so the first run does not require API keys. When you are ready, replace the numeric metric with an llm_judge extractor and use the included pairwise prompt as a starting point.
CLI
rrx validate
rrx doctor
rrx init --template writing
rrx demo writing
rrx run
rrx run --until-target
rrx run --until-target --until-no-improve 3
rrx status
rrx frontier
rrx inspect <runId>
rrx accept <runId>
rrx reject <runId>
rrx serve-mcp --stdiorrx run --cycles N still executes a finite loop. Progressive modes are opt-in:
--until-target: keep iterating untilmanifest.stopping.targetis satisfied--until-no-improve N: stop afterNconsecutive cycles without a frontier improvement--cycles Nwith a progressive flag: treatNas a max-cycle cap instead of an exact count
rrx status now reports both the persisted latest run snapshot and the runtime view derived from the lock heartbeat, so running (alive) is distinguished from stale (resumable) and the output includes heartbeat and last-progress timestamps when available.
Stopping Targets
Use stopping.target when the workflow contract is "keep going until metric X reaches threshold Y":
metrics:
catalog:
- id: exact_rate
kind: numeric
direction: maximize
extractor:
type: command
command: "python scripts/metric.py"
parser: plain_number
frontier:
strategy: single_best
primaryMetric: exact_rate
ratchet:
type: epsilon_improve
metric: exact_rate
epsilon: 0
stopping:
target:
metric: exact_rate
op: ">="
value: 0.8Metric Diagnostics
If a metric script can explain why a candidate was zeroed or downgraded, prefer JSON output plus parser: json_path so the reason survives into run, decision, and inspect output:
metrics:
catalog:
- id: exact_rate
kind: numeric
direction: maximize
extractor:
type: command
command: "python scripts/metric.py"
parser: json_path
valuePath: $.value{
"value": 0,
"metricId": "overfit_safe_exact_rate",
"reasons": ["all_missing_features", "normalized_order_leak"]
}When project.workspace=git, rrx now warns if proposer, experiment, or metric command files are dirty in the working tree, because detached candidate worktrees only see committed baseline content.
MCP
The bundled MCP server currently supports stdio transport and exposes three thin tools backed by the same service layer as the CLI:
run_research_cycleget_research_statusget_frontier
Design Principles
- local-first execution
- bounded changes
- recoverable state transitions
- trusted signal before automation
- inspectable accept/reject decisions
Development
npm install
npm test
npm run typecheck
npm run buildLicense
MIT
