@cvr/xp
v0.0.1
Published
Autonomous experiment daemon — LLM-driven optimization of any measurable metric
Maintainers
Readme
xp
Autonomous experiment daemon. Point an LLM at any benchmark, it optimizes the metric in a loop.
Install
bun run build # compiles binary to bin/xp + symlinks to ~/.bun/bin/Usage
# Start an experiment
xp start optimize-fft \
--metric latency --unit ms --direction min \
--benchmark "./bench.sh" \
--objective "reduce FFT latency" \
--provider claude
# Monitor
xp status # current state
xp logs # daemon output
xp logs -f # tail daemon output
xp results # all trial results
xp results --last 5 # last 5 trials
# Steer the agent mid-run
xp steer "try SIMD intrinsics instead of auto-vectorization"
# Stop
xp stopCommands
| Command | Description |
| ------------------ | ----------------------------------------- |
| start <name> | Initialize and start an experiment |
| stop | Stop the daemon |
| status | Show experiment state (--json) |
| logs | View daemon log (-f to follow) |
| results | Show trial results (--last N, --json) |
| steer <guidance> | Send guidance to the running experiment |
start Flags
| Flag | Description | Default |
| ------------------ | -------------------------------------------- | -------- |
| --metric | Metric name to optimize | required |
| --unit | Metric unit | "" |
| --direction | min or max | required |
| --benchmark | Shell command that emits METRIC name=value | required |
| --objective | What the agent should optimize | required |
| --provider | claude or codex | claude |
| --max-iterations | Budget cap | 50 |
| --max-failures | Max consecutive failures | 5 |
Benchmark Contract
The benchmark command must print metrics to stdout in this format:
METRIC latency=42.5
METRIC throughput=1200One METRIC name=value per line. The --metric flag selects which one to optimize.
How It Works
- Baseline: runs the benchmark on the current code to establish a starting point
- Loop: invokes the LLM agent with context (objective, best score, dead ends, user guidance), agent makes changes in a git worktree, benchmark runs, result is kept or reverted
- Persistence: all events logged to append-only JSONL, crash-safe with two-phase decisions
- Worktree isolation: experiments run in
.xp/worktree/on anxp/<name>branch — your working directory stays clean
Development
bun run dev -- --help # run from source
bun run gate # typecheck + lint + fmt + test + build
bun test # tests only