empiric

v0.1.5

Published

2 months ago

A scientific-method CLI for local problem decomposition and software experiments.

0High
0Medium
0Low

Empiric

Empiric is a local CLI for breaking coding problems into small parts and running software experiments with the scientific method.

It helps you:

state coding goals,
break them into atomic problem parts,
keep a categorized backlog of deletion, simplification, rebuild, tuning, measurement, and automation ideas,
audit components for owner, requirement, and deletion proof,
enforce a deletion-first workflow gate before part-linked experiments,
guide agents toward small local experiments,
state falsifiable hypotheses,
design minimal code-change experiments,
run benchmark commands,
capture speed and memory evidence,
record accept/reject/inconclusive interpretations,
track every attempted experiment change, whether it worked, and why,
keep an append-only logbook under .empiric/.

Install locally

npm install
npm run build
npm link

You can also run the local wrapper after building:

./tools/empiric --help
npx empiric --help

For a concise feature and command map:

empiric info
empiric info --json

Quick start

empiric init

empiric problem add \
  --title "Decide whether parser caching is worth building" \
  --goal "Find the smallest evidence-backed parser caching approach." \
  --context "Repeated parser calls may be slowing the hot path." \
  --success "The agent knows whether to build, skip, or narrow the cache." \
  --constraints "Do not change parser output or public APIs."

empiric part add \
  --problem prob_20260429_example \
  --title "Check for repeated inputs" \
  --question "Do identical parser inputs repeat during one representative run?" \
  --why "Caching only helps if repeated inputs are common enough." \
  --success "A local probe reports repeat counts." \
  --experiment "Print input hashes during the benchmark, then remove the probe." \
  --owner "Parser owner" \
  --requirement "Avoid repeated parser work only when it is observable." \
  --deletion-proof "The no-cache baseline proves repeated parsing is not material."

empiric idea add \
  --category delete \
  --title "Remove receipt retention from hot path" \
  --rationale "Retained receipt data may dominate memory under load." \
  --expected-impact "Lower peak memory at the same TPS." \
  --next "Delete retention and run the capped benchmark."

empiric audit add \
  --part part_20260429_example \
  --component "parser cache" \
  --owner "Parser owner" \
  --requirement "Only keep this if repeated inputs are common." \
  --deletion-proof "Deleting it causes measurable repeated parser work." \
  --step delete \
  --status kept \
  --next "Simplify the retained path."

empiric audit gate --part part_20260429_example

empiric experiment plan \
  --part part_20260429_example \
  --kind probe \
  --title "Measure repeated parser inputs" \
  --change "Add temporary input-hash logging around the parser." \
  --benchmark "npm test -- --bench" \
  --success "The command prints repeat-count evidence." \
  --rollback "Remove the temporary logging."

empiric run --experiment exp_20260429_probe
empiric result --experiment exp_20260429_probe --status accepted --interpretation "Repeated inputs exist." --next "Build the smallest cache."
empiric attempts --experiment exp_20260429_probe
empiric part decide --part part_20260429_example --decision "Build a narrow parser cache." --evidence "The probe found repeated inputs." --next "Implement the smallest cache."
empiric goal
empiric status
empiric problem show --problem prob_20260429_example

empiric hypothesis add \
  --title "Avoid repeated parsing" \
  --statement "Caching parsed input will reduce benchmark duration without increasing max RSS by more than 5%." \
  --speed "10% faster benchmark duration" \
  --memory "no more than 5% higher max RSS" \
  --mechanism "Parsing is repeated for identical inputs in the hot path." \
  --assumptions "The parsed data is immutable during a run." \
  --simpler-baseline "Measure the current parser without code changes." \
  --must-be-true "Parser time must be a meaningful share of total runtime."

empiric experiment plan \
  --hypothesis hyp_20260428_example \
  --title "Cache parsed input by content hash" \
  --change "Add a narrow in-memory cache around the parser." \
  --benchmark "npm test -- --bench" \
  --speed-target "10% faster duration" \
  --memory-target "max RSS within 5%" \
  --rollback "Remove the cache wrapper."

empiric log

Benchmark RESULT ingestion

Empiric can parse harness output lines that begin with RESULT:

RESULT target_tps=40000 delivered=40000 dropped=0 avg_tps=39980 peak_tps=41000 memory_peak_mb=1830 oom=0 generator_failures=0 log=/tmp/run.log

Log paths are parsed as first-class fields from log, log_path, logdir, node_log, external_outs, and memory_samples. Failure signatures can be provided with signature=p2p_cap_drop,bad_function_call or --signature, and Empiric also infers common signatures such as oom_kill, generator_fail, delivery_miss, bootstrap_fail, and bad_function_call.

Pipe saved output directly into an experiment:

benchmark-command | empiric result ingest --experiment exp_20260429_probe --from-stdin
empiric result --experiment exp_20260429_probe

Or let empiric run capture and ingest RESULT lines automatically:

empiric run --experiment exp_20260429_probe --build-path ./build/nodeos --flags "2gb p2p"
empiric latest
empiric attempts
empiric ceiling
empiric next
empiric promoted target --target 40000
empiric ledger row --result res_20260429_example --format performance-md
empiric ledger validate

empiric run and empiric result ingest print a compact parsed summary immediately after ingestion, including target, delivered/dropped counts, average and peak TPS, memory peak, OOM state, promotion state, clean-run count, config fingerprint, log paths, and signatures. Promotion defaults for performance experiments are two clean runs with the same normalized config fingerprint, no dropped transactions, no generator failures, and no OOM events. The fingerprint is computed from --flags when present, then from the run command, otherwise from unspecified.

Attempts

Empiric records an attempt every time work is tried through empiric run, empiric result, or empiric result ingest. Attempts answer what has already been tried, whether it worked, and why.

Attempt outcomes are:

successful: interpreted evidence was accepted.
unsuccessful: interpreted evidence was rejected, or the command failed.
inconclusive: interpreted evidence was inconclusive.
uninterpreted: a run completed, but no RESULT line or manual result has explained whether the attempted change worked.

Inspect attempt history with:

empiric attempts
empiric attempts --experiment exp_20260429_probe
empiric attempts --problem prob_20260429_example
empiric attempts --part part_20260429_example
empiric latest
empiric status

empiric latest includes the latest attempt, and empiric status includes the attempts linked to the active or selected problem.

Command Map

Use --help on any command for exact options. The main command families are:

empiric init: initialize .empiric storage.
empiric info: explain the feature set and command map.
empiric problem add|list|show: record and inspect coding goals.
empiric part add|list|update|decide: break problems into atomic parts and record evidence-backed decisions.
empiric idea add|list|update: manage categorized backlog ideas. Categories are delete, simplify, rebuild, tune, measure, and automate.
empiric audit add|list|gate|update: record component owner, requirement, deletion proof, algorithm step, and audit status.
empiric hypothesis add: record falsifiable performance hypotheses.
empiric experiment plan: plan a minimal code-change experiment linked to a hypothesis, problem, or part.
empiric run: run an experiment command, save stdout/stderr, and optionally auto-ingest RESULT lines.
empiric result: show results or manually record accepted, rejected, or inconclusive evidence.
empiric result ingest: parse RESULT lines from stdin or a file.
empiric attempts: list attempted changes and their outcomes.
empiric latest: show the latest run, result, and attempt.
empiric goal: show the active or selected problem goal.
empiric status: summarize active or selected problem progress, attempts, and performance state.
empiric ceiling: show the current performance ceiling.
empiric next: show the one active next step.
empiric promoted target: show clean runs proving a promoted target.
empiric ledger row|validate: print or validate benchmark ledger rows.
empiric log: print the append-only Empiric log.

Ledger commands prefer an existing PERFORMCANCE.md file, matching Harbor's canonical scoreboard spelling, and otherwise keep the existing EXPERIMENTS.md behavior.

Agent workflow

Create the problem with empiric problem add.
Break it into atomic parts with empiric part add.
Add backlog ideas with empiric idea add --category delete|simplify|rebuild|tune|measure|automate.
Audit each questioned component with empiric audit add, including owner, requirement, and deletion proof.
Pass empiric audit gate --part ... before planning linked experiments.
Plan and run a linked experiment with empiric experiment plan --part ... and empiric run.
Record a result, inspect empiric attempts, then decide the part with empiric part decide.
Use empiric goal, empiric status, empiric attempts, empiric problem show, and empiric log to see what is known and what remains.

Deletion-first workflow gate

Empiric encodes a practical version of the Elon Musk algorithm:

Question every requirement: every audited component needs a human owner and requirement.
Delete the part or process: record the deletion proof that would justify keeping it.
Simplify and optimize: only optimize what survived deletion.
Accelerate cycle time: make the benchmark loop faster after the component survives.
Automate last: automation ideas belong in the backlog only after the thing should exist.

empiric audit gate --part ... prints the five-step checklist, the completed steps, and the next step the agent should record. Audit records must move through the algorithm in order: Empiric rejects skipped steps such as simplifying before a valid question/delete audit exists.

Part-linked experiments are blocked until the questioned component has passed the question/delete prerequisite. Later steps can then use experiments to simplify, accelerate, and finally automate. Use --skip-gate only for deliberate exploratory work.

Part-linked automation ideas are blocked until the part has completed question, delete, simplify, and accelerate. Unlinked automation ideas remain allowed as general backlog notes.

Empiric does not call AI models or edit application code. It gives agents a durable local structure for deciding what to try next.

Storage

Empiric writes readable local state into the target repository:

.empiric/config.json
.empiric/problems/*.json
.empiric/parts/*.json
.empiric/hypotheses/*.json
.empiric/ideas/*.json
.empiric/audits/*.json
.empiric/experiments/*.json
.empiric/attempts/*.json
.empiric/runs/*.json
.empiric/log.md
PERFORMCANCE.md or EXPERIMENTS.md ledger rows for structured benchmark results

Empiric does not modify application code, create git commits, call remote services, or require a specific benchmark framework.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme