@sigmance/razor
v0.125.8
Published
<p align="center"> <img src="./.github/razor-logo.png" alt="Razor logo" width="220" /> </p>
Downloads
3,845
Readme
Fork context:
./razor/is the default CLI and core development workspace.
Quickstart
Installing and running Razor CLI
Install globally with your preferred package manager. If you use npm:
npm install -g @sigmance/razorThen simply run razor to get started:
razorIf you're running into upgrade issues with a global install, see the FAQ entry on updating Razor.
Each GitHub Release contains many executables, but in practice, you likely want one of these:
- macOS
- Apple Silicon/arm64:
razor-aarch64-apple-darwin.tar.gz - x86_64 (older Mac hardware):
razor-x86_64-apple-darwin.tar.gz
- Apple Silicon/arm64:
- Linux
- x86_64:
razor-x86_64-unknown-linux-musl.tar.gz - arm64:
razor-aarch64-unknown-linux-musl.tar.gz
- x86_64:
Each archive contains a single entry with the platform baked into the name (e.g., razor-x86_64-unknown-linux-musl), so you likely want to rename it to razor after extracting it.
Using Razor with your ChatGPT plan
Run razor and select Sign in with ChatGPT. We recommend signing into your ChatGPT account to use Razor as part of your Plus, Pro, Team, Edu, or Enterprise plan. Learn more about what's included in your ChatGPT plan.
You can also use Razor with an API key, but this requires additional setup. If you previously used an API key for usage-based billing, see the migration steps. If you're having trouble with login, please open an issue in this repository.
Model Context Protocol (MCP)
Razor can access MCP servers. To configure them, refer to the config docs.
Deep Research & Oracle commands
Once you run make setup inside /Users/.../deep-research (which creates .venv/ with all dependencies) and record the [mcp_servers.deep_research] / [mcp_servers.deep_research_oracle] entries in config.toml, Razor exposes two slash commands that never require you to specify output paths:
/deep-research deep "<query>" [--style educational|technical ...]- launches the full deep research workflow. Useshalloworbasicas the first argument to run the lighter templates. Example:/deep-research deep "How does MAP-Elites vary with archive resolution?" --max-iters 8 --confidence 0.9 /deep-research shallow "Summarize the latest AlphaResearch metrics" --with-metadata /deep-research basic "Provide a high-level definition of novelty search"All artifacts (
cache/,logs/,results/research_report.md|json,timings.json, discovered sources) are automatically written under your repo'sworkspace/deep-research/<timestamp>/..../oracle "<question>" [--context "<extra details>"]- sends a question to the Oracle FastMCP server. Example:/oracle "What risks do we run when parallelizing evaluator stages?" --context "Current manifest uses 4 workers"The response is streamed inline and the transcript is saved under
workspace/deep-research/oracle/<timestamp>_<slug>/oracle_response.(md|json)so future/evolveplans can cite it. SetOPENAI_API_KEYin your shell before launching Razor; the CLI forwards it to both MCP servers so secrets never live inconfig.toml.
Deep research and Oracle outputs are always mirrored into your active repo's workspace; you never need to pass workspace/deep-research manually.
Configuration
Razor CLI supports a rich set of configuration options, with preferences stored in ~/.razor/config.toml (set RAZOR_HOME or legacy CODEX_HOME to override). For full configuration options, see Configuration.
Razor autonomy operations
Razor's autonomy operations are documented in Razor Autonomy Usage Guide. Use it as the operator guide for /loop, /schedule, model-facing self-invocation and capability-awareness tools, /tasks, /task, /notifications, /monitor, /cic, /stop, /continue, /remote-control, live monitors, autonomous /evolve:loop, external evolution requests, and remote app-server/TUI operation.
Publishing the Razor fork (@sigmance/razor)
Follow these steps when you need to build or publish the private Razor-scoped package.
- Prep the environment
- Use Node.js >=22 (e.g.
nvm use 22) and enable pnpm:corepack enable && corepack prepare [email protected] --activate. - Ensure Python 3.11+, the GitHub CLI (
gh auth status), and an authenticated npm session (npm whoami). - Install DotSlash for ripgrep manifests:
brew install dotslash.
- Use Node.js >=22 (e.g.
- Stage native binaries
- Preferred local release build:
make build-all-local-artifacts VERSION=0.117.15 - This produces split npm artifacts:
@sigmance/razorplus@sigmance/razor-darwin-arm64,@sigmance/razor-darwin-x64,@sigmance/razor-linux-arm64, and@sigmance/razor-linux-x64. - The split package layout keeps each native tarball below npm's payload limit. Do not publish the old all-platform monolithic package.
- If native binaries already exist in
codex-cli/vendor, repack only the npm artifacts withmake build-local-npm-packages VERSION=0.117.15.
- Verify the staged CLI
make publish-npm VERSION=0.117.15 NPM="npm --dry-run"- Optional local install smoke test: install
dist/npm/razor-npm-0.117.15.tgzplus the native package for your platform, then runrazor --version.
- Publish or install
- Publish:
make publish-npm VERSION=0.117.15 - If npm requires 2FA for writes, publish with
make publish-npm VERSION=0.117.15 NPM_OTP=<code>. - Preview publish:
make publish-preview VERSION=0.117.15-alpha.1 - If npm returns a transient
409 Conflictor an already-published403, rerunmake publish-npm; already-visible package versions are skipped and accepted package versions are treated as complete. - Local monolithic install for offline testing:
make build-local-artifacts VERSION=0.117.15 TARGETS="$(rustc -vV | awk '/host:/{print $$2}')"thennpm install -g dist/npm/razor-monolithic-npm-0.117.15.tgz
- Publish:
- Bump on new releases
- Use the Makefile version targets so Cargo and npm metadata stay aligned.
- Makefile shortcuts
make version-set VERSION=0.54.0-alpha.1to record a preview version across Cargo + npm metadata.- Remote artifacts:
make build-preview VERSION=0.54.0-alpha.1 WORKFLOW_URL=<run-url>thenmake publish-preview VERSION=0.54.0-alpha.1. - Full local release build:
make build-all-local-artifacts VERSION=0.54.0-alpha.1. - Local monolithic build:
make build-local-artifacts VERSION=0.54.0-alpha.1 TARGETS="$(rustc -vV | awk '/host:/{print $$2}')".
Local development workflow
Install prerequisites
# Node / pnpm corepack enable pnpm install --frozen-lockfile # Rust targets needed for packaging (host triple + optional Linux musl) rustup toolchain install 1.90 rustup target add --toolchain 1.90 x86_64-unknown-linux-muslTip: add any other targets you plan to ship (for example
aarch64-apple-darwin).Run tests
# Rust workspace (CLI + support crates) cargo test --manifest-path codex-rs/Cargo.toml -p codex-cli # TypeScript SDK (optional but mirrors CI) pnpm --filter @openai/codex-sdk testRefresh vendor binaries (optional)
# Rebuild razor for the host triple and update codex-cli/vendor make build-local-vendor TARGETS="$(rustc -vV | awk '/host:/{print $$2}')" # Use cross for non-host targets (requires Docker + cross installed) make build-local-vendor TARGETS="x86_64-unknown-linux-musl" USE_CROSS=1Run this when you only need fresh binaries without packaging a tarball yet.
Build Razor artifacts locally
# Build binaries for the host triple (e.g., macOS) and package a monolithic local tarball make build-local-artifacts VERSION=0.54.0-alpha.1 TARGETS="$(rustc -vV | awk '/host:/{print $$2}')" # Build all release platforms and split publishable npm tarballs make build-all-local-artifacts VERSION=0.54.0-alpha.1The publishable tarballs are written to
dist/npm/. The monolithic local tarball is namedrazor-monolithic-npm-<version>.tgzso it cannot be confused with the small publishable wrapper.Publish the locally built package
npm whoami # ensure you're authenticated make publish-npm VERSION=0.54.0-alpha.1 # publishes platform packages first, then @sigmance/razorFor preview builds, prefer
make publish-preview VERSION=0.54.0-alpha.1to assign thealphadist-tag automatically.
Triggering GitHub workflows
ci.ymlruns automatically for every PR and push tomain. It stages a smoke npm tarball using the checked-in vendored binaries and runs formatting checks--nothing to trigger manually.rust-release.ymlperforms the full release (multi-platform Rust builds, artifact uploads, npm publish). It triggers when you push a tag namedrust-vX.Y.Zorrust-vX.Y.Z-alpha.N.
Example release flow:
VERSION=0.54.0
# Set versions everywhere
make version-set VERSION="$VERSION"
# (Optional) smoke the local build
make build-local-artifacts VERSION="$VERSION" TARGETS="$(rustc -vV | awk '/host:/{print $2}')"
# Commit and tag
git commit -am "chore: release $VERSION"
git tag -a "rust-v$VERSION" -m "Release $VERSION"
git push origin main
git push origin "rust-v$VERSION"For previews use VERSION=0.54.0-alpha.1; the workflow will publish under the npm alpha tag automatically.
Need to re-run the workflow without a new commit? From GitHub's Actions tab select the desired workflow (ci or rust-release) and click Run workflow (or Re-run jobs on a past run).
Evolution workflows (/evolve:<command> ...)
The Razor CLI ships a built-in "Evolution Kernel" that plans, mutates, evaluates, and archives candidate solutions without leaving your session. Every workflow is exposed both as a CLI verb (razor evolve:<command> ...) and as a slash command (/evolve:<command> ...) so the streaming UX mirrors /init and /review.
Command overview
| Command | Purpose |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| razor evolve:init <goal> / /evolve:init <goal> | Scaffold a manifest under workspace/evolution/manifests/<goal>.yaml, auto-detecting language, targets, and eval commands from your repo. Supports --description, --template <file>, and --output <file>. |
| razor evolve:plan --manifest <file> / /evolve:plan --manifest <file> | Validate the manifest, create workspace/evolution/jobs/<id>, and snapshot baseline metadata. |
| razor evolve:run [--manifest <file> \| --target <artifact> \| --job-id <id>] [--path <id>] [--dry-run] / /evolve:run --manifest <file> or /evolve:run --target <artifact> | Execute the research -> mutation -> evaluation loop under Razor sandboxing, persisting ideas, iterations, artifacts, and worktrees. Target-driven runs generate a manifest automatically. |
| razor evolve:loop [--program <file> \| --manifest <file> \| --target <artifact> \| --job-id <id>] / /evolve:loop ... | Run the same evolution loop without a fixed iteration cap. Stop it with /evolve:stop --job-id <id> or an operator interruption. |
| razor evolve:resume --job-id <id> / /evolve:resume --job-id <id> | Convenience alias for run --job-id .... |
| razor evolve:steer --job-id <id> --message <text> / /evolve:steer --job-id <id> --message <text> | Queue operator, monitor, subsystem, external-trigger, or system guidance for the next iteration of an existing job. Supports --summary, --source, and --source-id. |
| razor evolve:view --job-id <id> / /evolve:view --job-id <id> | Show job metadata, recent iterations, per-job scoreboards, and the Global archive snapshot aggregated across every job. |
| razor evolve:inspect --job-id <id> / /evolve:inspect --job-id <id> | Render the MAP grid plus queue/island health (pending migrations, recent lineage, island table) and the global archive table for sharing results or debugging novelty coverage. |
| razor evolve:bench --pack <file> / /evolve:bench --pack <file> | Runs the benchmark pack (built-in via --built-in or workspace packs via --project). Bench runs execute each job sequentially; pass --dry-run to preview commands without running them. |
| razor evolve:worktree <list\|create\|prune> / /evolve:worktree <list\|create\|prune> | Inspect or manage the git worktrees/branches Razor creates per evolutionary path. |
Slash commands accept the same flags (for example /evolve:run --manifest ... --dry-run). Run and loop also accept target seed flags when no manifest exists yet: --target <artifact>, --target-kind <code|strategy|model|prompt|system>, --signal <why>, --evidence <text>, and --eval-command <cmd>. View supports --rewards, --research, --peer-review, --datasets, and --steering.
1. Scaffold a manifest (/evolve:init)
Use the init command to generate a manifest that already reflects your repo layout:
# CLI
razor evolve:init circle-packing --description "Improve heuristics" --template docs/examples/evolution/base.yaml
# Slash (inside TUI)
/evolve:init circle-packing --description "Improve heuristics"Razor inspects Cargo.toml, package.json, pnpm-lock.yaml, pyproject.toml, etc. to pick:
- Language -
rust,typescript, orpython(falls back tomixed). - Targets - e.g.,
src/**/*.rs,apps/**/src/**/*.ts,**/*.py. - Eval command -
cargo test,pnpm test/npm test/yarn test, orpytest. - Location -
workspace/evolution/manifests/<goal>.yamlunless--outputoverrides it.
Supplying --template copies that file (retaining custom metrics/novelty settings) before patching the goal name and description. Edit the YAML afterward to fine-tune adapters, metrics, or timeouts.
2. Review / customize the manifest
Generated manifests follow the same schema as hand-written ones, so keep editing them in-place. Example:
name: Circle Packing
description: Improve heuristic + benchmarks
language: python
targets:
- src/**/*.py
- tests/**/*.py
eval:
command: "pytest tests/test_circle.py -q"
parallel_evaluations: 2
stages:
- name: "unit"
command: "pytest tests/unit -q"
weight: 0.6
- name: "integration"
command: "pytest tests/integration -q"
weight: 0.4
timeouts:
evaluate_seconds: 300
artifacts:
save_stdout: true
save_stderr: true
max_total_mb: 200
metrics:
combined_score: required
correctness: optional
novelty:
embeddings: disabled
threshold: 0.98Set parallel_evaluations to >1 to fan evaluation stages out concurrently (best used when the stages are independent test suites). Leave it unset or 1 to retain single-threaded evaluation.
Store manifests anywhere, but keeping them under workspace/evolution/manifests/ aligns with the rest of the evolution storage (workspace/evolution/{jobs,db,artifacts}). A ready-to-edit template lives at docs/examples/evolution/coding_task.yaml, so you can run /evolve:init <goal> --template docs/examples/evolution/coding_task.yaml (or the CLI equivalent) as a starting point.
Target-driven evolution without a manifest
For operator requests or subsystem signals that name an artifact directly, Razor can seed an evolution job without a pre-existing manifest. Use --target with /evolve:run for a bounded campaign, or with /evolve:loop for an autonomous loop that continues until stopped.
/evolve:run --target trading.strategist_prompt --target-kind strategy --signal alpha_decline_3d --iterations 1 --eval-command "cd trading-system-workspace/trading-engine && pytest tests/prompts"
/evolve:loop --target src/services/allocator.py --target-kind code --signal operator_request
/evolve:steer --job-id <id> --message "Prioritize monitor evidence from alpha-watch before the next mutation." --source monitor --source-id alpha-watchTarget-driven runs generate a manifest under workspace/evolution/manifests/generated/ and write the seed request to workspace/evolution/jobs/<job-id>/evolution_seed.json. Known Sigmance aliases such as trading.strategist_prompt and trading.allocator_prompt resolve to the trading-engine prompt templates when those paths exist from the current workspace. Comma-separated explicit targets are also supported.
/evolve:loop uses the same kernel as /evolve:run, but the iteration cap is unbounded. Stop it with /evolve:stop --job-id <id> or a normal interruption. Live steering is supported while a loop is running: /evolve:steer writes pending guidance to workspace/evolution/jobs/<job-id>/steering_events.jsonl, the next iteration injects it into the research plan/report before mutation, and steering_consumption.jsonl records which iteration consumed it. Inspect the queue with /evolve:view --job-id <id> --steering.
Remote UI clients and subsystem services can append the same steering events through the app-server evolution/steer method. Monitor triggers can also carry evolutionSteering metadata so a monitor fire creates its normal TaskRun/notification and, when configured with a jobId, appends a source-typed steering event for the running evolution job.
App-server live monitor loops can produce those monitor fires without a manual JSON-RPC call. See Live monitor loops for the full setup and verification path. In short: start app-server with the scheduler feature enabled (razor --enable scheduler app-server --listen unix://), create an active monitor with /monitor create file|command|task|notification or app-server monitor/create, then let the shared monitor daemon check file, command, task, or notification conditions, debounce repeated observations, honor /stop through the durable autonomy gate, and route matching observations through the same monitor/trigger path. Supported loop kinds are file (path or paths, intervalSecs, debounceSecs), command (command, intervalSecs, timeoutSecs, triggerOn of failure|success|change|always), task (taskStatus, sourceKind, sourceId), and notification (notificationStatus, sourceKind, sourceId, taskRunId). Monitor fires write the task/notification/evolution-steering fabric and can launch immediate background new_agent work with runOnTrigger = "new_agent" or /monitor create ... --auto-run. Command loops should be limited to trusted workspace-owned checks until policy and shell-task hardening are complete.
External trading flywheel requests
Razor can also consume UATS/trading flywheel degradation signals from MongoDB and turn them into tracked evolution tasks. The consumer is disabled by default and starts only when RAZOR_EVOLUTION_REQUESTS_MONGODB_URI is present in the app-server environment.
| Environment variable | Default | Purpose |
| ------------------------------------------- | -------------------------- | ---------------------------------------------------- |
| RAZOR_EVOLUTION_REQUESTS_MONGODB_URI | unset | Enables the consumer and points it at MongoDB. |
| RAZOR_EVOLUTION_REQUESTS_MONGODB_DATABASE | trading_system | Database containing queued requests. |
| RAZOR_EVOLUTION_REQUESTS_COLLECTION | razor_evolution_requests | Collection to poll. |
| RAZOR_EVOLUTION_REQUESTS_POLL_SECS | 30 | Poll interval. |
| RAZOR_EVOLUTION_REQUESTS_ITERATIONS | 1 | Iterations per claimed request. |
| RAZOR_EVOLUTION_REQUESTS_DRY_RUN | unset / false | Runs the evolution task in dry-run mode when truthy. |
| RAZOR_EVOLUTION_REQUESTS_EVAL_COMMAND | inferred when possible | Overrides the generated manifest eval command. |
The consumer claims requests with status equal to queued or pending, marks them in_progress, writes the raw trigger to workspace/evolution/external-triggers/<request-id>.json, writes a generated manifest under workspace/evolution/manifests/external/, creates a durable TaskRunKind::Evolution, and submits the native Op::Evolution flow in a background thread. Mongo status is then updated to completed, failed, or cancelled with razor_* metadata.
If a request carries steering_job_id (aliases: evolution_job_id or job_id), the consumer treats it as a steering producer instead of starting a new job. It appends to workspace/evolution/jobs/<job-id>/steering_events.jsonl, marks the Mongo document completed, and stores razor_steering_event_id, razor_steering_job_id, razor_steering_events_path, and razor_steering_consumption_path.
To verify it manually, start the app-server with scheduler/task features and the Mongo environment variables set, insert or wait for a razor_evolution_requests document, then inspect:
- Mongo fields:
status,razor_task_run_id,razor_manifest_path,razor_started_at, and terminalrazor_completed_at/razor_failed_at. - Razor task fabric:
/tasks,/task <task-id>, or app-servertask/list. - Artifacts:
workspace/evolution/external-triggers/,workspace/evolution/manifests/external/, andworkspace/evolution/jobs/<job-id>/. - Steering requests: Mongo
razor_steering_event_idplusworkspace/evolution/jobs/<job-id>/steering_events.jsonl.
Scheduler backends (manifest.scheduler)
Every manifest picks a scheduler backend:
local(default) keeps evaluations entirely in-process.shellshells out to a user-defined command so you can forward work to Hydra/Slurm wrappers, GitHub Actions dispatchers, etc., without baking those integrations into Razor.hydra/slurmremain placeholders; configureshellfor custom dispatch today.
For backend: shell, set:
scheduler:
backend: shell
shell:
program: "bash"
args:
- "-lc"
- "{command_sh}"
timeout_secs: 600 # optional override
env:
HYDRA_PROFILE: "research"The scheduler replaces placeholders inside program, args, and env values:
| Token | Value |
| ---------------- | ------------------------------------------------------------------------- |
| {command_sh} | Original eval command, shell-escaped (e.g., cargo test -- --nocapture). |
| {command_json} | JSON array of the command (e.g., ["cargo","test","--","--nocapture"]). |
| {stage} | Eval stage name (unit, lint, etc.). |
| {label} | Friendly label (defaults to the stage name). |
| {job_id} | Evolution job id. |
| {path_id} | Active path id (defaults to main). |
| {iteration} | 1-based iteration counter. |
Razor also injects these environment variables for shell backends: EVOLUTION_SCHEDULER_COMMAND_JSON, EVOLUTION_SCHEDULER_COMMAND_SH, EVOLUTION_SCHEDULER_STAGE, EVOLUTION_SCHEDULER_LABEL, EVOLUTION_SCHEDULER_JOB_ID, EVOLUTION_SCHEDULER_PATH_ID, and EVOLUTION_SCHEDULER_ITERATION. Your dispatch script can read them to decide how to route the work (submit a Slurm job, run srun, etc.) while keeping Razor's default local orchestration untouched.
Sample wrapper scripts live under docs/examples/evolution/schedulers/:
hydra-wrapper.sh- forwards commands tohydra run ..., including stage metadata in logs.slurm-wrapper.sh- wraps the command insrun --job-name="razor-..." ....
Point scheduler.shell.program/args at one of these scripts (or your own variant) to integrate with remote schedulers without modifying Razor itself. /evolve:view|inspect now print the selected scheduler backend for each job, and OTEL events (codex.evolution_scheduler_*) include queue position + timeout metadata so dashboards can track dispatch latency.
AlphaResearch benchmark packs
Built-in packs live under docs/examples/evolution/benchmarks/ and are available via
razor evolve bench --built-in <name>:
alpharesearch_pack- runs the entire AlphaResearchComp suite (all nine benchmarks).alpharesearch_packing_circlesalpharesearch_spherical_codealpharesearch_minizing_ratioalpharesearch_third_autocorrelationalpharesearch_autoconvolutionalpharesearch_littlewoodalpharesearch_kissing_numberalpharesearch_heilbronnalpharesearch_mstd
Razor embeds the full AlphaResearch manifest set; if these files are missing locally
(for example when running from a packaged binary), the CLI regenerates them under
~/.razor/builtin_evolution_docs/docs/examples/evolution/ the first time you invoke
--built-in. You can inspect or edit the regenerated packs there before re-running
the benchmark.
To customise the suite per workspace, copy one of the packs into
workspace/evolution/benchmarks/ (see workspace/evolution/benchmarks/alpharesearch/full.yaml)
and run it via razor evolve bench --project <name>.
Archive knobs & migrations
archiveaccepts OpenEvolve-style controls:
archive:
complexity_bins: 3
novelty_bins: 3
migration_interval: 4 # iterations between migration checks
migration_batch: 2 # max migrations per batch window
batch_window: 6 # number of recent iterations to consider
migration_trace_limit: 200 # keep the last N migrations in migrations.jsonl
feature_dimensions: ["complexity", "novelty", "diversity"]The first two
feature_dimensionsentries label the MAP-Elites grid./evolve viewand/evolve inspectnow renderArchive coverage (<row> rows x <column> columns)using those names, and migration plans reuse the labels so the storedjobs/<id>/archive.jsonsnapshot documents the chosen schema.Razor writes every migration decision to
jobs/<id>/migrations.jsonl(reason, target bin, priority, source success rate).MapMigratorenforces the batch/window limits so multi-island queues don't spawn too many paths at once, and/evolve:inspecthighlights the pending scheduler state plus MAP-Elites table for quick triage.
Idea database & datasets
- Every
/evolve:runiteration records DeepEvolve-style research artifacts underworkspace/evolution/jobs/<id>/: the web-search plan (web_search_plan.jsonl), search notes (search_notes.jsonl), markdown reports (research_reports/report-<job>-<ts>.md), dataset cache manifests (datasets/<handle>-cache.json), and mutation bundles (coder/debugger transcripts). - ResearchPlanner now detects common dataset folders (
data/,datasets/,inputs/), records cache metadata (dataset_handle,cache_path,cache_mtime,inputs_signature), and exposes it viarazor evolve:ideas show --job-id <id>. - When no
mutation_targets.jsonexists,/evolve:runauto-seeds one from the planner's focus files so MutationEngine can produce multi-file EVOLVE blocks without manual scaffolding; edit that file between iterations to steer the coder/debugger delegates. Each target supports aweight(higher numbers appear earlier in mutation prompts) plusmode(diff,full,crossover) and an optionalrationale. - Razor writes dataset feedback events (missing caches, stale metadata, refreshes) to
jobs/<id>/dataset_feedback.jsonl. Inspect them via/evolve view --datasetsorrazor evolve view --datasets --job-id <id>to keep cache hygiene in sync with research planning. - Razor writes live guidance to
jobs/<id>/steering_events.jsonland consumed markers tojobs/<id>/steering_consumption.jsonl.ResearchPlannerfolds pending steering into the next iteration's hypotheses, reflection prompts, and research report, then the task emits a liveSteeringupdate. - App-server
evolution/steer, monitorevolutionSteeringtrigger metadata, and Mongosteering_job_idrequests all append the same event shape, so operator, monitor, subsystem, external-trigger, and system guidance share one durable audit trail. - ResearchPlanner surfaces follow-up query queues (dataset hygiene warnings, low-scoring ideas, and underexplored mutation targets). These appear under
/evolve view --researchso operators can triage which avenues to pursue next. razor evolve init <goal>now scaffolds manifests directly from the CLI (honoring--description,--template,--output), using the same environment detection as/evolve:init. Follow it withrazor evolve plan --manifest <file>to register the job underworkspace/evolution/jobs/without launching the TUI.
Model ensembles & proof checker (AlphaEvolve-style)
manifest.ensemblescontrols coder/debugger model rotations plus an optional proof checker:
ensembles:
coder_models:
- gpt-5-codex
- gpt-4.1-preview
debugger_models:
- gpt-4.1-mini
proof_checker: "pnpm test -- proof"- MutationEngine will try each coder model in order until one succeeds (recorded in the mutation summary and idea logs). Debugger models work the same way, and the iteration summary shows
Coder ensemble (n): .... - Evaluator cascades now run the proof checker command after shell stages (unless
--dry-run). Pass/fail results stream into/evolve:run,iterations.jsonl, and OTEL telemetry so dashboarding AlphaEvolve-style verification is straightforward.
Benchmark packs (razor evolve:bench)
- Store YAML benchmark packs under
workspace/evolution/benchmarks/(sample:docs/examples/evolution/benchmarks/ts_cli_pack.yaml). Each pack declares a name, optional description, and a list of jobs shaped like:
name: TypeScript CLI smoke pack
jobs:
- manifest: ../../evolution/coding_task.yaml
path: staging
iterations: 1
dry_run: true
- manifest: ../../evolution/coding_task.yaml
path: qa
iterations: 1Scaffold goal-aligned packs with
razor evolve bench --project-init "<goal>" --manifest workspace/evolution/manifests/<goal>.yaml --challenge "<measurable challenge>". The helper writes a starter pack underworkspace/evolution/benchmarks/<slug>.yamlthat embeds the challenge text so you can tweak it before running--project <slug>.Run
razor evolve:bench --pack <pack.yaml>(or/evolve:bench --pack ...). Razor validates the file, confirms the referenced manifests exist, and prints a ready-to-run command list (e.g.,/evolve:run --manifest ... --path qa --iterations 1). Execute each command to compare archive metrics across paths/iterations.
3. Plan and run
From the repo root:
# Parse the manifest and snapshot the initial job metadata
razor evolve:plan --manifest workspace/evolution/manifests/circle-packing.yaml
# Start iterating (use --dry-run to skip evaluation execution)
razor evolve:run --manifest workspace/evolution/manifests/circle-packing.yaml --path stagingSlash equivalents:
/evolve:plan --manifest workspace/evolution/manifests/circle-packing.yaml
/evolve:run --manifest workspace/evolution/manifests/circle-packing.yaml --path staging --dry-runBoth entry points now stream four panes in the TUI:
- Research - ResearchPlanner insights, citations, pseudocode, and web-search notes.
- Mutation - Outputs from the coder/debugger delegates (DeepEvolve-style
apply_patchdiffs plus debugger fixes). - Evaluation - Eval command, exit code, duration, metrics, and novelty scores.
- Archive - Scoreboard + MAP-style novelty/complexity grid.
All artifacts (manifests, ideas, iterations, job DB, worktrees, archives) live under workspace/evolution/ in the repo you launched Razor from, so you can version them like any other asset.
The mutation stage mirrors DeepEvolve: the ResearchPlanner feeds a codex sub-agent that edits the repo via apply_patch, then a debugger agent inspects the diff and suggests fixes. Their transcripts (and any follow-up patches) are persisted under workspace/evolution/jobs/<id>/mutations/, making every mutation path auditable.
Regression checklist (/evolve:init|plan|run)
Run these commands whenever you tweak manifests, CLI plumbing, or slash handlers to ensure the critical flows remain healthy:
razor evolve:init demo-goal --description "Smoke test" --template docs/examples/evolution/coding_task.yaml- Confirms manifest scaffolding still inspects the repo correctly and writes to
workspace/evolution/manifests/.
- Confirms manifest scaffolding still inspects the repo correctly and writes to
razor evolve:plan --manifest workspace/evolution/manifests/demo-goal.yaml- Verifies manifest parsing, job creation, and idea-log persistence without executing evaluators.
razor evolve:run --manifest workspace/evolution/manifests/demo-goal.yaml --path qa --dry-run- Ensures Research/Evaluation/Archive panes stream as expected, OTEL tags include the evolution metadata, and worktrees are created for the requested path.
- Optionally drop
--dry-runon a lightweight repo-specific eval command to validate approval prompts, sandbox policies, and artifact uploads.
Record the job id shown in step 3; it should be viewable via razor evolve:view --job-id <id> and the matching worktree should appear in razor evolve:worktree list.
For quick CI-friendly coverage, run make evolve-regression (wraps the CLI prompt builder tests, slash-command parser smoke tests, and the default manifest validator) whenever you change /evolve:* arguments or add new subcommands.
Automated smoke tests
Run these lightweight tests before shipping CLI changes so we keep /evolve:* CLI <-> slash parity and ensure the default manifest template stays valid:
cargo test -p codex-cli build_evolve_prompt_supports_init build_evolve_prompt_supports_plan
cargo test -p codex-tui chatwidget::tests::parse_evolution_slash_command_init_maps_goal_and_flags chatwidget::tests::parse_evolution_slash_command_run_resolves_manifest_and_flags
cargo test -p codex-core default_template_parses_successfullyThe first command exercises the prompt wiring that converts razor evolve:<command> ... invocations into the corresponding slash commands. The second keeps the slash-command parser in sync with the CLI flags, and the last guarantees that docs/examples/evolution/coding_task.yaml remains a well-formed manifest, which protects /evolve:init --template docs/examples/evolution/coding_task.yaml from regressions.
4. Inspect, resume, and share
Each run records a deterministic job id and path id. Use them to resume or inspect:
razor evolve:resume --job-id 20251107-1337-abc1 --path stage-a
razor evolve:view --job-id 20251107-1337-abc1
razor evolve:inspect --job-id 20251107-1337-abc1razor evolve:view shows the latest branch/worktree, recent iterations, and a Global archive snapshot (rendered from the shared EvolutionDatabase) so you can compare progress across jobs. razor evolve:inspect prints the per-job grid plus the same global table, which is handy for sharing coverage without rerunning anything.
Use razor evolve:worktree list to inspect the git branches created for each path, ... worktree create --job-id <id> --path <path> to spawn one manually, and ... worktree prune --job-id <id> --path <path> [--delete-branch] to clean them up after promotion. Job metadata tracks the winning branch so you can fast-forward it into your main workspace whenever you are ready.
Retrieval via Perplexity MCP
The ResearchPlanner automatically queries a Perplexity MCP server (default tool perplexity/search) so each /evolve:run iteration has fresh external citations. Configure the server exactly like any other MCP entry in config.toml:
[mcp_servers.perplexity]
url = "https://YOUR_PERPLEXITY_ENDPOINT" # or use `command = "perplexity-mcp"` for stdio
bearer_token_env_var = "PERPLEXITY_API_KEY" # populate via env var or secrets manager
enabled_tools = ["search"]By default Razor calls perplexity/search; override (or disable retrieval entirely) via RAZOR_PERPLEXITY_MCP_TOOL. Example: set RAZOR_PERPLEXITY_MCP_TOOL=perplexity/deep-search to use a different tool, or set it to an empty string to skip MCP retrieval. If the server or tool is unavailable, the planner gracefully falls back to local context.
Novelty embeddings (OpenAI-only)
Razor's novelty judge can hit OpenAI's embedding API independently of whatever primary model/provider you selected for the session. That means you can keep using login-based auth (or any configured provider) for regular completions while supplying an OpenAI key solely for embeddings. Set these variables if you want cloud-quality embeddings:
export OPENAI_API_KEY=sk-...
# Optional overrides:
export RAZOR_OPENAI_EMBEDDINGS_MODEL=text-embedding-3-large
export RAZOR_OPENAI_EMBEDDINGS_URL=https://api.openai.com/v1/embeddingsIf no key is provided--or the request fails--Razor automatically falls back to hashed (local) embeddings, so the rest of the evolution workflow (and your chosen auth method) keeps working unchanged.
Execpolicy
See the Execpolicy quickstart to set up rules that govern what commands Codex can execute.
Docs & FAQ
- Getting started
- Configuration
- Sandbox & approvals
- Execpolicy quickstart
- Authentication
- Automating Razor
- Advanced
- Zero data retention (ZDR)
- Contributing
- Install & build
- FAQ
- Open source fund
License
This repository is licensed under the Apache-2.0 License.
