@sigmance/razor

v0.125.8

Published

12 hours ago

Downloads

3,845

0High
0Medium
0Low

taddeusb90

Fork context: ./razor/ is the default CLI and core development workspace.

Quickstart

Installing and running Razor CLI

Install globally with your preferred package manager. If you use npm:

npm install -g @sigmance/razor

Then simply run razor to get started:

razor

If you're running into upgrade issues with a global install, see the FAQ entry on updating Razor.

Each GitHub Release contains many executables, but in practice, you likely want one of these:

macOS
- Apple Silicon/arm64: razor-aarch64-apple-darwin.tar.gz
- x86_64 (older Mac hardware): razor-x86_64-apple-darwin.tar.gz
Linux
- x86_64: razor-x86_64-unknown-linux-musl.tar.gz
- arm64: razor-aarch64-unknown-linux-musl.tar.gz

Each archive contains a single entry with the platform baked into the name (e.g., razor-x86_64-unknown-linux-musl), so you likely want to rename it to razor after extracting it.

Using Razor with your ChatGPT plan

Run razor and select Sign in with ChatGPT. We recommend signing into your ChatGPT account to use Razor as part of your Plus, Pro, Team, Edu, or Enterprise plan. Learn more about what's included in your ChatGPT plan.

You can also use Razor with an API key, but this requires additional setup. If you previously used an API key for usage-based billing, see the migration steps. If you're having trouble with login, please open an issue in this repository.

Model Context Protocol (MCP)

Razor can access MCP servers. To configure them, refer to the config docs.

Deep Research & Oracle commands

Once you run make setup inside /Users/.../deep-research (which creates .venv/ with all dependencies) and record the [mcp_servers.deep_research] / [mcp_servers.deep_research_oracle] entries in config.toml, Razor exposes two slash commands that never require you to specify output paths:

/deep-research deep "<query>" [--style educational|technical ...] - launches the full deep research workflow. Use shallow or basic as the first argument to run the lighter templates. Example:
```
/deep-research deep "How does MAP-Elites vary with archive resolution?" --max-iters 8 --confidence 0.9
/deep-research shallow "Summarize the latest AlphaResearch metrics" --with-metadata
/deep-research basic "Provide a high-level definition of novelty search"
```
All artifacts (cache/, logs/, results/research_report.md|json, timings.json, discovered sources) are automatically written under your repo's workspace/deep-research/<timestamp>/....
/oracle "<question>" [--context "<extra details>"] - sends a question to the Oracle FastMCP server. Example:
```
/oracle "What risks do we run when parallelizing evaluator stages?" --context "Current manifest uses 4 workers"
```
The response is streamed inline and the transcript is saved under workspace/deep-research/oracle/<timestamp>_<slug>/oracle_response.(md|json) so future /evolve plans can cite it. Set OPENAI_API_KEY in your shell before launching Razor; the CLI forwards it to both MCP servers so secrets never live in config.toml.

Deep research and Oracle outputs are always mirrored into your active repo's workspace; you never need to pass workspace/deep-research manually.

Configuration

Razor CLI supports a rich set of configuration options, with preferences stored in ~/.razor/config.toml (set RAZOR_HOME or legacy CODEX_HOME to override). For full configuration options, see Configuration.

Razor autonomy operations

Razor's autonomy operations are documented in Razor Autonomy Usage Guide. Use it as the operator guide for /loop, /schedule, model-facing self-invocation and capability-awareness tools, /tasks, /task, /notifications, /monitor, /cic, /stop, /continue, /remote-control, live monitors, autonomous /evolve:loop, external evolution requests, and remote app-server/TUI operation.

Publishing the Razor fork (`@sigmance/razor`)

Follow these steps when you need to build or publish the private Razor-scoped package.

Prep the environment
- Use Node.js >=22 (e.g. nvm use 22) and enable pnpm: corepack enable && corepack prepare [email protected] --activate.
- Ensure Python 3.11+, the GitHub CLI (gh auth status), and an authenticated npm session (npm whoami).
- Install DotSlash for ripgrep manifests: brew install dotslash.
Stage native binaries

Preferred local release build: make build-all-local-artifacts VERSION=0.117.15
This produces split npm artifacts: @sigmance/razor plus @sigmance/razor-darwin-arm64, @sigmance/razor-darwin-x64, @sigmance/razor-linux-arm64, and @sigmance/razor-linux-x64.
The split package layout keeps each native tarball below npm's payload limit. Do not publish the old all-platform monolithic package.
If native binaries already exist in codex-cli/vendor, repack only the npm artifacts with make build-local-npm-packages VERSION=0.117.15.

Verify the staged CLI
- make publish-npm VERSION=0.117.15 NPM="npm --dry-run"
- Optional local install smoke test: install dist/npm/razor-npm-0.117.15.tgz plus the native package for your platform, then run razor --version.
Publish or install
- Publish: make publish-npm VERSION=0.117.15
- If npm requires 2FA for writes, publish with make publish-npm VERSION=0.117.15 NPM_OTP=<code>.
- Preview publish: make publish-preview VERSION=0.117.15-alpha.1
- If npm returns a transient 409 Conflict or an already-published 403, rerun make publish-npm; already-visible package versions are skipped and accepted package versions are treated as complete.
- Local monolithic install for offline testing: make build-local-artifacts VERSION=0.117.15 TARGETS="$(rustc -vV | awk '/host:/{print $$2}')" then npm install -g dist/npm/razor-monolithic-npm-0.117.15.tgz
Bump on new releases
- Use the Makefile version targets so Cargo and npm metadata stay aligned.
Makefile shortcuts
- make version-set VERSION=0.54.0-alpha.1 to record a preview version across Cargo + npm metadata.
- Remote artifacts: make build-preview VERSION=0.54.0-alpha.1 WORKFLOW_URL=<run-url> then make publish-preview VERSION=0.54.0-alpha.1.
- Full local release build: make build-all-local-artifacts VERSION=0.54.0-alpha.1.
- Local monolithic build: make build-local-artifacts VERSION=0.54.0-alpha.1 TARGETS="$(rustc -vV | awk '/host:/{print $$2}')".

Local development workflow

Install prerequisites

# Node / pnpm
corepack enable
pnpm install --frozen-lockfile

# Rust targets needed for packaging (host triple + optional Linux musl)
rustup toolchain install 1.90
rustup target add --toolchain 1.90 x86_64-unknown-linux-musl

Tip: add any other targets you plan to ship (for example aarch64-apple-darwin).

Run tests

# Rust workspace (CLI + support crates)
cargo test --manifest-path codex-rs/Cargo.toml -p codex-cli

# TypeScript SDK (optional but mirrors CI)
pnpm --filter @openai/codex-sdk test

Refresh vendor binaries (optional)

# Rebuild razor for the host triple and update codex-cli/vendor
make build-local-vendor TARGETS="$(rustc -vV | awk '/host:/{print $$2}')"

# Use cross for non-host targets (requires Docker + cross installed)
make build-local-vendor TARGETS="x86_64-unknown-linux-musl" USE_CROSS=1

Run this when you only need fresh binaries without packaging a tarball yet.

Build Razor artifacts locally

# Build binaries for the host triple (e.g., macOS) and package a monolithic local tarball
make build-local-artifacts VERSION=0.54.0-alpha.1 TARGETS="$(rustc -vV | awk '/host:/{print $$2}')"

# Build all release platforms and split publishable npm tarballs
make build-all-local-artifacts VERSION=0.54.0-alpha.1

The publishable tarballs are written to dist/npm/. The monolithic local tarball is named razor-monolithic-npm-<version>.tgz so it cannot be confused with the small publishable wrapper.

Publish the locally built package

npm whoami               # ensure you're authenticated
make publish-npm VERSION=0.54.0-alpha.1   # publishes platform packages first, then @sigmance/razor

For preview builds, prefer make publish-preview VERSION=0.54.0-alpha.1 to assign the alpha dist-tag automatically.

Triggering GitHub workflows

ci.yml runs automatically for every PR and push to main. It stages a smoke npm tarball using the checked-in vendored binaries and runs formatting checks--nothing to trigger manually.
rust-release.yml performs the full release (multi-platform Rust builds, artifact uploads, npm publish). It triggers when you push a tag named rust-vX.Y.Z or rust-vX.Y.Z-alpha.N.

Example release flow:

VERSION=0.54.0
# Set versions everywhere
make version-set VERSION="$VERSION"

# (Optional) smoke the local build
make build-local-artifacts VERSION="$VERSION" TARGETS="$(rustc -vV | awk '/host:/{print $2}')"

# Commit and tag
git commit -am "chore: release $VERSION"
git tag -a "rust-v$VERSION" -m "Release $VERSION"
git push origin main
git push origin "rust-v$VERSION"

For previews use VERSION=0.54.0-alpha.1; the workflow will publish under the npm alpha tag automatically.

Need to re-run the workflow without a new commit? From GitHub's Actions tab select the desired workflow (ci or rust-release) and click Run workflow (or Re-run jobs on a past run).

Evolution workflows (`/evolve:<command> ...`)

The Razor CLI ships a built-in "Evolution Kernel" that plans, mutates, evaluates, and archives candidate solutions without leaving your session. Every workflow is exposed both as a CLI verb (razor evolve:<command> ...) and as a slash command (/evolve:<command> ...) so the streaming UX mirrors /init and /review.

Command overview

| Command | Purpose | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | razor evolve:init <goal> / /evolve:init <goal> | Scaffold a manifest under workspace/evolution/manifests/<goal>.yaml, auto-detecting language, targets, and eval commands from your repo. Supports --description, --template <file>, and --output <file>. | | razor evolve:plan --manifest <file> / /evolve:plan --manifest <file> | Validate the manifest, create workspace/evolution/jobs/<id>, and snapshot baseline metadata. | | razor evolve:run [--manifest <file> \| --target <artifact> \| --job-id <id>] [--path <id>] [--dry-run] / /evolve:run --manifest <file> or /evolve:run --target <artifact> | Execute the research -> mutation -> evaluation loop under Razor sandboxing, persisting ideas, iterations, artifacts, and worktrees. Target-driven runs generate a manifest automatically. | | razor evolve:loop [--program <file> \| --manifest <file> \| --target <artifact> \| --job-id <id>] / /evolve:loop ... | Run the same evolution loop without a fixed iteration cap. Stop it with /evolve:stop --job-id <id> or an operator interruption. | | razor evolve:resume --job-id <id> / /evolve:resume --job-id <id> | Convenience alias for run --job-id .... | | razor evolve:steer --job-id <id> --message <text> / /evolve:steer --job-id <id> --message <text> | Queue operator, monitor, subsystem, external-trigger, or system guidance for the next iteration of an existing job. Supports --summary, --source, and --source-id. | | razor evolve:view --job-id <id> / /evolve:view --job-id <id> | Show job metadata, recent iterations, per-job scoreboards, and the Global archive snapshot aggregated across every job. | | razor evolve:inspect --job-id <id> / /evolve:inspect --job-id <id> | Render the MAP grid plus queue/island health (pending migrations, recent lineage, island table) and the global archive table for sharing results or debugging novelty coverage. | | razor evolve:bench --pack <file> / /evolve:bench --pack <file> | Runs the benchmark pack (built-in via --built-in or workspace packs via --project). Bench runs execute each job sequentially; pass --dry-run to preview commands without running them. | | razor evolve:worktree <list\|create\|prune> / /evolve:worktree <list\|create\|prune> | Inspect or manage the git worktrees/branches Razor creates per evolutionary path. |

Slash commands accept the same flags (for example /evolve:run --manifest ... --dry-run). Run and loop also accept target seed flags when no manifest exists yet: --target <artifact>, --target-kind <code|strategy|model|prompt|system>, --signal <why>, --evidence <text>, and --eval-command <cmd>. View supports --rewards, --research, --peer-review, --datasets, and --steering.

1. Scaffold a manifest (`/evolve:init`)

Use the init command to generate a manifest that already reflects your repo layout:

# CLI
razor evolve:init circle-packing --description "Improve heuristics" --template docs/examples/evolution/base.yaml

# Slash (inside TUI)
/evolve:init circle-packing --description "Improve heuristics"

Razor inspects Cargo.toml, package.json, pnpm-lock.yaml, pyproject.toml, etc. to pick:

Language - rust, typescript, or python (falls back to mixed).
Targets - e.g., src/**/*.rs, apps/**/src/**/*.ts, **/*.py.
Eval command - cargo test, pnpm test / npm test / yarn test, or pytest.
Location - workspace/evolution/manifests/<goal>.yaml unless --output overrides it.

Supplying --template copies that file (retaining custom metrics/novelty settings) before patching the goal name and description. Edit the YAML afterward to fine-tune adapters, metrics, or timeouts.

2. Review / customize the manifest

Generated manifests follow the same schema as hand-written ones, so keep editing them in-place. Example:

name: Circle Packing
description: Improve heuristic + benchmarks
language: python
targets:
  - src/**/*.py
  - tests/**/*.py
eval:
  command: "pytest tests/test_circle.py -q"
  parallel_evaluations: 2
  stages:
    - name: "unit"
      command: "pytest tests/unit -q"
      weight: 0.6
    - name: "integration"
      command: "pytest tests/integration -q"
      weight: 0.4
timeouts:
  evaluate_seconds: 300
artifacts:
  save_stdout: true
  save_stderr: true
  max_total_mb: 200
metrics:
  combined_score: required
  correctness: optional
novelty:
  embeddings: disabled
  threshold: 0.98

Set parallel_evaluations to >1 to fan evaluation stages out concurrently (best used when the stages are independent test suites). Leave it unset or 1 to retain single-threaded evaluation.

Store manifests anywhere, but keeping them under workspace/evolution/manifests/ aligns with the rest of the evolution storage (workspace/evolution/{jobs,db,artifacts}). A ready-to-edit template lives at docs/examples/evolution/coding_task.yaml, so you can run /evolve:init <goal> --template docs/examples/evolution/coding_task.yaml (or the CLI equivalent) as a starting point.

Target-driven evolution without a manifest

For operator requests or subsystem signals that name an artifact directly, Razor can seed an evolution job without a pre-existing manifest. Use --target with /evolve:run for a bounded campaign, or with /evolve:loop for an autonomous loop that continues until stopped.

/evolve:run --target trading.strategist_prompt --target-kind strategy --signal alpha_decline_3d --iterations 1 --eval-command "cd trading-system-workspace/trading-engine && pytest tests/prompts"

/evolve:loop --target src/services/allocator.py --target-kind code --signal operator_request

/evolve:steer --job-id <id> --message "Prioritize monitor evidence from alpha-watch before the next mutation." --source monitor --source-id alpha-watch

Target-driven runs generate a manifest under workspace/evolution/manifests/generated/ and write the seed request to workspace/evolution/jobs/<job-id>/evolution_seed.json. Known Sigmance aliases such as trading.strategist_prompt and trading.allocator_prompt resolve to the trading-engine prompt templates when those paths exist from the current workspace. Comma-separated explicit targets are also supported.

/evolve:loop uses the same kernel as /evolve:run, but the iteration cap is unbounded. Stop it with /evolve:stop --job-id <id> or a normal interruption. Live steering is supported while a loop is running: /evolve:steer writes pending guidance to workspace/evolution/jobs/<job-id>/steering_events.jsonl, the next iteration injects it into the research plan/report before mutation, and steering_consumption.jsonl records which iteration consumed it. Inspect the queue with /evolve:view --job-id <id> --steering.

Remote UI clients and subsystem services can append the same steering events through the app-server evolution/steer method. Monitor triggers can also carry evolutionSteering metadata so a monitor fire creates its normal TaskRun/notification and, when configured with a jobId, appends a source-typed steering event for the running evolution job.

App-server live monitor loops can produce those monitor fires without a manual JSON-RPC call. See Live monitor loops for the full setup and verification path. In short: start app-server with the scheduler feature enabled (razor --enable scheduler app-server --listen unix://), create an active monitor with /monitor create file|command|task|notification or app-server monitor/create, then let the shared monitor daemon check file, command, task, or notification conditions, debounce repeated observations, honor /stop through the durable autonomy gate, and route matching observations through the same monitor/trigger path. Supported loop kinds are file (path or paths, intervalSecs, debounceSecs), command (command, intervalSecs, timeoutSecs, triggerOn of failure|success|change|always), task (taskStatus, sourceKind, sourceId), and notification (notificationStatus, sourceKind, sourceId, taskRunId). Monitor fires write the task/notification/evolution-steering fabric and can launch immediate background new_agent work with runOnTrigger = "new_agent" or /monitor create ... --auto-run. Command loops should be limited to trusted workspace-owned checks until policy and shell-task hardening are complete.

External trading flywheel requests

Razor can also consume UATS/trading flywheel degradation signals from MongoDB and turn them into tracked evolution tasks. The consumer is disabled by default and starts only when RAZOR_EVOLUTION_REQUESTS_MONGODB_URI is present in the app-server environment.

| Environment variable | Default | Purpose | | ------------------------------------------- | -------------------------- | ---------------------------------------------------- | | RAZOR_EVOLUTION_REQUESTS_MONGODB_URI | unset | Enables the consumer and points it at MongoDB. | | RAZOR_EVOLUTION_REQUESTS_MONGODB_DATABASE | trading_system | Database containing queued requests. | | RAZOR_EVOLUTION_REQUESTS_COLLECTION | razor_evolution_requests | Collection to poll. | | RAZOR_EVOLUTION_REQUESTS_POLL_SECS | 30 | Poll interval. | | RAZOR_EVOLUTION_REQUESTS_ITERATIONS | 1 | Iterations per claimed request. | | RAZOR_EVOLUTION_REQUESTS_DRY_RUN | unset / false | Runs the evolution task in dry-run mode when truthy. | | RAZOR_EVOLUTION_REQUESTS_EVAL_COMMAND | inferred when possible | Overrides the generated manifest eval command. |

The consumer claims requests with status equal to queued or pending, marks them in_progress, writes the raw trigger to workspace/evolution/external-triggers/<request-id>.json, writes a generated manifest under workspace/evolution/manifests/external/, creates a durable TaskRunKind::Evolution, and submits the native Op::Evolution flow in a background thread. Mongo status is then updated to completed, failed, or cancelled with razor_* metadata.

If a request carries steering_job_id (aliases: evolution_job_id or job_id), the consumer treats it as a steering producer instead of starting a new job. It appends to workspace/evolution/jobs/<job-id>/steering_events.jsonl, marks the Mongo document completed, and stores razor_steering_event_id, razor_steering_job_id, razor_steering_events_path, and razor_steering_consumption_path.

To verify it manually, start the app-server with scheduler/task features and the Mongo environment variables set, insert or wait for a razor_evolution_requests document, then inspect:

Mongo fields: status, razor_task_run_id, razor_manifest_path, razor_started_at, and terminal razor_completed_at / razor_failed_at.
Razor task fabric: /tasks, /task <task-id>, or app-server task/list.
Artifacts: workspace/evolution/external-triggers/, workspace/evolution/manifests/external/, and workspace/evolution/jobs/<job-id>/.
Steering requests: Mongo razor_steering_event_id plus workspace/evolution/jobs/<job-id>/steering_events.jsonl.

Scheduler backends (`manifest.scheduler`)

Every manifest picks a scheduler backend:

local (default) keeps evaluations entirely in-process.
shell shells out to a user-defined command so you can forward work to Hydra/Slurm wrappers, GitHub Actions dispatchers, etc., without baking those integrations into Razor.
hydra / slurm remain placeholders; configure shell for custom dispatch today.

For backend: shell, set:

scheduler:
  backend: shell
  shell:
    program: "bash"
    args:
      - "-lc"
      - "{command_sh}"
    timeout_secs: 600 # optional override
    env:
      HYDRA_PROFILE: "research"

The scheduler replaces placeholders inside program, args, and env values:

| Token | Value | | ---------------- | ------------------------------------------------------------------------- | | {command_sh} | Original eval command, shell-escaped (e.g., cargo test -- --nocapture). | | {command_json} | JSON array of the command (e.g., ["cargo","test","--","--nocapture"]). | | {stage} | Eval stage name (unit, lint, etc.). | | {label} | Friendly label (defaults to the stage name). | | {job_id} | Evolution job id. | | {path_id} | Active path id (defaults to main). | | {iteration} | 1-based iteration counter. |

Razor also injects these environment variables for shell backends: EVOLUTION_SCHEDULER_COMMAND_JSON, EVOLUTION_SCHEDULER_COMMAND_SH, EVOLUTION_SCHEDULER_STAGE, EVOLUTION_SCHEDULER_LABEL, EVOLUTION_SCHEDULER_JOB_ID, EVOLUTION_SCHEDULER_PATH_ID, and EVOLUTION_SCHEDULER_ITERATION. Your dispatch script can read them to decide how to route the work (submit a Slurm job, run srun, etc.) while keeping Razor's default local orchestration untouched. Sample wrapper scripts live under docs/examples/evolution/schedulers/:

hydra-wrapper.sh - forwards commands to hydra run ..., including stage metadata in logs.
slurm-wrapper.sh - wraps the command in srun --job-name="razor-..." ....

Point scheduler.shell.program/args at one of these scripts (or your own variant) to integrate with remote schedulers without modifying Razor itself. /evolve:view|inspect now print the selected scheduler backend for each job, and OTEL events (codex.evolution_scheduler_*) include queue position + timeout metadata so dashboards can track dispatch latency.

AlphaResearch benchmark packs

Built-in packs live under docs/examples/evolution/benchmarks/ and are available via razor evolve bench --built-in <name>:

alpharesearch_pack - runs the entire AlphaResearchComp suite (all nine benchmarks).
alpharesearch_packing_circles
alpharesearch_spherical_code
alpharesearch_minizing_ratio
alpharesearch_third_autocorrelation
alpharesearch_autoconvolution
alpharesearch_littlewood
alpharesearch_kissing_number
alpharesearch_heilbronn
alpharesearch_mstd

Razor embeds the full AlphaResearch manifest set; if these files are missing locally (for example when running from a packaged binary), the CLI regenerates them under ~/.razor/builtin_evolution_docs/docs/examples/evolution/ the first time you invoke --built-in. You can inspect or edit the regenerated packs there before re-running the benchmark.

To customise the suite per workspace, copy one of the packs into workspace/evolution/benchmarks/ (see workspace/evolution/benchmarks/alpharesearch/full.yaml) and run it via razor evolve bench --project <name>.

Archive knobs & migrations

archive accepts OpenEvolve-style controls:

archive:
  complexity_bins: 3
  novelty_bins: 3
  migration_interval: 4 # iterations between migration checks
  migration_batch: 2 # max migrations per batch window
  batch_window: 6 # number of recent iterations to consider
  migration_trace_limit: 200 # keep the last N migrations in migrations.jsonl
  feature_dimensions: ["complexity", "novelty", "diversity"]

The first two feature_dimensions entries label the MAP-Elites grid. /evolve view and /evolve inspect now render Archive coverage (<row> rows x <column> columns) using those names, and migration plans reuse the labels so the stored jobs/<id>/archive.json snapshot documents the chosen schema.
Razor writes every migration decision to jobs/<id>/migrations.jsonl (reason, target bin, priority, source success rate). MapMigrator enforces the batch/window limits so multi-island queues don't spawn too many paths at once, and /evolve:inspect highlights the pending scheduler state plus MAP-Elites table for quick triage.

Idea database & datasets

Every /evolve:run iteration records DeepEvolve-style research artifacts under workspace/evolution/jobs/<id>/: the web-search plan (web_search_plan.jsonl), search notes (search_notes.jsonl), markdown reports (research_reports/report-<job>-<ts>.md), dataset cache manifests (datasets/<handle>-cache.json), and mutation bundles (coder/debugger transcripts).
ResearchPlanner now detects common dataset folders (data/, datasets/, inputs/), records cache metadata (dataset_handle, cache_path, cache_mtime, inputs_signature), and exposes it via razor evolve:ideas show --job-id <id>.
When no mutation_targets.json exists, /evolve:run auto-seeds one from the planner's focus files so MutationEngine can produce multi-file EVOLVE blocks without manual scaffolding; edit that file between iterations to steer the coder/debugger delegates. Each target supports a weight (higher numbers appear earlier in mutation prompts) plus mode (diff, full, crossover) and an optional rationale.
Razor writes dataset feedback events (missing caches, stale metadata, refreshes) to jobs/<id>/dataset_feedback.jsonl. Inspect them via /evolve view --datasets or razor evolve view --datasets --job-id <id> to keep cache hygiene in sync with research planning.
Razor writes live guidance to jobs/<id>/steering_events.jsonl and consumed markers to jobs/<id>/steering_consumption.jsonl. ResearchPlanner folds pending steering into the next iteration's hypotheses, reflection prompts, and research report, then the task emits a live Steering update.
App-server evolution/steer, monitor evolutionSteering trigger metadata, and Mongo steering_job_id requests all append the same event shape, so operator, monitor, subsystem, external-trigger, and system guidance share one durable audit trail.
ResearchPlanner surfaces follow-up query queues (dataset hygiene warnings, low-scoring ideas, and underexplored mutation targets). These appear under /evolve view --research so operators can triage which avenues to pursue next.
razor evolve init <goal> now scaffolds manifests directly from the CLI (honoring --description, --template, --output), using the same environment detection as /evolve:init. Follow it with razor evolve plan --manifest <file> to register the job under workspace/evolution/jobs/ without launching the TUI.

Model ensembles & proof checker (AlphaEvolve-style)

manifest.ensembles controls coder/debugger model rotations plus an optional proof checker:

ensembles:
  coder_models:
    - gpt-5-codex
    - gpt-4.1-preview
  debugger_models:
    - gpt-4.1-mini
  proof_checker: "pnpm test -- proof"

MutationEngine will try each coder model in order until one succeeds (recorded in the mutation summary and idea logs). Debugger models work the same way, and the iteration summary shows Coder ensemble (n): ....
Evaluator cascades now run the proof checker command after shell stages (unless --dry-run). Pass/fail results stream into /evolve:run, iterations.jsonl, and OTEL telemetry so dashboarding AlphaEvolve-style verification is straightforward.

Benchmark packs (`razor evolve:bench`)

Store YAML benchmark packs under workspace/evolution/benchmarks/ (sample: docs/examples/evolution/benchmarks/ts_cli_pack.yaml). Each pack declares a name, optional description, and a list of jobs shaped like:

name: TypeScript CLI smoke pack
jobs:
  - manifest: ../../evolution/coding_task.yaml
    path: staging
    iterations: 1
    dry_run: true
  - manifest: ../../evolution/coding_task.yaml
    path: qa
    iterations: 1

Scaffold goal-aligned packs with razor evolve bench --project-init "<goal>" --manifest workspace/evolution/manifests/<goal>.yaml --challenge "<measurable challenge>". The helper writes a starter pack under workspace/evolution/benchmarks/<slug>.yaml that embeds the challenge text so you can tweak it before running --project <slug>.
Run razor evolve:bench --pack <pack.yaml> (or /evolve:bench --pack ...). Razor validates the file, confirms the referenced manifests exist, and prints a ready-to-run command list (e.g., /evolve:run --manifest ... --path qa --iterations 1). Execute each command to compare archive metrics across paths/iterations.

3. Plan and run

From the repo root:

# Parse the manifest and snapshot the initial job metadata
razor evolve:plan --manifest workspace/evolution/manifests/circle-packing.yaml

# Start iterating (use --dry-run to skip evaluation execution)
razor evolve:run --manifest workspace/evolution/manifests/circle-packing.yaml --path staging

Slash equivalents:

/evolve:plan --manifest workspace/evolution/manifests/circle-packing.yaml
/evolve:run --manifest workspace/evolution/manifests/circle-packing.yaml --path staging --dry-run

Both entry points now stream four panes in the TUI:

Research - ResearchPlanner insights, citations, pseudocode, and web-search notes.
Mutation - Outputs from the coder/debugger delegates (DeepEvolve-style apply_patch diffs plus debugger fixes).
Evaluation - Eval command, exit code, duration, metrics, and novelty scores.
Archive - Scoreboard + MAP-style novelty/complexity grid.

All artifacts (manifests, ideas, iterations, job DB, worktrees, archives) live under workspace/evolution/ in the repo you launched Razor from, so you can version them like any other asset.

The mutation stage mirrors DeepEvolve: the ResearchPlanner feeds a codex sub-agent that edits the repo via apply_patch, then a debugger agent inspects the diff and suggests fixes. Their transcripts (and any follow-up patches) are persisted under workspace/evolution/jobs/<id>/mutations/, making every mutation path auditable.

Regression checklist (`/evolve:init|plan|run`)

Run these commands whenever you tweak manifests, CLI plumbing, or slash handlers to ensure the critical flows remain healthy:

razor evolve:init demo-goal --description "Smoke test" --template docs/examples/evolution/coding_task.yaml
- Confirms manifest scaffolding still inspects the repo correctly and writes to workspace/evolution/manifests/.
razor evolve:plan --manifest workspace/evolution/manifests/demo-goal.yaml
- Verifies manifest parsing, job creation, and idea-log persistence without executing evaluators.
razor evolve:run --manifest workspace/evolution/manifests/demo-goal.yaml --path qa --dry-run
- Ensures Research/Evaluation/Archive panes stream as expected, OTEL tags include the evolution metadata, and worktrees are created for the requested path.
Optionally drop --dry-run on a lightweight repo-specific eval command to validate approval prompts, sandbox policies, and artifact uploads.

Record the job id shown in step 3; it should be viewable via razor evolve:view --job-id <id> and the matching worktree should appear in razor evolve:worktree list.

For quick CI-friendly coverage, run make evolve-regression (wraps the CLI prompt builder tests, slash-command parser smoke tests, and the default manifest validator) whenever you change /evolve:* arguments or add new subcommands.

Automated smoke tests

Run these lightweight tests before shipping CLI changes so we keep /evolve:* CLI <-> slash parity and ensure the default manifest template stays valid:

cargo test -p codex-cli build_evolve_prompt_supports_init build_evolve_prompt_supports_plan
cargo test -p codex-tui chatwidget::tests::parse_evolution_slash_command_init_maps_goal_and_flags chatwidget::tests::parse_evolution_slash_command_run_resolves_manifest_and_flags
cargo test -p codex-core default_template_parses_successfully

The first command exercises the prompt wiring that converts razor evolve:<command> ... invocations into the corresponding slash commands. The second keeps the slash-command parser in sync with the CLI flags, and the last guarantees that docs/examples/evolution/coding_task.yaml remains a well-formed manifest, which protects /evolve:init --template docs/examples/evolution/coding_task.yaml from regressions.

4. Inspect, resume, and share

Each run records a deterministic job id and path id. Use them to resume or inspect:

razor evolve:resume --job-id 20251107-1337-abc1 --path stage-a
razor evolve:view --job-id 20251107-1337-abc1
razor evolve:inspect --job-id 20251107-1337-abc1

razor evolve:view shows the latest branch/worktree, recent iterations, and a Global archive snapshot (rendered from the shared EvolutionDatabase) so you can compare progress across jobs. razor evolve:inspect prints the per-job grid plus the same global table, which is handy for sharing coverage without rerunning anything.

Use razor evolve:worktree list to inspect the git branches created for each path, ... worktree create --job-id <id> --path <path> to spawn one manually, and ... worktree prune --job-id <id> --path <path> [--delete-branch] to clean them up after promotion. Job metadata tracks the winning branch so you can fast-forward it into your main workspace whenever you are ready.

Retrieval via Perplexity MCP

The ResearchPlanner automatically queries a Perplexity MCP server (default tool perplexity/search) so each /evolve:run iteration has fresh external citations. Configure the server exactly like any other MCP entry in config.toml:

[mcp_servers.perplexity]
url = "https://YOUR_PERPLEXITY_ENDPOINT"          # or use `command = "perplexity-mcp"` for stdio
bearer_token_env_var = "PERPLEXITY_API_KEY"      # populate via env var or secrets manager
enabled_tools = ["search"]

By default Razor calls perplexity/search; override (or disable retrieval entirely) via RAZOR_PERPLEXITY_MCP_TOOL. Example: set RAZOR_PERPLEXITY_MCP_TOOL=perplexity/deep-search to use a different tool, or set it to an empty string to skip MCP retrieval. If the server or tool is unavailable, the planner gracefully falls back to local context.

Novelty embeddings (OpenAI-only)

Razor's novelty judge can hit OpenAI's embedding API independently of whatever primary model/provider you selected for the session. That means you can keep using login-based auth (or any configured provider) for regular completions while supplying an OpenAI key solely for embeddings. Set these variables if you want cloud-quality embeddings:

export OPENAI_API_KEY=sk-...
# Optional overrides:
export RAZOR_OPENAI_EMBEDDINGS_MODEL=text-embedding-3-large
export RAZOR_OPENAI_EMBEDDINGS_URL=https://api.openai.com/v1/embeddings

If no key is provided--or the request fails--Razor automatically falls back to hashed (local) embeddings, so the rest of the evolution workflow (and your chosen auth method) keeps working unchanged.

Execpolicy

See the Execpolicy quickstart to set up rules that govern what commands Codex can execute.

Docs & FAQ

License

This repository is licensed under the Apache-2.0 License.