evolving
v0.1.2
Published
An autonomous CLI that iterates on a codebase toward a user-defined goal.
Downloads
180
Maintainers
Readme
evolving
An autonomous CLI that iterates on a codebase toward a user-defined goal — one small, safe, measurable change at a time.
Status: 0.1.0 — first public release. Expect rough edges and API changes before 1.0. Feedback and contributions welcome.
Born from the spark of karpathy/autoresearch, evolving takes that same thrilling idea, an autonomous agent loop relentlessly chasing a goal, and unleashes it on your codebase. Imagine a tireless collaborator that wakes up every cycle, proposes a smart change, proves it works, and keeps only what makes your project better. That's evolving.
What it does
You point evolving at a git repository and a goal — "improve test coverage", "reduce duplication", "clean up TODOs" — and it iterates. Each cycle runs a fixed pipeline:
Scout → Worker → Gatekeeper
plan edit decideThe Scout reads the repo and proposes one small change. The Worker implements it. The Gatekeeper runs the test suite, evaluates the diff against success criteria, and decides keep or discard. Every cycle ends as keep, discard, or idle (nothing to do); a phase failure is recorded as a discard too. Only keep produces a git commit; everything else leaves no trace.
A separate Retrospector runs off-pipeline over the run log and proposes edits to the agent prompts — humans evolve the prompts, agents evolve the project.
Safety is part of the design, not a layer on top:
- Worktree isolation. Each run executes in a dedicated git worktree on a throwaway branch. The agents cannot see or modify your working copy.
- Process sandboxing. Every spawned subprocess runs through a kernel-level sandbox (bubblewrap on Linux,
sandbox-execon macOS) with per-role filesystem and network policies — agents and the orchestrator's own test/setup/metric commands alike, since those run agent-written code. Orchestrator credentials are never passed into the sandbox env. - Schema-validated outputs. Every agent response must satisfy a Zod schema; the runtime gets one retry, then fails the cycle (recorded as a discard).
- Atomic commits. A kept cycle is exactly one commit. Discarded cycles touch no history.
Install
npm install -g evolving
# or
bun add -g evolvingPrerequisites
- Node.js ≥ 20.
- A backend CLI, one of:
- A sandbox runtime, by platform:
- Linux:
bubblewrapandsocat(apt install bubblewrap socaton Debian/Ubuntu). On Ubuntu 24.04+, also enable unprivileged user namespaces:sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0. - macOS: built-in
sandbox-exec. No install step. - Windows: not supported. Use WSL2.
- Linux:
ripgrep(apt install ripgrep/brew install ripgrep) — used by the search tool.
Quickstart
From inside any git repository:
# 1. One-time interactive setup. Picks a backend + model, captures any
# API keys to .env (gitignored), scaffolds .evolving/config.json
# and per-role agent prompts.
evolving init
# 2. Run cycles toward a goal. Each cycle is at most a single commit.
evolving run --goal "improve test coverage of the parser"
# 3. See what landed.
evolving historyevolving run continues looping until you stop it (Ctrl-C) or pass --once. Kept commits land on a per-run branch like evolving/<runId>; merge them back when you're happy:
evolving merge --run <runId>CLI surface
| Command | Purpose |
| --- | --- |
| evolving init | Interactive setup: backend, model, API key, prompts, config. Run once per repo. |
| evolving run | Run the Scout → Worker → Gatekeeper loop. Flags: --goal, --once, --label. |
| evolving list | List active and historical runs in this repo. |
| evolving history | Show kept commits and cycle outcomes for a run. |
| evolving merge | Merge a run's branch back into your base branch. Flags: --run, --into, --base. |
| evolving pr | Open a pull request for a run. |
| evolving retrospect | Run the Retrospector over a run's log; proposes prompt edits without modifying code. |
| evolving label | Tag a cycle or run for later filtering. |
| evolving sandbox-violations | Show sandbox deny events for a run. Useful when debugging stuck cycles. |
| evolving uninstall | Remove evolving's state (.evolving/) from the current repo. |
evolving --help lists all flags.
Runtime backends
| Backend | Auth | Notes |
| --- | --- | --- |
| claude-code (default) | Uses your claude CLI subscription. No API key needed. | Recommended for most users. Each phase spawns a fresh claude subprocess with role-scoped sandbox + network policies. |
| codex | Run codex login once; evolving never sees the token. | Drives codex exec --json. Uses Codex's built-in tools and OS sandbox. |
Pick the backend at evolving init. To switch later, edit .evolving/config.json and re-run init if your model/key changed.
Safety model
The point isn't that your repo is untrusted — it's yours. It's that the Worker rewrites it every cycle, unattended, and the orchestrator runs that freshly-written code before you've reviewed it. Two independent layers confine that:
- Worktree partition. Every run gets its own
git worktreeon a freshevolving/<runId>branch withgc.auto = 0. Your main worktree is never touched, and a discarded cycle leaves no trace. - Per-role process sandbox. Every spawned subprocess runs through a kernel-level sandbox — agents and the orchestrator's own test / setup / metric commands alike, so agent-authored code stays worktree-confined no matter which process runs it. Your
.envis never forwarded into the sandbox env.
Sandboxing is defense-in-depth, not a guarantee — bypasses exist. The threat model, the full deny/allow rules, and per-platform enforcement live in wiki/sandbox.md.
Documentation
The wiki is the authoritative spec — read it before contributing or filing a non-trivial issue.
wiki/architecture.md— vision, cycle loop, agent roles, safety model, CLI surface.wiki/specification.md— schemas, orchestrator pseudocode, config shape, per-agent prompt contracts.wiki/runtime.md— how each backend runs a phase: invocation shape, tool wiring, tripwires.wiki/sandbox.md— two-layer isolation model; per-platform enforcement.wiki/status.md— what is built right now and what is not. The entry point for picking the project up mid-stream.wiki/decisions/— ADRs documenting non-obvious design choices.
Contributing
See CONTRIBUTING.md for setup, the bun run gate validation flow, code style, and the ADR process. Security reports: SECURITY.md.
License
Apache-2.0 — see LICENSE.
