evolving

v0.1.2

Published

2 months ago

An autonomous CLI that iterates on a codebase toward a user-defined goal.

0High
0Medium
0Low

jingbanz

autonomous agent cli evolution claude llm codegen orchestrator

evolving

An autonomous CLI that iterates on a codebase toward a user-defined goal — one small, safe, measurable change at a time.

Status: 0.1.0 — first public release. Expect rough edges and API changes before 1.0. Feedback and contributions welcome.

Born from the spark of karpathy/autoresearch, evolving takes that same thrilling idea, an autonomous agent loop relentlessly chasing a goal, and unleashes it on your codebase. Imagine a tireless collaborator that wakes up every cycle, proposes a smart change, proves it works, and keeps only what makes your project better. That's evolving.

What it does

You point evolving at a git repository and a goal — "improve test coverage", "reduce duplication", "clean up TODOs" — and it iterates. Each cycle runs a fixed pipeline:

Scout → Worker → Gatekeeper
 plan    edit     decide

The Scout reads the repo and proposes one small change. The Worker implements it. The Gatekeeper runs the test suite, evaluates the diff against success criteria, and decides keep or discard. Every cycle ends as keep, discard, or idle (nothing to do); a phase failure is recorded as a discard too. Only keep produces a git commit; everything else leaves no trace.

A separate Retrospector runs off-pipeline over the run log and proposes edits to the agent prompts — humans evolve the prompts, agents evolve the project.

Safety is part of the design, not a layer on top:

Worktree isolation. Each run executes in a dedicated git worktree on a throwaway branch. The agents cannot see or modify your working copy.
Process sandboxing. Every spawned subprocess runs through a kernel-level sandbox (bubblewrap on Linux, sandbox-exec on macOS) with per-role filesystem and network policies — agents and the orchestrator's own test/setup/metric commands alike, since those run agent-written code. Orchestrator credentials are never passed into the sandbox env.
Schema-validated outputs. Every agent response must satisfy a Zod schema; the runtime gets one retry, then fails the cycle (recorded as a discard).
Atomic commits. A kept cycle is exactly one commit. Discarded cycles touch no history.

Install

npm install -g evolving
# or
bun add -g evolving

Prerequisites

Node.js ≥ 20.
A backend CLI, one of:
- claude — Anthropic's Claude Code CLI. No API key needed; uses your Claude subscription. Default.
- codex — OpenAI's Codex CLI. Run codex login once; evolving doesn't capture or persist your credentials.
A sandbox runtime, by platform:
- Linux: bubblewrap and socat (apt install bubblewrap socat on Debian/Ubuntu). On Ubuntu 24.04+, also enable unprivileged user namespaces: sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0.
- macOS: built-in sandbox-exec. No install step.
- Windows: not supported. Use WSL2.
ripgrep (apt install ripgrep / brew install ripgrep) — used by the search tool.

Quickstart

From inside any git repository:

# 1. One-time interactive setup. Picks a backend + model, captures any
#    API keys to .env (gitignored), scaffolds .evolving/config.json
#    and per-role agent prompts.
evolving init

# 2. Run cycles toward a goal. Each cycle is at most a single commit.
evolving run --goal "improve test coverage of the parser"

# 3. See what landed.
evolving history

evolving run continues looping until you stop it (Ctrl-C) or pass --once. Kept commits land on a per-run branch like evolving/<runId>; merge them back when you're happy:

evolving merge --run <runId>

CLI surface

| Command | Purpose | | --- | --- | | evolving init | Interactive setup: backend, model, API key, prompts, config. Run once per repo. | | evolving run | Run the Scout → Worker → Gatekeeper loop. Flags: --goal, --once, --label. | | evolving list | List active and historical runs in this repo. | | evolving history | Show kept commits and cycle outcomes for a run. | | evolving merge | Merge a run's branch back into your base branch. Flags: --run, --into, --base. | | evolving pr | Open a pull request for a run. | | evolving retrospect | Run the Retrospector over a run's log; proposes prompt edits without modifying code. | | evolving label | Tag a cycle or run for later filtering. | | evolving sandbox-violations | Show sandbox deny events for a run. Useful when debugging stuck cycles. | | evolving uninstall | Remove evolving's state (.evolving/) from the current repo. |

evolving --help lists all flags.

Runtime backends

| Backend | Auth | Notes | | --- | --- | --- | | claude-code (default) | Uses your claude CLI subscription. No API key needed. | Recommended for most users. Each phase spawns a fresh claude subprocess with role-scoped sandbox + network policies. | | codex | Run codex login once; evolving never sees the token. | Drives codex exec --json. Uses Codex's built-in tools and OS sandbox. |

Pick the backend at evolving init. To switch later, edit .evolving/config.json and re-run init if your model/key changed.

Safety model

The point isn't that your repo is untrusted — it's yours. It's that the Worker rewrites it every cycle, unattended, and the orchestrator runs that freshly-written code before you've reviewed it. Two independent layers confine that:

Worktree partition. Every run gets its own git worktree on a fresh evolving/<runId> branch with gc.auto = 0. Your main worktree is never touched, and a discarded cycle leaves no trace.
Per-role process sandbox. Every spawned subprocess runs through a kernel-level sandbox — agents and the orchestrator's own test / setup / metric commands alike, so agent-authored code stays worktree-confined no matter which process runs it. Your .env is never forwarded into the sandbox env.

Sandboxing is defense-in-depth, not a guarantee — bypasses exist. The threat model, the full deny/allow rules, and per-platform enforcement live in wiki/sandbox.md.

Documentation

The wiki is the authoritative spec — read it before contributing or filing a non-trivial issue.

wiki/architecture.md — vision, cycle loop, agent roles, safety model, CLI surface.
wiki/specification.md — schemas, orchestrator pseudocode, config shape, per-agent prompt contracts.
wiki/runtime.md — how each backend runs a phase: invocation shape, tool wiring, tripwires.
wiki/sandbox.md — two-layer isolation model; per-platform enforcement.
wiki/status.md — what is built right now and what is not. The entry point for picking the project up mid-stream.
wiki/decisions/ — ADRs documenting non-obvious design choices.

Contributing

See CONTRIBUTING.md for setup, the bun run gate validation flow, code style, and the ADR process. Security reports: SECURITY.md.

License

Apache-2.0 — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

evolving

What it does

Install

Prerequisites

Quickstart

CLI surface

Runtime backends

Safety model

Documentation

Contributing

License