@jhlee0619/codexloop
v0.1.4
Published
CodexLoop — iterative improvement loop that drives OpenAI Codex as a multi-role critic (evaluate → suggest → rank → apply → validate → record).
Maintainers
Readme
CodexLoop
Iterative improvement loop plugin that drives OpenAI Codex as a multi-role critic. Ships as a single repo that installs as both a Claude Code plugin (via
/plugin marketplace) and a Codex CLI plugin (via the Codex marketplace), plus a standalonecloopshell binary for plain terminals and CI.
- Repository: https://github.com/jhlee0619/CodexLoop
- npm:
@jhlee0619/codexloop— installs thecloopbinary globally - Status: MVP v0.1.0 — six slash commands + background workers + deterministic convergence. See docs/future-work.md for the roadmap.
What it does
Every iteration strictly follows six steps:
evaluate → suggest → rank → apply → validate → recordCodex is used as a multi-role critic + improver (reviewer, adversarial
critic, solution generator, refactor advisor, test designer). Every
iteration is structurally forced to produce at least two proposals, a
single "judge" Codex call ranks them on six dimensions (correctness,
requirement satisfaction, simplicity, maintainability, riskInverse,
testability), the runtime deterministically re-computes the winner,
applies it with git apply + git commit, runs your configured
test/lint/type commands, and records the full forensic trace under
<target-repo>/.loop/.
Loops terminate on: goal-met, negligible-improvement (convergence), plateau, regression / divergence, or a budget limit (iterations / time / codex calls).
Opinionated defaults
- Model:
gpt-5.4 - Reasoning effort:
xhigh
These are forwarded to codex exec via --model gpt-5.4 -c
model_reasoning_effort=xhigh and applied regardless of your
~/.codex/config.toml, so loops are reproducible across machines and
config changes. Override per loop with /cloop:start --model <m>
--effort <level> or permanently for the active loop with /cloop:model
<m> --effort <level>.
Interview + approval (Claude Code and Codex CLI)
When you invoke CodexLoop through an LLM orchestrator (Claude Code or Codex CLI), the plugin runs a five-phase interview before launching the runtime:
- Pre-flight — git repo check, clean working tree,
codex/cloopavailability - Goal capture + clarification — rewrite vague input into one concrete immutable sentence
- Plan assembly — infer test/lint/type commands from
package.json/Makefile/pyproject.toml, confirm budget, choose--waitvs--background - Approval — show the full plan and wait for explicit go/no-go
- Invoke — call
cloop start …with every confirmed flag
Direct shell callers (cloop start … in a plain terminal or CI job) skip
the interview and take flags as-is, so CodexLoop remains deterministic
and scriptable.
CodexLoop is a peer of the openai-codex plugin, not a replacement.
It uses the same codex CLI binary and expects you to have run
/codex:setup once (or npm install -g @openai/codex).
Requirements
- Node.js 20 or newer.
- git 2.20+ with a clean working tree in the target repository.
codexCLI installed and authenticated.
Installation
CodexLoop installs on three platforms from the same repo. Pick whichever matches your workflow — they compose freely and do not conflict.
1. One-line install script (universal — recommended)
Handles Claude Code + Codex CLI + plain shell in one go:
curl -fsSL https://raw.githubusercontent.com/jhlee0619/CodexLoop/main/install.sh | shThe script:
- Downloads the latest
maintarball from GitHub into~/.local/share/cloop/ - Symlinks
~/.local/bin/cloopto the shell wrapper (plain-shell + Codex CLI + CI) - Symlinks
~/.claude/plugins/cloop/local/0.1.0/to the install dir if~/.claude/plugins/exists (Claude Code plugin) - Symlinks
~/plugins/cloopto the install dir and adds acloopentry to~/.agents/plugins/marketplace.jsonifcodexis on$PATH(Codex CLI plugin)
Override paths with env vars (CLOOP_REPO, CLOOP_REF, CLOOP_INSTALL_DIR,
CLOOP_BIN_DIR). The script is idempotent and safe to re-run for updates.
2. npm global install (shell binary only)
If you only need the cloop shell wrapper (for CI, Codex CLI, or plain
terminals) and do not want the Claude Code plugin:
npm install -g @jhlee0619/codexloop
cloop --helpThis installs bin/cloop onto your $PATH and carries the runtime
scripts + prompts + schemas along with it. It does NOT register with
Claude Code — use path 1 or path 3 for that.
Why
@jhlee0619/codexloopand not@jhlee0619/CodexLoop? npm requires all package names to be lowercase; scoped names cannot contain capital letters. The package name is the closest lowercase match to the GitHub repo name.
3. Claude Code plugin marketplace
If you only want /cloop:* slash commands inside Claude Code:
/plugin marketplace add jhlee0619/CodexLoop
/plugin install cloop@cloopClaude Code clones github.com/jhlee0619/CodexLoop into
~/.claude/plugins/marketplaces/cloop/, reads
.claude-plugin/marketplace.json, and installs the single cloop plugin
it lists. The six slash commands appear under /help immediately.
Update later with:
/plugin marketplace update cloop
/plugin install cloop@cloop4. Codex CLI plugin (manual)
If you want the Codex CLI plugin without running the one-line installer:
# Clone or symlink into Codex's home-local plugin dir:
git clone https://github.com/jhlee0619/CodexLoop.git ~/plugins/cloop
# or, if you already have the repo somewhere:
# ln -s /path/to/CodexLoop ~/plugins/cloop
# Add cloop to your home-local Codex marketplace:
mkdir -p ~/.agents/plugins
cat > ~/.agents/plugins/marketplace.json <<'JSON'
{
"name": "home-local",
"interface": { "displayName": "Home-local plugins" },
"plugins": [
{
"name": "cloop",
"source": { "source": "local", "path": "./plugins/cloop" },
"policy": { "installation": "AVAILABLE", "authentication": "NONE" },
"category": "Automation"
}
]
}
JSONRestart any active Codex session. Codex will auto-load
skills/cloop/SKILL.md, which activates the skill only when you name
cloop by keyword (see "Codex activation gating" below).
5. Manual clone + symlink (for plugin development)
If you are hacking on CodexLoop itself:
git clone https://github.com/jhlee0619/CodexLoop.git /path/to/CodexLoop
ln -s /path/to/CodexLoop ~/.claude/plugins/cloop/local/0.1.0
ln -s /path/to/CodexLoop ~/plugins/cloop
ln -s /path/to/CodexLoop/bin/cloop ~/.local/bin/cloopEdits under /path/to/CodexLoop/ take effect immediately because the
plugin caches are symlinks.
Quickstart (Claude Code)
/codex:setup # one-time Codex install + auth (from openai-codex plugin)
# inside the target repo, with a clean working tree — just say what you want:
/cloop:start fix the failing add testsClaude Code walks you through:
- Goal — reads
TASK.md/GOAL.md/PRD.md/AGENTS.mdif present, or asks you directly; rewrites your wording into one concrete sentence and confirms. - Plan — infers plausible
--test-cmd/--lint-cmd/--type-cmdcandidates frompackage.json/Makefile/pyproject.toml/go.mod; confirms the budget (iterations, time); defaults the model togpt-5.4+xhigh. - Approval — shows the full plan and asks
Start loop with this plan?before launching anything. You can edit the goal or plan details without re-typing. - Run — records the current HEAD as
seedCommit, adds.loop/to.gitignore, and drives the six-step iteration cycle until the verdict isgoal-metAND validation exits 0, OR the loop converges, OR a budget limit fires.
Then check /cloop:result --diff for the final state.
Power users can bypass the approval dialog by supplying every flag up front and adding --yes:
/cloop:start --wait --yes \
--goal "fix the failing add test" \
--max-iter 3 \
--test-cmd "node --test tests/" \
--model gpt-5.4 --effort xhighCommands
| Command | What it does |
|---|---|
| /cloop:start | Start a new loop. Runs the five-phase interview + approval first (skip with --yes), then launches the runtime. --wait for foreground, --background for OS-detached. |
| /cloop:iterate | Run exactly one iteration synchronously. --model <m> / --effort <e> override the loop's stored values for this iteration only. |
| /cloop:status | Show loop state, iteration history, budget consumption, convergence metrics. |
| /cloop:stop | SIGTERM a running background worker (graceful, waits 60 s). Use --force for SIGKILL. |
| /cloop:result | Dump the full iteration history with ranking breakdowns and validation output. |
| /cloop:model | Show or change the Codex model + reasoning effort used by the active loop. Defaults are gpt-5.4 / xhigh. |
See docs/command-spec.md for every flag on every command.
--wait vs --background
| Flag | Mode | Use when |
|---|---|---|
| --wait | Foreground — cloop start blocks the tool session until the loop terminates. Claude streams every iteration transcript inline. | Short loops, --dry-run, watching reasoning live, budget sanity checks. |
| --background | OS-detached — runtime spawns a detached Node worker via spawnDetached({ detached: true, stdio: "ignore" }).unref(), writes .loop/loop.pid, and exits. The worker runs independently of Claude Code; closing the session does NOT kill it. | Real multi-minute loops. Poll progress with /cloop:status, stop with /cloop:stop. |
Never pass both flags. If you pass neither, the interview recommends one
based on --max-iter and --dry-run.
Codex activation gating
Codex auto-loads skills/cloop/SKILL.md on session start, but the skill's
description field gates activation on an explicit keyword match. The
skill only runs when your request contains one of these tokens
(case-insensitive):
cloopcodexloopCodexLoopCODEXLOOP
Activation examples:
| What you say to Codex | Skill activates? | |---|---| | "Use cloop to fix the failing auth tests" | yes | | "codexloop this failing build" | yes | | "Start a CodexLoop on src/add.js" | yes | | "CODEXLOOP — fix the broken adds function" | yes | | "iteratively fix the auth tests" | no — no cloop keyword | | "keep trying until the tests pass" | no — no cloop keyword | | "loop on this until it works" | no — no cloop keyword |
When the skill activates, it walks through the same five-phase interview
(pre-flight → goal → plan → approval → invoke) that /cloop:start does
in Claude Code, and only then invokes cloop start … as a shell command.
Defaults (gpt-5.4, xhigh) and the immutable goal hash apply
identically on both platforms.
Codex calls cloop as a regular shell command; because CodexLoop spawns
its own codex exec children, the running Codex session is used only to
orchestrate the wrapper — each loop iteration is an independent,
ephemeral Codex invocation, so the outer Codex session's context is
never burned by the loop.
Using cloop directly from a shell (no LLM)
When the cloop wrapper is on $PATH (via any install path above), you
can drive it directly from any terminal or script. There is no
interview on this path — the runtime takes your flags verbatim, so it
is safe for CI, scripted automation, and power-user shell usage.
# Full-flag form:
cloop start --wait --yes \
--goal "fix the failing add test" \
--test-cmd "node --test tests/" \
--max-iter 3
# Positional shortcut — the goal is the free-form text:
cloop start --wait --yes "fix the failing add test" \
--test-cmd "node --test tests/" --max-iter 3
cloop status
cloop result --diff
cloop stopThe interview only runs when an LLM orchestration layer (Claude via
/cloop:start, or Codex via the skill) sits above cloop. In a plain
shell, cloop is deterministic and trusts its flags.
Safety by default
- Clean working tree required. Every iteration starts from a clean state so
git reset --hardis an unambiguous rollback path. - Atomic state writes. Every write to
.loop/state.jsonuses temp-file-plus-rename so a crash mid-write never corrupts the canonical state. - Reward-hacking guards. The runtime rejects patches that delete test files, add
.skip/xit/.onlyin test files, or modify tests without a substantive justification. Both thesuggestandrankprompts forbid these patterns, andapply.mjshard-rejects them if they slip through. - Deterministic ranking. The judge's weighted sum is never trusted blindly —
rank.mjs::recomputeWinnerre-computes from the raw dimensional scores and uses the runtime's pick if the two disagree. - Budget caps.
--max-iter,--max-time, and--max-callsare enforced before every iteration starts (including dry-run). - Goal immutability.
state.goal.goalHash = sha256(text + acceptanceCriteria + testCmd + lintCmd + typeCmd)is asserted on every iteration. Any drift fails loudly. - Single loop per repo. An advisory lock at
.loop/loop.lock(keyed by worker PID) refuses a second concurrent loop in the same repo.
Documentation
| File | Purpose |
|---|---|
| docs/architecture.md | Design overview, directory layout, Codex invocation pattern |
| docs/iteration-policy.md | The six-step cycle, role definitions, proposal rules |
| docs/state-schema.md | The .loop/state.json shape + per-iteration dump shape |
| docs/command-spec.md | Every flag on every slash command |
| docs/stopping-criteria.md | Quality score formula + convergence triggers |
| docs/known-limitations.md | What MVP does NOT do (yet) |
| docs/future-work.md | v0.5 / v1 roadmap |
| docs/examples/01-bugfix-loop.md | Step-by-step walk-through of a real bugfix loop |
Updating
How to update an existing CodexLoop installation — depends on how you installed it originally.
npm
npm update -g @jhlee0619/codexloopClaude Code marketplace
/plugin marketplace update cloop
/plugin install cloop@cloopinstall.sh
Re-run the same one-liner (the script is idempotent):
curl -fsSL https://raw.githubusercontent.com/jhlee0619/CodexLoop/main/install.sh | shclone + symlink
cd /path/to/CodexLoop
git pull origin mainSymlink means git pull is all you need — changes take effect immediately.
v0.1.4 upgrade note
v0.1.4 aligns CodexLoop with OpenAI's published guidance on iterating on difficult problems. Four behavior changes you should know about:
- Dual-threshold stopping. The loop now stops only when BOTH the overall
quality score AND the LLM-judge rubric average clear their thresholds
(default
0.90for each). The pre-v0.1.4 implicit single threshold was0.75. To restore the old behavior:cloop start … --quality-threshold 0.75 --judge-threshold 0.75. - Below-target stalls no longer stop the loop. Previously a flat quality
delta below the threshold terminated the loop with
plateau. Now the reviewer must report abottleneckand the loop continues for up toplateauLimit(default3) stalled iterations before giving up with reasonplateau-exhausted. AGENTS.mdis treated as authoritative. If the repo has anAGENTS.md(orCLAUDE.md/.cursorrulesfallback), its content is injected into every evaluate/suggest/rank prompt as<custom_instructions>(capped at 16KB). Findings and proposals that ignore it are penalized.- Patch-size discipline. Suggest proposals over 200 diff lines will be
penalized at rank time unless the proposal's
focusAreajustifies scope. The rank step can now demote a sprawling winner (>400 lines) in favor of a tighter runner-up within 0.05 weighted score.
State files from v0.1.3 (version: 2) migrate automatically on first load —
no manual action required.
Tests
No framework dependency — every test is a standalone node script.
node tests/unit/state.test.mjs # 55 tests: defaults, round-trip, lock, migration, model/effort
node tests/unit/rank.test.mjs # 15 tests: weighted score, tiebreaker, reward-hacking floor
node tests/unit/convergence.test.mjs # 16 tests: quality score, goal-met, budget, negligible, plateau, divergence
node tests/integration/loop-smoke.test.mjs # 50 tests: end-to-end dry-run with mock codexThe integration smoke test sets up a scratch git repo in /tmp, points
the runtime at tests/fixtures/mock-codex-exec.mjs via
CODEXLOOP_CODEX_BIN, and runs a full dry-run loop that reaches
goal-met. It spends zero Codex budget.
Total: 136 tests passing.
Plugin manifests
CodexLoop ships two platform manifests at the repo root:
.claude-plugin/plugin.json+.claude-plugin/marketplace.json— Claude Code plugin + marketplace entry.codex-plugin/plugin.json— Codex CLI plugin manifest (skills discovered fromskills/)
The same Node runtime (scripts/loop-companion.mjs) and the same shell
wrapper (bin/cloop) back both plugins. No code is duplicated between
platforms.
Acknowledgments
CodexLoop is an independent project but its design borrows generously from two upstream projects. Neither is a dependency — every line of CodexLoop is an original reinterpretation — but both deserve explicit credit.
snarktank/ralph
Ralph (after Geoffrey Huntley's "Ralph Wiggum" pattern) is the iterative-loop project that popularized the idea of driving an LLM through a stateless fresh-context loop with file-based state carrying the only memory between iterations. CodexLoop inherits that core principle:
- Every Codex call runs
codex exec --ephemeral— no session reuse, no conversational memory across iterations. - All state lives in files under
<repo>/.loop/—state.jsonis the canonical shape, per-iteration dumps are initerations/NNNN.json, and an append-onlyprogress.logcaptures the learning trajectory the way Ralph'sprogress.txtdoes. - One story (one proposal) applied per iteration, scope-disciplined by design.
- An explicit "completion marker" decides when the loop terminates — in CodexLoop the reviewer emits
verdict: "goal-met"and the runtime verifies via one more validation pass before accepting it.
Ralph's stateless-loop pattern is the mental model CodexLoop extends to a multi-role critic system.
openai/codex-plugin-cc
The official Codex plugin for Claude Code is the reference architecture
for how a Claude Code plugin should wrap the codex CLI. CodexLoop
studies (but does not import) many of its structural conventions:
- Minimal
.claude-plugin/plugin.jsonmanifest with auto-discovery forcommands/,skills/,agents/. - Command file frontmatter with
description,argument-hint,disable-model-invocation,allowed-tools, plus an inline!node ...`` body shell form for read-only commands. $ARGUMENTSpassthrough + the foreground-vs-background UX pattern usingAskUserQuestion.- A Node companion script (
scripts/codex-companion.mjs) that verb-dispatches, spawns detached background workers via{ detached: true, stdio: "ignore" }+.unref(), and manages workspace-scoped state directories. - JSON-schema-driven Codex outputs via
codex exec --output-schema, with XML-tagged prompt templates (<role>,<task>,<structured_output_contract>,<grounding_rules>,<final_check>). - The
codex-rescuesubagent pattern — a thin forwarder that delegates to Codex via a skill-routed one-shot call.
CodexLoop's scripts/loop-companion.mjs, commands/*.md,
prompts/*.md, and schemas/*.json are original code written from
scratch — no file is copied — but they are line-by-line recognizable as
siblings of the openai-codex plugin's counterparts. A reader familiar
with that plugin can navigate CodexLoop with zero cognitive tax.
Thanks to both projects for publishing their work openly.
License
MIT — see LICENSE.
