@jhlee0619/codexloop

v0.1.4

Published

14 days ago

CodexLoop — iterative improvement loop that drives OpenAI Codex as a multi-role critic (evaluate → suggest → rank → apply → validate → record).

0High
0Medium
0Low

jhlee0619

codex claude-code plugin loop iterative refinement review adversarial convergence cloop cli

CodexLoop

Iterative improvement loop plugin that drives OpenAI Codex as a multi-role critic. Ships as a single repo that installs as both a Claude Code plugin (via /plugin marketplace) and a Codex CLI plugin (via the Codex marketplace), plus a standalone cloop shell binary for plain terminals and CI.

Repository: https://github.com/jhlee0619/CodexLoop
npm: @jhlee0619/codexloop — installs the cloop binary globally
Status: MVP v0.1.0 — six slash commands + background workers + deterministic convergence. See docs/future-work.md for the roadmap.

What it does

Every iteration strictly follows six steps:

evaluate → suggest → rank → apply → validate → record

Codex is used as a multi-role critic + improver (reviewer, adversarial critic, solution generator, refactor advisor, test designer). Every iteration is structurally forced to produce at least two proposals, a single "judge" Codex call ranks them on six dimensions (correctness, requirement satisfaction, simplicity, maintainability, riskInverse, testability), the runtime deterministically re-computes the winner, applies it with git apply + git commit, runs your configured test/lint/type commands, and records the full forensic trace under <target-repo>/.loop/.

Loops terminate on: goal-met, negligible-improvement (convergence), plateau, regression / divergence, or a budget limit (iterations / time / codex calls).

Opinionated defaults

Model: gpt-5.4
Reasoning effort: xhigh

These are forwarded to codex exec via --model gpt-5.4 -c model_reasoning_effort=xhigh and applied regardless of your ~/.codex/config.toml, so loops are reproducible across machines and config changes. Override per loop with /cloop:start --model <m> --effort <level> or permanently for the active loop with /cloop:model <m> --effort <level>.

Interview + approval (Claude Code and Codex CLI)

When you invoke CodexLoop through an LLM orchestrator (Claude Code or Codex CLI), the plugin runs a five-phase interview before launching the runtime:

Pre-flight — git repo check, clean working tree, codex/cloop availability
Goal capture + clarification — rewrite vague input into one concrete immutable sentence
Plan assembly — infer test/lint/type commands from package.json / Makefile / pyproject.toml, confirm budget, choose --wait vs --background
Approval — show the full plan and wait for explicit go/no-go
Invoke — call cloop start … with every confirmed flag

Direct shell callers (cloop start … in a plain terminal or CI job) skip the interview and take flags as-is, so CodexLoop remains deterministic and scriptable.

CodexLoop is a peer of the openai-codex plugin, not a replacement. It uses the same codex CLI binary and expects you to have run /codex:setup once (or npm install -g @openai/codex).

Requirements

Node.js 20 or newer.
git 2.20+ with a clean working tree in the target repository.
codex CLI installed and authenticated.

Installation

CodexLoop installs on three platforms from the same repo. Pick whichever matches your workflow — they compose freely and do not conflict.

1. One-line install script (universal — recommended)

Handles Claude Code + Codex CLI + plain shell in one go:

curl -fsSL https://raw.githubusercontent.com/jhlee0619/CodexLoop/main/install.sh | sh

The script:

Downloads the latest main tarball from GitHub into ~/.local/share/cloop/
Symlinks ~/.local/bin/cloop to the shell wrapper (plain-shell + Codex CLI + CI)
Symlinks ~/.claude/plugins/cloop/local/0.1.0/ to the install dir if ~/.claude/plugins/ exists (Claude Code plugin)
Symlinks ~/plugins/cloop to the install dir and adds a cloop entry to ~/.agents/plugins/marketplace.json if codex is on $PATH (Codex CLI plugin)

Override paths with env vars (CLOOP_REPO, CLOOP_REF, CLOOP_INSTALL_DIR, CLOOP_BIN_DIR). The script is idempotent and safe to re-run for updates.

2. npm global install (shell binary only)

If you only need the cloop shell wrapper (for CI, Codex CLI, or plain terminals) and do not want the Claude Code plugin:

npm install -g @jhlee0619/codexloop
cloop --help

This installs bin/cloop onto your $PATH and carries the runtime scripts + prompts + schemas along with it. It does NOT register with Claude Code — use path 1 or path 3 for that.

Why @jhlee0619/codexloop and not @jhlee0619/CodexLoop? npm requires all package names to be lowercase; scoped names cannot contain capital letters. The package name is the closest lowercase match to the GitHub repo name.

3. Claude Code plugin marketplace

If you only want /cloop:* slash commands inside Claude Code:

/plugin marketplace add jhlee0619/CodexLoop
/plugin install cloop@cloop

Claude Code clones github.com/jhlee0619/CodexLoop into ~/.claude/plugins/marketplaces/cloop/, reads .claude-plugin/marketplace.json, and installs the single cloop plugin it lists. The six slash commands appear under /help immediately.

Update later with:

/plugin marketplace update cloop
/plugin install cloop@cloop

4. Codex CLI plugin (manual)

If you want the Codex CLI plugin without running the one-line installer:

# Clone or symlink into Codex's home-local plugin dir:
git clone https://github.com/jhlee0619/CodexLoop.git ~/plugins/cloop
# or, if you already have the repo somewhere:
# ln -s /path/to/CodexLoop ~/plugins/cloop

# Add cloop to your home-local Codex marketplace:
mkdir -p ~/.agents/plugins
cat > ~/.agents/plugins/marketplace.json <<'JSON'
{
  "name": "home-local",
  "interface": { "displayName": "Home-local plugins" },
  "plugins": [
    {
      "name": "cloop",
      "source": { "source": "local", "path": "./plugins/cloop" },
      "policy": { "installation": "AVAILABLE", "authentication": "NONE" },
      "category": "Automation"
    }
  ]
}
JSON

Restart any active Codex session. Codex will auto-load skills/cloop/SKILL.md, which activates the skill only when you name cloop by keyword (see "Codex activation gating" below).

5. Manual clone + symlink (for plugin development)

If you are hacking on CodexLoop itself:

git clone https://github.com/jhlee0619/CodexLoop.git /path/to/CodexLoop
ln -s /path/to/CodexLoop ~/.claude/plugins/cloop/local/0.1.0
ln -s /path/to/CodexLoop ~/plugins/cloop
ln -s /path/to/CodexLoop/bin/cloop ~/.local/bin/cloop

Edits under /path/to/CodexLoop/ take effect immediately because the plugin caches are symlinks.

Quickstart (Claude Code)

/codex:setup                    # one-time Codex install + auth (from openai-codex plugin)

# inside the target repo, with a clean working tree — just say what you want:
/cloop:start fix the failing add tests

Claude Code walks you through:

Goal — reads TASK.md / GOAL.md / PRD.md / AGENTS.md if present, or asks you directly; rewrites your wording into one concrete sentence and confirms.
Plan — infers plausible --test-cmd / --lint-cmd / --type-cmd candidates from package.json / Makefile / pyproject.toml / go.mod; confirms the budget (iterations, time); defaults the model to gpt-5.4 + xhigh.
Approval — shows the full plan and asks Start loop with this plan? before launching anything. You can edit the goal or plan details without re-typing.
Run — records the current HEAD as seedCommit, adds .loop/ to .gitignore, and drives the six-step iteration cycle until the verdict is goal-met AND validation exits 0, OR the loop converges, OR a budget limit fires.

Then check /cloop:result --diff for the final state.

Power users can bypass the approval dialog by supplying every flag up front and adding --yes:

/cloop:start --wait --yes \
             --goal "fix the failing add test" \
             --max-iter 3 \
             --test-cmd "node --test tests/" \
             --model gpt-5.4 --effort xhigh

Commands

| Command | What it does | |---|---| | /cloop:start | Start a new loop. Runs the five-phase interview + approval first (skip with --yes), then launches the runtime. --wait for foreground, --background for OS-detached. | | /cloop:iterate | Run exactly one iteration synchronously. --model <m> / --effort <e> override the loop's stored values for this iteration only. | | /cloop:status | Show loop state, iteration history, budget consumption, convergence metrics. | | /cloop:stop | SIGTERM a running background worker (graceful, waits 60 s). Use --force for SIGKILL. | | /cloop:result | Dump the full iteration history with ranking breakdowns and validation output. | | /cloop:model | Show or change the Codex model + reasoning effort used by the active loop. Defaults are gpt-5.4 / xhigh. |

See docs/command-spec.md for every flag on every command.

`--wait` vs `--background`

| Flag | Mode | Use when | |---|---|---| | --wait | Foreground — cloop start blocks the tool session until the loop terminates. Claude streams every iteration transcript inline. | Short loops, --dry-run, watching reasoning live, budget sanity checks. | | --background | OS-detached — runtime spawns a detached Node worker via spawnDetached({ detached: true, stdio: "ignore" }).unref(), writes .loop/loop.pid, and exits. The worker runs independently of Claude Code; closing the session does NOT kill it. | Real multi-minute loops. Poll progress with /cloop:status, stop with /cloop:stop. |

Never pass both flags. If you pass neither, the interview recommends one based on --max-iter and --dry-run.

Codex activation gating

Codex auto-loads skills/cloop/SKILL.md on session start, but the skill's description field gates activation on an explicit keyword match. The skill only runs when your request contains one of these tokens (case-insensitive):

cloop
codexloop
CodexLoop
CODEXLOOP

Activation examples:

| What you say to Codex | Skill activates? | |---|---| | "Use cloop to fix the failing auth tests" | yes | | "codexloop this failing build" | yes | | "Start a CodexLoop on src/add.js" | yes | | "CODEXLOOP — fix the broken adds function" | yes | | "iteratively fix the auth tests" | no — no cloop keyword | | "keep trying until the tests pass" | no — no cloop keyword | | "loop on this until it works" | no — no cloop keyword |

When the skill activates, it walks through the same five-phase interview (pre-flight → goal → plan → approval → invoke) that /cloop:start does in Claude Code, and only then invokes cloop start … as a shell command. Defaults (gpt-5.4, xhigh) and the immutable goal hash apply identically on both platforms.

Codex calls cloop as a regular shell command; because CodexLoop spawns its own codex exec children, the running Codex session is used only to orchestrate the wrapper — each loop iteration is an independent, ephemeral Codex invocation, so the outer Codex session's context is never burned by the loop.

Using `cloop` directly from a shell (no LLM)

When the cloop wrapper is on $PATH (via any install path above), you can drive it directly from any terminal or script. There is no interview on this path — the runtime takes your flags verbatim, so it is safe for CI, scripted automation, and power-user shell usage.

# Full-flag form:
cloop start --wait --yes \
            --goal "fix the failing add test" \
            --test-cmd "node --test tests/" \
            --max-iter 3

# Positional shortcut — the goal is the free-form text:
cloop start --wait --yes "fix the failing add test" \
            --test-cmd "node --test tests/" --max-iter 3

cloop status
cloop result --diff
cloop stop

The interview only runs when an LLM orchestration layer (Claude via /cloop:start, or Codex via the skill) sits above cloop. In a plain shell, cloop is deterministic and trusts its flags.

Safety by default

Clean working tree required. Every iteration starts from a clean state so git reset --hard is an unambiguous rollback path.
Atomic state writes. Every write to .loop/state.json uses temp-file-plus-rename so a crash mid-write never corrupts the canonical state.
Reward-hacking guards. The runtime rejects patches that delete test files, add .skip / xit / .only in test files, or modify tests without a substantive justification. Both the suggest and rank prompts forbid these patterns, and apply.mjs hard-rejects them if they slip through.
Deterministic ranking. The judge's weighted sum is never trusted blindly — rank.mjs::recomputeWinner re-computes from the raw dimensional scores and uses the runtime's pick if the two disagree.
Budget caps. --max-iter, --max-time, and --max-calls are enforced before every iteration starts (including dry-run).
Goal immutability. state.goal.goalHash = sha256(text + acceptanceCriteria + testCmd + lintCmd + typeCmd) is asserted on every iteration. Any drift fails loudly.
Single loop per repo. An advisory lock at .loop/loop.lock (keyed by worker PID) refuses a second concurrent loop in the same repo.

Documentation

| File | Purpose | |---|---| | docs/architecture.md | Design overview, directory layout, Codex invocation pattern | | docs/iteration-policy.md | The six-step cycle, role definitions, proposal rules | | docs/state-schema.md | The .loop/state.json shape + per-iteration dump shape | | docs/command-spec.md | Every flag on every slash command | | docs/stopping-criteria.md | Quality score formula + convergence triggers | | docs/known-limitations.md | What MVP does NOT do (yet) | | docs/future-work.md | v0.5 / v1 roadmap | | docs/examples/01-bugfix-loop.md | Step-by-step walk-through of a real bugfix loop |

Updating

How to update an existing CodexLoop installation — depends on how you installed it originally.

npm

npm update -g @jhlee0619/codexloop

Claude Code marketplace

/plugin marketplace update cloop
/plugin install cloop@cloop

install.sh

Re-run the same one-liner (the script is idempotent):

curl -fsSL https://raw.githubusercontent.com/jhlee0619/CodexLoop/main/install.sh | sh

clone + symlink

cd /path/to/CodexLoop
git pull origin main

Symlink means git pull is all you need — changes take effect immediately.

v0.1.4 upgrade note

v0.1.4 aligns CodexLoop with OpenAI's published guidance on iterating on difficult problems. Four behavior changes you should know about:

Dual-threshold stopping. The loop now stops only when BOTH the overall quality score AND the LLM-judge rubric average clear their thresholds (default 0.90 for each). The pre-v0.1.4 implicit single threshold was 0.75. To restore the old behavior: cloop start … --quality-threshold 0.75 --judge-threshold 0.75.
Below-target stalls no longer stop the loop. Previously a flat quality delta below the threshold terminated the loop with plateau. Now the reviewer must report a bottleneck and the loop continues for up to plateauLimit (default 3) stalled iterations before giving up with reason plateau-exhausted.
AGENTS.md is treated as authoritative. If the repo has an AGENTS.md (or CLAUDE.md / .cursorrules fallback), its content is injected into every evaluate/suggest/rank prompt as <custom_instructions> (capped at 16KB). Findings and proposals that ignore it are penalized.
Patch-size discipline. Suggest proposals over 200 diff lines will be penalized at rank time unless the proposal's focusArea justifies scope. The rank step can now demote a sprawling winner (>400 lines) in favor of a tighter runner-up within 0.05 weighted score.

State files from v0.1.3 (version: 2) migrate automatically on first load — no manual action required.

Tests

No framework dependency — every test is a standalone node script.

node tests/unit/state.test.mjs         # 55 tests: defaults, round-trip, lock, migration, model/effort
node tests/unit/rank.test.mjs          # 15 tests: weighted score, tiebreaker, reward-hacking floor
node tests/unit/convergence.test.mjs   # 16 tests: quality score, goal-met, budget, negligible, plateau, divergence
node tests/integration/loop-smoke.test.mjs  # 50 tests: end-to-end dry-run with mock codex

The integration smoke test sets up a scratch git repo in /tmp, points the runtime at tests/fixtures/mock-codex-exec.mjs via CODEXLOOP_CODEX_BIN, and runs a full dry-run loop that reaches goal-met. It spends zero Codex budget.

Total: 136 tests passing.

Plugin manifests

CodexLoop ships two platform manifests at the repo root:

.claude-plugin/plugin.json + .claude-plugin/marketplace.json — Claude Code plugin + marketplace entry
.codex-plugin/plugin.json — Codex CLI plugin manifest (skills discovered from skills/)

The same Node runtime (scripts/loop-companion.mjs) and the same shell wrapper (bin/cloop) back both plugins. No code is duplicated between platforms.

Acknowledgments

CodexLoop is an independent project but its design borrows generously from two upstream projects. Neither is a dependency — every line of CodexLoop is an original reinterpretation — but both deserve explicit credit.

snarktank/ralph

Ralph (after Geoffrey Huntley's "Ralph Wiggum" pattern) is the iterative-loop project that popularized the idea of driving an LLM through a stateless fresh-context loop with file-based state carrying the only memory between iterations. CodexLoop inherits that core principle:

Every Codex call runs codex exec --ephemeral — no session reuse, no conversational memory across iterations.
All state lives in files under <repo>/.loop/ — state.json is the canonical shape, per-iteration dumps are in iterations/NNNN.json, and an append-only progress.log captures the learning trajectory the way Ralph's progress.txt does.
One story (one proposal) applied per iteration, scope-disciplined by design.
An explicit "completion marker" decides when the loop terminates — in CodexLoop the reviewer emits verdict: "goal-met" and the runtime verifies via one more validation pass before accepting it.

Ralph's stateless-loop pattern is the mental model CodexLoop extends to a multi-role critic system.

openai/codex-plugin-cc

The official Codex plugin for Claude Code is the reference architecture for how a Claude Code plugin should wrap the codex CLI. CodexLoop studies (but does not import) many of its structural conventions:

Minimal .claude-plugin/plugin.json manifest with auto-discovery for commands/, skills/, agents/.
Command file frontmatter with description, argument-hint, disable-model-invocation, allowed-tools, plus an inline !node ...`` body shell form for read-only commands.
$ARGUMENTS passthrough + the foreground-vs-background UX pattern using AskUserQuestion.
A Node companion script (scripts/codex-companion.mjs) that verb-dispatches, spawns detached background workers via { detached: true, stdio: "ignore" } + .unref(), and manages workspace-scoped state directories.
JSON-schema-driven Codex outputs via codex exec --output-schema, with XML-tagged prompt templates (<role>, <task>, <structured_output_contract>, <grounding_rules>, <final_check>).
The codex-rescue subagent pattern — a thin forwarder that delegates to Codex via a skill-routed one-shot call.

CodexLoop's scripts/loop-companion.mjs, commands/*.md, prompts/*.md, and schemas/*.json are original code written from scratch — no file is copied — but they are line-by-line recognizable as siblings of the openai-codex plugin's counterparts. A reader familiar with that plugin can navigate CodexLoop with zero cognitive tax.

Thanks to both projects for publishing their work openly.

License

MIT — see LICENSE.