codex-on-claude

v0.5.6

Published

a month ago

Bridge that lets Claude Code call OpenAI Codex CLI as an MCP sub-agent. Installable Skills + optional isolated review Agent, a continuous-improvement loop with optional PostToolUse hook auto-logging, and a persistent thread catalog. Local-only telemetry,

0High
0Medium
0Low

pathcosmos

claude-code codex codex-cli mcp anthropic openai agent skill plugin ai

codex-on-claude

Call OpenAI Codex CLI as a sub-agent from inside Claude Code. MCP transport + curated Skills + an optional isolated review Agent + a continuous-improvement loop with optional PostToolUse hook auto-logging + a persistent thread catalog — installed by one CLI.

🇰🇷 Korean mirror: README.ko.md

TL;DR

# Prerequisites already installed: Codex CLI + Claude Code, both logged in.
codex --version && claude --version

# Canonical: install or update + reconfigure in one shot
npx --yes codex-on-claude@latest

The installer:

Runs preflight — Node 18.17+, codex, claude, codex doctor, claude auth status, and the codex MCP server's Connected status.
Walks you through eight questions with arrow-key navigation (↑/↓, Space to toggle, Enter to confirm). v0.5.0 adds question §7 (usageMode) — Codex invocation policy (none / synergy / auto / max). v0.5.1 adds question §8 (cerberus) — opt-in 3-head planning consensus mode (default off). Existing installs migrate silently (usageMode=synergy, cerberus=off).
Shows a final review screen with Apply / Edit again / Cancel.
Copies the selected Skills (and optional Agent / hooks / PreToolUse gate when usageMode=none) under ~/.claude/.

Run the same command any time to update + reconfigure — if prior state is found, the installer prints a vPREV → vCURR banner and walks you through previous answers as defaults. Or use codex-on-claude reconfigure explicitly.

Why this exists

The bare wiring (Claude Code ↔ Codex CLI via MCP) is a one-line claude mcp add, but day-to-day use needs more:

A Skill so you don't re-type the same options every call.
An optional Agent so a 30KB Codex response doesn't eat the main context.
A continuous-improvement loop that watches your usage and proposes smarter defaults.
A persistent thread catalog so Codex sessions you started yesterday don't vanish into a "Session not found" error.

codex-on-claude installs the whole layered stack, and lets you turn each layer on/off.

Prerequisites

| Item | Minimum | Check | |---|---|---| | Node.js | 18.17 | node --version | | Claude Code (claude) | 2.1.x | claude --version | | Codex CLI (codex) | 0.13.x | codex --version, codex doctor | | codex MCP server | Connected | claude mcp get codex | | zsh / bash | — | the installer uses one of these |

Both CLIs must be logged in.

claude auth status --text
codex doctor --summary
claude mcp get codex          # Status: ✓ Connected
codex-on-claude doctor        # runs all of the above in one shot

Install

A. via npx (recommended)

npx --yes codex-on-claude@latest   # always pulls latest + auto-reconfigures if state exists

The --yes flag skips the "install package" confirmation; @latest cache-busts npx so you never hit a stale _npx/<hash>/ cache. The bare form npx codex-on-claude also works but is more prone to corrupted-cache ENOENT errors (see Troubleshooting).

B. global install

npm install -g codex-on-claude
codex-on-claude

C. from source

git clone https://github.com/pathcosmos/codex-on-claude.git
cd codex-on-claude
node install/install.mjs

D. non-interactive (CI / scripted)

codex-on-claude \
  --patterns=review,followup,fix,routine \
  --context-policy=mixed \
  --improvement-loop=manual \
  --threads=basic \
  --usage-mode=synergy \
  --auto-tier2-llm-probe=on \
  --subscription-claude=max --subscription-codex=pro \
  --codex-model-primary=gpt-5.5 --codex-reasoning-primary=xhigh \
  --reviewer-model-primary=opus --reviewer-reasoning-primary=xhigh \
  --yes

The --subscription-* flags declare what you actually have access to. The --*-model-primary / --*-reasoning-primary flags pick the model + effort level you want to run by default. Fallback is auto-locked to each tier's base — if you exhaust quota on the primary, Skill prose retries once on the fallback (gpt-5/medium for Codex, sonnet/medium for the Claude reviewer agent at default tiers). Pass --codex-model-fallback=... etc. if you really need to override the lock; the installer warns and re-snaps to base.

--share-scope=... is accepted but deprecated since v0.3 and ignored (a one-line notice prints). The plugin-bundle workflow was removed in 0.3.0; if you need team distribution, share this GitHub repo and have everyone run the installer.

Updating codex-on-claude

npx users — one command does both. npx --yes codex-on-claude@latest fetches the newest release and auto-runs reconfigure when prior state is found (you'll see a vPREV → vCURR banner). No need to chain a separate reconfigure call.

Global install — bump via npm, then re-run:

npm install -g codex-on-claude@latest    # `npm update -g` also works
hash -r                                  # zsh/bash: refresh the command hash so the new binary is picked up
codex-on-claude                          # auto-reconfigures because prior state is detected

If codex-on-claude still says "command not found" after npm update -g, that's a stale shell hash (see Troubleshooting).

Re-run reconfigure after updating. Whether via npx or global, your previous answers come pre-selected, you can change any of the four options via arrow keys, and the final review screen shows exactly what's about to be saved.
Roll back a release if needed:
```
npm install -g [email protected]
codex-on-claude reconfigure
```
State files are forward/backward-compatible across patch releases; alias migrations (e.g. on-demand → manual) are one-way, and an older binary will just ignore unknown keys.
What gets touched by npm update itself? Only the CLI binary. Files under ~/.claude/skills/codex-*, ~/.claude/agents/codex-reviewer.md, ~/.claude/settings.json (hook entries), and ~/.claude/codex-on-claude/ are modified only when you run reconfigure, uninstall, or another explicit subcommand.
Deprecated versions on npm carry a one-line Use codex-on-claude@^X.Y.Z pointer that npm install will surface.

The install questions

Each answer is also a CLI flag for non-interactive use. Reconfigure later with codex-on-claude reconfigure — your previous answers come back pre-selected.

v0.4.1 raises the count from 4 to 6 (subscription + primary model / reasoning per side). All new questions live below the original four; the originals still work the same way.
v0.5.0 adds question §7 (usageMode) — Codex invocation policy. Existing installs migrate silently to synergy (no prompt); explicit reconfigure surfaces the new question.

1. patterns (multi-select)

| Choice | Skills installed | |---|---| | One-shot read-only review | codex-review | | Long multi-turn session | codex-followup, codex-resume | | Workspace-write delegation | codex-fix (with strict file allowlist guardrails) | | Templated repeating job | codex-routine |

Flag: --patterns=review,followup,fix,routine

2. contextPolicy (single)

| Choice | Effect | |---|---| | direct | Skills only — Codex responses land in the main context | | summarize | Adds codex-reviewer subagent; Skills route via the agent so only the summary reaches main | | mixed (recommended) | Both — Skills decide per call |

Flag: --context-policy=direct|summarize|mixed

3. improvementLoop (single)

| Choice | Behavior | |---|---| | off | No codex-analyze / codex-improve / codex-log Skills, no hooks | | manual (default) | Skill prose triggers codex-on-claude log; no OS hook (was: on-demand, kept as alias) | | auto-on-skill | Adds two PostToolUse hooks in ~/.claude/settings.json so every mcp__codex__codex / codex-reply call is auto-logged — no Skill-prose dependence | | periodic | auto-on-skill plus cron/loop-friendly periodic analysis guidance |

Flag: --improvement-loop=off|manual|auto-on-skill|periodic (alias: on-demand → manual)

Logging is metadata only — prompt / response bodies are never written to disk. The hooks we install carry a _coc.marker = "codex-on-claude:auto-log" sentinel so uninstall / reconfigure --improvement-loop=manual remove only the entries we added (hook-level filter as of 0.3.4 — user hooks that happen to coexist in the same group are preserved).

Runtime config drift (v0.3.4+): if you manually edit ~/.claude/codex-on-claude/config.json to set improvementLoop to off or manual without re-running reconfigure, the installed hook now respects the new value on the next call (no more silent telemetry from a stale hook). Corrupt or missing config fail-opens so existing installs don't break unexpectedly.

Hook payload extraction (v0.3.4+): threadId is now correctly extracted from Claude Code 2.1.x's PostToolUse payload (where tool_response is a JSON-encoded string), and elapsedMs is populated from duration_ms. Previously every auto-hook log entry had threadId: null, which silently broke analyzer rules that join logs to threads.

4. threads (single, v0.2+)

Persistent thread catalog at ~/.claude/codex-on-claude/threads/<threadId>.json.

| Choice | Captured per thread | |---|---| | off | nothing | | basic (default when improvement loop is on) | threadId + title + tags + originatingSkill + lastUsed + turnCount | | full | above plus goal / outcome / decision / note summaries, incidents (issue/resolution/outcome), and fallbackStrategy (auto-resume / ask / new) |

Flag: --threads=off|basic|full

External transport: none. Bodies of prompts / responses are never stored — only short text you (or a Skill) explicitly write via threads outcome | decision | note | incident.

5. subscription tiers (v0.4.1)

You declare which Claude + Codex subscription tier you actually have. The installer uses this only to constrain the model + reasoning choices in questions 6 below — it never sends anything to Anthropic/OpenAI to verify.

| Side | Tiers offered (highest → lowest) | Default | |---|---|---| | Claude | enterprise / team / max / pro / free | max | | Codex | team / pro / plus / free | pro |

Flags: --subscription-claude=... and --subscription-codex=...

A declared tier that doesn't match reality is fine — but expect the frequent-fallback analyzer rule to fire as your primary tier exhausts quota repeatedly. Reconfigure to a lower tier (and lower primary reasoning) when that happens.

6. model / reasoning — primary + fallback (v0.4.1)

For each side (Codex calls, Claude reviewer subagent) you pick:

Primary: the model + reasoning effort you want by default
Fallback: locked to the subscription tier's base — used automatically when Skill prose detects a rate_limit_exceeded / quota / 429 / usage_limit_reached error

| Side | Primary flags | |---|---| | Codex | --codex-model-primary=gpt-5.5 --codex-reasoning-primary=xhigh | | Reviewer (Claude subagent) | --reviewer-model-primary=opus --reviewer-reasoning-primary=xhigh |

Reasoning vocabulary (shared by Claude --effort and Codex model_reasoning_effort): low / medium / high / xhigh / max.

How fallback fires:

Codex side: each mcp__codex__codex call's Skill prose ends with a "if you see rate_limit/quota, retry once with the locked fallback" block.
Reviewer subagent route: the primary codex-reviewer agent emits the sentinel line CODEX_QUOTA_FALLBACK_NEEDED; /codex-review Skill prose re-launches via codex-reviewer-fallback.
claude -p direct callers (cron / bench): pass --fallback-model <id> — Claude's CLI handles it natively without prose. Note: --effort only applies to primary; Claude doesn't have a --fallback-effort flag.

You can pass --codex-model-fallback=... etc. but values that don't match the matrix base are ignored with a warning — the lock is intentional. To "unlock" the fallback, raise your subscription tier (which raises the base).

Fallback events accrue in the usage log (outcome=fallback, errorKind=quota|rate_limit|...). codex-on-claude analyze surfaces a ruleFrequentFallback candidate at ≥5 events in the window — your hint to upgrade or de-tune the primary.

7. usage-mode — Codex invocation policy (v0.5.0)

How aggressively Claude consults Codex. Pick one of four:

| Mode | One-line definition | Default? | Behavior summary | |---|---|---|---| | none | Codex calls blocked at PreToolUse gate | — | α-only. Privacy / cost-sensitive use. The PreToolUse hook denies mcp__codex__codex with a structured reason. | | synergy | Follow v9 guidance (Quick-Ref 3-Q tree + R1–R6 recipes) | default for new installs + silent migration default | 90% sweet spot. Codex fires only when matched recipes apply (adversarial review, sweet-spot partial-fail, hard reasoning). | | auto | Heuristic signal detection + optional Tier 2 LLM probe | for power users | Tier 1: heuristic regex over prompt. Tier 2 (--auto-tier2-llm-probe=on): $0.01–0.02 Codex meta-classifier on ambiguous tasks. | | max | Quality-first bounded automation | for critical tasks | R1 default ON; R5 always probe; γ hot-swap auto on P5 catastrophe. Hard DO-NOT rules still enforced — max ≠ override. |

Flags: --usage-mode=none|synergy|auto|max and (auto-only) --auto-tier2-llm-probe=on|off.

Hard guardrails (every mode) — surfaced as {{guardrail*}} placeholders in every SKILL.md:

chainJsonTrap: hard-block (avoid Chain-JSON Trap, route to R6 Format-Safe Handoff)
subagentStrict: hard-block (don't ask subagent for strict-JSON when adversarial output is also needed)
turnBurn: 3-turn-stop (stop multi-turn followups at turn 3 unless new context arrived)
ceilingNoUpside: warn-and-skip (when α is ≈100% already, β can only equal it)

Detailed spec + decision tree: docs/usage-mode-config.md. Quick decision card: docs/guidance-quick-ref.md. Recipe details: docs/synergy-playbook.md.

Existing installs (pre-0.5.0) migrate silently to synergy on the next npx --yes codex-on-claude@latest — no prompt, no behavior change. Run codex-on-claude reconfigure if you want to surface the new question.

v0.5.0 hardening (post-L6 review — 14 follow-up fixes)

Beyond the headline 4-mode policy, v0.5.0 ships a comprehensive set of fixes from a Codex peer review pass + adversarial security review. User-visible changes:

PreToolUse gate is race-free. The hook command embeds --enforce-mode=none at install time so the gate decision never depends on a separate config.json read. Mode toggles do not race in-flight calls.
Gate covers Bash CLI bypasses. mode=none blocks not only mcp__codex__codex MCP calls but also any Bash invocation whose command contains codex exec, codex-on-claude threads resume, npx ...codex..., or path-qualified codex binaries. Wildcard regex (mcp__codex__.*) handles current + future MCP tool variants. Case-insensitive matching.
Gate fails CLOSED for Codex-shaped tools on corrupt state. When config.json is unreadable or the hook payload is malformed AND the tool looks Codex-shaped, the gate denies (was: fail-OPEN). Non-Codex tools still fail-open to avoid breaking unrelated calls.
Gate state reconciles automatically. Every reconfigure inspects ~/.claude/settings.json itself (not just config.json's claim) and removes orphan gate entries from manual edits / stale state.
Hook decision JSON emitted in BOTH legacy and new shapes. {decision, reason} + hookSpecificOutput.permissionDecision so the gate works against current Claude Code 2.1.x and any future version that drops legacy support. Hard-denies additionally exit(2) with stderr backstop.
State directories are 0700. Logs (which may capture prompt sizes / thread IDs) + thread catalog + improvements are now restricted to user-only access. Best-effort on filesystems that ignore chmod.
Atomic config writes. config.json, settings.json, and thread catalog files are written via temp+rename; concurrent readers (analyzer, status, gate) never see a torn JSON document.
CLI parsing hardening. --usage-mode max (space-separated, common mistake) now errors with a hint to use --usage-mode=max. Use = to assign flag values; positional args are rejected against an explicit known-subcommand allowlist.
Silent-fill info line. Upgrading users see usageMode: silent default 'synergy' applied for upgrade on auto-detected reconfigures (npx upgrade path). Explicit reconfigure still surfaces the §7 prompt.
auto mode classifier sharper. Tier 1 heuristics now handle: chain step variants (numbered / lettered / first-then-finally / inline "Step N:"); mixed adversarial+style prompts ("find security bugs and naming issues" keeps adversarial signal); unquoted field-list syntax ("Return fields: status, risk, file, line"); "table" only with output-intent context; plural forms ("race conditions", "failing tests"); excluded "edge cases" alone from TDD signal.
Every log row carries usageMode. usage-*.jsonl entries include the active mode at log time, powering the ruleUsageModeDrift analyzer rule.
ruleUsageModeDrift filters by config.updatedAt. Logs from before the last reconfigure no longer trigger false-positive drift right after a mode switch.

Full details: docs/release-notes-0.5.0.md + docs/security-review-0.5.0.md.

Final verification (post 5 Codex peer review cycles): 168 automated test cases (112 unit + 37 integration + 17 installer-flow + 2 regression with 7 internal cases) all PASS. npm tarball 115 kB / 32 files (99% reduction from initial build via explicit files allowlist + .npmignore). 28 total fixes applied pre-ship across 5 review cycles (F1-F8 → G1-G7 → H1-H7 → B1-B6 → H1-H4 Bash precision → M1-M3 → A1-A4 — see CHANGELOG.md for the full breakdown). Run all tests yourself: bash install/fixtures/v05/installer-flow/run-flow.sh && node install/fixtures/v05/unit/run-units.mjs && ....

Cerberus mode (v0.5.5, opt-in)

When the planning step matters more than execution speed, enable Cerberus Head mode:

codex-on-claude reconfigure --cerberus=on --yes
# Then in a Claude Code session:
/cerberus head "<task description>"

How it works. /cerberus head spawns three independent planners in parallel — cerberus-h1-claude-only (no Codex), cerberus-h2-codex-only (delegates fully to Codex CLI), cerberus-h3-synergy (Claude + Codex consultation) — then merges their plans via a deterministic algorithm. Topics are extracted from each plan's markdown sections (Decision / Reasons / Risks / Steps), grouped by Jaccard similarity over Porter-stemmed tokens, and classified into one of four cases: case 1 (3-head merge), case 2 (3-head tournament with h3 tie-break + head-weight ladder), case 3 (2-head agreement + 1 missing), case 4 (single-head, validated / disputed / minority / missing buckets). Output is a single consensus plan plus an agreement_score (0–1) and label (low / moderate / high). Cost: ~2–3× a single planner (typically 20–50k tokens total). Plan-only — execute and verify phases ship later.

Hardening since v0.5.1.

| Version | Change | Why it matters | |---------|--------|----------------| | v0.5.1 | Initial 3-head + deterministic merge | PoC | | v0.5.2 | Porter Stemmer + nonce challenge + case-4 conservative partial credit | Paraphrase absorption (deterministic/deterministically → same stem) and orchestration-layer guard against fake plans (each head plan must end with cerberus-nonce: <value> matching init's nonce; consensus rejects mismatches) | | v0.5.3 | Polarity tracking (detectPolarity blocks use cache (+) merging with do not use cache (-)) + empty-token guard (no false-merge on Korean / single-char bullets) + minority dissent bucket + Porter isV(s, -1) base case fix | Closes 4 critical bugs surfaced by the n=2 self-review | | v0.5.4 | Disputed render fix — *(disputed — opposing polarity)* for case-4 polarity-split entries (no more *(lost to ?)*) | MEDIUM #2 surfaced in Opus 4.7 re-test: 6 user-facing ? outputs across 4 runs | | v0.5.5 | Decision multiplier soft-curve (case-2 → 1.2×, was binary cliff to 1.0×) + contrast conjunction polarity (but/however/although/despite/except with but also/even/too false-positive guard) + cerberusConfig seed/reader (dead path in server fixed) + HU-33 plan-level lock | MEDIUM #1 + #3 + LOW + HU-33 from v0.5.4 plan Phase 2 |

Test coverage. 102 cerberus-specific unit + integration tests (consensus 15 + install 13 + config 12 + stemming 14 + stemming-adversarial 7 + nonce 10 + score-formula 10 + v053-fixes 19 + e2e 4).

Per-machine tuning. When cerberus=on, the installer seeds choices.cerberusConfig: {} in ~/.claude/codex-on-claude/config.json. Override any of headWeights / jaccardGroupThreshold / bodyMergeThreshold / decisionMultiplier / decisionPartialMultiplier / costCapTokens to taste — the server reads these on every consensus call and merges with DEFAULTS. Empty seed keeps shipped defaults.

Known backlog (v0.5.6+). Cerberus Full mode (FU-01~10 + FC-02~05, 14 PENDING-IMPL scenarios: 3-worktree execute + verify + iteration loop) — separate minor release. Embedding-based similarity to bridge Porter Stemmer paraphrase miss (opt-in LLM-judge) — v0.5.6+ candidate.

Disable any time with codex-on-claude reconfigure --cerberus=off --yes. Spec: docs/cerberus-mode-spec.md (Draft 5). PoC walkthrough: docs/cerberus-poc-2026-05-22.md. Re-test results: docs/test-execution-results-cerberus-v0.5.3-opus47.md.

Operational guidance (v0.5.0)

The v5–v9 analysis series in docs/ captures 771 runs across 217 scenarios (~$100–115 in measured external cost). Key references for everyday decisions:

| Read this when | File | |---|---| | 30-second decision card for one task | docs/guidance-quick-ref.md | | Detailed comparison (Claude α / Codex γ / β orchestration) | docs/guidance-comparison.md | | Practical recipes (R1 Adversarial / R2 Self-review / R3 reasoning=high / R4 γ hot-swap / R5 cheap β trial / R6 Format-Safe Handoff) | docs/synergy-playbook.md | | Mode configuration spec (what every mode does, per-skill activation matrix) | docs/usage-mode-config.md | | Cross-domain validation results (post-v8 sweep) | docs/v8-validation-sweep.md | | Why these recommendations changed in v9 | docs/v9-adjustment-spec.md |

What you get

Skills under `~/.claude/skills/codex-*/SKILL.md`

codex-review — one-shot read-only review of the current diff / files
codex-followup — continue the same thread via mcp__codex__codex-reply
codex-resume — CLI fallback when MCP session is lost (auto-detects silent-new-session)
codex-fix — file edits under a strict allowlist (workspace-write)
codex-routine — templated recurring job
codex-analyze — analyze the usage log and surface improvement candidates (improvementLoop ≠ off)
codex-improve — apply / reject specific candidates (improvementLoop ≠ off)
codex-log — append usage log entries (improvementLoop ≠ off; redundant when hooks handle it)
codex-threads — browse / annotate / resume the persistent catalog (threads ≠ off)

Agent under `~/.claude/agents/codex-reviewer.md`

Isolates large Codex responses and returns only a 4-line summary to the main session. Installed when contextPolicy ∈ {summarize, mixed}.

Plugin bundle

Deprecated since 0.3.0. Existing 0.2.x bundles are left in place — uninstall does not remove them; a one-line Manual: rm -rf … hint prints.

Operating flow

┌─────────────────────────────────────────────────────────────────┐
│ Claude Code session                                             │
│                                                                 │
│   User invokes /codex-review (or natural language match)        │
│            │                                                    │
│            ▼                                                    │
│   Skill follows its MUST procedure → mcp__codex__codex          │
│            │                                                    │
│            ▼                                                    │
│   ┌─── MCP transport (codex mcp-server) ──────────────────┐    │
│   │   → Codex model call                                   │    │
│   │   ← threadId + content                                 │    │
│   └────────────────────────────────────────────────────────┘    │
│            │                                                    │
│            ▼                                                    │
│   Response lands in main (or in codex-reviewer subagent)        │
│            │                                                    │
│            ▼                                                    │
│   Skill writes catalog entry + log line                         │
│   (improvementLoop=auto-on-skill: PostToolUse hook does logging)│
└─────────────────────────────────────────────────────────────────┘

Periodically — or on demand:

   codex-on-claude analyze
       │
       ▼
   reads logs/threads, ranks improvement candidates
       │
       ▼
   /codex-improve walks the user through Apply / Reject / Skip
       │
       ▼
   decisions persisted under ~/.claude/codex-on-claude/improvements/

CLI reference

codex-on-claude               Install interactively (preflight included)
codex-on-claude reconfigure   Reconfigure (previous answers as defaults; review/edit/cancel at the end)
codex-on-claude doctor        Run preflight only (Node/codex/claude/MCP checks)
codex-on-claude status        Show install state
codex-on-claude uninstall     Remove installed components (Skills, Agent, hooks)
codex-on-claude analyze       Analyze usage log and surface improvement candidates
codex-on-claude suggest       Record an apply/reject decision for a candidate
codex-on-claude log           Append a usage log entry (use --from-stdin for hook payloads)
codex-on-claude threads ...   Persistent thread catalog (see below)
codex-on-claude help          Print help

Common combinations:

codex-on-claude status

codex-on-claude reconfigure --improvement-loop=auto-on-skill --yes   # turn on hook logging
codex-on-claude reconfigure --threads=full --yes                     # enable full thread context

codex-on-claude analyze --days=7 --format=markdown --save            # write a snapshot report
codex-on-claude suggest --apply=2                                    # adopt candidate #2
codex-on-claude suggest --reject=2 --reason="intentional pattern"

codex-on-claude uninstall

`threads` subcommand reference (v0.2+)

codex-on-claude threads list [--status=active|resolved|archived] [--tag=...] [--since=7d] [--skill=...]
codex-on-claude threads latest [--format=id|json]                       # deterministic latest thread (great for shell automation)
codex-on-claude threads show <threadId>
codex-on-claude threads new  <threadId> --title="..." --tags=a,b --skill=codex-review --cwd="$PWD" --sandbox=read-only
codex-on-claude threads goal     <threadId> "Verify naming consistency"
codex-on-claude threads outcome  <threadId> "Codex flagged 3 issues"
codex-on-claude threads decision <threadId> "Adopt suggestion 2"
codex-on-claude threads note     <threadId> "arbitrary memo"
codex-on-claude threads incident <threadId> --issue=session-not-found --resolution="codex exec resume" --outcome=recovered
codex-on-claude threads tag      <threadId> --add=critical --remove=draft
codex-on-claude threads status   <threadId> resolved
codex-on-claude threads fallback <threadId> auto-resume
codex-on-claude threads search   "react refactor"
codex-on-claude threads resume   <threadId> "follow-up prompt"
codex-on-claude threads remove   <threadId>

threads resume <id> "prompt" reads the stored fallbackStrategy:

auto-resume — calls codex exec resume and automatically detects silent-new-session (when codex returns a different thread_id), filing an incident on the original thread and registering the new one as a bifurcation.
ask — prints the thread metadata + recent summaries so the user picks the next step.
new — guides starting a fresh mcp__codex__codex call with the prior summaries as a preamble.

Verification

1. Preflight + MCP

codex-on-claude doctor

Expected: Node.js …, Codex CLI …, Claude Code …, codex doctor: ok, Claude auth: Login method: …, codex MCP server: ✓ Connected.

2. End-to-end MCP call (a fresh `claude -p`)

claude -p --model haiku --verbose --output-format stream-json \
  --permission-mode dontAsk \
  --allowedTools=mcp__codex__codex \
  "Use the Codex MCP tool to ask Codex: Return exactly OK. Then return Codex's exact answer."

3. Skill in a real session

Start a fresh Claude Code session and type:

/codex-review evaluate this README in two sentences.

Codex should answer; the response ends with a deterministic Thread: <id> line (per the Skill's MUST procedure).

4. Analyze cycle

codex-on-claude log --skill=codex-review --sandbox=read-only --outcome=ok \
  --prompt-chars=120 --response-chars=600 --elapsed-ms=2100
codex-on-claude analyze --days=1

Operational guidance

Default sandbox is read-only. Switch to workspace-write only when files genuinely need edits (/codex-fix enforces an allowlist). danger-full-access is not used in this workflow.
Pass context explicitly. Claude and Codex don't share hidden context — when you want Codex to know about a goal / file / constraint, name it in the prompt.
threadId discipline. The threadId itself is a public-safe identifier. Within the same MCP server process, prefer /codex-followup; on Session not found, switch to /codex-resume or codex-on-claude threads resume.
401 from Claude. If claude auth status looks fine but the model returns 401 Invalid authentication credentials, re-run claude auth login --claudeai --email <you>.
Shrinking install. Running reconfigure with fewer --patterns removes the unselected Skills automatically. Thread catalog data and improvement decisions are kept even when their Skills are removed.

Privacy policy

All logs / catalog / improvement decisions stay on this machine. No outbound network calls beyond the Codex/Claude APIs themselves.
Log entries contain only metadata: timestamp, Skill name, sandbox mode, prompt/response lengths, threadId, outcome code.
Prompt / response bodies are never written to disk.
~/.claude/codex-on-claude/ is created with restrictive permissions.
Opt out at any time: codex-on-claude reconfigure --improvement-loop=off (also removes hook entries) and codex-on-claude reconfigure --threads=off.
Full wipe: codex-on-claude uninstall or rm -rf ~/.claude/codex-on-claude/.

Directory layout

codex-on-claude/
├── README.md            (this file)
├── README.ko.md         Korean mirror
├── CHANGELOG.md
├── LICENSE
├── package.json         npm metadata (bin: codex-on-claude)
├── install/
│   ├── install.mjs      main installer / reconfigurer / analyze CLI
│   ├── analyze.mjs      usage log analyzer + improvement candidate rules
│   ├── threads.mjs      persistent thread catalog CRUD
│   ├── hooks.mjs        ~/.claude/settings.json PostToolUse merger
│   ├── manifest.json    option ↔ component mapping
│   └── components/
│       ├── agents/codex-reviewer.md
│       └── skills/codex-{review,followup,resume,fix,routine,analyze,improve,log,threads}/SKILL.md
└── docs/
    ├── implementation-log.md   English summary
    ├── test-report-2026-05-20.md   English summary
    └── ko/             Korean originals (history)

After install (on a user machine):

~/.claude/
├── skills/codex-*/SKILL.md                       (per selected patterns + improvement loop + threads)
├── agents/codex-reviewer.md                      (contextPolicy ∈ {summarize, mixed})
├── settings.json                                 (PostToolUse hooks merged in when improvementLoop ∈ {auto-on-skill, periodic})
└── codex-on-claude/
    ├── config.json                               selections from last install/reconfigure
    ├── logs/usage-YYYY-MM-DD.jsonl
    ├── reports/                                  analyze --save outputs
    ├── improvements/                             apply/reject decisions
    └── threads/<threadId>.json + index.json      persistent thread catalog

Troubleshooting

`codex-on-claude: command not found` right after `npm update -g` / `npm install -g`

Stale shell command hash — npm replaced the binary file on disk, but your current zsh/bash session is still caching the old path. Fix:

hash -r        # zsh / bash: clear cached command lookups
# or: rehash   # zsh-specific
# or open a new terminal
codex-on-claude --version

codex-on-claude doctor (v0.3.3+) auto-detects this case and prints the same hint.

`npx codex-on-claude` fails with `ENOENT … _npx/<hash>/package.json`

Corrupted npx per-package cache, usually from a previously interrupted run. Force a fresh download:

npx --yes codex-on-claude@latest   # `@latest` cache-busts the per-package directory
# still broken? nuke npx's cache entirely:
rm -rf ~/.npm/_npx

The canonical install command in this README (npx --yes codex-on-claude@latest) avoids this class of failure because @latest forces npx to revalidate the package version on every run.

`claude mcp get codex` reports "Failed to connect"

Often just a sandboxed shell. Re-run in a normal terminal:

claude mcp get codex
claude mcp list

`Session not found for thread_id`

The MCP server restarted. Fallback:

codex exec resume --skip-git-repo-check --json <threadId> "<follow-up prompt>"

Or, inside Claude Code: /codex-resume. The codex-on-claude threads resume <id> "..." subcommand wraps this and auto-detects the silent-new-session case (where Codex returns a different thread_id than requested).

`401 Invalid authentication credentials`

Even when claude auth status looks healthy:

claude auth login --claudeai --email <your-email>

Skill not detected in a new session

Restart Claude Code. Skills are loaded once at session start; install / reconfigure / uninstall only affect future sessions (with the rare exception of hot-load when the harness happens to refresh).

Reconfigure removed something I wanted

Re-run with the full set:

codex-on-claude reconfigure
# step through with arrow keys, the previous answers are pre-selected

The final review screen lets you Edit again before saving.

`codex-on-claude threads latest --format=id` includes the CLI banner / ANSI escape codes

Fixed in v0.3.4 — the top-level codex-on-claude vX.Y.Z banner now goes to stderr, so stdout for --format=id|json data subcommands is clean:

codex-on-claude threads latest --format=id            # stdout: bare UUID (or empty)
codex-on-claude threads latest --format=id 2>/dev/null  # silences banner entirely

If you're stuck on ≤ 0.3.3 and need to scrape stdout, strip ANSI + grep UUIDs:

codex-on-claude threads latest --format=id \
  | sed 's/\x1b\[[0-9;]*m//g' \
  | grep -oE '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' \
  | head -1

Hook keeps logging after I set `improvementLoop=off` in `config.json` (without reconfigure)

Fixed in v0.3.4 — the auto-on-skill hook (codex-on-claude log --from-stdin) now reads ~/.claude/codex-on-claude/config.json on each call and silently no-ops when improvementLoop is off or manual. No reconfigure step required for the gate to take effect.

If your hook command path is stale (still points at the npm-cached binary from before the fix), reinstall + reconfigure to refresh:

npm install -g codex-on-claude@latest
codex-on-claude reconfigure --yes

Manual /codex-log invocations are unaffected by the guard — explicit logging always works.

Mixed-ownership PostToolUse group lost my user hook on uninstall (≤ 0.3.3)

Fixed in v0.3.4. If you manually nested a _coc.marker entry inside a group also containing your own hook, the previous code removed the whole group via .some() semantics. The new stripOursFromGroups operates at the hook level — your hooks are preserved, only marked entries are removed. The installer never produces mixed groups itself; this protects you only against manual settings.json edits.

Contributing / license

Issues / PRs welcome: https://github.com/pathcosmos/codex-on-claude
License: MIT

History

CHANGELOG.md — release notes
docs/implementation-log.md — English summary of the original 2026-05-19 build session (Korean original)
docs/test-report-2026-05-20.md — English summary of the 2026-05-20 verification run (Korean original)
docs/test-scenarios-codex-calls.md — 0.3.3-era comprehensive test scenario document (~43 scenarios × 8 groups covering MCP transport, 9 Skills, codex-reviewer Agent, 14 threads subcommands, hooks, analyzer rules, full E2E). Each scenario has Given/Setup/Command/Expected/Notes plus deterministic fixtures at install/fixtures/analyze-rules/.
docs/test-execution-results-2026-05-20.md — execution log of the 0.3.4 cycle: Phase A (hooks, 10/10 PASS), Phase B (resume + SILENT_NEW_SESSION, 7/7 PASS after malformed-id correction), Codex-call ledger (27 calls across the session), bug discovery via meta-audit, synergy measurement (findings/$, findings/min, PRE/POST ratio).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

codex-on-claude

TL;DR

Why this exists

Prerequisites

Install

A. via npx (recommended)

B. global install

C. from source

D. non-interactive (CI / scripted)

Updating codex-on-claude

The install questions

1. patterns (multi-select)

2. contextPolicy (single)

3. improvementLoop (single)

4. threads (single, v0.2+)

5. subscription tiers (v0.4.1)

6. model / reasoning — primary + fallback (v0.4.1)

7. usage-mode — Codex invocation policy (v0.5.0)

v0.5.0 hardening (post-L6 review — 14 follow-up fixes)

Cerberus mode (v0.5.5, opt-in)

Operational guidance (v0.5.0)

What you get

Skills under ~/.claude/skills/codex-*/SKILL.md

Agent under ~/.claude/agents/codex-reviewer.md

Plugin bundle

Operating flow

CLI reference

threads subcommand reference (v0.2+)

Verification

1. Preflight + MCP

2. End-to-end MCP call (a fresh claude -p)

3. Skill in a real session

4. Analyze cycle

Operational guidance

Privacy policy

Directory layout

Troubleshooting

codex-on-claude: command not found right after npm update -g / npm install -g

npx codex-on-claude fails with ENOENT … _npx/<hash>/package.json

claude mcp get codex reports "Failed to connect"

Session not found for thread_id

401 Invalid authentication credentials

Skill not detected in a new session

Reconfigure removed something I wanted

codex-on-claude threads latest --format=id includes the CLI banner / ANSI escape codes

Hook keeps logging after I set improvementLoop=off in config.json (without reconfigure)

Mixed-ownership PostToolUse group lost my user hook on uninstall (≤ 0.3.3)

Contributing / license

History

Skills under `~/.claude/skills/codex-*/SKILL.md`

Agent under `~/.claude/agents/codex-reviewer.md`

`threads` subcommand reference (v0.2+)

2. End-to-end MCP call (a fresh `claude -p`)

`codex-on-claude: command not found` right after `npm update -g` / `npm install -g`

`npx codex-on-claude` fails with `ENOENT … _npx/<hash>/package.json`

`claude mcp get codex` reports "Failed to connect"

`Session not found for thread_id`

`401 Invalid authentication credentials`

`codex-on-claude threads latest --format=id` includes the CLI banner / ANSI escape codes

Hook keeps logging after I set `improvementLoop=off` in `config.json` (without reconfigure)