weave-codex
v0.1.0
Published
W&B Weave observability for OpenAI Codex CLI — reconstructs gen_ai OpenTelemetry spans from Codex rollout sessions and ships them to Weave, fully off the critical path.
Readme
weave-codex
W&B Weave observability for the OpenAI Codex CLI. It reconstructs gen_ai
OpenTelemetry spans for every Codex turn — model calls, token usage, tool calls,
reasoning — and ships them to Weave, entirely off
Codex's critical path. The agent that's modeled after this is W&B's
weave-claude-code; this is the
Codex equivalent.
How it works
Codex emits OTEL today, but not a clean gen_ai span tree. Rather than parse
that, weave-codex reconstructs spans from Codex's own rollout session files
(~/.codex/sessions/**/rollout-*.jsonl), which contain everything: per-turn
model, the full message/tool stream, and per-model-call token usage.
Codex turn ends
└─ Stop hook (fire-and-forget shell shim, ~ms, returns immediately)
└─ detached worker (off Codex's critical path)
├─ read new rollout lines since a per-session cursor
├─ reconstruct the turn → spans:
│ invoke_agent codex (the turn)
│ ├─ chat <model> (one per model call: usage, output)
│ └─ execute_tool <name> (one per tool: args, result)
└─ export via OTLP → trace.wandb.ai/agents/otel/v1/traces- Non-blocking by construction. The Stop hook only spawns a detached process
and exits
0(printing nothing), so Codex never waits on the network. - One trace per turn. Turns are stitched into a conversation server-side via
gen_ai.conversation.id(the Codex session id) on every span. - Accurate timelines. Span start/end times are backdated from the rollout timestamps, so durations reflect what actually happened.
Requirements
- Node.js ≥ 20
- OpenAI Codex CLI with the hooks system (recent versions)
- A W&B account (
WANDB_API_KEY) and a Weave project
Quick start
npm install -g weave-codex
# Credentials (precedence: env > settings.json > netrc)
wandb login # or: export WANDB_API_KEY=...
export WEAVE_PROJECT="entity/project"
weave-codex install # merges a Stop hook into ~/.codex/hooks.jsonOne-time trust step: Codex marks newly-added hooks as untrusted and will
not run them until you approve. On your next codex launch, approve the
weave-codex hook when prompted (or set bypass_hook_trust = true in
~/.codex/config.toml). Run weave-codex status to confirm everything resolves.
That's it — run Codex normally. Each completed turn appears in Weave within ~a second.
Configuration
| Setting | Env var | settings.json key | Default |
| --------------- | ----------------------------- | ------------------- | ------------------------------ |
| W&B API key | WANDB_API_KEY | wandb_api_key | from wandb login (netrc) |
| Weave project | WEAVE_PROJECT | weave_project | — (required, entity/project) |
| Base URL | WANDB_BASE_URL | wandb_base_url | https://trace.wandb.ai |
| Capture content | WEAVE_CODEX_CAPTURE_CONTENT | capture_content | true |
| Debug logging | WEAVE_CODEX_DEBUG | debug | off (errors always log) |
State lives in ~/.weave-codex/ (settings, the hook shim, per-session cursors,
and logs/collector.log). Set WANDB_BASE_URL for a dedicated/self-hosted W&B.
Data disclosure
By default weave-codex captures span content: your prompts, the model's
responses and reasoning, tool-call arguments, and tool results (which include
shell commands, command output, and file contents). This data is sent to your
W&B/Weave instance. PII scrubbing and redaction are not implemented.
To send structure, token usage, model, and timing only — no prompts, code, or output — disable content capture:
export WEAVE_CODEX_CAPTURE_CONTENT=0The structural/metric spans are always emitted regardless of this setting.
What's captured
invoke_agent codex— turn root; agent name/version, conversation id, model, summed token usage, and (if enabled) the user prompt + final answer.chat <model>— one per model call;gen_ai.usage.*(input/output/cached/ reasoning) tokens, finish reason,server.address, and the assistant output (text, reasoning, tool-call parts).execute_tool <name>— one per tool (shell,apply_patch, web search, MCP, functions);gen_ai.tool.*id/arguments/result; MCP calls also carrymcp.server.name.
All attributes follow the OpenTelemetry GenAI semantic conventions that Weave's Agents ingestion reads, so traces also render in generic OTEL backends.
Scope & limitations (v0.1)
- Modes: interactive
codex(TUI) andcodex exec. Thecodex mcp/app-servermodes are not covered (they fire no hooks). - Subagents: a spawned subagent shows up as its
spawn_agenttool call; nesting the subagent's own model/tool calls is not yet implemented (Codex writes those to a separate rollout file). - Aborted turns: the
Stophook does not fire on aborted/errored turns, so those are not captured. - Hook-locked orgs: if
allow_managed_hooks_onlyis set, use anotifyprogram as the trigger instead (see below). - Backdating shim: the span emitter currently uses a thin OpenTelemetry
pipeline (
src/spans/tracer.ts) because the Weave TS SDK's GenAI helpers can't yet set historical timestamps. Once the SDK gainsstartedAt/endedAt, the emitter moves toweave.startTurn/startLLM/startTool.
notify fallback
In environments where you cannot add hooks, configure Codex's fire-and-forget
notify program to invoke the collector instead (turn-complete only):
# ~/.codex/config.toml
notify = ["sh", "/Users/you/.weave-codex/stop-hook.sh"]Uninstall
weave-codex uninstall # removes only our entries from ~/.codex/hooks.jsonDevelopment
npm install
npm run build
npm test
npm run lint