weave-codex

v0.1.0

Published

16 days ago

W&B Weave observability for OpenAI Codex CLI — reconstructs gen_ai OpenTelemetry spans from Codex rollout sessions and ships them to Weave, fully off the critical path.

0High
0Medium
0Low

andrewwandb

weave-codex

W&B Weave observability for the OpenAI Codex CLI. It reconstructs gen_ai OpenTelemetry spans for every Codex turn — model calls, token usage, tool calls, reasoning — and ships them to Weave, entirely off Codex's critical path. The agent that's modeled after this is W&B's weave-claude-code; this is the Codex equivalent.

How it works

Codex emits OTEL today, but not a clean gen_ai span tree. Rather than parse that, weave-codex reconstructs spans from Codex's own rollout session files (~/.codex/sessions/**/rollout-*.jsonl), which contain everything: per-turn model, the full message/tool stream, and per-model-call token usage.

Codex turn ends
  └─ Stop hook (fire-and-forget shell shim, ~ms, returns immediately)
       └─ detached worker (off Codex's critical path)
            ├─ read new rollout lines since a per-session cursor
            ├─ reconstruct the turn → spans:
            │     invoke_agent codex          (the turn)
            │     ├─ chat <model>             (one per model call: usage, output)
            │     └─ execute_tool <name>      (one per tool: args, result)
            └─ export via OTLP → trace.wandb.ai/agents/otel/v1/traces

Non-blocking by construction. The Stop hook only spawns a detached process and exits 0 (printing nothing), so Codex never waits on the network.
One trace per turn. Turns are stitched into a conversation server-side via gen_ai.conversation.id (the Codex session id) on every span.
Accurate timelines. Span start/end times are backdated from the rollout timestamps, so durations reflect what actually happened.

Requirements

Node.js ≥ 20
OpenAI Codex CLI with the hooks system (recent versions)
A W&B account (WANDB_API_KEY) and a Weave project

Quick start

npm install -g weave-codex

# Credentials (precedence: env > settings.json > netrc)
wandb login                       # or: export WANDB_API_KEY=...
export WEAVE_PROJECT="entity/project"

weave-codex install               # merges a Stop hook into ~/.codex/hooks.json

One-time trust step: Codex marks newly-added hooks as untrusted and will not run them until you approve. On your next codex launch, approve the weave-codex hook when prompted (or set bypass_hook_trust = true in ~/.codex/config.toml). Run weave-codex status to confirm everything resolves.

That's it — run Codex normally. Each completed turn appears in Weave within ~a second.

Configuration

| Setting | Env var | settings.json key | Default | | --------------- | ----------------------------- | ------------------- | ------------------------------ | | W&B API key | WANDB_API_KEY | wandb_api_key | from wandb login (netrc) | | Weave project | WEAVE_PROJECT | weave_project | — (required, entity/project) | | Base URL | WANDB_BASE_URL | wandb_base_url | https://trace.wandb.ai | | Capture content | WEAVE_CODEX_CAPTURE_CONTENT | capture_content | true | | Debug logging | WEAVE_CODEX_DEBUG | debug | off (errors always log) |

State lives in ~/.weave-codex/ (settings, the hook shim, per-session cursors, and logs/collector.log). Set WANDB_BASE_URL for a dedicated/self-hosted W&B.

Data disclosure

By default weave-codex captures span content: your prompts, the model's responses and reasoning, tool-call arguments, and tool results (which include shell commands, command output, and file contents). This data is sent to your W&B/Weave instance. PII scrubbing and redaction are not implemented.

To send structure, token usage, model, and timing only — no prompts, code, or output — disable content capture:

export WEAVE_CODEX_CAPTURE_CONTENT=0

The structural/metric spans are always emitted regardless of this setting.

What's captured

invoke_agent codex — turn root; agent name/version, conversation id, model, summed token usage, and (if enabled) the user prompt + final answer.
chat <model> — one per model call; gen_ai.usage.* (input/output/cached/ reasoning) tokens, finish reason, server.address, and the assistant output (text, reasoning, tool-call parts).
execute_tool <name> — one per tool (shell, apply_patch, web search, MCP, functions); gen_ai.tool.* id/arguments/result; MCP calls also carry mcp.server.name.

All attributes follow the OpenTelemetry GenAI semantic conventions that Weave's Agents ingestion reads, so traces also render in generic OTEL backends.

Scope & limitations (v0.1)

Modes: interactive codex (TUI) and codex exec. The codex mcp / app-server modes are not covered (they fire no hooks).
Subagents: a spawned subagent shows up as its spawn_agent tool call; nesting the subagent's own model/tool calls is not yet implemented (Codex writes those to a separate rollout file).
Aborted turns: the Stop hook does not fire on aborted/errored turns, so those are not captured.
Hook-locked orgs: if allow_managed_hooks_only is set, use a notify program as the trigger instead (see below).
Backdating shim: the span emitter currently uses a thin OpenTelemetry pipeline (src/spans/tracer.ts) because the Weave TS SDK's GenAI helpers can't yet set historical timestamps. Once the SDK gains startedAt/endedAt, the emitter moves to weave.startTurn/startLLM/startTool.

`notify` fallback

In environments where you cannot add hooks, configure Codex's fire-and-forget notify program to invoke the collector instead (turn-complete only):

# ~/.codex/config.toml
notify = ["sh", "/Users/you/.weave-codex/stop-hook.sh"]

Uninstall

weave-codex uninstall   # removes only our entries from ~/.codex/hooks.json

Development

npm install
npm run build
npm test
npm run lint

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

weave-codex

How it works

Requirements

Quick start

Configuration

Data disclosure

What's captured

Scope & limitations (v0.1)

notify fallback

Uninstall

Development

`notify` fallback