npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

codex2parquet

v1.0.0

Published

CLI tool to export Codex session logs in parquet format

Readme

codex2parquet

mit license dependencies

A command-line tool to convert Codex session logs to Parquet format for data analysis and AI applications.

Installation

npm install -g codex2parquet

Usage

# Export Codex logs for current directory to codex_logs.parquet
codex2parquet

# Export logs from all projects
codex2parquet --all

# Export to custom filename
codex2parquet --output logs.parquet

# Export logs for a specific project directory
codex2parquet --project ~/code/myapp

# Read from a non-default Codex data directory
codex2parquet --codex-dir ~/.codex

What Gets Exported

Codex stores local data under ~/.codex by default. This tool reads:

  • ~/.codex/sessions/**/*.jsonl: current Codex rollout logs. Each line is a JSON object with timestamp, type, and payload.
  • ~/.codex/sessions/rollout-*.json: legacy rollout logs. Each file contains a session object and an items array.
  • ~/.codex/state_5.sqlite: thread metadata, including cwd, title, model, model provider, CLI version, sandbox policy, approval mode, token totals, git metadata, dynamic tools, and subagent parent/child edges.
  • ~/.codex/history.jsonl: prompt history rows with session_id, Unix timestamp, and text.
  • ~/.codex/logs_2.sqlite: diagnostic/runtime log rows when the current Node.js runtime includes node:sqlite.

The SQLite sources are optional. The exporter reads them through Node's native node:sqlite module and does not require a system sqlite3 command. If the SQLite files are missing or unreadable, the exporter still writes rollout and history rows.

Output Schema

The generated Parquet file is an event table. It includes one row per rollout event, legacy item, history prompt, or diagnostic log entry.

Important columns:

  • source_kind: rollout, history, or diagnostic_log
  • project: Project name derived from cwd
  • session_id: Codex thread/session identifier
  • item_index: Event index within its source
  • timestamp: ISO timestamp when available
  • rollout_path: Source rollout file path
  • top_level_type: Current JSONL top-level type, such as session_meta, event_msg, response_item, or turn_context
  • event_type: Nested event type for event_msg payloads
  • item_type: Response item type, such as message, reasoning, function_call, or function_call_output
  • role, name, status, call_id, item_id, turn_id: Common message and tool-call identifiers
  • text: The primary readable body for messages, user prompts, tool results, agent messages, and diagnostics
  • tool_input_json, tool_output: Tool/function call inputs and decoded outputs
  • model, model_provider, reasoning_effort, cwd, title, source, cli_version: Thread/session metadata
  • approval_mode, sandbox_policy, tokens_used, git_sha, git_branch, git_origin_url: Execution metadata from state_5.sqlite
  • input_tokens, cached_input_tokens, output_tokens, reasoning_output_tokens, total_tokens: Token usage when present in event payloads
  • rate_limits_json, metadata_json, content_json, payload_json, raw_json: Metadata and raw JSON preservation columns

All Parquet columns are written as strings to keep the schema stable across Codex log format changes. Rare or source-specific details, such as diagnostic log module paths, dynamic tools, and subagent metadata, are preserved in metadata_json instead of becoming mostly-empty top-level columns.

Options

  • --output <file>, -o <file>: Output parquet filename (default: codex_logs.parquet)
  • --project <path>: Filter logs to a specific project directory
  • --all: Export logs from all Codex projects
  • --codex-dir <path>: Codex data directory (default: ~/.codex)
  • --no-history: Skip prompt history rows
  • --no-diagnostics: Skip diagnostic log rows
  • --help, -h: Show help message

Requirements

  • Node.js 22.5.0 or newer. SQLite enrichment uses native node:sqlite; no sqlite3 CLI is required.
  • Codex local data in ~/.codex

Use Cases

  • Analyzing Codex usage patterns across projects
  • Building datasets from human-agent coding sessions
  • Auditing tool calls, command outputs, and runtime diagnostics
  • Creating dashboards over models, projects, token usage, and git branches

Hyperparam

Hyperparam is a tool for exploring and curating AI datasets, such as those produced by codex2parquet.