npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

gitlode

v0.7.1

Published

Extract Git commit history to JSON Lines (JSONL) for data warehouse ingestion

Readme

gitlode

Extract Git commit history as JSON Lines — ready for warehouses, dashboards, and metrics pipelines.

gitlode is an ETL bridge between Git repositories and analytical systems. It reads a local Git repository and emits one commit per line as JSON Lines (.jsonl), so downstream systems can ingest commit history without understanding Git internals.

gitlode is a faithful extractor: it maps Git object data as stored and leaves interpretation, aggregation, and reporting to your downstream tools.

Named after the mining term lode (a vein of valuable ore), with a nod to load — gitlode (not gitload).

Use cases

  • Continuous ingestion into a warehouse — periodically fetch your repository and load only new commits into BigQuery, Snowflake, Redshift, DuckDB, or similar via incremental mode.
  • Developer activity dashboards — measure commit frequency, contributor growth, and team velocity over time.
  • Release and change-velocity metrics — track commit cadence, time-between-releases, and change volume per area of the codebase.
  • Cross-repository aggregation — run gitlode across many repositories and stack the JSONL output into a single unified dataset.
  • Ad-hoc analysis with DuckDB or pandas — extract once and query the .jsonl directly with duckdb.read_json() or pandas.read_json(..., lines=True).

gitlode is not for interactive history inspection — questions like "who changed this line?" or "what commits touched this file?" are better answered by a Git client or IDE. gitlode targets bulk extraction into analytical systems.

Features

  • Reads the local .git directory directly via isomorphic-git — no git CLI required at runtime
  • Outputs one record per line in JSON Lines format (commit-granularity by default)
  • Two extraction modes: snapshot (full extraction each run) and --incremental (differential extraction using a state file)
  • Handles multi-branch extraction with cross-branch deduplication
  • Plugin system for adding custom extensions data to output records via --config

Requirements

  • Node.js ≥ 22.0.0
  • A local Git repository (cloned and fetched via your preferred method — gitlode reads .git data directly and does not require the git CLI)

Installation

npm install -g gitlode

Quick Start

# One-time extraction from a local clone
gitlode -r main ./my-repo

# Continuous extraction — fetch remote changes, then extract new commits
git -C ./my-repo fetch origin
gitlode --incremental -r origin/main -s ./gitlode-state.json --missing-state snapshot ./my-repo

See the User Guide for detailed workflow patterns including incremental setup, release-tag-based extraction, and CI configuration.

CLI Reference

gitlode [options] <repository-path>

| Parameter | Alias | Type | Required | Default | Description | | -------------------------- | ----- | ------------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | <repository-path> | | positional | ✅ | — | Local path to the Git repository | | --incremental | | boolean | | false | When set, reads state to extract only new commits. When absent, performs a full snapshot extraction. | | --ref <ref> | -r | string (repeatable) | ✅ | — | Ref to traverse from. Accepts branch name, tag, or raw commit OID. Specify one or more times. | | --output-dir <path> | -o | string | | ./ | Directory for output .jsonl files | | --output-prefix <string> | | string | | derived | Filename prefix (derived from remote origin if omitted) | | --state <path> | -s | string | | — | State file path. Required with --incremental. | | --missing-state | | error \| snapshot | | error | Behavior when state file is absent. Only valid with --incremental. | | --since-ref <ref> | | string | | — | Exclude commits reachable from this ref (tag, branch, or commit object ID). Snapshot mode only. | | --since-date <ISO8601> | | string | | — | Include only commits after this datetime. Snapshot mode only. | | --per-file | | boolean | | false | When set, emits one record per changed file per commit; when absent, emits one record per commit (default). | | --max-diff-size <value> | | string | | — | Skip line-level diff computation for files above this size (K/M/G suffix supported). Outputs null additions/deletions for skipped diffs. Applies only with --per-file. | | --repo-name <string> | | string | | — | Override repository.name in all output records. Does not affect state-file identity or incremental behavior. | | --repo-url <string> | | string | | — | Override repository.url in all output records. Does not affect state-file identity or incremental behavior. | | --rotate-lines <n> | | number | | — | Start new file after n lines | | --rotate-size <bytes> | | string | | — | Start new file after threshold (raw bytes or K/M/G suffix, case-insensitive; range 1M to 64G) | | --quiet | -q | boolean | | false | Suppress progress, summary, and profile output on stderr. Warnings and errors remain visible. | | --profile | | boolean | | false | Print per-stage timing information to stderr after a successful extraction. Suppressed by --quiet. | | --config <path> | -c | string | | — | Path to gitlode configuration file. Enables plugin enrichment pipeline. See the User Guide for details. |

Progress, summary, and profile output are written to stderr; use --quiet to suppress them. Validation errors exit with code 1; runtime errors with code 2. See the User Guide for the full list of mutual exclusion rules.

Output

In the default commit-granularity mode, each line in the output .jsonl file is a JSON object representing one commit. With --per-file, each line represents one changed file within a commit, with full commit metadata denormalized onto each record plus a file object containing path, status, additions, and deletions.

Commit-mode record example:

{
  "oid": "a1b2c3d4...",
  "subject": "Fix null pointer in auth module",
  "body": "",
  "author": {
    "name": "Jane Doe",
    "email": "[email protected]",
    "timestamp": "2024-01-15T09:00:00+09:00"
  },
  "committer": {
    "name": "Jane Doe",
    "email": "[email protected]",
    "timestamp": "2024-01-15T09:05:00+09:00"
  },
  "parents": ["parenthash1"],
  "repository": { "name": "my-repo", "url": "https://github.com/org/my-repo" }
}

| Field | Description | | ------------------------------------------ | ----------------------------------------------------------------------------------------- | | oid | Full commit object ID (OID) | | subject | First line of the commit message | | body | Remainder of the commit message (empty string if none) | | author | Person who originally authored the changes | | committer | Person who committed (may differ from author after rebase/cherry-pick) | | author.timestamp / committer.timestamp | ISO 8601 datetime using the offset embedded in the commit object | | parents | Array of parent commit OIDs (empty for the initial commit; two entries for merge commits) | | repository.name | Repository name derived from remote origin URL (falls back to directory name) | | repository.url | Remote origin URL, or null if no remote is configured |

Current runtime support is limited to repositories using the sha1 object format due to isomorphic-git behavior in gitlode-used operations. Repositories with unsupported object formats fail fast with: Unsupported repository object format: <format>. Supported formats: sha1.

Output files are named <prefix>-<timestamp>-000001.jsonl, <prefix>-<timestamp>-000002.jsonl, and so on. The prefix is derived from the repository's remote origin URL; use --output-prefix to override. The timestamp segment (YYYYMMDDTHHmmssZ) is captured once per session, so all files from a single run share the same timestamp and will not overwrite files produced by earlier runs. Use --rotate-lines or --rotate-size to split output across multiple files.

Note: Output line order is not guaranteed to be chronological. Sort by committer.timestamp in your downstream system.

Documentation

  • User Guide — detailed workflows, mode explanations, and full CLI reference
  • Architecture — layer responsibilities, end-to-end flow, and key design decisions
  • Git Traversal — DAG traversal, differential extraction modes, and deduplication strategy
  • Output Schema — JSONL format, field definitions, timestamp conversion, and file rotation
  • Plugin System — plugin contract, configuration file format, lifecycle, and official plugin package policy
  • Changelog — release history and notable changes by version

Project

gitlode is developed in the gitlode monorepo, which hosts the CLI together with its official plugins. For source code, the issue tracker, and contribution guidelines, see the repository homepage.

License

MIT