npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

gitrail

v0.4.0

Published

Extract Git commit history to JSON Lines (JSONL) for data warehouse ingestion

Readme

gitrail

A CLI tool that extracts commit history from a local Git repository and outputs it as JSON Lines (.jsonl) files, suitable for ingestion into data warehouses and analytical systems.

Features

  • Reads the local .git directory directly via isomorphic-git — no git CLI required at runtime
  • Outputs one record per line in JSON Lines format (commit-granularity by default)
  • Two extraction modes: snapshot (full extraction each run) and --incremental (differential extraction using a state file)
  • Handles multi-branch extraction with cross-branch deduplication

Requirements

  • Node.js ≥ 22.0.0
  • A local Git repository (cloned and fetched via your preferred method — gitrail reads .git data directly and does not require the git CLI)

Installation

npm install -g gitrail

Quick Start

# One-time extraction from a local clone
gitrail -b main ./my-repo

# Continuous extraction — fetch remote changes, then extract new commits
git -C ./my-repo fetch origin
gitrail --incremental -b origin/main -s ./gitrail-state.json --missing-state snapshot ./my-repo

See the User Guide for detailed workflow patterns including incremental setup, release-tag-based extraction, and CI configuration.

CLI Reference

gitrail [options] <repository-path>

| Parameter | Alias | Type | Required | Default | Description | | -------------------------- | ----- | ------------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------- | | <repository-path> | | positional | ✅ | — | Local path to the Git repository | | --incremental | | boolean | | false | When set, reads state to extract only new commits. When absent, performs a full snapshot extraction. | | --branch <ref> | -b | string (repeatable) | ✅ | — | Ref to traverse from. Specify one or more times. | | --output-dir <path> | -o | string | | ./ | Directory for output .jsonl files | | --output-prefix <string> | | string | | derived | Filename prefix (derived from remote origin if omitted) | | --state <path> | -s | string | | — | State file path. Required with --incremental. | | --missing-state | | error \| snapshot | | error | Behavior when state file is absent. Only valid with --incremental. | | --since-ref <ref> | | string | | — | Exclude commits reachable from this ref (tag, branch, or hash). Snapshot mode only. | | --since-date <ISO8601> | | string | | — | Include only commits after this datetime. Snapshot mode only. | | --per-file | | boolean | | false | When set, emits one record per changed file per commit; when absent, emits one record per commit (default). | | --rotate-lines <n> | | number | | — | Start new file after n lines | | --rotate-size <bytes> | | number | | — | Start new file after n bytes | | --quiet | -q | boolean | | false | Suppress progress, summary, and profile output on stderr. Warnings and errors remain visible. | | --profile | | boolean | | false | Print per-stage timing information to stderr after a successful extraction. Suppressed by --quiet. |

Progress, summary, and profile output are written to stderr; use --quiet to suppress them. Validation errors exit with code 1; runtime errors with code 2. See the User Guide for the full list of mutual exclusion rules.

Output

In the default commit-granularity mode, each line in the output .jsonl file is a JSON object representing one commit. With --per-file, each line represents one changed file within a commit, with full commit metadata denormalized onto each record plus a file object containing path, status, additions, and deletions.

Commit-mode record example:

{
  "oid": "a1b2c3d4...",
  "subject": "Fix null pointer in auth module",
  "body": "",
  "author": {
    "name": "Jane Doe",
    "email": "[email protected]",
    "timestamp": "2024-01-15T09:00:00+09:00"
  },
  "committer": {
    "name": "Jane Doe",
    "email": "[email protected]",
    "timestamp": "2024-01-15T09:05:00+09:00"
  },
  "parents": ["parenthash1"],
  "repository": { "name": "my-repo", "url": "https://github.com/org/my-repo" }
}

| Field | Description | | ------------------------------------------ | ------------------------------------------------------------------------------------------- | | oid | Full SHA-1 commit hash | | subject | First line of the commit message | | body | Remainder of the commit message (empty string if none) | | author | Person who originally authored the changes | | committer | Person who committed (may differ from author after rebase/cherry-pick) | | author.timestamp / committer.timestamp | ISO 8601 datetime using the offset embedded in the commit object | | parents | Array of parent commit hashes (empty for the initial commit; two entries for merge commits) | | repository.name | Repository name derived from remote origin URL (falls back to directory name) | | repository.url | Remote origin URL, or null if no remote is configured |

Output files are named <prefix>-<timestamp>-000001.jsonl, <prefix>-<timestamp>-000002.jsonl, and so on. The prefix is derived from the repository's remote origin URL; use --output-prefix to override. The timestamp segment (YYYYMMDDTHHmmssZ) is captured once per session, so all files from a single run share the same timestamp and will not overwrite files produced by earlier runs. Use --rotate-lines or --rotate-size to split output across multiple files.

Note: Output line order is not guaranteed to be chronological. Sort by committer.timestamp in your downstream system.

Documentation

  • User Guide — detailed workflows, mode explanations, and full CLI reference
  • Changelog — release history and notable changes by version

Developer Guide

  • Contributing Guide — local setup, quality checks, and pull request workflow
  • Architecture — layer responsibilities, end-to-end flow, and key design decisions
  • Git Traversal — DAG traversal, differential extraction modes, and deduplication strategy
  • Output Schema — JSONL format, field definitions, timestamp conversion, and file rotation

License

MIT