git2parquet
v0.1.5
Published
CLI tool to export git commits in parquet format
Maintainers
Readme
git2parquet
A command-line tool to convert git commit history to Parquet format, including unified diffs for data analysis and AI applications.
Installation
npm install -g git2parquetUsage
Command Line
# Export git history of current repo to gitlog.parquet
git2parquet
# Export to custom filename
git2parquet commits.parquet
# Export and open with hyperparam
git2parquet --open
# Export to custom file and open with hyperparam
git2parquet commits.parquet --openOutput Schema
The generated Parquet file contains the following columns:
hash(STRING): Git commit hashauthorName(STRING): Author's nameauthorEmail(STRING): Author's email addressdate(TIMESTAMP): Commit date in ISO formatsubject(STRING): Commit message subject linediff(STRING): Unified diff showing file changes
Requirements
- Node.js
- Must be run from within a git repository
- Git must be available in PATH
Options
--help,-h: Show help message--open: Open the generated Parquet file with hyperparam after export
Use Cases
- Analyzing code change patterns over time
- Training ML models on code evolution
- Creating datasets for software engineering research
- Building commit history dashboards
Hyperparam
Hyperparam is a tool for exploring and curating AI datasets. The Hyperparam CLI (npx hyperparam) is a local viewer for ML datasets that launches a small HTTP server and opens your browser to interactively explore the generated git2parquet output file.
