git-shrink
v1.2.0
Published
Semantic git history compressor — intelligently squashes bloated commit histories by grouping related commits across file proximity, message similarity, time windows, and branch origin.
Maintainers
Readme
git-shrink
Semantic git history compressor — intelligently squashes bloated commit histories by analyzing message similarity, file proximity, directory overlap, and time windows.
git-shrink v1.0.0 — semantic commit history compressor
✔ Loaded 47 commits
✔ Found 9 squash group(s) — history shrinks 47 → 18 commits (62% reduction)
Group 1/9 — touch the same file(s): src/auth/middleware.js · committed within 14min
Suggested message: "feat: implement OAuth2 PKCE flow with refresh token rotation"
Similarity score: ████████████████░░░░ 80/100The problem
Every long-running codebase has this:
abc1234 fix
def5678 fix again
fff0001 wip
aaa9999 temp
bbb1111 ok now it works
ccc2222 actually works nowgit rebase -i helps, but it requires you to manually identify which commits belong together — across dozens or hundreds of entries. That doesn't scale.
git-shrink automates the grouping. It scores every pair of commits across four dimensions and suggests squash groups you can approve, edit, or skip — then writes a ready-to-apply rebase script.
Install
npm install -g git-shrinkRequires Node.js ≥ 18.
Usage
Analyze (interactive)
git-shrink analyzeAnalyzes the last 50 commits, scores pairs, and walks you through each suggested group interactively.
Analyze (auto mode)
git-shrink analyze --auto --count 100Skips the interactive prompt — auto-approves all groups above the similarity threshold.
Dry run
git-shrink analyze --dry-runShows what would be grouped without writing any files. Safe to run on any repo.
Commit health stats
git-shrink statsPrints a health report: noisy commit ratio, oversized commits, hot directories, and an overall history health score.
Apply a saved plan
git-shrink apply git-shrink-plan-1706123456789.txtExecutes a previously generated rebase plan after a final confirmation prompt.
# Force apply on a feature/fix branch that has already been pushed
git-shrink apply git-shrink-plan-1706123456789.txt --forceOptions
git-shrink analyze
| Flag | Default | Description |
|------|---------|-------------|
| --count <n> | 50 | Number of commits to analyze from HEAD |
| --from <hash> | — | Start of commit range |
| --to <hash> | HEAD | End of commit range |
| --branch <name> | — | Analyze a specific branch |
| --auto | false | Skip interactive prompts, approve all groups |
| --dry-run | false | Preview only, no files written |
| --threshold <0-100> | 50 | Minimum similarity score to form a group |
| --min-group <n> | 2 | Minimum commits required to form a group |
git-shrink apply
| Flag | Default | Description |
|------|---------|-------------|
| --dry-run | false | Validate the plan without executing the rebase |
| -f, --force | false | Skip the pushed-commit guardrail — use on feature/fix branches |
git-shrink stats
| Flag | Default | Description |
|------|---------|-------------|
| --count <n> | 100 | Number of commits to include in the health report |
How the scoring works
Every pair of commits is scored across four dimensions, then combined into a weighted composite:
| Dimension | Weight | Method |
|-----------|--------|--------|
| Message similarity | 55% | Levenshtein distance on normalized messages — strips conventional commit prefixes (fix:, feat:) and noise words (wip, temp, minor) before comparing |
| File proximity | 30% | Jaccard similarity of changed file sets — commits touching the same files score high regardless of message wording |
| Directory overlap | 15% | Jaccard similarity of parent directories — weaker structural signal, useful when file names differ but work is in the same area |
Time proximity and branch origin are intentionally excluded. They group commits that happen to be close in time or on the same branch, not commits that are semantically related — which is the wrong heuristic for history cleanup.
Pairs scoring above --threshold are clustered using union-find. The suggested squash message is the most semantically meaningful message within the group (longest after noise removal).
Config file
Place a .gitshrinkrc in your project root (or add a gitshrink key to package.json):
{
"threshold": 65,
"timeWindow": 20,
"minGroup": 2,
"count": 75
}Safety
git-shrink analyze and git-shrink stats are read-only. They never touch your git history.
git-shrink apply rewrites history via interactive rebase. It will:
- Show a full preview of the rebase plan
- Warn you explicitly that history will be rewritten
- Require a confirmation prompt before executing
- Print
git rebase --abortinstructions if anything goes wrong
Pushed-commit guardrail — apply will refuse to run if any commit in the plan has already been pushed to a remote, and will tell you exactly which ones. To bypass this on a feature or fix branch where force-pushing is acceptable, use --force.
Empty-commit detection — if a squash group's commits cancel each other out (e.g. "add logs" followed by "remove logs"), analyze warns you at plan-generation time and apply automatically drops the empty result rather than halting mid-rebase.
Never run on shared branches that other developers have pulled without coordinating first. Always use git push --force-with-lease (not --force) after applying on a pushed branch.
Example workflow
# 1. Check your repo health first
git-shrink stats
# 2. Preview what would be grouped (no files written)
git-shrink analyze --dry-run --count 80
# 3. Run interactively and approve/edit/skip groups
git-shrink analyze --count 80
# 4. Review the generated plan
cat git-shrink-plan-*.txt
# 5. Apply it
git-shrink apply git-shrink-plan-*.txt
# 6. Verify
git log --onelineLimitations
- Analyzes up to ~200 commits efficiently (O(n²) pair scoring). For larger ranges, use
--from/--toto target specific ranges. - Merge commits are automatically excluded from analysis.
- Rebase rewrites history — coordinate with your team before running on shared branches.
Local development
Prerequisites
- Node.js ≥ 18
- npm ≥ 9
Setup
git clone https://github.com/santoshkumar-in/git-shrink.git
cd git-shrink
npm installRun locally without installing globally
# Run directly
node src/index.js analyze
# Or link it so `git-shrink` resolves as a command globally on your machine
npm link
git-shrink analyze
npm linkcreates a symlink from your globalbintosrc/index.js. Runnpm unlink -g git-shrinkto remove it.
Project structure
src/
├── index.js # CLI entry point — wires Commander commands
├── commands/
│ ├── analyze.js # `git-shrink analyze` — fetch, score, prompt, write plan
│ ├── apply.js # `git-shrink apply` — validate plan, run rebase
│ └── stats.js # `git-shrink stats` — commit health report
├── core/
│ ├── git.js # simple-git wrapper — getCommits(), generateRebaseScript()
│ └── grouper.js # scoring engine — scorePair(), groupCommits(), union-find
└── utils/
├── config.js # cosmiconfig loader for .gitshrinkrc
└── render.js # terminal rendering — tables, score bars, summary boxKey modules to know
src/core/grouper.js is the heart of the tool. scorePair(commitA, commitB) returns a weighted composite score across message similarity (Levenshtein), file proximity (Jaccard), and directory overlap (Jaccard). groupCommits() runs union-find over all pairs to produce clusters. This is the right place to add new scoring dimensions or tune weights.
src/core/git.js wraps simple-git. getCommits() fetches the log with per-commit file diffs. generateRebaseScript() serializes groups into a git rebase -i todo format.
src/commands/analyze.js is the main user-facing flow — it calls both modules above, handles interactive prompts via inquirer, and writes the plan file.
Running tests
npm testTests use Jest with --experimental-vm-modules for ESM support. Test files live in __tests__/ (create this directory if adding new tests).
To test a specific file:
node --experimental-vm-modules node_modules/.bin/jest grouperLinting
npm run lintUses ESLint. The config is expected at .eslintrc.json or eslint.config.js in the root — add one if you're setting up a fresh clone for contribution.
Testing against a real repo
The most useful local test is running it against an actual repository with a noisy history:
# Point it at any git repo on your machine
cd /path/to/some/other/repo
node /path/to/git-shrink/src/index.js analyze --dry-run --count 30--dry-run makes this completely safe — it scores and renders groups but writes nothing.
Config for development
Create a .gitshrinkrc in any test repo to override defaults without flags:
{
"threshold": 50,
"minGroup": 2,
"count": 30,
"cleanupPlan": true
}Contributing
Bug reports and pull requests are welcome. For significant changes, open an issue first to discuss what you'd like to change.
When contributing:
- Keep new scoring dimensions in
src/core/grouper.jsand update the weights table in this README - All write operations must remain behind
--dry-runsupport - The
applycommand must always require explicit confirmation — don't remove or default it totrue
License
MIT © Santosh Kumar
