npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

git-forensics

v2.0.0

Published

Uncover architectural secrets hidden in your git history

Readme

git-forensics

A TypeScript library for providing insights from git commit history.

Features

  • Actionable insights
  • Fast - ~700ms for 100,000 commits (getting the git-log will be slow)
  • Follows file rename and removal
  • Optimized for CI
  • Percentile-based classification — self-calibrating thresholds that work across any codebase size
  • Composite risk scoring — weighted multi-metric risk scores per file
  • Integrated (a VERY basic) code complexity engine
  • Bring your own code complexity score
  • Add custom metrics using full temporal history

Motivation

Existing git analysis tools (code-maat, git-of-theseus, Hercules, etc.) are great for reports but feel heavy as a backend for dev-tools. This library is designed to be lightweight, fast, and embeddable.

Tip: Focus on recent history (6-9 months). While the library handles renames and long histories correctly, older data tends to add noise.

Installation

npm install git-forensics

Quick Start

import { simpleGit } from 'simple-git';
import { computeForensics } from 'git-forensics';

const git = simpleGit('/path/to/repo');
const forensics = await computeForensics(git);

forensics.hotspots; // Files changed most often
forensics.churn; // Code volatility (lines added/deleted)
forensics.coupledPairs; // Hidden dependencies
forensics.couplingRankings; // Architectural hubs
forensics.codeAge; // Stale code detection
forensics.ownership; // Knowledge silos
forensics.communication; // Developer coordination needs
forensics.topContributors; // Per-file contributor breakdown

Example Output

Running computeForensics on a repository returns structured data across all metrics:

{
  "analyzedCommits": 842,
  "dateRange": { "from": "2024-03-10", "to": "2025-01-15" },
  "metadata": { "totalFilesAnalyzed": 134, "totalAuthors": 12 },

  "hotspots": [
    { "file": "src/api/routes.ts", "revisions": 87, "exists": true },
    { "file": "src/core/engine.ts", "revisions": 64, "exists": true },
  ],

  "coupledPairs": [
    {
      "file1": "src/api/routes.ts",
      "file2": "src/api/middleware.ts",
      "couplingPercent": 82,
      "coChanges": 34,
    },
  ],

  "ownership": [
    {
      "file": "src/core/engine.ts",
      "mainDev": "alice",
      "ownershipPercent": 34,
      "fractalValue": 0.18,
      "authorCount": 7,
    },
  ],

  // ... plus churn, codeAge, couplingRankings, communication, topContributors
}

Passing the result to generateInsights produces actionable alerts:

[
  {
    "file": "src/core/engine.ts",
    "type": "hotspot",
    "severity": "critical",
    "data": {
      "type": "hotspot",
      "revisions": 64,
      "rank": 2,
      "percentile": 95,
    },
    "fragments": {
      "title": "Hotspot",
      "finding": "64 revisions (P95), ranked #2 in repository",
      "risk": "Top-ranked churn file — prioritize for refactoring or test hardening",
      "suggestion": "Consider breaking into smaller modules or adding test coverage",
    },
  },
  {
    "file": "src/core/engine.ts",
    "type": "ownership-risk",
    "severity": "critical",
    "data": {
      "type": "ownership-risk",
      "fractalValue": 0.18,
      "authorCount": 7,
      "mainDev": "alice",
      "percentile": 92,
    },
    "fragments": {
      "title": "Fragmented Ownership",
      "finding": "7 contributors, fragmentation score 0.18 (P92)",
      "risk": "Diffuse ownership slows review cycles and increases merge conflicts",
      "suggestion": "Request review from alice (primary contributor)",
    },
  },
  // ... insights generated for each metric that exceeds thresholds
]

Actionable Insights

generateInsights transforms metrics into alerts with severity (warning, critical) and human-readable fragments (title, finding, risk, suggestion).

Insights use percentile-based thresholds — a file is flagged based on where it ranks relative to other files in the same repository. This makes thresholds self-calibrating across codebases of any size.

Insight thresholds

| Question | Metric | Insight triggers when | | ----------------------------------- | ------------------ | ---------------------------------------------- | | Where's the riskiest code? | hotspots | Revisions in P75+ (warning) or P90+ (critical) | | What keeps getting rewritten? | churn | Churn in P75+ or P90+ | | What hidden dependencies exist? | coupledPairs | ≥70% co-change rate (absolute, not percentile) | | What has ripple effects? | couplingRankings | Coupling score in P75+ or P90+ | | What's been forgotten? | codeAge | Age in P75+ or P90+ | | Who owns what? Any knowledge silos? | ownership | ≥3 authors, fragmentation in P75+ or P90+ |

All thresholds are overridable — pass a partial thresholds object and only the values you specify will change:

const insights = generateInsights(forensics, {
  thresholds: {
    hotspot: { warning: 80, critical: 95 }, // percentile cutoffs
    churn: { warning: 80 },
    staleCode: { warning: 60, critical: 85 },
    coupling: { minPercent: 80 }, // stays absolute — not percentile-based
    ownershipRisk: { warning: 70, critical: 90, minAuthors: 4 },
    couplingScore: { warning: 80, critical: 95 },
  },
});

Analysis options

The analysis pipeline has its own configurable thresholds that control what data is collected:

const forensics = await computeForensics(git, {
  maxFilesPerCommit: 50, // skip large commits from coupling analysis (default: 50)
  minCoChanges: 3, // minimum co-changes to report a coupled pair (default: 3)
  minCouplingPercent: 30, // minimum coupling % to report a pair (default: 30)
  minSharedEntities: 2, // minimum shared files for communication pairs (default: 2)
});

These options are also available on computeForensicsFromData().

Build your own insights

forensics.stats contains the complete temporal history—every commit, by every author, for every file. Access stats.fileStats[file].byAuthor, authorContributions, nameHistory, etc. to build custom metrics like temporal histograms, expertise scores, or handoff detection.

Composite Risk Score

computeRiskScores produces a single 0-100 risk score per file by combining percentile ranks across all metrics with configurable weights:

import { computeRiskScores } from 'git-forensics';

const scores = computeRiskScores(forensics);
// [
//   { file: 'src/core/engine.ts', riskScore: 87.5, breakdown: { revisions: 22.5, churn: 25, ownershipRisk: 18, age: 12, couplingScore: 10 } },
//   { file: 'src/api/routes.ts', riskScore: 72.0, breakdown: { ... } },
//   ...
// ]

Default weights:

| Metric | Weight | | -------------- | ------ | | Revisions | 0.25 | | Churn | 0.25 | | Ownership Risk | 0.20 | | Age | 0.15 | | Coupling Score | 0.15 |

Override weights to match your priorities:

const scores = computeRiskScores(forensics, {
  revisions: 0.4,
  churn: 0.3,
  ownershipRisk: 0.1,
  age: 0.1,
  couplingScore: 0.1,
});

File Metrics with Percentiles

extractFileMetrics flattens forensics into per-file rows for storage. Pass includePercentiles: true to enrich each row with percentile ranks and a composite risk score:

import { extractFileMetrics } from 'git-forensics';

const metrics = extractFileMetrics(forensics, { includePercentiles: true });
// Each entry includes:
// {
//   file, revisions, ageMonths, churn, fractalValue, ...
//   percentiles: { revisions: 90, churn: 75, ownershipRisk: 85, ageMonths: 60, couplingScore: 40 },
//   riskScore: 72.5,
// }

Percentile Utilities

The underlying percentile functions are exported for building custom scoring:

import {
  percentileRank,
  createPercentileRanker,
  createInvertedPercentileRanker,
} from 'git-forensics';

// One-off calculation
percentileRank(50, [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]); // 45

// Reusable ranker for repeated lookups
const rank = createPercentileRanker([10, 20, 30, 40, 50]);
rank(30); // 50
rank(50); // 90

// Inverted ranker (lower values = higher percentile)
const riskRank = createInvertedPercentileRanker([0.1, 0.3, 0.5, 0.7, 0.9]);
riskRank(0.1); // 90 (lowest value = highest risk)

Complexity Analysis

git-forensics separates commit analysis from static code analysis. It provides optional complexity helpers for convenience (using indent-complexity). It is recommended you use a language-aware complexity scoring and pass the results to computeForensics.

CI Usage

Building a report

Loop over insights and build a PR comment or CI annotation:

const insights = generateInsights(forensics, { minSeverity: 'warning' });

for (const insight of insights) {
  const prefix = insight.severity === 'critical' ? '[CRITICAL]' : '[WARNING]';
  console.log(`${prefix} ${insight.file} - ${insight.fragments.title}`);
  console.log(`  ${insight.fragments.finding}`);
  console.log(`  ${insight.fragments.suggestion}\n`);
}

Optimization: Store & Reuse (large codebases)

For very large repos, store the computeForensics result between runs and rehydrate with generateInsights — no git scan needed:

import { generateInsights, getChangedFiles } from 'git-forensics';

// Fetch pre-computed forensics from your server/cache
const forensics = await fetch('https://your-server/api/forensics?repo=org/repo').then((r) =>
  r.json()
);

// Generate insights only for PR changed files
const changedFiles = await getChangedFiles(git, 'origin/main');
const insights = generateInsights(forensics, { files: changedFiles, minSeverity: 'warning' });

Data-Driven API

For environments without direct git access use computeForensicsFromData() with pre-fetched git data:

import { computeForensicsFromData, gitLogDataSchema, validateGitLogData } from 'git-forensics';

// Data must match the following format
const data = {
  log: {
    all: [
      {
        hash: 'abc123',
        date: '2025-01-15T10:00:00Z',
        author_name: 'Alice',
        message: 'Add feature',
        diff: {
          files: [
            { file: 'src/app.ts', insertions: 50, deletions: 10 },
            { file: 'src/utils.ts', insertions: 20, deletions: 5 },
          ],
        },
      },
      // ... more commits
    ],
  },
  trackedFiles: 'src/app.ts\nsrc/utils.ts\nsrc/index.ts', // from git ls-files
};

// Print JSON-schema if needed
console.log(gitLogDataSchema); // JSON Schema object

// Validate before processing
validateGitLogData(data); // throws if invalid

const forensics = computeForensicsFromData(data);

Migration from v1.x

v2.0.0 replaces absolute thresholds with percentile-based classification. Key changes:

  • InsightThresholds values are now percentile cutoffs (0-100), not raw metric values
  • InsightData variants (except coupling) include a percentile field
  • Stale-code severity changed from info/warning to warning/critical
  • Finding strings now include (Pxx) percentile annotations
  • Generator function signatures added a percentileRank parameter (affects direct generator importers)
  • New exports: computeRiskScores, DEFAULT_RISK_WEIGHTS, percentileRank, createPercentileRanker, createInvertedPercentileRanker
  • New types: PercentileThresholds, RiskWeights, FileRiskScore, ExtractFileMetricsOptions

Attribution

Based on concepts from Adam Tornhill's Your Code as a Crime Scene and Software Design X-Rays.

License

MIT