npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

git-semantic-bun

v0.4.1

Published

Local semantic git commit search CLI built with Bun + TypeScript

Readme

git-semantic-bun

Search your git history by meaning, not just keywords.

CI GitHub release Licence: MIT

gsb search "fix race condition in auth token refresh"

Why?

git log --grep matches exact strings. Real questions are fuzzier: "that commit where we fixed the retry backoff" or "the auth token race condition".

gsb embeds every commit message into a vector space and ranks results by semantic similarity, BM25 lexical overlap, and recency — all locally, all offline.

Highlights

  • Semantic + hybrid ranking — weighted blend of vector similarity, lexical matching, and recency
  • Runs entirely offline — no API keys, no cloud; embeddings via Transformers.js
  • Fast — sub-millisecond lookups with optional ANN (HNSW) backend for 10k+ commit repos
  • Incrementalgsb update indexes only new commits; detects rebases and force-pushes
  • Compactf16 vector storage halves disk usage; sidecar index keeps .git tidy

Quick start

# 1. Install
bun add -g github:danjdewhurst/git-semantic-bun

# 2. Initialise & index
gsb init
gsb index

# 3. Search
gsb search "fix race condition in auth token refresh"

Prebuilt binaries for macOS, Linux, and Windows are available on the Releases page.


Install

Prebuilt binaries (recommended)

Download the latest release asset for your platform from Releases.

From GitHub with Bun

bun add -g github:danjdewhurst/git-semantic-bun
gsb --help

From source

Requirements: Bun >= 1.3.9, a git repository.

git clone https://github.com/danjdewhurst/git-semantic-bun.git
cd git-semantic-bun
bun install
bun run src/cli.ts --help

Commands

| Command | Description | |---|---| | gsb init | Initialise index directories and model metadata | | gsb index | Build semantic index from git history | | gsb search | Semantic search over indexed commits | | gsb update | Incrementally index new commits | | gsb serve | Warm search daemon (stdin/stdout) | | gsb stats | Show index and vector stats | | gsb doctor | Check index health; --fix to repair | | gsb benchmark | Benchmark ranking performance |

gsb init

Initialises semantic index/cache directories and writes model metadata.

-m, --model <name>    embedding model (default: Xenova/all-MiniLM-L6-v2)

gsb index

Builds the semantic index from git history.

--full                    full re-index
--model <name>            embedding model
--batch-size <n>          max 256
--include <glob>          repeatable
--exclude <glob>          repeatable
--vector-dtype <f32|f16>  f32 default; f16 halves storage

.gsbignore patterns are applied automatically when present.

gsb search <query>

Runs semantic search over indexed commits.

Filters:

--author <name>       filter by commit author
--after <date>        commits after date
--before <date>       commits before date
--file <path>         path substring filter
-m, --model <name>    select model index
--query-file <path>   read long query text from file
-n, --limit <count>   max results (max 200)

Ranking & output:

--semantic-weight <0..1>    vector similarity weight
--lexical-weight <0..1>     BM25 lexical weight
--recency-weight <0..1>     recency boost weight
--no-recency-boost          disable recency boosting
--explain                   show score breakdown
--format <text|md|json>     output format
--min-score <score>         minimum score threshold
--snippets                  show diff snippets
--snippet-lines <count>     lines per snippet
--strategy <auto|exact|ann> search strategy

--min-score defaults are output-mode aware:

  • text / markdown: 0.15
  • json: 0.00

When no results are found, gsb search now prints actionable suggestions (threshold and filter hints).

gsb serve

Warm in-process search daemon — one query per stdin line, avoids repeated cold starts.

Interactive commands: :reload, :quit (or :exit).

printf "token refresh race\nretry backoff\n:quit\n" | gsb serve --jsonl -n 5

Supports all search filter and ranking flags, plus --jsonl for compact JSON output.

gsb update

Incrementally indexes commits newer than the latest indexed commit. Supports the same options as gsb index. Includes rewritten-history safety checks (rebase/force-push detection + recovery window). Use --model <name> to update a specific model index.

gsb stats

Shows index and vector stats — size, dtype, timestamps, load time. Use --model <name> to inspect a specific model index.

gsb doctor

Checks index health and metadata/cache readiness. --fix performs safe, non-destructive repairs. Use --model <name> to check or repair one model index.

gsb benchmark [query]

Benchmarks ranking path (baseline full-sort vs heap top-k).

-i, --iterations <count>   iterations (default: 20)
-n, --limit <count>        max results (default: 10)
--model <name>             benchmark model index
--compare-model <name>     compare with another model (repeatable)
--save                     append to benchmarks.jsonl
--history                  print saved history
--ann                      compare exact vs ANN

Also accepts all search filter and ranking weight flags.


Examples

gsb search "refactor payment retry logic"
gsb search "fix flaky parser" --after 2025-01-01 --author dan
gsb search "optimise caching" --file src/core -n 5
gsb search "token refresh race" --format markdown --explain
gsb search "error handling in webhook retries" --format json --min-score 0.35

Optional ANN backend

For very large repositories (>10k commits), an approximate nearest-neighbour (HNSW) index reduces semantic lookup to sub-millisecond:

bun add usearch            # optional dependency
gsb index                  # builds .ann.usearch alongside standard index
gsb search "fix bug" --strategy ann   # force ANN
gsb search "fix bug"                  # auto: ANN when available & >10k commits
gsb benchmark "fix bug" --ann         # compare exact vs ANN recall + speedup

When usearch is not installed, all commands fall back to exact brute-force search transparently.


Storage layout

All data lives under .git/semantic-index/ — nothing outside your repo:

.git/semantic-index/
├── models/
│   └── <model-key>/
│       ├── index.json
│       ├── index.meta.json
│       ├── index.vec.f32   # or .f16
│       ├── index.ann.usearch
│       └── benchmarks.jsonl
├── cache/                  # embedding cache
└── index.json              # legacy single-index layout (still supported)

Development

bun install          # install dependencies
bun run lint         # biome check
bun run typecheck    # tsc --noEmit
bun test             # run test suite
bun run perf:ci      # performance regression suite
bun run build        # compile to dist/

Performance CI

  • bun run perf:ci runs cold/warm/index-load suites against a synthetic dataset.
  • .github/perf-baseline.json defines baseline snapshots + allowed regression thresholds.
  • CI uploads perf-artifacts/perf-snapshot.json as a build artifact.

Docs

See the documentation index for guides, architecture, and development references. Project roadmap: ROADMAP.md.


Contributing

PRs welcome. Please use conventional commits and include tests for behavioural changes.

Licence

MIT — see LICENSE