npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

sigsim

v0.1.0

Published

Fast file similarity detection with TLSH. Native Rust speed for near-duplicate detection of re-exported PDFs, re-saved DOCXs, and more.

Readme

sigsim

Detect near-duplicate files in Node.js using TLSH fingerprints. Native Rust, prebuilt binaries.

Install

pnpm add sigsim

Usage

import { sigsim } from "sigsim";

// Fingerprint a file (returns null if too small / no entropy)
const fp = await sigsim.file("/path/to/upload.pdf");
// → "T1A12..." (70-char hex string) or null

// Fingerprint a buffer
const fp = await sigsim.buffer(data);

// Distance between two fingerprints (0 = identical, lower = more similar)
const d = sigsim.distance(fpA, fpB);

// Boolean similarity check with threshold (default 30)
sigsim.similar(fpA, fpB); // true/false
sigsim.similar(fpA, fpB, { threshold: 60 }); // more lenient

Batch fingerprinting

Fingerprint many files in a single native call. Rayon distributes files across cores - no NAPI overhead per file.

const fps = await sigsim.files([
  "/uploads/a.pdf",
  "/uploads/b.png",
  "/uploads/c.docx",
]);

Bulk search

Find similar fingerprints in an array. Results sorted by distance (ascending).

const matches = sigsim.search(needle, haystack, { threshold: 30 });
// → [{ index: 3, distance: 12 }, { index: 7, distance: 28 }]

Benchmarks

Measured on Apple M3 Pro, Node.js v24. Compared against tlsh (pure JS TLSH) and ssdeep.js (pure JS ssdeep).

Fingerprint throughput

| Size | sigsim (native) | tlsh (JS) | ssdeep.js | vs tlsh | vs ssdeep | |------|-----------------|-----------|-----------|---------|-----------| | 1 KB | 0.024ms | 0.11ms | 0.15ms | 4x | 6x | | 64 KB | 0.26ms | 5.9ms | 7.6ms | 23x | 29x | | 1 MB | 3.8ms | 94ms | 256ms | 24x | 67x |

Search at scale

Single-call search across a haystack of pre-computed fingerprints, vs ssdeep.js loop:

| Haystack size | sigsim (native) | ssdeep.js | Speedup | |---------------|-----------------|-----------|---------| | 1,000 | 0.16ms | 2.2ms | 14x | | 10,000 | 1.6ms | 20ms | 12x | | 100,000 | 17ms | 199ms | 12x |

Run benchmarks yourself:

pnpm bench

How it works

  • TLSH: Trend Micro Locality Sensitive Hash. Analyzes byte distribution patterns to produce a 70-char fingerprint that tolerates minor changes (metadata updates, re-exports, re-saves)
  • Distance, not similarity: TLSH native unit is distance 0-1000+. Threshold 30 = near-exact duplicate (0.007% FP rate). No lossy conversion to 0-1
  • null for unhashable: TLSH requires ~50+ bytes and sufficient entropy. Returns null instead of throwing
  • Sync distance ops: distance(), similar(), search() are synchronous - pure CPU math on small fixed-size data
  • Batch API: Single NAPI boundary crossing for N files. Rayon distributes work across cores inside Rust - no JS event loop involvement
  • mmap: Files > 1 MB are memory-mapped for zero-copy reads