npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

shingles

v0.1.1

Published

A WebAssembly library for efficient text deduplication and similarity detection using shingling and MinHash.

Downloads

161

Readme

shingles

Fast, privacy-aware client-side near-duplicate detection in the browser
A lightweight Rust library compiled to WebAssembly (via wasm-bindgen) for computing MinHash signatures and Locality-Sensitive Hashing (LSH) indices to find similar/duplicate texts entirely in the browser, with optional salting for privacy-preserving similarity checks.

No server needed. Runs offline. Perfect for PWAs, note-taking apps, plagiarism checkers, private document comparison, or client-side data cleaning.

npm License: MIT

✨ Features

  • MinHash signatures for estimating Jaccard similarity between texts
  • Locality-Sensitive Hashing (LSH) with banding for fast candidate duplicate detection
  • Privacy-preserving mode: Optional per-document salt for blind similarity comparison (same salt → comparable signatures :-> different salt → no leakage)
  • Efficient shingling (k-grams on bytes, UTF-8 safe)
  • Zero-copy slices where possible for fast WASM ↔ JS interop
  • Small footprint WASM binary stays tiny (~500KB–1MB gzipped)
  • Offline-first works in browsers, PWAs, even low-end devices

Installation

Via npm (recommended for web projects)

npm install shingles
# or
yarn add shingles
# or
pnpm add shingles

Usage (JavaScript / TypeScript)

shingles is designed to be used in a background thread (Web Worker) for large datasets, but the API is simple enough for main-thread use on smaller collections.

import init, { Deduplicator } from 'shingles';

async function main() {
  await init(); 

  // 128 hashes divided into 16 bands (Threshold ~0.7-0.8)
  // use a BigInt for the seed (n suffix required in JS)
  const seed = 42n;
  const dedup = new Deduplicator(128, 16, seed); 

  const privacySalt = "user-secret-key-123";

  // add documents
  const docs = [
    { id: 1, text: "The quick brown fox jumps over the lazy dog." },
    { id: 2, text: "The quick brown fox leaps over the lazy dog!" }, // near dupe
    { id: 3, text: "I love coding in Rust and WebAssembly." }        // different
  ];

  for (const doc of docs) {
    const shingless = Deduplicator.get_shingless(doc.text, 5);
    const signature = dedup.compute_minhash(shingless, privacySalt);
    dedup.add_document(doc.id, signature);
  }

  // query for near-duplicates
  const queryText = "A quick brown fox jumps over a lazy dog.";
  const queryshingless = Deduplicator.get_shingless(queryText, 5);
  const querySig = dedup.compute_minhash(queryshingless, privacySalt);

  // LSH finds candidates in O(1) time instead of O(N)
  const candidates = dedup.query_candidates(querySig);
  console.log("Candidate IDs:", candidates); // Output: [1, 2]

  // fine-grained similarity check
  // compare the query against the most likely candidate (ID: 1)
  const sig1 = dedup.compute_minhash(Deduplicator.get_shingless(docs[0].text, 5), privacySalt);
  const similarity = dedup.estimate_jaccard(querySig, sig1);
  
  console.log(`Estimated Similarity: ${(similarity * 100).toFixed(2)}%`);
}

main();

Why Privacy Salt?

When two parties use the same secret salt, their MinHash signatures become comparable without revealing the original text.
Different salts produce completely incomparable signatures → ideal for secure, private similarity checks (e.g., two users compare notes without sending plaintext).

Performance

  • Tested on 1,000–5,000 short documents: signature computation + indexing < 500ms (browser, mid-range device)
  • Scales to 10k+ docs with Web Workers (recommended for heavy use)
  • Much faster than pure JS alternatives for large texts

Tuning the "Sensitivity" (S-Curve)

The relationship between Bands, Rows, and Similarity follows an S-curve. By adjusting the parameters in new Deduplicator(hashes, bands, seed), you control how "fuzzy" the matching is. | Goal | Hashes | Bands | Threshold (T) | |-------------|--------|-------|---------------| | Strict | 128 | 8 | ≈ 0.84 | | Balanced | 128 | 16 | ≈ 0.70 | | Permissive | 128 | 32 | ≈ 0.55 |

Formula: $$T≈(1/b)1/r$$ where b is bands and r is rows per band

Building from Source

# Install wasm-pack if needed
cargo install wasm-pack

# Build for web target
wasm-pack build --target web

Roadmap

  • Web Worker helpers for non-blocking heavy operations
  • Exact similarity (Jaccard on shingles) only on candidates
  • Streaming/large-text support
  • Advanced PSI extensions for stronger privacy
  • TypeScript types & better docs

Contributing

Contributions welcome! Feel free to open issues or PRs.

  1. Fork the repo
  2. cargo build / wasm-pack build
  3. Make changes
  4. Submit PR

License

MIT © Preetham Pemmasani


If you use this in a project, star the repo and let me know — I'd love to hear about it!
Follow me on X: @ppmpreetham