npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

bloom-filter-numbers

v2.0.0

Published

Lightweight Bloom Filter for integers using splitmix64 and XOR fingerprinting

Readme

bloom-filter-numbers

Know what you haven't seen — in constant time.

bloom-filter-numbers is a production-grade Bloom filter built for numeric data. It tells you with 100% certainty when a number isn't in your set, and with high confidence when it is — using a fraction of the memory a traditional data structure would need.

One million IDs? That's ~1.1 MB. With a false positive rate under 1%.

Built in pure JavaScript, zero dependencies. Works in Node.js (CommonJS & ESM) and browsers. Ships with TypeScript definitions.

const bloom = new BloomFilterNumbers({ expectedItems: 1_000_000 });

bloom.add(userId);

if (!bloom.has(someId)) {
  // This ID has NEVER been added. Guaranteed.
}

Why Use This

Every time your app checks "have I seen this before?", you're probably hitting a database, a Set, or a hash map. That works — until it doesn't. At scale, those lookups become bottlenecks.

A Bloom filter flips the question: instead of asking "is this in my data?", you ask "is this definitely NOT in my data?" — and you get the answer in constant time, with zero I/O.

The tradeoff? A small chance of false positives. If has() returns true, the value is probably there. If it returns false, it is definitely not there.

In practice, this means you can skip 99%+ of unnecessary lookups before they happen.

Real-World Use Cases

  • Deduplication — Skip already-processed user IDs, transaction hashes, or log entries without querying storage
  • Database gatekeeping — Only hit the DB when the Bloom filter says "maybe", saving thousands of queries per second
  • P2P sync — Exchange compact filters between nodes to quickly figure out who's missing what
  • Stream processing — Filter out duplicate events in real-time pipelines at line speed
  • API rate limiting — Track seen request fingerprints without maintaining a full hash set

Installation

npm install bloom-filter-numbers

Quick Start

const BloomFilterNumbers = require('bloom-filter-numbers');

// Create a filter expecting 10,000 items with 1% false positive rate
const bloom = new BloomFilterNumbers({ expectedItems: 10_000, falsePositiveRate: 0.01 });

// Add values (numbers or BigInts)
bloom.add(123);
bloom.add(456);
bloom.addMany([789, 1011, 1213]);

// Check membership
bloom.has(123);  // true  — probably in the set
bloom.has(999);  // false — DEFINITELY not in the set

// Inspect the filter
bloom.getStats();

Importing the Library

Node.js — CommonJS

const BloomFilterNumbers = require('bloom-filter-numbers');
const { optimalParams } = require('bloom-filter-numbers');

Node.js — ESM

import BloomFilterNumbers from 'bloom-filter-numbers';
import { optimalParams } from 'bloom-filter-numbers';

Browser — script tag

<script src="path/to/bloom-filter-numbers/index.js"></script>
<script>
  const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
  bloom.add(42);
</script>

Browser — ESM

<script type="module">
  import BloomFilterNumbers from './index.mjs';
  const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
</script>

TypeScript

Type definitions are included out of the box — no @types/ package needed.

import BloomFilterNumbers, { optimalParams } from 'bloom-filter-numbers';
const bloom = new BloomFilterNumbers({ expectedItems: 5000, falsePositiveRate: 0.001 });

API Reference

Constructor

// Auto mode (recommended) — calculates optimal size and hash count
new BloomFilterNumbers({ expectedItems: 1000, falsePositiveRate: 0.01 })

// Manual mode — specify exact bit count and hash count
new BloomFilterNumbers(1024, 4)

falsePositiveRate defaults to 0.01 (1%) if omitted.

Core Methods

| Method | Returns | Description | |--------|---------|-------------| | add(x) | boolean | Add a number or BigInt. Returns true if probably new, false if probably existed. Always sets bits regardless. | | addMany(iterable) | void | Add multiple values at once. | | has(x) | boolean | Check if a value might be in the set. false = definitely not present. | | clear() | void | Reset the filter to empty, keeping size and hash configuration. | | isSaturated() | boolean | Check if the filter has exceeded its designed capacity. | | fingerprintsMatch(other) | boolean | Compare dual fingerprints with another filter. | | estimatedFalsePositiveRate() | number | Current estimated false positive probability (0–1). | | getStats() | object | Full statistics snapshot. | | merge(other) | void | Merge another filter into this one (bitwise OR). |

Static Methods

| Method | Returns | Description | |--------|---------|-------------| | BloomFilterNumbers.from(state) | BloomFilterNumbers | Restore from binary export. | | BloomFilterNumbers.fromJSON(json) | BloomFilterNumbers | Restore from JSON export. | | BloomFilterNumbers.union(a, b) | BloomFilterNumbers | Create new filter from two existing ones. |

Properties

| Property | Type | Description | |----------|------|-------------| | size | number | Total bits in the filter | | numHashes | number | Number of hash functions | | itemCount | number | Total add() calls (includes duplicates) | | onSaturated | function \| null | Callback fired once when filter exceeds capacity |

Statistics Output

bloom.getStats();
// {
//   sizeBits: 95851,
//   sizeBytes: 11982,
//   numHashes: 7,
//   itemCount: 5,
//   uniqueCount: 5,
//   expectedItems: 10000,
//   fillRatio: 0.00036,
//   estimatedFalsePositiveRate: 3.5e-19,
//   isSaturated: false,
//   fingerprint: '10316936872266494535',
//   additiveFingerprint: '4829176453021839472'
// }

Helper

const { optimalParams } = require('bloom-filter-numbers');

optimalParams(10000, 0.01);   // { size: 95851, numHashes: 7 }
optimalParams(10000, 0.001);  // { size: 143776, numHashes: 10 }

Export & Import

The library supports two serialization formats: binary (for in-memory transfer) and JSON (for storage, network, and caching). Both formats preserve the full filter state including dual fingerprints, uniqueCount, and expectedItems.

Binary Export / Import

Best for passing filters between processes or storing in binary-friendly backends.

// Export
const state = bloom.export();
// {
//   bits: Uint8Array(...),
//   size: 95851,
//   numHashes: 7,
//   itemCount: 100,
//   uniqueCount: 98,
//   fingerprint: '10316936872266494535',
//   additiveFingerprint: '4829176453021839472',
//   expectedItems: 10000
// }

// Import
const restored = BloomFilterNumbers.from(state);
restored.has(123); // true — same as original

JSON Export / Import

Best for saving to files, databases, localStorage, or sending over HTTP. The bit array is encoded as base64.

// Export to JSON-safe object
const json = bloom.toJSON();

// Save to file
const fs = require('fs');
fs.writeFileSync('filter.json', JSON.stringify(json));

// Load from file
const loaded = JSON.parse(fs.readFileSync('filter.json', 'utf8'));
const restored = BloomFilterNumbers.fromJSON(loaded);

Sending Over HTTP

// Server: send filter to client
app.get('/filter', (req, res) => {
  res.json(bloom.toJSON());
});

// Client: receive and restore
const response = await fetch('/filter');
const json = await response.json();
const bloom = BloomFilterNumbers.fromJSON(json);

Saving to localStorage (browser)

// Save
localStorage.setItem('myFilter', JSON.stringify(bloom.toJSON()));

// Restore
const saved = JSON.parse(localStorage.getItem('myFilter'));
const bloom = BloomFilterNumbers.fromJSON(saved);

Dual Fingerprinting

Every inserted value contributes to two internal fingerprints:

  1. XOR fingerprint — each value is hashed and XOR'd into a running value
  2. Additive fingerprint — the same hash is added to a separate accumulator

Both are order-independent — inserting [10, 20, 30] or [30, 10, 20] produces the same fingerprints.

Why two? A single XOR fingerprint can collide — two completely different sets might produce the same XOR value (for example, because XOR is its own inverse: a ^ b ^ a = b). By combining XOR and addition, the collision probability drops drastically. Both would need to collide simultaneously.

Comparing Filters

const a = new BloomFilterNumbers(2048, 4);
a.addMany([10, 20, 30]);

const b = new BloomFilterNumbers(2048, 4);
b.addMany([30, 10, 20]); // different order, same values

a.fingerprintsMatch(b); // true — both fingerprints match

const c = new BloomFilterNumbers(2048, 4);
c.addMany([10, 20, 99]);

a.fingerprintsMatch(c); // false

This is especially useful in distributed systems where you want to check if two nodes have the same data without transferring the full bit array — just compare two numbers.


Unique Count vs Item Count

The filter tracks two counters:

  • itemCount — total number of add() calls, including duplicates
  • uniqueCount — increments only when a value was probably new (i.e., has() returned false before insertion)

This matters for accuracy: estimatedFalsePositiveRate() uses uniqueCount rather than itemCount, giving you a more realistic estimate of the filter's current precision.

const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
bloom.add(42);
bloom.add(42);  // duplicate
bloom.add(43);

bloom.itemCount;               // 3 — every add() counted
bloom.getStats().uniqueCount;  // 2 — only distinct values

Saturation Detection

A Bloom filter becomes useless when it's overfilled — the false positive rate approaches 100% and has() returns true for almost everything.

This library watches for that. If you created the filter with expectedItems, it will flag saturation when uniqueCount exceeds 2× the expected capacity. In manual mode (no expectedItems), it checks if the fill ratio exceeds 50%.

Checking Manually

const bloom = new BloomFilterNumbers({ expectedItems: 100 });

for (let i = 0; i < 500; i++) bloom.add(i);

bloom.isSaturated();           // true
bloom.getStats().isSaturated;  // true

Automatic Callback

You can register a callback that fires once when saturation is detected:

const bloom = new BloomFilterNumbers({ expectedItems: 1000 });

bloom.onSaturated = (stats) => {
  console.warn('Bloom filter is saturated!', stats.estimatedFalsePositiveRate);
  // Maybe: create a new larger filter, alert monitoring, etc.
};

// ... later, after inserting too many items, the callback fires automatically

The callback fires once — it won't spam you on every subsequent add().


Merge & Union

Combine two Bloom filters that share the same size and hash count. Both fingerprints and counters are merged correctly.

Merge (in-place)

const nodeA = new BloomFilterNumbers({ expectedItems: 10000 });
const nodeB = new BloomFilterNumbers({ expectedItems: 10000 });

nodeA.addMany([1, 2, 3]);
nodeB.addMany([4, 5, 6]);

nodeA.merge(nodeB); // modifies nodeA
nodeA.has(4); // true

Union (creates new filter)

const combined = BloomFilterNumbers.union(nodeA, nodeB);
// nodeA and nodeB are unchanged
combined.has(1); // true
combined.has(4); // true

P2P Sync Example

// Node A sends its compact filter to Node B
const filterData = nodeA.toJSON();
sendToNodeB(filterData); // just a small JSON payload

// Node B receives it and finds what A is missing
const remoteFilter = BloomFilterNumbers.fromJSON(filterData);
for (const id of myLocalIds) {
  if (!remoteFilter.has(id)) {
    // Node A definitely doesn't have this — send it over
    sendDataToNodeA(id);
  }
}

Performance

The library is optimized for throughput with several techniques under the hood:

  • Double hashing (Kirsch & Mitzenmacher, 2006) — only 2 hash computations per operation instead of k, with mathematically equivalent distribution
  • Single-pass add — checks and sets bits in one loop instead of two
  • Pre-cached BigInt constantsBigInt(size) and BigInt(i) are allocated once at construction, not on every call
  • Popcount lookup table_bitsSet() uses a 256-entry table instead of bit-by-bit counting, ~92% faster
  • Lazy second hashhas() computes only the first hash before checking the first bit; if it's zero, it returns immediately without computing the second hash

Benchmark Results (Node.js)

Core Operations

| Operation | 10K items | 100K items | 1M items | |-----------|-----------|------------|----------| | add() | 623K ops/sec | 777K ops/sec | 689K ops/sec | | has() positive | 564K ops/sec | 781K ops/sec | 792K ops/sec | | has() negative | 1.53M ops/sec | 1.79M ops/sec | 1.56M ops/sec |

Serialization

| Operation | 10K items | 100K items | |-----------|-----------|------------| | export() | 133K ops/sec | 14.7K ops/sec | | from() | 74K ops/sec | 6.6K ops/sec | | toJSON() | — | 7.5K ops/sec | | fromJSON() | — | 4.1K ops/sec | | Full JSON round-trip | — | 1.5K ops/sec |

Merge

| Operation | 10K items | 100K items | |-----------|-----------|------------| | merge() | 29K ops/sec | 3.5K ops/sec |

False Positive Rate — Empirical

| Target FPR | Actual FPR | Filter Size | Hashes | |------------|------------|-------------|--------| | 10% | 10.15% | 47,926 bits | 3 | | 1% | 1.08% | 95,851 bits | 7 | | 0.1% | 0.12% | 143,776 bits | 10 |

Memory Usage

| Items | Filter Size | Bits/Item | Est. FPR | |-------|-------------|-----------|----------| | 100 | 120 B | 9.6 | 1.00% | | 1,000 | 1.2 KB | 9.6 | 0.98% | | 10,000 | 11.7 KB | 9.6 | 0.99% | | 100,000 | 117 KB | 9.6 | 1.00% | | 1,000,000 | 1.1 MB | 9.6 | 1.00% |

Run the benchmarks yourself:

node benchmark.js

Configuration Tips

| Scenario | Suggested Config | |----------|-----------------| | General use | { expectedItems: n, falsePositiveRate: 0.01 } | | Low memory | { expectedItems: n, falsePositiveRate: 0.1 } | | High precision | { expectedItems: n, falsePositiveRate: 0.001 } | | Manual control | new BloomFilterNumbers(n * 10, 7) |

Rule of thumb: ~9.6 bits per item gives ~1% false positive rate with 7 hash functions.


Running Tests

node test.js
# 110 tests covering: constructor modes, add/has, BigInt support, input
# validation, export/import (binary + JSON), merge/union, saturation
# detection, dual fingerprinting, and empirical false positive verification

Project Links

Author

Created by colocohen

License

MIT