bloom-filter-numbers
v2.0.0
Published
Lightweight Bloom Filter for integers using splitmix64 and XOR fingerprinting
Maintainers
Readme
bloom-filter-numbers
Know what you haven't seen — in constant time.
bloom-filter-numbers is a production-grade Bloom filter built for numeric data. It tells you with 100% certainty when a number isn't in your set, and with high confidence when it is — using a fraction of the memory a traditional data structure would need.
One million IDs? That's ~1.1 MB. With a false positive rate under 1%.
Built in pure JavaScript, zero dependencies. Works in Node.js (CommonJS & ESM) and browsers. Ships with TypeScript definitions.
const bloom = new BloomFilterNumbers({ expectedItems: 1_000_000 });
bloom.add(userId);
if (!bloom.has(someId)) {
// This ID has NEVER been added. Guaranteed.
}Why Use This
Every time your app checks "have I seen this before?", you're probably hitting a database, a Set, or a hash map. That works — until it doesn't. At scale, those lookups become bottlenecks.
A Bloom filter flips the question: instead of asking "is this in my data?", you ask "is this definitely NOT in my data?" — and you get the answer in constant time, with zero I/O.
The tradeoff? A small chance of false positives. If has() returns true, the value is probably there. If it returns false, it is definitely not there.
In practice, this means you can skip 99%+ of unnecessary lookups before they happen.
Real-World Use Cases
- Deduplication — Skip already-processed user IDs, transaction hashes, or log entries without querying storage
- Database gatekeeping — Only hit the DB when the Bloom filter says "maybe", saving thousands of queries per second
- P2P sync — Exchange compact filters between nodes to quickly figure out who's missing what
- Stream processing — Filter out duplicate events in real-time pipelines at line speed
- API rate limiting — Track seen request fingerprints without maintaining a full hash set
Installation
npm install bloom-filter-numbersQuick Start
const BloomFilterNumbers = require('bloom-filter-numbers');
// Create a filter expecting 10,000 items with 1% false positive rate
const bloom = new BloomFilterNumbers({ expectedItems: 10_000, falsePositiveRate: 0.01 });
// Add values (numbers or BigInts)
bloom.add(123);
bloom.add(456);
bloom.addMany([789, 1011, 1213]);
// Check membership
bloom.has(123); // true — probably in the set
bloom.has(999); // false — DEFINITELY not in the set
// Inspect the filter
bloom.getStats();Importing the Library
Node.js — CommonJS
const BloomFilterNumbers = require('bloom-filter-numbers');
const { optimalParams } = require('bloom-filter-numbers');Node.js — ESM
import BloomFilterNumbers from 'bloom-filter-numbers';
import { optimalParams } from 'bloom-filter-numbers';Browser — script tag
<script src="path/to/bloom-filter-numbers/index.js"></script>
<script>
const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
bloom.add(42);
</script>Browser — ESM
<script type="module">
import BloomFilterNumbers from './index.mjs';
const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
</script>TypeScript
Type definitions are included out of the box — no @types/ package needed.
import BloomFilterNumbers, { optimalParams } from 'bloom-filter-numbers';
const bloom = new BloomFilterNumbers({ expectedItems: 5000, falsePositiveRate: 0.001 });API Reference
Constructor
// Auto mode (recommended) — calculates optimal size and hash count
new BloomFilterNumbers({ expectedItems: 1000, falsePositiveRate: 0.01 })
// Manual mode — specify exact bit count and hash count
new BloomFilterNumbers(1024, 4)falsePositiveRate defaults to 0.01 (1%) if omitted.
Core Methods
| Method | Returns | Description |
|--------|---------|-------------|
| add(x) | boolean | Add a number or BigInt. Returns true if probably new, false if probably existed. Always sets bits regardless. |
| addMany(iterable) | void | Add multiple values at once. |
| has(x) | boolean | Check if a value might be in the set. false = definitely not present. |
| clear() | void | Reset the filter to empty, keeping size and hash configuration. |
| isSaturated() | boolean | Check if the filter has exceeded its designed capacity. |
| fingerprintsMatch(other) | boolean | Compare dual fingerprints with another filter. |
| estimatedFalsePositiveRate() | number | Current estimated false positive probability (0–1). |
| getStats() | object | Full statistics snapshot. |
| merge(other) | void | Merge another filter into this one (bitwise OR). |
Static Methods
| Method | Returns | Description |
|--------|---------|-------------|
| BloomFilterNumbers.from(state) | BloomFilterNumbers | Restore from binary export. |
| BloomFilterNumbers.fromJSON(json) | BloomFilterNumbers | Restore from JSON export. |
| BloomFilterNumbers.union(a, b) | BloomFilterNumbers | Create new filter from two existing ones. |
Properties
| Property | Type | Description |
|----------|------|-------------|
| size | number | Total bits in the filter |
| numHashes | number | Number of hash functions |
| itemCount | number | Total add() calls (includes duplicates) |
| onSaturated | function \| null | Callback fired once when filter exceeds capacity |
Statistics Output
bloom.getStats();
// {
// sizeBits: 95851,
// sizeBytes: 11982,
// numHashes: 7,
// itemCount: 5,
// uniqueCount: 5,
// expectedItems: 10000,
// fillRatio: 0.00036,
// estimatedFalsePositiveRate: 3.5e-19,
// isSaturated: false,
// fingerprint: '10316936872266494535',
// additiveFingerprint: '4829176453021839472'
// }Helper
const { optimalParams } = require('bloom-filter-numbers');
optimalParams(10000, 0.01); // { size: 95851, numHashes: 7 }
optimalParams(10000, 0.001); // { size: 143776, numHashes: 10 }Export & Import
The library supports two serialization formats: binary (for in-memory transfer) and JSON (for storage, network, and caching). Both formats preserve the full filter state including dual fingerprints, uniqueCount, and expectedItems.
Binary Export / Import
Best for passing filters between processes or storing in binary-friendly backends.
// Export
const state = bloom.export();
// {
// bits: Uint8Array(...),
// size: 95851,
// numHashes: 7,
// itemCount: 100,
// uniqueCount: 98,
// fingerprint: '10316936872266494535',
// additiveFingerprint: '4829176453021839472',
// expectedItems: 10000
// }
// Import
const restored = BloomFilterNumbers.from(state);
restored.has(123); // true — same as originalJSON Export / Import
Best for saving to files, databases, localStorage, or sending over HTTP. The bit array is encoded as base64.
// Export to JSON-safe object
const json = bloom.toJSON();
// Save to file
const fs = require('fs');
fs.writeFileSync('filter.json', JSON.stringify(json));
// Load from file
const loaded = JSON.parse(fs.readFileSync('filter.json', 'utf8'));
const restored = BloomFilterNumbers.fromJSON(loaded);Sending Over HTTP
// Server: send filter to client
app.get('/filter', (req, res) => {
res.json(bloom.toJSON());
});
// Client: receive and restore
const response = await fetch('/filter');
const json = await response.json();
const bloom = BloomFilterNumbers.fromJSON(json);Saving to localStorage (browser)
// Save
localStorage.setItem('myFilter', JSON.stringify(bloom.toJSON()));
// Restore
const saved = JSON.parse(localStorage.getItem('myFilter'));
const bloom = BloomFilterNumbers.fromJSON(saved);Dual Fingerprinting
Every inserted value contributes to two internal fingerprints:
- XOR fingerprint — each value is hashed and XOR'd into a running value
- Additive fingerprint — the same hash is added to a separate accumulator
Both are order-independent — inserting [10, 20, 30] or [30, 10, 20] produces the same fingerprints.
Why two? A single XOR fingerprint can collide — two completely different sets might produce the same XOR value (for example, because XOR is its own inverse: a ^ b ^ a = b). By combining XOR and addition, the collision probability drops drastically. Both would need to collide simultaneously.
Comparing Filters
const a = new BloomFilterNumbers(2048, 4);
a.addMany([10, 20, 30]);
const b = new BloomFilterNumbers(2048, 4);
b.addMany([30, 10, 20]); // different order, same values
a.fingerprintsMatch(b); // true — both fingerprints match
const c = new BloomFilterNumbers(2048, 4);
c.addMany([10, 20, 99]);
a.fingerprintsMatch(c); // falseThis is especially useful in distributed systems where you want to check if two nodes have the same data without transferring the full bit array — just compare two numbers.
Unique Count vs Item Count
The filter tracks two counters:
itemCount— total number ofadd()calls, including duplicatesuniqueCount— increments only when a value was probably new (i.e.,has()returnedfalsebefore insertion)
This matters for accuracy: estimatedFalsePositiveRate() uses uniqueCount rather than itemCount, giving you a more realistic estimate of the filter's current precision.
const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
bloom.add(42);
bloom.add(42); // duplicate
bloom.add(43);
bloom.itemCount; // 3 — every add() counted
bloom.getStats().uniqueCount; // 2 — only distinct valuesSaturation Detection
A Bloom filter becomes useless when it's overfilled — the false positive rate approaches 100% and has() returns true for almost everything.
This library watches for that. If you created the filter with expectedItems, it will flag saturation when uniqueCount exceeds 2× the expected capacity. In manual mode (no expectedItems), it checks if the fill ratio exceeds 50%.
Checking Manually
const bloom = new BloomFilterNumbers({ expectedItems: 100 });
for (let i = 0; i < 500; i++) bloom.add(i);
bloom.isSaturated(); // true
bloom.getStats().isSaturated; // trueAutomatic Callback
You can register a callback that fires once when saturation is detected:
const bloom = new BloomFilterNumbers({ expectedItems: 1000 });
bloom.onSaturated = (stats) => {
console.warn('Bloom filter is saturated!', stats.estimatedFalsePositiveRate);
// Maybe: create a new larger filter, alert monitoring, etc.
};
// ... later, after inserting too many items, the callback fires automaticallyThe callback fires once — it won't spam you on every subsequent add().
Merge & Union
Combine two Bloom filters that share the same size and hash count. Both fingerprints and counters are merged correctly.
Merge (in-place)
const nodeA = new BloomFilterNumbers({ expectedItems: 10000 });
const nodeB = new BloomFilterNumbers({ expectedItems: 10000 });
nodeA.addMany([1, 2, 3]);
nodeB.addMany([4, 5, 6]);
nodeA.merge(nodeB); // modifies nodeA
nodeA.has(4); // trueUnion (creates new filter)
const combined = BloomFilterNumbers.union(nodeA, nodeB);
// nodeA and nodeB are unchanged
combined.has(1); // true
combined.has(4); // trueP2P Sync Example
// Node A sends its compact filter to Node B
const filterData = nodeA.toJSON();
sendToNodeB(filterData); // just a small JSON payload
// Node B receives it and finds what A is missing
const remoteFilter = BloomFilterNumbers.fromJSON(filterData);
for (const id of myLocalIds) {
if (!remoteFilter.has(id)) {
// Node A definitely doesn't have this — send it over
sendDataToNodeA(id);
}
}Performance
The library is optimized for throughput with several techniques under the hood:
- Double hashing (Kirsch & Mitzenmacher, 2006) — only 2 hash computations per operation instead of k, with mathematically equivalent distribution
- Single-pass add — checks and sets bits in one loop instead of two
- Pre-cached BigInt constants —
BigInt(size)andBigInt(i)are allocated once at construction, not on every call - Popcount lookup table —
_bitsSet()uses a 256-entry table instead of bit-by-bit counting, ~92% faster - Lazy second hash —
has()computes only the first hash before checking the first bit; if it's zero, it returns immediately without computing the second hash
Benchmark Results (Node.js)
Core Operations
| Operation | 10K items | 100K items | 1M items |
|-----------|-----------|------------|----------|
| add() | 623K ops/sec | 777K ops/sec | 689K ops/sec |
| has() positive | 564K ops/sec | 781K ops/sec | 792K ops/sec |
| has() negative | 1.53M ops/sec | 1.79M ops/sec | 1.56M ops/sec |
Serialization
| Operation | 10K items | 100K items |
|-----------|-----------|------------|
| export() | 133K ops/sec | 14.7K ops/sec |
| from() | 74K ops/sec | 6.6K ops/sec |
| toJSON() | — | 7.5K ops/sec |
| fromJSON() | — | 4.1K ops/sec |
| Full JSON round-trip | — | 1.5K ops/sec |
Merge
| Operation | 10K items | 100K items |
|-----------|-----------|------------|
| merge() | 29K ops/sec | 3.5K ops/sec |
False Positive Rate — Empirical
| Target FPR | Actual FPR | Filter Size | Hashes | |------------|------------|-------------|--------| | 10% | 10.15% | 47,926 bits | 3 | | 1% | 1.08% | 95,851 bits | 7 | | 0.1% | 0.12% | 143,776 bits | 10 |
Memory Usage
| Items | Filter Size | Bits/Item | Est. FPR | |-------|-------------|-----------|----------| | 100 | 120 B | 9.6 | 1.00% | | 1,000 | 1.2 KB | 9.6 | 0.98% | | 10,000 | 11.7 KB | 9.6 | 0.99% | | 100,000 | 117 KB | 9.6 | 1.00% | | 1,000,000 | 1.1 MB | 9.6 | 1.00% |
Run the benchmarks yourself:
node benchmark.jsConfiguration Tips
| Scenario | Suggested Config |
|----------|-----------------|
| General use | { expectedItems: n, falsePositiveRate: 0.01 } |
| Low memory | { expectedItems: n, falsePositiveRate: 0.1 } |
| High precision | { expectedItems: n, falsePositiveRate: 0.001 } |
| Manual control | new BloomFilterNumbers(n * 10, 7) |
Rule of thumb: ~9.6 bits per item gives ~1% false positive rate with 7 hash functions.
Running Tests
node test.js
# 110 tests covering: constructor modes, add/has, BigInt support, input
# validation, export/import (binary + JSON), merge/union, saturation
# detection, dual fingerprinting, and empirical false positive verificationProject Links
- NPM: bloom-filter-numbers
- GitHub: colocohen/bloom-filter-numbers
Author
Created by colocohen
License
MIT
