bloom-sift
v1.0.0
Published
Bloom filter with 128-bit MurmurHash3 optimization using Kirsch-Mitzenmacher technique
Maintainers
Readme
bloom-sift
Bloom filter using 128-bit MurmurHash3 with Kirsch-Mitzenmacher double hashing.
Overview
A space-efficient probabilistic data structure for set membership testing. Uses a single 128-bit hash to derive all k hash values via double hashing:
h(i) = h1 + i * h2Features:
- Single hash call for all k values (fast)
- Automatic optimal parameter calculation
- Serialization for storage/transfer
- Works in Node.js and browsers
- Zero dependencies (except murmur-hash)
Installation
npm install bloom-siftQuick Start
import { BloomSift } from 'bloom-sift';
const filter = new BloomSift({ capacity: 1000, errorRate: 0.01 });
filter.add('user:123');
filter.has('user:123'); // true
filter.has('user:456'); // false (probably)API
Constructor
new BloomSift({ capacity: number, errorRate: number })capacity- Expected number of itemserrorRate- Desired false positive rate (0 < p < 1)
Methods
filter.add(item: string | Uint8Array): void
filter.has(item: string | Uint8Array): boolean
filter.clear(): void
filter.serialize(): SerializedBloomSift
BloomSift.deserialize(data: SerializedBloomSift): BloomSiftProperties
filter.size // number of bits
filter.hashCount // number of hash functions (k)
filter.count // items added
filter.fillRatio // saturation (0 to 1)Utility
import { calculateOptimalParams } from 'bloom-sift';
const { size, hashCount } = calculateOptimalParams(1000, 0.01);
// { size: 9586, hashCount: 7 }Usage Examples
Serialization
const data = filter.serialize();
localStorage.setItem('filter', JSON.stringify(data));
// Later...
const restored = BloomSift.deserialize(JSON.parse(localStorage.getItem('filter')));Binary Data
filter.add(new Uint8Array([1, 2, 3]));
filter.has(new Uint8Array([1, 2, 3])); // trueReusing Filters
filter.add('item-1');
filter.clear();
filter.count; // 0How It Works
Given capacity n and error rate p:
- Bit size:
m = -n * ln(p) / (ln(2)²) - Hash count:
k = (m/n) * ln(2)
The 128-bit MurmurHash3 output is split into two 64-bit values (h1, h2). Each of the k bit positions is computed as (h1 + i * h2) % m, providing the same theoretical guarantees as k independent hashes.
Performance
Tested on Apple M1 Pro:
| Operation | Ops/sec | |-----------|---------| | add | ~220K | | has | ~250K |
Run benchmarks:
npm run benchTypeScript
Full TypeScript support with bundled type definitions. Exports:
BloomSift- Main classBloomSiftOptions- Constructor options interfaceSerializedBloomSift- Serialization format interfacecalculateOptimalParams- Utility function
Limitations
- No deletions - Use counting Bloom filters for that
- Fixed size - Capacity must be set upfront
- False positives - May report items as present when they're not (never false negatives)
License
MIT
