probabilistic-toolkit
v1.0.1
Published
A toolkit for probabilistic algorithms
Readme
Probabilistic Toolkit
Small TypeScript helpers for common probabilistic data structures:
- Bloom Filter + time-partitioned
TimeBaseBF - HyperLogLog for cardinality
- Count-Min Sketch +
WindowedCMS - MinHash for rough text similarity
Install
npm install probabilistic-toolkit
# or directly from git
npm install git+https://github.com/your-org/bloomfilter.gitAll examples below use the package entrypoint:
import {
BloomFilter,
TimeBaseBF,
HyperLogLog,
CountMinSketch,
WindowedCMS,
MinHash,
} from "probabilistic-toolkit";Bloom Filter
const bf = new BloomFilter(1_000_000, 0.01); // expected items, false-positive rate
bf.add("alice");
bf.add("bob");
bf.check("alice"); // true (very likely)
bf.check("mallory"); // falseTime-partitioned Bloom Filter
Rotate filters automatically so old entries expire.
const tbf = new TimeBaseBF(1_000_000, 60_000, 0.01); // items, ttl(ms), fp rate
tbf.add("session-1");
// within ttl -> very likely true
console.log(tbf.check("session-1"));HyperLogLog (cardinality)
const hll = new HyperLogLog(14); // precision 4–16; higher = better accuracy, more memory
for (let i = 0; i < 1_000_000; i++) {
hll.add(`user-${i}`);
}
console.log(hll.count()); // ~1,000,000Serialization:
const bytes = hll.serialize();
const restored = HyperLogLog.deserialize(bytes);Count-Min Sketch
Frequency estimation with conservative updates to reduce overestimation.
const cms = new CountMinSketch(10_000, 4); // width, depth
cms.conservativeAdd("apple", 2);
cms.add("banana");
console.log(cms.estimate("apple")); // ~= 2Sliding window counts
const wcms = new WindowedCMS(
60, // buckets in window
1_000, // bucket duration in ms
() => new CountMinSketch(10_000, 4)
);
wcms.add("ip-1");
setTimeout(() => {
console.log(wcms.estimate("ip-1")); // sum across active buckets
}, 500);MinHash (text similarity, educational)
const seeds = MinHash.generateSeeds(128);
const a = new MinHash(128, seeds);
const b = new MinHash(128, seeds);
a.addText("lorem ipsum dolor sit amet", 3); // shingle size = 3 words
b.addText("lorem dolor ipsum amet sit", 3);
console.log(a.similarity(b)); // 0..1 (approx. Jaccard)Serialization:
const buf = a.serialize();
const restored = MinHash.deserialize(buf, seeds);Notes
- Written in TypeScript; works in Node 18+.
- False positives/estimation error are inherent to probabilistic structures—tune parameters (
width/depth,p, false-positive rate) for your workload.
