bloom-filter-numbers

v1.0.0

Published

5 months ago

Lightweight Bloom Filter for integers using splitmix64 and XOR fingerprinting

0High
0Medium
0Low

colo.cohen

bloom filter bloom-filter numbers splitmix64 fingerprint hashing deduplication javascript

bloom-filter-numbers

A compact and efficient Bloom Filter implementation optimized for numbers, written in pure JavaScript.
It uses the high-quality splitmix64 as the hashing function and XOR-based fingerprinting to track inserted values.

🚀 Features

⚡ Fast and lightweight
🔢 Optimized for integer keys (numbers / BigInts)
🧠 Tracks inserted values via a fingerprint
📤 Exportable & importable state
📦 No dependencies, works everywhere

✅ When to Use

This library is ideal for optimizing workflows where quick existence checks on numeric values are needed. Typical use cases include:

Deduplication of numeric IDs such as user IDs, transaction hashes, or log entries
Real-time data filtering to prevent reprocessing of already-seen events
P2P systems and messaging layers to track seen message IDs, peers, or blocks
Efficient sync algorithms, where Bloom filters can be exchanged to quickly detect missing items before deeper comparison
Fingerprint verification: compare filters without transferring all data
Performance-critical environments, where string hashing is too heavy or unnecessary

🔢 This implementation is optimized specifically for numbers (integers or BigInts), making it faster and more memory-efficient than generic string-based Bloom filters. It avoids encoding overhead and suits low-level protocols and systems that operate on numeric identifiers.

🧠 Why use a Bloom Filter?

Bloom filters are probabilistic data structures that can quickly tell if a value is definitely not present — or maybe present.

✅ If has(x) returns false, you can be 100% sure the value does not exist.
❓ If has(x) returns true, the value might exist — and you can then run a deeper check (like querying a database).

This makes Bloom filters perfect for reducing unnecessary reads from databases or other expensive operations.

💡 Example:

if (!bloom.has(userId)) {
  // Definitely not seen before — safe to skip DB
  handleNewUser(userId);
} else {
  // Might have seen — verify with the database
  db.checkUser(userId).then(user => {
    if (!user) handleNewUser(userId);
  });
}

📦 Installation

npm install bloom-filter-numbers

📚 Usage

const BloomFilterNumbers = require('bloom-filter-numbers');

// Create a Bloom Filter with 1024 bits and 4 hash functions
const bloom = new BloomFilterNumbers(1024, 4);

// Add values
bloom.add(123);
bloom.add(456);

// Check existence
console.log(bloom.has(123)); // true
console.log(bloom.has(999)); // false

// Get internal stats
console.log(bloom.getStats());

♻️ Export & Import

You can save the filter state and restore it later (for caching, syncing, persistence, etc).

🔄 Export:

const exported = bloom.export();

/*
{
  bits: Uint8Array(...),
  size: 1024,
  numHashes: 4,
  itemCount: 2,
  totalFingerprint: 812374123
}
*/

🔁 Import:

const restored = new BloomFilterNumbers(
  exported.bits,
  exported.numHashes,
  exported.itemCount
);

console.log(restored.has(123)); // true

✅ You must provide all three: bits, numHashes, and itemCount when importing.

🧠 Fingerprinting

Every inserted number is XORed into an internal fingerprint:

console.log(bloom.export().totalFingerprint); // Unique to inserted values

📊 Stats Example

bloom.getStats();
/*
{
  sizeBits: 1024,
  numHashes: 4,
  insertedItems: 2,
  filterSizeBytes: 128,
  rawSizeBytes: 16,
  savedBytes: -112,
  savedPercent: -700,
  fingerprintSafeNumber: 812374123
}
*/

🛠️ Configuration Tips

Use size = itemCount * 10 for ~1% false positive rate
Increase numHashes (default: 4) for slightly better precision
Store bits, numHashes, itemCount if you want to restore the filter later

🔗 Project Links

NPM: bloom-filter-numbers
GitHub: colocohen/bloom-filter-numbers

🧑‍💻 Author

Created with ❤️ by colocohen

📝 License

MIT License

🤝 Contribute or Support

If you find this useful, feel free to star ⭐ the repo or sponsor me.

Pull requests are welcome!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

bloom-filter-numbers

🚀 Features

✅ When to Use

🧠 Why use a Bloom Filter?

💡 Example:

📦 Installation

📚 Usage

♻️ Export & Import

🔄 Export:

🔁 Import:

🧠 Fingerprinting

📊 Stats Example

🛠️ Configuration Tips

🔗 Project Links

🧑‍💻 Author

📝 License

🤝 Contribute or Support