ribbit-riblt

v1.1.1

Published

2 months ago

Rateless Invertible Bloom Lookup Table (RIBLT) for set reconciliation

Downloads

0High
0Medium
0Low

xori

iblt bloom-filter set-reconciliation ribbit-riblt rateless invertible-bloom-lookup-table set-difference sync

ribbit-RIBLT 🐸

Rateless Invertible Bloom Lookup Table (RIBLT) for set reconciliation. Zero dependencies.

Two peers each hold a set of elements. This library lets them figure out which elements differ — without exchanging the sets themselves. The sender streams IBLT cells; the receiver compares against their own cells and peels the symmetric difference. If the first batch isn't enough, ask for more — the protocol is rateless, so there's no up-front size estimate needed.

Based on the scheme described by Yang et al., using gap-based cell mapping, which yields ~1.5–1.7× overhead relative to the symmetric difference size.

Install

npm install ribbit-riblt

Usage

import { BloomStream, BloomDiff } from "ribbit-riblt";

// Sender has ["cat", "dog"]
const sender = new BloomStream(["cat", "dog"]);
const senderCells = [...sender.next(20)];

// Receiver has ["dog", "bird", "fish"]
const receiver = new BloomStream(["dog", "bird", "fish"]);
const receiverCells = [...receiver.next(20)];

// Compute the diff
const diff = new BloomDiff();
diff.addCells(senderCells, receiverCells);

diff.status; // "peelable"
diff.left; // ["cat"]          — only in sender
diff.right; // ["bird", "fish"] — only in receiver

Streaming (rateless)

If the first batch doesn't resolve, feed more cells. The diff and streams remember their position:

const a = new BloomStream(["alpha", "beta", "gamma"]);
const b = new BloomStream(["delta", "epsilon"]);

const diff = new BloomDiff();

diff.addCells([...a.next(2)], [...b.next(2)]);
diff.status; // "needs more data"

// Pull more cells and extend the diff
diff.addCells([...a.next(30)], [...b.next(30)]);
diff.status; // "peelable"

Custom types

Pass encode/decode for non-string elements:

const stream = new BloomStream([42, 99], {
  encode: (n) => new Uint8Array(new Uint32Array([n]).buffer),
  decode: (b) => new Uint32Array(b.buffer)[0],
});

This works with any structured type. Here's an example reconciling product catalogs:

import { BloomStream, BloomDiff } from "ribbit-riblt";

interface Product {
  id: number;
  name: string;
  tags: string[];
}

const encoder = new TextEncoder();
const decoder = new TextDecoder();

const codec = {
  encode: (p: Product): Uint8Array =>
    encoder.encode(JSON.stringify([p.id, p.name, p.tags])),
  decode: (b: Uint8Array): Product => {
    const [id, name, tags] = JSON.parse(decoder.decode(b));
    return { id, name, tags };
  },
};

const warehouse = [
  { id: 1, name: "Widget", tags: ["sale"] },
  { id: 2, name: "Gadget", tags: ["new"] },
  { id: 3, name: "Doohickey", tags: ["clearance"] },
];

const store = [
  { id: 1, name: "Widget", tags: ["sale"] },
  { id: 3, name: "Doohickey", tags: ["clearance"] },
  { id: 4, name: "Thingamajig", tags: ["exclusive"] },
];

const a = new BloomStream<Product>(warehouse, codec);
const b = new BloomStream<Product>(store, codec);

const diff = new BloomDiff<Product>(codec);
diff.addCells([...a.next(20)], [...b.next(20)]);

diff.status; // "peelable"
diff.left; // [{ id: 2, name: "Gadget", tags: ["new"] }]
diff.right; // [{ id: 4, name: "Thingamajig", tags: ["exclusive"] }]

You want to strive to keep the encoded size of the elements small. It is better to reconcile a list of 64-bit ids, than a list of 50Mb byte streams!!

Wire format

Cells can be serialized for transmission using a protobuf-compatible binary format:

import { serializeCellBatch, deserializeCellBatch } from "ribbit-riblt";

const bytes = serializeCellBatch(cells); // Uint8Array
const { cellOffset, cells } = deserializeCellBatch(bytes);

Individual cells can also be serialized with serializeCell / deserializeCell.

API

`CodecOptions<T>`

Both constructors accept an optional CodecOptions<T> object:

| Option | Type | Default | Description | | -------- | -------------------------- | --------------------- | ------------------------------------- | | encode | (value: T) => Uint8Array | UTF-8 string encoding | Serializes an element to bytes | | decode | (bytes: Uint8Array) => T | UTF-8 string decoding | Deserializes bytes back to an element |

When T is string (the default), both can be omitted.

`BloomStream<T>`

new BloomStream(collection, options?)

Creates a cell stream from a collection. options is a CodecOptions<T> — both encode and decode are used.

.next(n): Generator<Cell> — yields the next n cells. Successive calls continue where the last left off.

`BloomDiff<T>`

new BloomDiff(options?)

Creates an empty diff. options is a CodecOptions<T> — only decode is used (to reconstruct elements from peeled cells).

.addCells(leftCells, rightCells) — appends paired cell batches and re-peels. leftCells and rightCells must have equal length. Can be called multiple times to incrementally extend the diff.

.status — "peelable" if the diff resolved, "needs more data" otherwise
.left — elements present only in the left set
.right — elements present only in the right set

`Cell`

interface Cell {
  idSum: Uint8Array;
  hashSum: bigint;
  count: number;
}

Wire serialization

serializeCell(cell: Cell): Uint8Array
deserializeCell(buf: Uint8Array): Cell
serializeCellBatch(cells: Cell[], cellOffset?: number): Uint8Array
deserializeCellBatch(buf: Uint8Array): { cellOffset: number; cells: Cell[] }

License

ISC

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme