@smol-range/compress

v0.1.0

Published

3 months ago

Library for compressing a list of integer ranges.

0High
0Medium
0Low

valadaptive

compression range unicode

@smol-range/compress

Compress sets of integers (for example, Unicode codepoints) into a compact bitstream.

For information about the encoding strategy, see the repository.

While the corresponding decompression package has less stringent compatibility requirements, this package requires resizable ArrayBuffer support.

Usage

import {compress} from '@smol-range/compress';

// Compress a list of ranges
const ranges: [number, number][] = [[1, 5], [10, 15], [20, 20]];
const compressed = compress(ranges);

// You can also pass single numbers
const mixed = [1, 2, 3, [10, 15], 20];
const compressed2 = compress(mixed);

// Adjacent ranges are automatically merged
const adjacent = [[1, 5], [6, 10]]; // Will be merged to [1, 10]
const compressed3 = compress(adjacent);

// The decompressor works on `Uint8Array`s, but also base64-encoded data. To encode it here, you can use Node's buffer API:
const encoded = Buffer.from(compressed.buffer, compressed.byteOffset, compressed.byteLength).toString('base64');
// Or eventually, the new `toBase64` method (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toBase64):
const encoded2 = compressed.toBase64();

API

`compress(ranges, opts?)`

Compress a set of integers, provided as a sorted list of numbers or ranges.

Parameters:

ranges: Iterable<[number, number] | number> - List of numbers and/or inclusive ranges. Must be sorted in ascending order.
opts?: CompressOptions - Optional configuration:
- maxSize?: number - Maximum output size in bytes. Defaults to 16 MB.

Returns: Uint8Array - The compressed bitstream.

Input requirements:

The input must be sorted in ascending order.
Numbers and ranges can be mixed in any combination.
Ranges are inclusive on both ends: [1, 5] includes 1, 2, 3, 4, and 5
Adjacent ranges are automatically merged: [1, 5], [6, 10] becomes [1, 10]
Ranges cannot overlap: [1, 5], [5, 8] is invalid (the ranges are inclusive, so 5 is in both).
All numbers must be non-negative and at most 2**32 - 1 (4,294,967,295)
- Because of how the encoding works, trying to encode solely the number 4294967295, or the range 0-4294967295, will fail. If there are other numbers in between, or you start at 1, it will work.

Performance

This library is optimized for "gappy" datasets--those which are very sparse, very dense, or a mix of both. It performs well across a wide range of input patterns:

Sparse data: Single integers separated by large gaps
Dense data: Long consecutive runs of integers
Mixed data: Combination of sparse and dense regions
Unicode ranges: Typical use case with clustered code points

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@smol-range/compress

Usage

API

compress(ranges, opts?)

Performance

`compress(ranges, opts?)`