lttb-js
v1.0.6
Published
High-performance, memory-efficient LTTB implementation, designed to downsample massive datasets with thousands, millions, or hundreds of millions of points while keeping memory usage to a minimum.
Readme
lttb-js
lttb-js is a high-performance, memory-efficient implementation of the Largest-Triangle-Three-Buckets (LTTB) algorithm, designed to downsample massive datasets with thousands, millions, or hundreds of millions of points while keeping memory usage to a minimum.
Performance tests
| Container | Dataset | Memory limit | |-----------|-------------------------|--------------| | node:slim | 100,000,000 (4.2 GB) | 50 MB | | node:slim | 1,000,000,000 (42 GB) | 150 MB |
Installation
npm install lttb-jsUsage
For small datasets with a few thousand elements or less, use the lttb function.
import { lttb } from 'lttb-js';
const dataset = [
{ x: 1, y: 23.32 },
{ x: 2, y: 98.42 },
{ x: 3, y: 15.45 },
...
// 10,000 items
];
const threshold = 100;
const result = await lttb(dataset, threshold);Streaming
For large datasets with millions or hundreds of millions of elements, you should use lttbStream to keep memory usage to a minimum.
import { lttbStream } from 'lttb-js';
const datasetLength = 1_023_435;
const threshold = 1334;
const batchSize = 20_000;
const result = await lttbStream(
datasetLength,
threshold,
batchSize,
async function* (offset, size) {
// Stream the data from an API, a database, or another source
const stream = await streamDataFromDatabase(offset, size);
for await (const item of stream) {
yield { x: item.x, y: item.y };
}
}
);The lttbStream function loads the data as it processes it, preventing load the whole dataset into memory. Every time it is ready for more data, it calls the callback function to fetch the next batch.
Batch requests
If your API or database doesn't support streaming, you can fetch the data normally and pass it to lttbStream in the following way:
const result = await lttbStream(..., async function* (offset, size) {
// Fetch the data from an API, a database, or another source
const data = await fetchDataFromDatabase(offset, size);
for (const item of data) {
yield { x: item.x, y: item.y };
}
});Notice that here we are using a normal for loop and not for await.
Or, if your data already arrives in the { x: number, y: number } format, you can pass it directly using yield*:
const result = await lttbStream(..., async function* (offset, size) {
// Fetch the data from an API, a database, or another source
const data = await fetchDataFromDatabase(offset, size);
yield* data;
});Reference
lttb
lttb(items: Point[], threshold: number): Promise<Point[]>- items: List of data points to process.
- threshold: The number of points to return.
lttbStream
lttbStream(
length: number,
threshold: number,
batchSize: number,
streamFunction: (offset: number, size: number) => AsyncGenerator<Point, void>
): Promise<Point[]>- length: The total length of the dataset.
- threshold: The number of points to return.
- batchSize: The number of items to load each time.
- streamFunction: Function called to load more data.
Point
interface Point {
x: number;
y: number;
}