ultratab
v1.1.3
Published
Ultra-fast streaming CSV and XLSX parser for Node.js with native C++ backend, non-blocking threads, and low memory usage for huge files
Maintainers
Readme
UltraTab
Ultra-fast native CSV and XLSX streaming parser for Node.js. Built as a C++ addon with SIMD acceleration, background-thread parsing, and bounded memory for large files (100MB–10GB+).
Features
- Streaming: Parses from disk in chunks; does not load entire files into memory
- Non-blocking: Parsing runs on a C++ background thread; the Node event loop stays responsive
- Two APIs: Row-based
csv()(string[][]) and typed columnarcsvColumns()(TypedArrays) - Typed output: int32, int64, float64, bool → Int32Array, BigInt64Array, Float64Array, Uint8Array
- XLSX support:
xlsx()parses .xlsx files in low-memory streaming mode - SIMD acceleration: AVX2/SSE2 on x86_64 (Linux/Windows); scalar fallback on macOS
Installation
npm install ultratabRequirements: Node.js >= 18, CMake >= 3.15, C++17 compiler (GCC, Clang, or MSVC)
UltraTab is written in TypeScript and ships with full type definitions. Use import (ESM) or require() (CommonJS).
Quick Start
After npm install ultratab, try the example:
node node_modules/ultratab/examples/csv-demo.jsCSV (row-based)
import { csv } from "ultratab";
for await (const batch of csv("data.csv", { batchSize: 10000, headers: true })) {
console.log("Rows in batch:", batch.length);
for (const row of batch) {
console.log(row); // string[]
}
}CSV (typed columnar)
import { csvColumns } from "ultratab";
for await (const batch of csvColumns("data.csv", {
headers: true,
schema: { id: "int32", amount: "float64", active: "bool" },
select: ["id", "amount", "active"],
})) {
const ids = batch.columns.id; // Int32Array
const amounts = batch.columns.amount; // Float64Array
const nulls = batch.nullMask?.amount; // 1 = null
for (let i = 0; i < batch.rows; i++) {
if (nulls?.[i]) continue;
console.log(ids[i], amounts[i]);
}
}XLSX
import { xlsx } from "ultratab";
for await (const batch of xlsx("data.xlsx", {
sheet: 1,
headers: true,
batchSize: 5000,
})) {
console.log("Headers:", batch.headers);
console.log("Rows:", batch.rowsCount);
}API Reference
csv(path, options?)
Returns AsyncIterable<string[][]>. Each batch is an array of rows; each row is string[].
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| delimiter | string | "," | Field delimiter (use "\t" for TSV) |
| quote | string | '"' | Quote character |
| headers | boolean | false | Skip first row as header |
| batchSize | number | 10000 | Rows per batch (1–10,000,000) |
| maxQueueBatches | number | 2 | Max batches in queue (backpressure) |
| useMmap | boolean | false | Use memory-mapped I/O |
| readBufferSize | number | 262144 | Read buffer size in bytes |
csvColumns(path, options?)
Returns AsyncIterable<ColumnarBatch>. Each batch has headers, columns, nullMask, and rows.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| delimiter | string | "," | Field delimiter |
| quote | string | '"' | Quote character |
| headers | boolean | true | First row is header |
| batchSize | number | 10000 | Rows per batch |
| select | string[] | (all) | Columns to keep by header name |
| schema | object | (string) | Per-column: "string", "int32", "int64", "float64", "bool" |
| nullValues | string[] | ["","null","NULL"] | Strings treated as null |
| trim | boolean | false | Trim whitespace |
| typedFallback | string | "null" | On parse failure: "null" or "string" |
xlsx(path, options?)
Returns AsyncIterable<XlsxBatchResult>. Options: sheet, headers, batchSize, select, schema, nullValues, trim, typedFallback.
Performance
Designed for large files: minimal allocations, SIMD-accelerated scanning (x86_64), and bounded backpressure. Typical throughput: hundreds of thousands to millions of rows per second depending on schema and hardware.
Platform Compatibility
- Linux (x86_64): Full SIMD (AVX2/SSE2)
- Windows (x64): Full SIMD
- macOS (Intel/ARM): Scalar path (no SIMD flags; still fast)
Build from Source
git clone https://github.com/haykghukasyan/ultratab.git
cd ultratab
npm install
npm run buildCompiler requirements:
- Windows: Visual Studio Build Tools
- macOS: Xcode Command Line Tools (
xcode-select --install) - Linux:
build-essential(or equivalent)
Troubleshooting
"Native addon not found"
- Run
npm run buildornpm rebuild ultratab - Ensure CMake and a C++17 compiler are installed
Build fails on Windows
- Install Visual Studio Build Tools with the "Desktop development with C++" workload
Build fails on Linux
sudo apt-get install cmake build-essential(Debian/Ubuntu)
License
MIT
