csv-super
v1.0.0
Published
Read a 10GB CSV file without crashing RAM — Streaming CSV parser with fixed 50MB memory footprint
Maintainers
Readme
csv-super ⚡
Read a 10GB CSV file without crashing RAM.
Fixed ~50MB memory footprint. RFC 4180 compliant. Zero dependencies.
The Real Problem
// ❌ This CRASHES on a 10GB file — allocates 16+ GB of RAM
const results = [];
fs.createReadStream('data.csv')
.pipe(csvParser())
.on('data', row => results.push(row)); // ← ALL rows land in heapWhy it crashes: csv-parser, fast-csv, and papaparse emit rows via EventEmitter 'data' events. With no backpressure applied, the stream reads at disk speed (~500 MB/s) and piles all rows into an array. By the time a 10GB file finishes, the V8 heap holds 15–30GB of objects (JSON has 1.5–3× overhead).
The Solution
import { csvSuper } from 'csv-super';
// ✅ RAM stays at ~50MB regardless of file size
for await (const { rows, totalSoFar } of csvSuper('data.csv', { batch: 1000 })) {
await db.insertMany(rows); // process 1000 rows at a time
console.log(`Processed: ${totalSoFar}`); // cumulative counter built-in
}Why it works: csv-super uses an async function* (Async Generator). Each yield suspends the generator until the consumer's await resolves. While suspended, fs.createReadStream is automatically paused — no data accumulates in the heap. This is native JavaScript backpressure.
Performance Comparison
| Metric | csv-parser (traditional) | csv-super | |-------------------|--------------------------|------------------------| | 1GB file | ~2GB RAM | ~50MB RAM | | 10GB file | 💀 OOM crash | ~50MB RAM | | First row delay | After full file read | Within seconds | | TypeScript | Partial | 100% typed | | RFC 4180 | Partial | Full compliance | | Dependencies | 2 | 0 | | Node.js ≥ 18 | ✅ | ✅ |
Installation
npm install csv-superRequirements: Node.js ≥ 18.0.0 (uses native fs.createReadStream + Async Generators)
Quick Start
import { csvSuper } from 'csv-super';
// Basic: 1000 rows per batch (default)
for await (const batch of csvSuper('sales.csv')) {
console.log(batch.rows); // CsvRow[] = Record<string, string>[]
console.log(batch.batchIndex); // 0, 1, 2, ...
console.log(batch.count); // rows in this batch (≤ 1000)
console.log(batch.totalSoFar); // cumulative rows processed
}API Reference
csvSuper(filePath, options?) → AsyncGenerator<BatchResult>
import { csvSuper } from 'csv-super';
import type { CsvSuperOptions, BatchResult } from 'csv-super';Options
| Option | Type | Default | Description |
|-----------------|-----------------------------------------|--------------|------------------------------------------|
| batch | number | 1000 | Rows per yielded batch (1–100000) |
| delimiter | string | ',' | Field separator (single char) |
| quote | string | '"' | Quote character |
| escape | string | '"' | Escape char (RFC 4180: same as quote) |
| headers | boolean | true | First row = column names |
| skipEmptyLines| boolean | true | Skip blank lines |
| encoding | 'utf8' \| 'utf16le' \| 'latin1' \| 'auto' | 'auto' | File encoding (auto = BOM detection) |
| chunkSize | number | 65536 | Read buffer size in bytes (≥ 1024) |
| onProgress | (info: ProgressInfo) => void | null | Progress callback |
BatchResult
interface BatchResult {
rows: CsvRow[]; // The parsed rows
batchIndex: number; // 0-based batch counter
count: number; // rows.length (convenience)
totalSoFar: number; // cumulative rows across all batches
}ProgressInfo
interface ProgressInfo {
bytesRead: number; // bytes consumed so far
totalBytes: number; // total file size
percentage: number; // 0.0–100.0
speedMBps: number; // current read speed (sliding window)
estimatedSecondsLeft: number; // ETA in seconds
rowsProcessed: number; // rows parsed so far
}Examples
Insert into Database
import { csvSuper } from 'csv-super';
for await (const batch of csvSuper('customers.csv', { batch: 5000 })) {
await db.customers.insertMany(batch.rows);
console.log(`✅ Inserted ${batch.totalSoFar} customers`);
}Progress Bar
for await (const batch of csvSuper('large.csv', {
batch: 5000,
onProgress: ({ percentage, speedMBps, estimatedSecondsLeft }) => {
const bar = '█'.repeat(Math.floor(percentage / 5)).padEnd(20, '░');
process.stdout.write(
`\r[${bar}] ${percentage.toFixed(1)}% @ ${speedMBps.toFixed(1)} MB/s`
);
},
})) {
await processRows(batch.rows);
}TSV Files
for await (const batch of csvSuper('data.tsv', { delimiter: '\t' })) {
// ...
}Early Termination (stream closes automatically, no leak)
for await (const batch of csvSuper('events.csv')) {
for (const row of batch.rows) {
if (row.severity === 'CRITICAL') {
console.log('Critical event found:', row);
return; // or break — stream closes cleanly
}
}
}No Headers (index-based access)
for await (const batch of csvSuper('raw.csv', { headers: false })) {
// rows: { '0': 'value1', '1': 'value2', ... }
console.log(batch.rows[0]?.['0']);
}Pro Features ($17/month)
For enterprise workloads that process CSV files daily, csv-super Pro adds:
Multi-Thread Processing (Worker Threads)
import { csvSuperPro } from 'csv-super';
for await (const batch of csvSuperPro('huge-file.csv', {
licenseKey: process.env.CSV_SUPER_KEY,
threads: 8, // Use 8 CPU cores in parallel
batch: 10_000,
})) {
await db.insertMany(batch.rows);
}Speed: ~N× faster on N-core machines. A file that takes 60s on 1 core takes ~8s on 8 cores.
Transform Pipeline
import { csvSuperPro, TransformPipeline } from 'csv-super';
const pipeline = new TransformPipeline()
.filter(row => row.status === 'active') // Filter rows
.trim() // Trim all fields
.select(['id', 'name', 'email', 'salary']) // Select columns
.rename({ 'id': 'employee_id' }) // Rename columns
.mapField('salary', v => String(parseInt(v, 10))) // Type coerce
.pipe(async row => { // Async enrichment
const dept = await getDept(row.employee_id);
return { ...row, department: dept };
});
for await (const batch of csvSuperPro('employees.csv', {
licenseKey: process.env.CSV_SUPER_KEY,
transform: pipeline.toFn(),
})) {
await db.employees.insertMany(batch.rows);
}Pro Pricing
| Plan | Price | Features | |------------|------------|------------------------------------------------------| | Free | $0 forever | Streaming + Batch + TypeScript + RFC 4180 + Progress | | Pro | $17/month | + Multi-thread + Transform + Priority support | | Enterprise | Custom | + SLA + Custom seat count + Dedicated support |
Architecture
I/O Layer Parser Layer Delivery Layer
────────── ────────────── ───────────────
fs.createReadStream → CsvParser (FSM) → BatchController
↓ ↓ ↓
64KB chunks RFC 4180 State Machine Async Generator
(backpressure) (incremental, chunked) (yield + await)Key insight: When the consumer awaits inside for await...of, the Async Generator suspends at the yield. While suspended, getNextChunk() is not called, so readStream.resume() is never called — the stream stays paused. This is mechanical, guaranteed backpressure.
Memory formula:
heap ≈ chunkSize (64KB) + batch_size × avg_row_size
≈ 0.064MB + 1000 × 0.05MB
≈ ~50MB (constant regardless of file size)RFC 4180 Compliance
Full support for all CSV edge cases:
✅ Fields with commas: "Smith, John"
✅ Fields with newlines: "line1\nline2"
✅ Escaped quotes (x2): "He said ""hi"""
✅ Empty fields: a,,b
✅ CRLF line endings: \r\n
✅ No trailing newline (handled by finalize())
✅ Unicode content: UTF-8, UTF-16 LE/BE
✅ Custom delimiters: TSV (\t), PSV (|), SSV (;)Error Handling
import { csvSuper, CsvSuperError, ParseError } from 'csv-super';
try {
for await (const batch of csvSuper('data.csv')) {
await processRows(batch.rows);
}
} catch (err) {
if (err instanceof ParseError) {
console.error(`Parse error at line ${err.lineNumber}: ${err.message}`);
} else if (err instanceof CsvSuperError) {
console.error(`csv-super error [${err.code}]: ${err.message}`);
} else {
throw err; // re-throw unexpected errors
}
}TypeScript
Fully typed out of the box. No @types package needed.
import type {
CsvRow, // Record<string, string>
BatchResult, // { rows, batchIndex, count, totalSoFar }
ProgressInfo, // { percentage, speedMBps, ... }
CsvSuperOptions, // full options type
TransformFn, // (row: CsvRow) => CsvRow | null | Promise<CsvRow | null>
} from 'csv-super';Contributing
Issues and PRs are welcome at github.com/csv-super/csv-super.
git clone https://github.com/Brah-Timo/csv-super
cd csv-super
npm install
npm test
npm run benchLicense
Core (free tier): MIT License
Pro (multi-thread + transform): Commercial License — see csv-super.dev/pro
Built with ❤️ for data engineers who have felt the pain of a 10GB CSV on a 16GB server.
