fluxcsv
v2.0.1
Published
Fast streaming CSV parser for Node.js with robust quote and multiline handling
Downloads
382
Maintainers
Readme
FluxCSV
A streaming CSV parser built on a pure Deterministic Finite Automaton (DFA).
- True DFA — 5 states, linear time and space, no recursion, no backtracking
- Clean architecture — tokenizer emits raw character tokens only; all CSV semantics live in the transition table
- Handles everything — quoted fields,
""escapes, embedded newlines and commas, CRLF/CR/LF, BOM, custom delimiters, trailing commas - Streaming-first — Node.js Transform stream; also
parse()(Promise) andparseSync() - Zero dependencies
Install
npm install fluxcsvQuick start
const { parse, parseSync, CSVReader, PureDFAParser } = require('fluxcsv');API
parseSync(data, options?) → array
Synchronous. Returns all rows immediately. Throws on error.
const { parseSync } = require('fluxcsv');
// Array of arrays (default)
parseSync('Alice,30,engineer\nBob,25,designer');
// → [['Alice','30','engineer'], ['Bob','25','designer']]
// Array of objects with headers:true
parseSync('name,age\nAlice,30\nBob,25', { headers: true });
// → [{ name: 'Alice', age: '30' }, { name: 'Bob', age: '25' }]parse(data, options?) → Promise<array>
Same as parseSync, but async.
const rows = await parse('name,age\nAlice,30', { headers: true });new PureDFAParser(options?) — Transform stream
A Node.js Transform stream in object mode. Emits one record per data event.
Good for large files — rows are emitted as they arrive, never buffering the whole file.
const parser = new PureDFAParser({ headers: true });
parser.on('data', row => console.log(row));
parser.on('end', () => console.log('done'));
parser.on('metrics', m => console.log(`${m.rowsProcessed} rows in ${m.duration}ms`));
fs.createReadStream('data.csv').pipe(parser);new CSVReader(source, options?) — async iterator
Wraps a file path or readable stream. Supports on() registration before iteration starts.
const reader = new CSVReader('data.csv', { headers: true });
// Register handlers before iterating — they'll be attached when iteration begins
reader.on('metrics', m => console.log(m));
for await (const row of reader) {
console.log(row);
}
// Or collect everything
const rows = await reader.toArray();Options
| Option | Default | Description |
|---|---|---|
| delimiter | ',' | Field separator |
| quote | '"' | Quote character |
| headers | false | First row becomes object keys; records are objects |
| trim | false | Strip whitespace from unquoted fields (quoted whitespace is always preserved) |
| skipEmptyLines | false | Ignore blank rows |
| relaxQuotes | false | Tolerate malformed quoting instead of throwing |
| relaxColumnCount | false | Allow rows with different field counts |
| skipLinesWithError | false | Emit warning and skip invalid rows instead of throwing |
| cast | null | (value, context) => any — transform field values on the way out |
Examples
Run the included examples file:
node examples.js all # run all examples
node examples.js basic # basic usage
node examples.js headers # headers mode
node examples.js quoted # quoted fields
node examples.js cast # transforming values
node examples.js stream # streaming Transform
node examples.js reader # CSVReader async iterator
node examples.js options # delimiter, trim, skipEmptyLines
node examples.js errors # error handling
node examples.js realworld # BOM, CRLF, multi-line fieldsBasic
// Single row
parseSync('Alice,30,engineer');
// → [['Alice', '30', 'engineer']]
// Multiple rows
parseSync('Alice,30\nBob,25\nCarol,35');
// → [['Alice','30'], ['Bob','25'], ['Carol','35']]
// Empty and trailing fields
parseSync('Alice,,engineer'); // → [['Alice', '', 'engineer']]
parseSync('Alice,30,'); // → [['Alice', '30', '']]
parseSync(',,'); // → [['', '', '']]Headers
const csv = `name,age,role
Alice,30,engineer
Bob,25,designer`;
parseSync(csv, { headers: true });
// → [
// { name: 'Alice', age: '30', role: 'engineer' },
// { name: 'Bob', age: '25', role: 'designer' },
// ]Quoted fields
// Comma inside a value — must be quoted
parseSync('"Smith, John",30,engineer');
// → [['Smith, John', '30', 'engineer']]
// Newline inside a value
parseSync('"line one\nline two",next_field');
// → [['line one\nline two', 'next_field']]
// Double-quote escape: "" → "
parseSync('"He said ""hello""",done');
// → [['He said "hello"', 'done']]cast — transform values on the fly
// Auto-cast numbers
const autoNumber = v => isNaN(v) ? v : Number(v);
parseSync('name,age,score\nAlice,30,9.5', {
headers: true,
cast: autoNumber,
});
// → [{ name: 'Alice', age: 30, score: 9.5 }]
// Cast by column name
parseSync('date,amount\n2024-01-15,99.99', {
headers: true,
cast: (value, { column }) => {
if (column === 'amount') return parseFloat(value);
if (column === 'date') return new Date(value);
return value;
},
});
// → [{ date: Date('2024-01-15'), amount: 99.99 }]Custom delimiter
// TSV
parseSync('name\tage\nAlice\t30', { delimiter: '\t', headers: true });
// → [{ name: 'Alice', age: '30' }]
// Semicolons (common in European CSV exports)
parseSync('name;age\nAlice;30', { delimiter: ';', headers: true });
// → [{ name: 'Alice', age: '30' }]Large files — streaming
const { PureDFAParser } = require('fluxcsv');
const fs = require('fs');
const parser = new PureDFAParser({ headers: true });
let count = 0;
let total = 0;
parser.on('data', row => {
count++;
total += Number(row.amount);
});
parser.on('end', () => {
console.log(`Processed ${count} rows, total: $${total.toFixed(2)}`);
});
parser.on('metrics', m => {
console.log(`Took ${m.duration}ms`);
});
fs.createReadStream('transactions.csv').pipe(parser);Async iteration over a file
const reader = new CSVReader('customers.csv', { headers: true });
reader.on('metrics', m => console.log(m));
for await (const customer of reader) {
await sendEmail(customer.email, customer.name);
}Error handling
// Strict by default — throws on column mismatch
try {
parseSync('a,b\nc,d,e');
} catch (e) {
console.error(e.message);
// → Column count mismatch at row 2: expected 2, got 3
}
// Skip bad rows, keep good ones — emit warnings
const warnings = [];
const parser = new PureDFAParser({
headers: true,
skipLinesWithError: true,
});
parser.on('warning', w => warnings.push(w));
parser.on('data', row => console.log(row));
parser.write('name,age\nAlice,30\nBob\nCarol,35');
parser.end();
// Emits Alice and Carol; Bob's row triggers a warningEvents (streaming mode)
| Event | Payload | When |
|---|---|---|
| data | record (array or object) | One record per row |
| warning | { error, row } | Bad row skipped (requires skipLinesWithError: true) |
| recovered | { row } | Row skipped after parse error recovery |
| metrics | { rowsProcessed, rowsSkipped, errors, duration } | Stream end |
| error | Error | Fatal parse error (when not using skipLinesWithError) |
CLI
# Basic usage
fluxcsv data.csv
# With headers — output JSON objects
fluxcsv data.csv --headers
# Semicolon-delimited
fluxcsv data.csv --delimiter=';' --headers
# Trim and skip blank lines
fluxcsv data.csv --trim --skip-empty --headers
# Pretty-print
fluxcsv data.csv --headers --pretty
# Pipe from stdin
cat data.csv | fluxcsv --headers
# All options
fluxcsv --helpArchitecture
Input chunks
│
▼
Tokenizer raw tokens: QUOTE | DELIMITER | NEWLINE | TEXT
│ (knows nothing about CSV semantics)
▼
DFA Transition 5 states: START → FIELD → QUOTED_FIELD → QUOTE_SEEN → SKIP_TO_NEWLINE
│ (all CSV semantics live here)
▼
Actions build field buffer → push to row buffer → emit records
│
▼
Records arrays or objectsThe tokenizer's only job is to classify characters into raw token types. It has no concept of "escaped quote" or "quoted field" — that meaning is entirely determined by the DFA's current state when it sees a QUOTE token. This separation makes the parser easy to test and reason about: every behaviour follows from the transition table.
Key invariant: each token is processed exactly once. No recursion, no re-processing, no backtracking.
Notes
\ninput with nothing before it (e.g.'\n') returns[](empty-row skipped)trim: truestrips whitespace from unquoted fields only; quoted whitespace is always preservedskipLinesWithErroruses best-effort recovery: when an error is detected inside a chunk, the parser entersSKIP_TO_NEWLINEstate and discards tokens until the next\n. Rows that span the error in the same chunk are discarded; rows in subsequent chunks parse normally.- Column-count validation applies in both array mode (counted from first row) and headers mode
License
MIT
