npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

udsv

v0.5.3

Published

A small, fast CSV parser

Downloads

1,578

Readme

𝌠 μDSV

A faster CSV parser in 5KB (min) (MIT Licensed)


Introduction

uDSV is a fast JS library for parsing well-formed CSV strings, either from memory or incrementally from disk or network. It is mostly RFC 4180 compliant, with support for quoted values containing commas, escaped quotes, and line breaks¹. The aim of this project is to handle the 99.5% use-case without adding complexity and performance trade-offs to support the remaining 0.5%.

¹ Line breaks (\n,\r,\r\n) within quoted values must match the row separator.


Features

What does uDSV pack into 5KB?

  • RFC 4180 compliant
  • Incremental or full parsing, with optional accumulation
  • Auto-detection and customization of delimiters (rows, columns, quotes, escapes)
  • Schema inference and value typing: string, number, boolean, date, json
  • Defined handling of '', 'null', 'NaN'
  • Whitespace trimming of values & skipping empty lines
  • Multi-row header skipping and column renaming
  • Multiple outputs: arrays (tuples), objects, nested objects, columnar arrays

Of course, most of these are table stakes for CSV parsers :)


Performance

Is it Lightning Fast™ or Blazing Fast™?

No, those are too slow! uDSV has Ludicrous Speed™; it's faster than the parsers you recognize and faster than those you've never heard of.

On a Ryzen 7 ThinkPad, Linux v6.4.11, and NodeJS v20.6.0, a diverse set of benchmarks show a 1x-5x performance boost relative to Papa Parse. Papa Parse is used as a reference not because it's the fastest, but due to its outsized popularity, battle-testedness, and some external validation of its performance claims.

Most CSV parsers have one happy/fast path -- the one without quoted values, without value typing, and using the default settings & output format. Once you're off that path, you can generally throw their self-promoting benchmarks in the trash. In contrast, uDSV remains fast with all datasets and options; its happy path is every path.

For way too many synthetic and real-world benchmarks, head over to /bench...and don't forget your coffee!

┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│ uszips.csv (6 MB, 18 cols x 34K rows)                                                         │
├────────────────────────┬────────┬─────────────────────────────────────────────────────────────┤
│ Name                   │ Rows/s │ Throughput (MiB/s)                                          │
├────────────────────────┼────────┼─────────────────────────────────────────────────────────────┤
│ uDSV                   │ 782K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 140 │
│ csv-simple-parser      │ 682K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 122        │
│ achilles-csv-parser    │ 469K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 83.8                      │
│ d3-dsv                 │ 433K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 77.4                        │
│ csv-rex                │ 346K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░ 61.9                              │
│ PapaParse              │ 305K   │ ░░░░░░░░░░░░░░░░░░░░░░ 54.5                                 │
│ csv42                  │ 296K   │ ░░░░░░░░░░░░░░░░░░░░░ 52.9                                  │
│ csv-js                 │ 285K   │ ░░░░░░░░░░░░░░░░░░░░░ 50.9                                  │
│ comma-separated-values │ 258K   │ ░░░░░░░░░░░░░░░░░░░ 46.1                                    │
│ dekkai                 │ 248K   │ ░░░░░░░░░░░░░░░░░░ 44.3                                     │
│ CSVtoJSON              │ 245K   │ ░░░░░░░░░░░░░░░░░░ 43.8                                     │
│ csv-parser (neat-csv)  │ 218K   │ ░░░░░░░░░░░░░░░░ 39                                         │
│ ACsv                   │ 218K   │ ░░░░░░░░░░░░░░░░ 39                                         │
│ SheetJS                │ 208K   │ ░░░░░░░░░░░░░░░ 37.1                                        │
│ @vanillaes/csv         │ 200K   │ ░░░░░░░░░░░░░░░ 35.8                                        │
│ node-csvtojson         │ 165K   │ ░░░░░░░░░░░░ 29.4                                           │
│ csv-parse/sync         │ 125K   │ ░░░░░░░░░ 22.4                                              │
│ @fast-csv/parse        │ 78.2K  │ ░░░░░░ 14                                                   │
│ jquery-csv             │ 55.1K  │ ░░░░ 9.85                                                   │
│ but-csv                │ ---    │ Wrong row count! Expected: 33790, Actual: 1                 │
│ @gregoranders/csv      │ ---    │ Invalid CSV at 1:109                                        │
│ utils-dsv-base-parse   │ ---    │ unexpected error. Encountered an invalid record. Field 17 o │
└────────────────────────┴────────┴─────────────────────────────────────────────────────────────┘

Installation

npm i udsv

or

<script src="./dist/uDSV.iife.min.js"></script>

API

A 150 LoC uDSV.d.ts TypeScript def.


Basic Usage

import { inferSchema, initParser } from 'udsv';

let csvStr = 'a,b,c\n1,2,3\n4,5,6';

let schema = inferSchema(csvStr);
let parser = initParser(schema);

// native format (fastest)
let stringArrs = parser.stringArrs(csvStr); // [ ['1','2','3'], ['4','5','6'] ]

// typed formats (internally converted from native)
let typedArrs  = parser.typedArrs(csvStr);  // [ [1, 2, 3], [4, 5, 6] ]
let typedObjs  = parser.typedObjs(csvStr);  // [ {a: 1, b: 2, c: 3}, {a: 4, b: 5, c: 6} ]
let typedCols  = parser.typedCols(csvStr);  // [ [1, 4], [2, 5], [3, 6] ]

Nested/deep objects can be re-constructed from column naming via .typedDeep():

// deep/nested objects (from column naming)
let csvStr2 = `
_type,name,description,location.city,location.street,location.geo[0],location.geo[1],speed,heading,size[0],size[1],size[2]
item,Item 0,Item 0 description in text,Rotterdam,Main street,51.9280712,4.4207888,5.4,128.3,3.4,5.1,0.9
`.trim();

let schema2 = inferSchema(csvStr2);
let parser2 = initParser(schema2);

let typedDeep = parser2.typedDeep(csvStr2);

/*
[
  {
    _type: 'item',
    name: 'Item 0',
    description: 'Item 0 description in text',
    location: {
      city: 'Rotterdam',
      street: 'Main street',
      geo: [ 51.9280712, 4.4207888 ]
    },
    speed: 5.4,
    heading: 128.3,
    size: [ 3.4, 5.1, 0.9 ],
  }
]
*/

CSP Note:

uDSV uses dynamically-generated functions (via new Function()) for its .typed*() methods. These functions are lazy-generated and use JSON.stringify() code-injection guards, so the risk should be minimal. Nevertheless, if you have strict CSP headers without unsafe-eval, you won't be able to take advantage of the typed methods and will have to do the type conversion from the string tuples yourself.


Incremental / Streaming

uDSV has no inherent knowledge of streams. Instead, it exposes a generic incremental parsing API to which you can pass sequential chunks. These chunks can come from various sources, such as a Web Stream or Node stream via fetch() or fs, a WebSocket, etc.

Here's what it looks like with Node's fs.createReadStream():

let stream = fs.createReadStream(filePath);

let parser = null;
let result = null;

stream.on('data', (chunk) => {
  // convert from Buffer
  let strChunk = chunk.toString();
  // on first chunk, infer schema and init parser
  parser ??= initParser(inferSchema(strChunk));
  // incremental parse to string arrays
  parser.chunk(strChunk, parser.stringArrs);
});

stream.on('end', () => {
  result = parser.end();
});

...and Web streams in Node, or Fetch's Response.body:

let stream = fs.createReadStream(filePath);

let webStream = Stream.Readable.toWeb(stream);
let textStream = webStream.pipeThrough(new TextDecoderStream());

let parser = null;

for await (const strChunk of textStream) {
  parser ??= initParser(inferSchema(strChunk));
  parser.chunk(strChunk, parser.stringArrs);
}

let result = parser.end();

The above examples show accumulating parsers -- they will buffer the full result into memory. This may not be something you want (or need), for example with huge datasets where you're looking to get the sum of a single column, or want to filter only a small subset of rows. To bypass this auto-accumulation behavior, simply pass your own handler as the third argument to parser.chunk():

// ...same as above

let sum = 0;

let reducer = (rows) => {
  for (let i = 0; i < rows.length; i++) {
    sum += rows[i][3]; // sum fourth column
  }
};

for await (const strChunk of textStream) {
  parser ??= initParser(inferSchema(strChunk));
  parser.chunk(strChunk, parser.typedArrs, reducer); // typedArrs + reducer
}

parser.end();

TODO?

  • handle #comment rows
  • emit empty-row and #comment events?