npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@danmasta/csv

v1.0.0

Published

CSV Parser

Downloads

17

Readme

CSV

CSV Parser for Node Apps

Features:

  • Easy to use
  • Simple, lightweight, and fast
  • Parse csv data to js objects
  • Header field mapping
  • Value field mapping
  • Supports csv spec rfc 4180
  • Memory safe, only keeps maximum of 1 row in memory during streaming
  • Supports streaming, string parsing, and promises
  • Supports multi-byte newlines (\n, \r\n)
  • Custom newline support
  • Works with buffers and strings
  • Only 1 dependency: lodash

About

I was working on a BI system at work and as our user base kept growing, our jobs got slower and slower. A huge bottleneck for us was csv parsing. We tried many of the popular csv libraries on npm, but were unable to get a decent level of performance consistently from our many jobs. We are parsing around a billion rows per day, with the average row being around 1.5kb and 80+ columns (it's not uncommon for a single job to parse over 10gb of data). Some columns have complex json strings, some are empty, some are in languages like Chinese, and some use \r\n for new lines. We needed something fast, chunk boundary safe, and character encoding safe. Something that worked well on all of our various csv formats. Some of the other parsers failed to accurately parse \r\n newlines across chunk boundaries, some failed altogether, some caused out of memory errors, some were just crazy slow, and some had encoding errors with multi-byte characters.

This library is a result of those learnings. It's fast, stable, works on any data, and is highly configurable. Feel free to check out and run the benchmarks. I'm currently able to achieve around 100mb per second parsing (on a i7 9700k), depending on data and cpu.

Usage

Add csv as a dependency for your app and install via npm

npm install @danmasta/csv --save

Require the package in your app

const csv = require('@danmasta/csv');

By default csv returns a transform stream interface. You can use the methods: promise(), map(), each(), and parse() to consume row objects via promises or as a synchronous array.

Options

name | type | description -----|----- | ----------- headers | boolean\|object\|array\|function | If truthy, reads the first line as header fields. If false, disables header fields and replaces with integer values of 0-n. If object, the original header name field is replaced with it's value in the mapping. If function, the header field is set to the functions return value. Default is true values | boolean\|object\|function | Same as the header field, but for values. If you want to replace values on the fly, provide an object: {'null': null, 'True', true}. If function, the value is replaced with the functions return value. Default is null newline | string | Which character to use for newlines. Default is \n cb | function | Function to call when ready to flush a complete row. Used only for the Parser class. If you implement a custom parser you will need to include a cb function. Default is this.push buffer | boolean | If true, uses a string decoder to parse buffers. This is set automatically with convenience methods, but will need to be set on custom parser instances. Default is false encoding | string | Which encoding to use when parsing rows in buffer mode. This doesn't matter if using strings or streams not in buffer mode. Default is utf8 stream | object | Options to be passed to transform stream constructor. Default is { objectMode: true, encoding: null }

Methods

Name | Description -----|------------ Parser(opts, str) | Parser class for generating custom csv parser instances. Accepts an options object and optional string or buffer to parse Parser.parse(str, opts) | Synchronous parse function. Accepts a string or buffer and optional options object. Returns an array of parsed rows promise() | Returns a promise that resolves with an array of parsed rows map(fn) | Runs an iterator function over each row. Returns a promise that resolves with the new array each(fn) | Runs an iterator function over each row. Returns a promise that resolves with undefined parse(str) | Parses a string or buffer and pushes rows to stream. Need to call flush() when finished. Returns the csv parser instance flush() | Flushes remaining data and pushes final rows to stream. Returns the csv parser instance

Examples

Use CRLF line endings

csv.parse(str, { newline: '\r\n' });

Parse input from a stream and write rows to a destination stream

let read = fs.createReadStream('./data.csv');

read.pipe(csv()).pipe(myDestinationStream());

Parse a string and get results in a promise

csv().parse(str).promise().then(res => {
    console.log('Rows:', res);
});

Run an iteration function over each row using each()

csv().parse(str).each(row => {
    console.log('Row:', row);
}).then(() => {
    console.log('Parse complete!');
});

Update header values

let headers = {
    time: 'timestamp',
    latitude: 'lat',
    longitude: 'lng'
};

csv.parse(str, { headers });

Update headers and/or values with functions

function headers (str) {
    return str.toLowerCase();
}

function values (val) {
    if (val === 'false') return false;
    if (val === 'true') return true;
    if (val === 'null' || val === 'undefined') return null;
    return val.toLowerCase();
}

csv.parse(str, { headers, values });

Create a custom parser that pushes to a queue

const Queue = require('queue');
const q = new Queue();

const parser = csv({ cb: q.push.bind(q) });

parser.parse(str);
parser.flush();

Testing

Testing is currently run using mocha and chai. To execute tests just run npm run test. To generate unit test coverage reports just run npm run coverage

Benchmarks

Benchmarks are currently built using gulp. Just run gulp bench to test timings and bytes per second

Filename         Mode    Density  Rows    Bytes       Time (ms)  Rows/Sec      Bytes/Sec
---------------  ------  -------  ------  ----------  ---------  ------------  --------------
Earthquakes      String  0.11     72,689  11,392,064  121.66     597,492.68    93,641,058.06
Earthquakes      Buffer  0.11     72,689  11,392,064  104.8      693,582.44    108,700,567.12
Pop-By-Zip-LA    String  0.18     3,199   120,912     1.88       1,702,582.7   64,352,197.35
Pop-By-Zip-LA    Buffer  0.18     3,199   120,912     1.28       2,490,424.44  94,130,103.07
Stats-By-Zip-NY  String  0.41     2,369   263,168     15.68      151,058.46    16,780,816.02
Stats-By-Zip-NY  Buffer  0.41     2,369   263,168     14.52      163,183.15    18,127,726.59

Speed and throughput are highly dependent on the density (token matches/byte length) of data as well as the size of the objects created. The earthquakes file test data represents a roughly average level of density for common csv datas. The stats by zip code file represents an example of very dense csv data

Contact

If you have any questions feel free to get in touch