npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

dwca-reader

v1.0.0

Published

A library that reads Darwin Core Archives with loading functions for mongo and elastic search.

Readme

dwca-reader Node.JS Library

===========

Install

To install the most recent version, run

	npm install dwca-reader

Introduction

This is a Node.JS library to read Darwin Core Archives and load into MongoDB, ElasticSearch, or custom stream. The zip file must contain a meta.xml file, to give the information on the data to be read. The following is a simple example of downloading a file and uploading it into mongo.

  var dwcareader = require("../index.js");
  var dr = new dwcareader();

  // Download the following file, and send it to the next folder
  dr.getArchive('http://{path_to_archive}/{name}.zip', 
  'path', 
  null, 
  function(error, response, body) {
    if(error) {
      console.log(error, response);
	  }
  });
  
  /*
  Or, you can upload from an already downloaded .zip file
  dr.setArchive(path+'test.zip', function(error, msg){
    if(error) {
      console.log(msg);
    } else {
      console.log('setArchive worked!');
    }
  });
  */

  dr.transform = function(data) {
    /*
	  Write custom function to change bad data into better data
    */
    return data;
  }

  var config = {
    host: "host",
    port: "1000",
    db: "database",
    table: "table"
  };

  dr.import2mongo(config, function(err, res) {
	  if (err) {
	  	console.log("There was an error.");
	  } else {
	  	console.log("Total records read into the mongodb:", res.count);
      console.log("It took", res.read_time, "seconds to read the file(s).");
      console.log("It took a total of:", res.total_time, "seconds");
	  }
  });

GitHub information

After cloning the repository, enter the repository and use

	npm install

The drivers should all install, and the library will be up and running.

Examples

Inside the example folder, there are multiple examples to run to check that the code is working, and throwing errors correctly when not. All of the examples require a local pathname, which is not set (it can be different for pc or mac). This pathname must be set in each example. You can run the examples using

	node testname

The following are all of the examples at this point:

  • archiveTest
  • elasticsearchTest
  • mongoTest

Methods

The following are all of the functions that can be used for this library. All options are JSON objects with specific fields to be declared.

getArchive(url, destination, options, callback)

This function takes the url and downloads the file. It downloads it into the destination path, and creates a new file using the extension of the url. If a file with that name is already in the destination folder, the function assumes you do not want to download the url and will exit this function. If you want to download the file anyways, set the options.overwrite = true. The callback only takes an error and msg variables, eg:

  getArchive('http://{path_to_url}/{archive}.zip', 'localPath', {}, function(error, message) {}

setArchive(location, callback)

This function takes a location of the files. This is a simpler version of getArchive, and assumes the files are already downloaded. The callback is the same as above, takes an error and msg variables.

getSchema(callback)

This function is inheritly called in the readData function, so it never needs to be called. It accesses the meta.xml file. If the meta.xml is not found, it has a callback of the type function(err, schema).

import2mongo(options, callback)

This function is used to send data from a url or file path into a specified mongo database. The options variable must have the following variables:

  • .host
  • .port
  • .db
  • .table

If any of the above variables are not defined, the function will callback with an error and a message, same as above. If the process completes, the callback gives out a (false, results), where the results has 3 fields:

  • .count This is the number of records that were read in from the file (rows in a .csv file).
  • .read_time This is the time it took to read the data from the file.
  • .total_time This is the total amount of time it took to read and write the data.

import2elasticsearch(options, callback)

This function is used to send data from a url or file path into a specified elasticsearch index. The options variable must have the following variables:

  • .host
  • .port
  • .db This is the index value in elasticsearch
  • .table This is the type name in elasticsearch

If any of the above variables are not defined, the function will callback with an error and a message, same as above. If the process completes, the callback gives out a (false, results), where the results has 3 fields:

  • .count This is the number of records that were read in from the file (rows in a .csv file).
  • .read_time This is the time it took to read the data from the file.
  • .total_time This is the total amount of time it took to read and write the data.