npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

data-sourcer

v1.10.3

Published

Get (and filter) data from multiple different data sources quickly and efficiently.

Downloads

563

Readme

data-sourcer

Get (and filter) data from multiple different data sources quickly and efficiently.

Build Status

Installation

Add data-sourcer to your existing node application like this:

npm install data-sourcer --save

This will install data-sourcer and add it to your application's package.json file.

API

Public methods of this module are listed here.

getData

getData([options])

Gets data from all sources.

Usage:

var DataSourcer = require('data-sourcer');

var myDataSourcer = new DataSourcer({
	sourcesDir: 'path-to-your-sources-directory'
});

// getData() returns an event emitter object.
myDataSourcer.getData({
	series: true,
	filter: {
		mode: 'stict',
		include: {
			someField: ['1']
		}
	}
})
	.on('data', function(data) {
		// Received some data.
		console.log(data);
	})
	.on('error', function(error) {
		// Some error has occurred.
		console.error(error);
	})
	.once('end', function() {
		// Done getting data.
		console.log('Done!');
	});

All available options:

var options = {


	/*
		Options to pass to puppeteer when creating a new browser instance.
	*/
	browser: {
		headless: true,
		slowMo: 0,
		timeout: 10000,
	},

	/*
		Default request module options. For example you could pass the 'proxy' option in this way.

		See for more info:
		https://github.com/request/request#requestdefaultsoptions
	*/
	defaultRequestOptions: null,

	filter: {
		/*
			The filter mode determines how some options will be used to exclude data.

			For example when using the following filter option: `someField: ['1', '2']`:
				'strict' mode will only allow data that has the 'someField' property equal to '1' or '2'; ie. data that is missing the 'someField' property will be excluded.
				'loose' mode will allow data that has the 'someField' property of '1' or '2' as well as those that are missing the 'someField' property.
		*/
		mode: 'strict',

		/*
			Include items by their property values. Examples:

			`something: ['1', '2']`:
				Each item's 'something' property must equal '1' or '2'.
		*/
		include: {
		},

		/*
			Exclude items by their property values. Examples:

			`something: ['3']`:
				All items where 'something' equals '3' will be excluded.
		*/
		exclude: {
		}
	},

	/*
		The method name used to get data from a source. Required for each source.
	*/
	getDataMethodName: 'getData',

	/*
		Use a queue to limit the number of simultaneous HTTP requests.
	*/
	requestQueue: {
		/*
			The maximum number of simultaneous requests. Must be greater than 0.
		*/
		concurrency: 10,
		/*
			The time (in milliseconds) between each request. Set to 0 for no delay.
		*/
		delay: 0,
	},

	/*
		Set to TRUE to have all asynchronous operations run in series.
	*/
	series: false,

	/*
		Exclude data sources by name.

		All data sources except 'somewhere-else':
		['somewhere-else']
	*/
	sourcesBlackList: null,

	/*
		Directory from which sources will be loaded.
	*/
	sourcesDir: null,

	/*
		Include data sources by name.

		Only 'somewhere':
		['somewhere']
	*/
	sourcesWhiteList: null,
};

listSources

listSources([options])

Get list of all data sources.

Usage:

var DataSourcer = require('data-sourcer');

var myDataSourcer = new DataSourcer({
	sourcesDir: 'path-to-your-sources-directory'
});

console.log(myDataSourcer.listSources());

Sample sources:

[
	{
		name: 'somewhere',
		homeUrl: 'http://somewhere.com',
		requiredOptions: {}
	},
	{
		name: 'somewhere-else',
		homeUrl: 'http://www.somewhere-else.com',
		requiredOptions: {}
	}
]

All available options:

var options = {

	/*
		Exclude data sources by name.

		All data sources except 'somewhere-else':
		['somewhere-else']
	*/
	sourcesBlackList: null,

	/*
		Include data sources by name.

		Only 'somewhere':
		['somewhere']
	*/
	sourcesWhiteList: null,
};

Defining Sources

Each of your data sources should be a separate JavaScript file to be included via node's require() method. You are only required to define a getData(options) method, which should return an event emitter. See the following sample for more details:

module.exports = {
	/*
		The home URL for this source. Used as a reference only.

		[optional]
	*/
	homeUrl: 'https://somewhere.com',

	/*
		Defines the options that are required to use this source.

		This source is skipped and warnings are displayed if any of these required options are missing.

		Example usage with required options:

			var DataSourcer = require('data-sourcer');

			var myDataSourcer = new DataSourcer({
				sourcesDir: 'path-to-your-sources-directory'
			});

			myDataSourcer.getData({
				sourceOptions: {
					somewhere: {
						apiKey: 'some-api-key'
					}
				}
			});

		[optional]
	*/
	requiredOptions: {
		apiKey: 'You can get an API key for this service by creating an account at https://somewhere.com'
	},

	/*
		The method that is called whenever `dataSourcer.getData()` is called.

		[required]
	*/
	getData: function(options) {

		var emitter = options.newEventEmitter();

		// Defer emitting events until the emitter has been returned.
		_.defer(function() {
			// When an error occurs, use the 'error' event.
			// The 'error' event can be emitted more than once.
			emitter.emit('error', new Error('Something bad happened!'));

			// When data is ready, use the 'data' event.
			// The 'data' event can be emitted more than once.
			emitter.emit('data', data);

			// When done getting data, emit the 'end' event.
			// The 'end' event should be emitted once.
			emitter.emit('end');
		});

		// Must return an event emitter.
		return emitter;
	}
};

Options that are passed to your sources:

  • filter - object - Passed through from the options that you provide the getData function.
  • newPage - function with signature newPage(cb) - Get a new puppeteer page instance. See the puppeteer docs for more details. Use as follows:
  • request - function - Wrapper function for the request module with the default options you provided via defaultRequestOptions. Requests made via the options.request instance are queued if using the requestQueue option.
  • series - boolean - Passed through from the options that you provide the getData function.
  • sourceOptions object - These are custom source options which are passed through to your source by name. You can use the requiredOptions source attribute to define which options are required for your source to run properly. Some example of a required option would be an API key or secret for some third-party web API.

Contributing

There are a number of ways you can contribute:

  • Improve or correct the documentation - All the documentation is in this readme file. If you see a mistake, or think something should be clarified or expanded upon, please submit a pull request
  • Report a bug - Please review existing issues before submitting a new one; to avoid duplicates. If you can't find an issue that relates to the bug you've found, please create a new one.
  • Request a feature - Again, please review the existing issues before posting a feature request. If you can't find an existing one that covers your feature idea, please create a new one.
  • Fix a bug - Have a look at the existing issues for the project. If there's a bug in there that you'd like to tackle, please feel free to do so. I would ask that when fixing a bug, that you first create a failing test that proves the bug. Then to fix the bug, make the test pass. This should hopefully ensure that the bug never creeps into the project again. After you've done all that, you can submit a pull request with your changes.

Before you contribute code, please read through at least some of the source code for the project. I would appreciate it if any pull requests for source code changes follow the coding style of the rest of the project.

Now if you're still interested, you'll need to get your local environment configured.

Configure Local Environment

Step 1: Get the Code

First, you'll need to pull down the code from GitHub:

git clone https://github.com/chill117/data-sourcer.git

Step 2: Install Dependencies

Second, you'll need to install the project dependencies as well as the dev dependencies. To do this, simply run the following from the directory you created in step 1:

npm install

Tests

This project includes an automated regression test suite. To run the tests:

npm test

Changelog

See changelog.md

License

This software is MIT licensed:

A short, permissive software license. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software/source. There are many variations of this license in use.

Funding

This project is free and open-source. If you would like to show your appreciation by helping to fund the project's continued development and maintenance, you can find available options here.