npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

twitter-harvest

v0.3.4

Published

A simple continous harvester for twitter

Readme

twitter-harvest NPM version Build Status Dependency Status Coverage percentage

A simple continuous harvester for twitter

This application is able to capture tweets which happen around the world. Currently it works only with the Twitter stream API 1.1.

  • You have to define or modify the cfg/cfg.json and create at least one capture agent in cfg/agents/ directory (enable to true).
  • You can activate mail alert from a SMTP account like gmail (see Private configuration and the mail_alert flag in main configuration)
  • If fs_out is true (default), the captured tweets are written to the file system with the following convention:
  • If todo_out is true (should be false by default), a kind of queue is created (directory 'data/TODO') where filenames to consume by an external process. This allow to write the tweets to any db
    • Note, that the number of files by directory is limited (depend of the OS), the filenames need to be consumed by the external process regularly to avoid issues

data_dir/year/month/day/hour-min-sec_tweet-id

e.g.

data/2015/9/24/16-30-44_647055571951190000

Install

$ npm install --save twitter-harvest

Usage

node twitter-harvest.js

Usage with forever

$ npm install -g forever
$ forever start twitter-harvest.js

With forever it is possible to run the task 'forever'. And leave your session.

Main configuration

{
  "agents_dir"    : "cfg/agents/",
  "data_dir"      : "./data/",
  "private_cfg"   : "./cfg/cfg-private.json",

  "mail_alert"    : false,

  "fs_out"        : true,
  "std_out"       : true,
  "todo_out"      : true  
}
  • agents_dir: path where to put the agent file
  • data_dir: path where to write the tweets on the file system
  • private_cfg: file where private data is stored (such as mail credential)
  • mail_alert: if true enable mail alerting in case of failure
  • fs_out: if true write the twitter data on the file system
  • std_out: if true write the twitter data on the console
  • todo_out: if true write the json filename in the 'data/TODO' dir (to be consumed by an other process to BD (mysql, ...)

Agents configuration

put all the agent definition files to the agent directory (one file per agent).

$ cat cfg/agents/*.json
{
  "type_doc"            : "twitter",
  "enable"              : true,
  "type_filter"         : "track",
  "type_api"            : "stream",
  "name"                : "keywords-geneva",
  "filter"              : {
    "track"             : "genève,geneva,genebra,genevra,genf"
  },
  "stream"              : "filter",
  "consumer_key"        : "...",
  "consumer_secret"     : "...",
  "access_token_key"    : "...",
  "access_token_secret" : "..."  
}

to capture all the tweets where there is a mention of geneva word for several languages.

{
  "type_doc"            : "twitter",
  "enable"              : true,
  "type_filter"         : "locations",
  "type_api"            : "stream",
  "name"                : "location-geneva",
  "filter"              : {
    "locations"  : "5.77,45.85,7.15,46.80"
  },
  "stream"              : "filter",
  "consumer_key"        : "...",
  "consumer_secret"     : "...",
  "access_token_key"    : "...",
  "access_token_secret" : "..."
}

to capture all the tweets which are posted around Geneva area (Switzerland).

  • type_doc : 'twitter'
  • enable : if true this agent is launched
  • type_filter : locations | filter | follow
  • stream : filter | firehose (if you have the chance)
  • consumer_key, consumer_secret, access_token_key, access_token_secret : personal keys given by twitter for using their APIs

more API twitter doc https://dev.twitter.com/streaming/overview/request-parameters

Private configuration

{
  "mail_service"    : "gmail",
  "mail_auth_user"  : "username",
  "mail_auth_path"  : "password",
  "mail_from"       : "alert_twitter_harvest",
  "mail_to"         : "[email protected]"
}
  • mail_service : name of the mail service
  • mail_auth_user : username credential of the mail service
  • mail_auth_path : password credential of the mail service
  • mail_from : who will send the mail
  • mail_to : who want to be alerted

One mail is also sent when the system is started, you should received this mail on your mail box if all well configured.

note : supported mail system is given by nodemailer node module (here is the supported service https://github.com/andris9/nodemailer-wellknown#supported-services), but only gmail was tested for gmail, it is possible you have to decrease the security level of your mail account (so don't use a personal account) and to authorize specifically the application by using this url: https://g.co/allowaccess

Test

$ gulp

Notes

Note that currently, we have 3 errors messages when twitter-harvest is launched. This is not important. Here are theses Error messages

{ [Error: Cannot find module './build/Release/DTraceProviderBindings'] code: 'MODULE_NOT_FOUND' }
{ [Error: Cannot find module './build/default/DTraceProviderBindings'] code: 'MODULE_NOT_FOUND' }
{ [Error: Cannot find module './build/Debug/DTraceProviderBindings'] code: 'MODULE_NOT_FOUND' }

To do

  • add more tests
  • add extra option to add extra info in the output(from agents)
  • add other api interface (not only the streaming API)

License

MIT © Arnaud Gaudinat

Change log

  • 0.3.4:
    • chat the node twitter lib with Twit (for better handling of error)
  • 0.3.3:
    • add the TODO option and directory to allow writing in DB
    • add 2 digits on filenames and JSON extension
  • 0.3.2:
    • add JSONschema validation