npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

hadoop-streaming-utils

v0.0.8

Published

Hadoop streaming utils for NodeJS

Downloads

9

Readme

Build Status

Hadoop streaming utils for NodeJS

A set of functions to allow you write hadoop jobs easily.

Synopsys

// mapper.js (count word example)
var hadoopUtils = require('hadoop-streaming-utils');

hadoopUtils.iterateJsonLines(function(line) {
    var words = line.split(/\s+/);

    words.forEach(function(word) {
        // using emitJson instead of emit allows to preserve variable type
        hadoopUtils.emitJson(word, 1); 
    });
});

// reducer.js
var hadoopUtils = require('hadoop-streaming-utils');

hadoopUtils.iterateKeysWithGroupedJsonValues(function(word, counts) {
    var totalCount = 0;
    counts.forEach(function(cnt) {
        // no need to parseInt because in reducer we use "emitJson"
        totalCount += cnt; 
    });

    hadoopUtils.emitJson(word, totalCount);
});

// Run (emulate hadoop-streaming behaviour) 
cat file | node mapper.js | sort -k1,1 | node reducer.js

See more examples in "examples" folder.

Description

This modules contains a set of utils to read and process data line by line. So, next line will be read only after finishing processing the previous one. It is easy when your callback is synchronous. When your callback is asynchronous you should return a promise from it. Moreover, every iterating function returns a promise which will be resolved after finishing processing all lines.

Functions working with json data

iterateJsonLines

Will read input line by line and will apply JSON.parse to each line.

hadoopUtils.iterateJsonLines(function(data) {  });

iterateKeysWithJsonValues

  1. Reads input line by line.
  2. Extracts key and value from line.
  3. Applies JSON.parse to value.
hadoopUtils.iterateKeysWithJsonValues(function(key, value) { });

iterateKeysWithGroupedJsonValues

  1. Reads input line by line.
  2. Extracts key and value from line.
  3. Applies JSON.parse to value.
  4. Groups all values by key.
hadoopUtils.iterateKeysWithGroupedJsonValues(function(key, values) { });

emitJson

Serializes data to JSON and emits it.

hadoopUtils.emitJson(key, data);

Functions working with raw data

iterateLines

Will read and process input line by line.

hadoopUtils.iterateLines(function(data) {  });

iterateKeysWithValues

  1. Reads input line by line.
  2. Extracts key and value from line.
hadoopUtils.iterateKeysWithValues(function(key, value) { });

iterateKeysWithGroupedValues

  1. Reads input line by line.
  2. Extracts key and value from line.
  3. Groups all values by key.
hadoopUtils.iterateKeysWithGroupedValues(function(key, values) { });

emit

Emits key and value.

hadoopUtils.emitJson(key, value);

incrementCounter

Updates hadoop counter.

hadoopUtils.incrementCounter(group, counter, amount);

Async operations

When your callback is async you should return promise from it. So, iterating function will wait until promise is resolved. Moreover, every iterating function returns a promise which will be resolved when all lines were processed.

Usage example

var Promise = require('bluebird');

var streamingUtils = require('hadoop-streaming-utils');

streamingUtils.iterateLines(function(line) {
    return new Promise(function(resolve, reject) {
        asyncSplit(line, function(err, words) {
            resolve(words);
        });
    }).then(function(words) {
        words.forEach(function(word) {
            streamingUtils.emitJson(word, 1);
        });
    });
}).then(function() {
    process.exit();
}).catch(console.error);

function asyncSplit(line, callback) {
    var words = line.split(/\s+/);
    setTimeout(function() {
        callback(null, words);
    }, 500);
}

Author

koorchik (Viktor Turskyi)