npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

worker-map-reduce

v0.0.3

Published

MapReduce with Web Workers

Downloads

4

Readme

worker-map-reduce

MapReduce with Web Workers

This is a MapReduce for a browser. Uses Web Workers to run tasks in parallel.

Features (see Release Notes in the bottom):

  • Spawn limited number of workers.
  • Mapper and reducer tasks are sent to workers. (REDUCER NOT YET)
  • Send a new task to a worker as soon as it gets free. (ALMOST)
  • Fault tolerance: if a worker fails with a runtime exception, its task will be resent. (NOT YET)

WorkerMapReduce uses Task from data.task module and immutable list from immutable.

On how to use SharedArrayBuffer to pass data to workers more effectively see demo demo/array-max/ as well as worker-task-runner examples. Multiple workers can work with the same SharedArrayBuffer as long as they operate on different parts of the buffer.

Intro

Two models of parallelism

  • Data parallelism <<< current use case with parallelized mappers
    • the same task is executed several times in parallel on different elements of the same dataset.
  • Task parallelism
    • different tasks are executed in parallel.

Use cases

  • Small data, expensive calculation <<< main demo
    • can pass data via postMessage
  • Large data, fast calculation
    • no benefit to send data with postMessage
      • consider retrieving data by the spawned process itself (e.g. from api)
    • use SharedArrayBuffer
  • Large data, slow calculation <<< demo/array-max
    • use SharedArrayBuffer

Usage

Main demo

The task is:

  • given an array of strings
  • run an expensive calculation (is simulated) per string (with a provided mapper function)
  • and calculate a sum in the provided reducer.

To see the demo load demo/index.html in browser and open the console. You will see something like:

[mapReduceWorker] starting: list.size=5, workerPoolSize=3
...
Exec: took 2.243 seconds.
Result: 73

Example

To try it your self you need to create two modules that export a task: a mapper and a reducer. Then create your app:

// Main MapReduce module:
const mapReduce = require('worker-map-reduce/src/map-reduce')

// Your reducer module that given a data returns a Task:
const reducer = require('./reducer')

// Your mapper (given a data returns a Task) module name that will be loaded
// via a module loader (StealJS) into a worker:
const mapperUrl = 'demo/mapper'

// Data array:
const array = [1, 2, 3]

// Main app:
const app = mapReduce(mapperUrl, reducer, array)

app.fork(err => console.log("Error: ", err),
         res => console.log("Result: ", res))

Parallelizing a task for MapReduce

The simplest example of applying MapReduce to an array of data so that a mapper is executed for every element. But this is not often the case.

What you can do is:

  • Divide your data into chunks.
  • Run your mapper on every chunk of data in parallel.
  • Run your reducer on the results collected from mappers.

See the following demo: demo/array-max/.

Types and Signatures

Note: Using Haskell notation

// Types:
//   Reducer a b = List a -> Task b
//   Mapper a b = a -> Task b

// mapReduce :: MapperUrl -> Reducer a b -> List a -> Task b
// mapReduce :: String -> (List a -> Task b) -> ListData a -> Task b
const mapReduce = require("worker-map-reduce")

// mapper :: a -> Task b
const mapper = require('your-mapper-module')

// reducer :: List a -> Task b
const reducer = require('your-reducer-module')

Release Notes

  • 0.0.2 Worker pool:
    • Schedule tasks for all available workers.
    • Once the whole pool of workers is done, schedule the next bunch of tasks.
  • 0.0.1 Initial version:
    • Send mapper tasks to workers, number of workers is the same as number of elements in data array.
    • Reducer task is executed inside the main app.
    • Uses data.task for asynchronous actions.