Pkg
Stats

npm package discovery and stats viewer.

Discover Tips

General search
[free text search, go nuts!]
Package details
pkg:[package-name]
User packages
@[username]

Sponsor

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Twitter
@PkgStats
GitHub
pkgstats
Twitter
@ryanhefner
GitHub
ryanhefner
Site
ryanhefner.com

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

Framework
react / react-dom
Server
next / express / next-routes
Data Store
redux / react-redux / next-redux-wrapper / redux-thunk / redux-logger
Caching
lru-cache
CSS / Styling
next-page-transitions / styled-components
Typeface
@ibm/plex
Avatars
gravatar
Data Viz
chart.js / react-sparklines
Date formatting
dayjs
Infinite scrolling
react-scroll-trigger
Markdown rendering
react-markdown
Repository url parsing
hosted-git-info
User data
npm-user
Compiling
babel-plugin-module-resolver / babel-plugin-styled-components
Types
prop-types
Odds & Ends
es6-promise / isomorphic-fetch

© 2024 – Pkg Stats / Ryan Hefner

h2o.js

v0.0.2

Published

2 years ago

Node.js bindings to H2O

Downloads

10

0High
0Medium
0Low

machine learning predictive analytics predictive modeling data mining computational statistics statistics statistical learning clustering classification regression deep learning

H2O.js

This Node.js / io.js module provides access to the H2O JVM (and extensions thereof), its objects, its machine-learning algorithms, and modeling support (basic munging and feature generation) capabilities.

It is designed to bring H2O to a wider audience of data and machine learning devotees that work exclusively with Javascript, for building machine learning applications or doing data munging in a fast, scalable environment without any extra mental anguish about threads and parallelism.

H2O also supports R, Python, Scala and Java.

What is H2O?

H2O is a piece of Java software for data modeling and general computing. There are many different views of the H2O software, but the primary view of H2O is that of a distributed (many machines), parallel (many CPUs), in memory (several hundred GBs Xmx) processing engine.

There are two levels of parallelism:

within node
across (or between) node.

The goal, remember, is to "simply" add more processors to a given problem in order to produce a solution faster. The conceptual paradigm MapReduce (also known as "divide and conquer and combine") along with a good concurrent application structure (c.f. jsr166y and NonBlockingHashMap) enable this type of scaling in H2O (we’re really cooking with gas now!).

For application developers and data scientists, the gritty details of thread-safety, algorithm parallelism, and node coherence on a network are concealed by simple-to-use REST calls that are all documented here. In addition, H2O is an open-source project under the Apache v2 licence. All of the source code is on Github, there is an active Google Group mailing list, our nightly tests are open for perusal, our JIRA ticketing system is also open for public use. Last, but not least, we regularly engage the machine learning community all over the nation with a very busy meetup schedule (so if you’re not in The Valley, no sweat, we’re probably coming to you soon!), and finally, we host our very own H2O World conference. We also sometimes host hack-a-thons at our campus in Mountain View, CA. Needless to say, there is a lot of support for the application developer.

In order to make the most out of H2O, there are some key conceptual pieces that are helpful to know before getting started. Mainly, it’s helpful to know about the different types of objects that live in H2O and what the rules of engagement are in the context of the REST API (which is what any non-JVM interface is all about).

Let’s get started!

The H2O Object System

H2O sports a distributed key-value store (the "DKV"), which contains pointers to the various objects that make up the H2O ecosystem. The DKV is a kind of biosphere in that it encapsulates all shared objects (though, it may not encapsulate all objects). Some shared objects are mutable by the client; some shared objects are read-only by the client, but mutable by H2O (e.g. a model being constructed will change over time); and actions by the client may have side-effects on other clients (multi-tenancy is not a supported model of use, but it is possible for multiple clients to attach to a single H2O cloud).

Briefly, these objects are:

Key: A key is an entry in the DKV that maps to an object in H2O.
Frame: A Frame is a collection of Vec objects. It is a 2D array of elements.
Vec: A Vec is a collection of Chunk objects. It is a 1D array of elements.
Chunk: A Chunk holds a fraction of the BigData. It is a 1D array of elements.
ModelMetrics: A collection of metrics for a given category of model.
Model: A model is an immutable object having predict and metrics methods.
Job: A Job is a non-blocking task that performs a finite amount of work.

Many of these objects have no meaning to an end Javascript user, but in order to make sense of the objects available in this module it is helpful to understand how these objects map to objects in the JVM (because after all, this module is an interface that allows the manipulation of a distributed system).

(To be continued...)