npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

fast-kde

v0.2.2

Published

Fast Gaussian kernel density estimation in 1D or 2D.

Downloads

2,122

Readme

Fast Gaussian Kernel Density Estimation

Fast Gaussian kernel density estimation in 1D or 2D.

This package provides accurate, linear-time O(N + K) estimation using Deriche's approximation and is based on the IEEE VIS 2021 Short Paper Fast & Accurate Gaussian Kernel Density Estimation. (For the benchmarks in that paper, see github.com/uwdata/fast-kde-benchmarks.)

Citation

If you use or build on this package in academic work, please use this citation:

@inproceedings{2021-Heer-FastKDE,
 title = {Fast \& Accurate Gaussian Kernel Density Estimation},
 author = {Jeffrey Heer},
 booktitle = {IEEE VIS Short Papers},
 year = {2021},
 url = {http://idl.cs.washington.edu/papers/fast-kde},
}

Usage Examples

For interactive examples, see our Fast KDE Observable notebook.

1D Density Estimation

import { density1d } from 'fast-kde';

const data = [
  {u: 1, v: 1}, {u: 1, v: 2}, {u: 5, v: 4},
  {u: 5, v: 3}, {u: 6, v: 2}, {u: 8, v: 7}
];

// 1d density estimation, with automatic bandwidth and extent
// resulting estimator d1 is an object and also an iterable
let d1 = density1d([1, 1, 5, 5, 6, 8]);

// 1d density estimation, with accessor function
d1 = density1d(data, { x: d => d.u });

// 1d density estimation, with property key accessor
d1 = density1d(data, { x: 'u' });

// 1d density estimation, with given bandwidth and extent
d1 = density1d(data, { x: 'u', bandwidth: 1, extent: [0, 10] });

// efficiently update bandwidth on estimator (extent remains unchanged)
d1.bandwidth(0.5)

// generate array of {x, y} sample points
let p1 = Array.from(d1);

// generate array of {a, b} sample points (instead of {x, y})
p1 = [...d1.points('a', 'b')]

// retrieve internal sample grid array of density estimates
// these values represent the total probability mass in each bin
// they are not (yet) scaled to probability density function estimates
let g1 = d1.grid();

2D Density Estimation

import { density2d } from 'fast-kde';
import { interpolatePiYG } from 'd3-scale-chromatic';

const data = [
  {u: 1, v: 1}, {u: 1, v: 2}, {u: 5, v: 4},
  {u: 5, v: 3}, {u: 6, v: 2}, {u: 8, v: 7}
];

// 2d density estimation, with automatic bandwidth and extent
let d2 = density2d(data, { x: 'u', y: 'v' });

// 2d density estimation, with bandwidth and extent shared across (x, y)
d2 = density2d(data, { x: 'u', y: 'v', bandwidth: 1, extent: [0, 10] });

// 2d density estimation, with bandwidth and extent that differ across (x, y)
d2 = density2d(data, { x: 'u', y: 'v', bandwidth: [1, 0.5], extent: [[0, 10], [1, 9]] });

// 2d density estimation, with customized x- and y-bin counts
d2 = density2d(data, { x: 'u', y: 'v', bins: [256, 256] });

// generate array of {x, y, z} sample points
let p2 = [...d2];

// generate array of {a, b, v} sample points (instead of {x, y, z})
p2 = [...d2.points('a', 'b', 'v')]

// HTML canvas element with a bins[0] x bins[1] heatmap image
let h2 = d2.heatmap();

// HTML canvas heatmap with custom interpolator from d3-scale-chromatic
h2 = d2.heatmap({ color: interpolatePiYG })

// retrieve internal sample grid array of density estimates
// these values represent the total probability mass in each bin
// they are not (yet) scaled to probability density function estimates
let g2 = d2.grid();

Build Instructions

All code is written as ESM modules, and uses the "type": "module" Node.js setting. To build a bundle (ESM module or minified UMD):

  1. Run yarn to install dependencies.
  2. Run yarn build to build the bundles.

Compiled bundles will be written to the dist directory.

API Documentation

1D Density Estimation

# kde.density1d(data[, options])

Creates a new 1D density estimator for the input data. Returns an estimator object that includes the methods listed below, and also provides an iterator over resulting density points.

  • data: An array of input data values for which to perform density estimation. The array values may be numbers or objects.
  • options: Options for configuring density estimation.
    • x: An accessor function for input values. By default this is the identity function, corresponding to data as an array of numbers. If the x option is not function-valued, it will be treated as a key to look up on entries of the input data.
    • weight: An accessor function for weights. By default all input points are given the same weight. The weight values should sum to one for the output to be a proper probability density estimate. If the weight option is not function-valued, it will be treated as a key to look up on entries of the input data.
    • bandwidth: The kernel bandwidth (standard deviation) to use. If unspecified, the bandwidth is automatically calculated using the nrd heuristic and the adjust option.
    • adjust: A fractional value by which to scale (adjust) an automatically calculated bandwidth. For example, an adjust value of 0.5 will result in half the automatically-determined bandwidth. This option is ignored if the bandwidth option is specified.
    • extent: The extent over which to compute kernel density estimation as a two-element array. Note that input data values outside the extent are ignored, potentially resulting in inaccurate densities relative to the full data. If unspecified, the extent is automatically calculated based on the input data extent and the pad option.
    • pad: The amount (in kernel bandwidths) by which to extend an automatically-calculated extent. The default value is 3, capturing 99% of the density from the most extreme points. Set this value to 0 to trim the density estimate to the minimum and maximum observed data points. This option is ignored if the extent option is provided.
    • bins: The number of bins to use for the internal grid. The default is 512 bins. The returned density estimate will include a total of bins equally-spaced sample points over the extent.

Example

// perform 1D estimation with bandwidth = 1 over domain [0, 10]
// returns an iterator over [ { x, y }, ... ] points
kde.density1d([1, 2, 5, 5, 6, 9], { bandwidth: 1, extent: [0, 10] })

# density1d.grid()

Returns the internal grid array of total accumulated density values per bin. To instead produce an array of objects containing coordinate values and probability density function estimates, use density1d.points().

# density1d.points([x, y])

Returns an iterator over objects containing a sample point (x) and density value (y).

  • x: The property name for the sample point (default "x").
  • y: The property name for the estimated density value (default "y").

# density1d.bandwidth([bandwidth])

Get or set the bandwidth (standard deviation) of the Gaussian kernel. Setting the bandwidth will update the estimator efficiently without re-performing binning. The extent will remain unchanged, even if previously determined automatically.

# density1d.extent()

Get the calculated extent of density estimation as a [min, max] extent array. This method does not support setting the extent to a new value, as this requires re-binning the input data.

2D Density Estimation

# kde.density2d(data[, options])

Creates a new 2D density estimator for the input data. Returns an estimator object that includes the methods listed below, and also provides an iterator over resulting density points.

  • data: An array of input data values for which to perform density estimation.
  • options: Options for configuring density estimation.
    • x: An accessor function for x-dimension input values. The default accessor retrieves index 0. If the x option is not function-valued, it will be treated as a key to look up on entries of the input data.
    • y: An accessor function for y-dimension input values. The default accessor retrieves index 1. If the y option is not function-valued, it will be treated as a key to look up on entries of the input data.
    • weight: An accessor function for weights. By default all input points are given the same weight. The weight values should sum to one for the output to be a proper probability density estimate. If the weight option is not function-valued, it will be treated as a key to look up on entries of the input data.
    • bandwidth: The kernel bandwidths (standard deviations) to use. If array-valued, specifies the x- and y-bandwidths separately. If number-valued, sets both the x- and y-bandwidths to the same value. If unspecified, the bandwidths are automatically calculated per-dimension using the nrd heuristic and the adjust option.
    • adjust: A fractional value by which to scale (adjust) an automatically calculated bandwidth. For example, an adjust value of 0.5 will result in half the automatically-determined bandwidth. This option is ignored if the bandwidth option is specified.
    • extent: The extent over which to compute kernel density estimation along both the x- and y-dimensions. If an array of arrays is provided, specifies the x- and y-extents separately. If a single two-number array is provided, sets both x- and y-extents to the same value. Note that input data values outside the extent are ignored, potentially resulting in inaccurate densities relative to the full data. If unspecified, the extent is automatically calculated based on the input data extent and the pad option.
    • pad: The amount (in kernel bandwidths) by which to extend an automatically-calculated extent. The default value is 3, capturing 99% of the density from the most extreme points. Set this value to 0 to trim the density estimate to the minimum and maximum observed data points. This option is ignored if the extent option is provided.
    • bins: The number of bins to use for the internal grid. The default is [256, 256] bins. If array-valued, specifies the x- and y-bins separately. If number-valued, sets both x- and y-bins to the same value. The returned density estimate will include a total of bins[0] * bins[1] equally-spaced sample points over the extent.

Example

// perform 2D estimation with bandwidths [1, 1] over extent [[0, 10], [0, 10]]
// use default grid size ([256, 256])
// returns an iterator over [ { x, y, z }, ... ] points
const data = [[1, 1], [1, 2], [5, 4], [5, 3], [6, 2], [8, 7]];
kde.density2d(data, { bandwidth: 1, extent: [0, 10] })
// perform 2D estimation with different bandwidths and extent for x and y
// returns an iterator over [ { x, y, z }, ... ] points
const data = [[1, 1], [1, 2], [5, 4], [5, 3], [6, 2], [8, 7]];
kde.density2d(data, { bandwidth: [1, 0.5], extent: [[1, 9], [1, 8]] })

# density2d.grid()

Returns the internal grid array of total accumulated density values per bin. To instead produce an array of objects containing coordinate values and probability density function estimates, use density2d.points().

# density2d.points([x, y, z])

Returns an iterator over objects containing sample points (x, y) and density value (z).

  • x: The property name for the x-dimension sample point (default "x").
  • y: The property name for the y-dimension sample point (default "y").
  • z: The property name for the estimated density value (default "z").

# density2d.bandwidth([bandwidth])

Get or set the bandwidths (standard deviations) of the Gaussian kernel. If array-valued, specifies the x- and y-bandwidths separately. If number-valued, sets both x- and y-bandwidths to the same value. Setting the bandwidth will update the estimator efficiently without re-performing binning. The extent will remain unchanged, even if previously determined automatically.

# density2d.extent()

Get the calculated extent of density estimation. Returns an array containing the x- and y-dimension extents: [[xmin, xmax], [ymin, ymax]]. This method does not support setting the extent to a new value, as this requires re-binning the input data.

# density2d.heatmap([options])

Generate a heatmap image of the 2D density. Returns an HTML canvas element.

  • options: Options for heatmap image generation.
    • color: A color function that maps density values (normalized to the domain [0, 1]) to either valid CSS color strings or to RGB color objects with r, g, b properties (in the range 0-255) and an optional opacity property (a fractional value between 0 and 1). If CSS color strings are used, the d3-color library must also be loaded.
    • clamp: Sets the range of density values to a given [min, max] array. Values below the minimum or above the maximum will be clamped to the provided values. Values within the clamped range are then normalized to the domain [0, 1].
    • canvas: A canvas element to draw into. If unspecified, a new canvas instance is created with dimensions matching the density estimator bins option.
    • maxColors: The maximum number of colors (default 256) to use for heatmap generation. This number determines the size of the backing pre-computed palette, and thus the number of times the color function is invoked.

Utility Methods

# kde.nrd(data, accessor)

Calculates a suggested bandwidth for a set of numeric data values, using Scott's normal reference distribution (NRD) heuristic.

# kde.opacityMap(r, g, b)

Returns a color map function (compatible with the heatmap color option) that ramps the opacity for a fixed set of r, g, b values in the range 0-255.