stoch

v0.0.2

Published

2 months ago

Probabilistic programming library powered by TensorFlow.js

0High
0Medium
0Low

zemlyansky

tensorflow probability probabilistic-programming bayesian statistics distributions machine-learning

stoch

Probabilistic programming in JavaScript powered by TensorFlow.js, inspired by TensorFlow Probability and Stan.

40 distributions, 16 bijectors, MCMC (HMC + NUTS), variational inference, Gaussian processes, and convergence diagnostics — browser and Node.js, with GPU acceleration.

GitHub · npm

Install

npm install stoch @tensorflow/tfjs

| Backend | Package | Best for | |---|---|---| | CPU (JS) | @tensorflow/tfjs | Browser, quick prototyping | | CPU (native) | @tensorflow/tfjs-node | Node.js production | | GPU (CUDA) | @tensorflow/tfjs-node-gpu | Large models, GPU inference |

Usage

import * as tf from '@tensorflow/tfjs'
import stoch from 'stoch'

All parameters accept scalars, arrays, or tensors. Arrays/tensors create batched distributions that vectorize all operations.

Module overview

stoch.distributions   40 probability distributions + KL divergence
stoch.bijectors       16 differentiable invertible transforms
stoch.mcmc            HMC, NUTS, Random Walk Metropolis, diagnostics
stoch.vi              Variational inference (ELBO, mean-field)
stoch.math            Special functions, constants, differentiable linear algebra
stoch.stats           HDI, MCSE, ArviZ-style summary
stoch.gp              Gaussian processes and kernels

stoch.setValidateArgs(false)  // disable runtime argument validation (faster)
stoch.getValidateArgs()       // check current setting (default: true)

Distributions

All distributions extend a common base class:

const dist = new stoch.distributions.Normal({ loc: 0, scale: 1 })

dist.sample([1000])   // shape [1000]
dist.logProb(0.5)     // scalar tensor
dist.prob(0.5)        // exp(logProb(x))
dist.cdf(0.5)         // cumulative distribution function
dist.logCdf(0.5)      // log CDF (numerically stable)
dist.mean()           // distribution mean
dist.variance()       // distribution variance
dist.stddev()         // sqrt(variance())
dist.entropy()        // Shannon entropy
dist.mode()           // mode (where implemented)
dist.dispose()        // free parameter tensors

Batching:

const dists = new stoch.distributions.Normal({ loc: [0, 1, 2], scale: 1 })
dists.sample([100])   // shape [100, 3]
dists.logProb(0.5)    // shape [3]

Continuous

| Distribution | Constructor | |---|---| | Normal | { loc, scale } | | LogNormal | { loc, scale } | | StudentT | { df, loc, scale } | | Uniform | { low, high } | | Beta | { concentration1, concentration0 } | | Gamma | { concentration, rate } | | Exponential | { rate } | | InverseGamma | { concentration, scale } | | Chi2 | { df } | | Cauchy | { loc, scale } | | Laplace | { loc, scale } | | Logistic | { loc, scale } | | Gumbel | { loc, scale } | | HalfNormal | { scale } | | HalfCauchy | { scale } | | Pareto | { concentration, scale } | | Weibull | { concentration, scale } | | VonMises | { loc, concentration } | | TruncatedNormal | { loc, scale, low, high } |

Discrete

| Distribution | Constructor | |---|---| | Bernoulli | { probs } or { logits } | | Categorical | { probs } or { logits } | | Binomial | { totalCount, probs } or { totalCount, logits } | | Poisson | { rate } | | Geometric | { probs } or { logits } | | NegativeBinomial | { totalCount, probs } or { totalCount, logits } | | Multinomial | { totalCount, probs } or { totalCount, logits } | | OneHotCategorical | { probs } or { logits } | | ZeroInflatedPoisson | { rate, gate } |

Relaxed (differentiable approximations)

| Distribution | Constructor | |---|---| | RelaxedBernoulli | { temperature, probs } or { temperature, logits } | | RelaxedOneHotCategorical | { temperature, probs } or { temperature, logits } |

Multivariate

| Distribution | Constructor | |---|---| | MultivariateNormalDiag | { loc, scaleDiag } | | MultivariateNormalTriL | { loc, scaleTril } | | Dirichlet | { concentration } | | Wishart | { df, scaleTril } | | LKJCholesky | { dimension, concentration } |

Compound

| Distribution | Constructor | |---|---| | Independent | { distribution, reinterpretedBatchNdims } | | MixtureSameFamily | { mixtureDist, componentDist } | | TransformedDistribution | { distribution, bijector } |

KL divergence

const p = new stoch.distributions.Normal({ loc: 0, scale: 1 })
const q = new stoch.distributions.Normal({ loc: 1, scale: 2 })
const kl = stoch.distributions.klDivergence(p, q)  // KL(p || q)

Built-in same-family pairs: Normal, Bernoulli, Gamma, Beta, Exponential, Dirichlet, Categorical, Laplace.

stoch.distributions.registerKL(DistP, DistQ, (p, q) => { /* return tf.Tensor */ })

Joint models

Named model with explicit deps (safe under minification):

const model = new stoch.distributions.JointDistributionNamed({
  mu:    { deps: [], fn: () => new stoch.distributions.Normal({ loc: 0, scale: 10 }) },
  sigma: { deps: [], fn: () => new stoch.distributions.LogNormal({ loc: 0, scale: 1 }) },
  y:     { deps: ['mu', 'sigma'], fn: ({ mu, sigma }) =>
    new stoch.distributions.Normal({ loc: mu, scale: sigma }) }
})

model.sample()              // { mu: Tensor, sigma: Tensor, y: Tensor }
model.sample([100])         // 100 joint draws
model.logProb(values)       // scalar — joint log probability
model.logProbParts(values)  // per-component log probabilities
model.variableNames         // ['mu', 'sigma', 'y'] (topological order)

Shorthand (arg-name parsing, breaks under minification):

const model = new stoch.distributions.JointDistributionNamed({
  mu: () => new stoch.distributions.Normal({ loc: 0, scale: 10 }),
  y:  ({ mu }) => new stoch.distributions.Normal({ loc: mu, scale: 1 })
})

Sequential model (positional deps, most recent first):

const model = new stoch.distributions.JointDistributionSequential([
  () => new stoch.distributions.Normal({ loc: 0, scale: 1 }),
  (x0) => new stoch.distributions.Normal({ loc: x0, scale: 0.1 })
])

model.sample()              // [Tensor, Tensor]
model.logProb([x0, x1])     // scalar

Bijectors

Differentiable invertible transforms for constrained-parameter inference and building transformed distributions.

const bij = new stoch.bijectors.Exp()
bij.forward(tf.scalar(-1))               // exp(-1) ≈ 0.368
bij.inverse(tf.scalar(2))                // log(2) ≈ 0.693
bij.forwardLogDetJacobian(tf.scalar(0))  // log|det(df/dx)|
bij.inverseLogDetJacobian(tf.scalar(2))  // log|det(df⁻¹/dy)|

Available bijectors

| Bijector | Transform | Use case | |---|---|---| | Identity | x | No-op | | Exp | exp(x) | R → R+ | | Log | log(x) | R+ → R | | Softplus | log(1 + exp(x)) | Smooth R → R+ | | Sigmoid | sigmoid(x) | R → (0, 1) | | Tanh | tanh(x) | R → (-1, 1) | | Shift({ shift }) | x + shift | Location shift | | Scale({ scale }) | x × scale | Scaling | | AffineScalar({ shift, scale }) | shift + scale × x | Affine transform | | Power({ power }) | x^power | Power transform | | Invert({ bijector }) | Swaps forward/inverse | Reverse any bijector | | Chain({ bijectors }) | Compose right-to-left | Build pipelines | | Ascending | R^d → sorted R^d | Ordered constraints | | SoftmaxCentered | R^(d-1) → simplex(d) | Probability simplex | | FillTriangular | R^(n(n+1)/2) → lower triangular | Matrix structure | | CorrelationCholesky | R^(d(d-1)/2) → correlation Cholesky | Correlation matrices |

Composed transforms

// LogNormal = Normal + Exp
const logNormal = new stoch.distributions.TransformedDistribution({
  distribution: new stoch.distributions.Normal({ loc: 0, scale: 1 }),
  bijector: new stoch.bijectors.Exp()
})

// Compose multiple bijectors (applied right-to-left)
const chain = new stoch.bijectors.Chain({
  bijectors: [new stoch.bijectors.Exp(), new stoch.bijectors.Scale({ scale: 2 })]
})
// chain.forward(x) = exp(2 * x)

MCMC

High-level API — `stoch.mcmc.sample()`

Auto-configures NUTS with step-size adaptation:

const { samples, diagnostics } = stoch.mcmc.sample({
  targetLogProbFn: (x) => tf.mul(-0.5, tf.square(x)),
  initialState: tf.scalar(0),
  numResults: 1000,
  numBurninSteps: 500,
  stepSize: 0.1
})

| Parameter | Type | Default | Description | |---|---|---|---| | targetLogProbFn | Function | required | (state) => tf.Tensor scalar log-density | | initialState | Tensor/Object | required | Starting point. Object for multi-parameter models | | numResults | number | 1000 | Samples to collect per chain | | numBurninSteps | number | 500 | Warmup steps (discarded) | | numChains | number | 1 | Independent chains (>=2 enables R-hat) | | stepSize | number | 0.1 | Initial leapfrog step size | | kernel | string | 'nuts' | 'nuts' or 'hmc' | | maxTreeDepth | number | 10 | NUTS max tree depth | | numLeapfrogSteps | number | 10 | HMC leapfrog steps (ignored for NUTS) | | bijectors | Object | — | { paramName: Bijector } for constrained params | | numAdaptationSteps | number | numBurninSteps | Step-size adaptation steps | | targetAcceptProb | number | 0.8 | Target acceptance rate | | numStepsBetweenResults | number | 0 | Thinning interval | | traceFn | Function | — | (state, kernelResults) => any |

Returns { samples, diagnostics, trace }. Diagnostics include ess, rhat, numDivergent, numMaxDepth, meanLeapfrogs.

Multi-parameter with constraints:

const { samples, diagnostics } = stoch.mcmc.sample({
  targetLogProbFn: ({ mu, logSigma }) => {
    const sigma = tf.exp(logSigma)
    return tf.add(
      tf.mul(-0.5, tf.square(tf.div(mu, sigma))),
      tf.neg(logSigma)
    )
  },
  initialState: { mu: tf.scalar(0), logSigma: tf.scalar(0) },
  numResults: 1000,
  numBurninSteps: 500,
  numChains: 2,
  stepSize: 0.1,
  targetAcceptProb: 0.8
})

Low-level API

Full control over kernel composition:

const kernel = new stoch.mcmc.DualAveragingStepSizeAdaptation({
  innerKernel: new stoch.mcmc.TransformedTransitionKernel({
    innerKernel: new stoch.mcmc.NoUTurnSampler({
      targetLogProbFn: targetLogProb,
      stepSize: 0.1,
      maxTreeDepth: 10
    }),
    bijectors: { sigma: new stoch.bijectors.Exp() }
  }),
  numAdaptationSteps: 400,
  targetAcceptProb: 0.75
})

const { samples, trace } = stoch.mcmc.sampleChain({
  numResults: 1000,
  numBurninSteps: 500,
  currentState: { mu: tf.scalar(0), sigma: tf.scalar(1) },
  kernel,
  numStepsBetweenResults: 0,
  traceFn: (state, kr) => ({ accepted: kr.isAccepted.dataSync()[0] })
})

Kernels

| Kernel | Constructor | |---|---| | NoUTurnSampler | { targetLogProbFn, stepSize, maxTreeDepth, maxEnergyDiff } | | HamiltonianMonteCarlo | { targetLogProbFn, stepSize, numLeapfrogSteps } | | RandomWalkMetropolis | { targetLogProbFn, newStateProposalFn, proposalScale } |

Wrappers

| Wrapper | Constructor | |---|---| | TransformedTransitionKernel | { innerKernel, bijectors } | | DualAveragingStepSizeAdaptation | { innerKernel, numAdaptationSteps, targetAcceptProb } |

Diagnostics

Operate on plain JS arrays (use tensor.dataSync()):

const ess = stoch.mcmc.effectiveSampleSize(chain.dataSync())       // Geyer 1992
const rhat = stoch.mcmc.potentialScaleReduction([chain1, chain2])   // Gelman-Rubin (>=2 chains)

Predictive checks

// Posterior predictive: one prediction per posterior draw
const yPred = stoch.mcmc.posteriorPredictive({
  samples: posteriorSamples,    // stacked tensor [n, ...] or { param: tensor }
  predictFn: ({ slope, intercept }) => tf.add(tf.mul(slope, xNew), intercept),
  numSamples: 200               // optional, defaults to all
})

// Prior predictive
const yPrior = stoch.mcmc.priorPredictive({
  priorFn: () => ({ slope: tf.randomNormal([]), intercept: tf.randomNormal([]) }),
  predictFn: ({ slope, intercept }) => tf.add(tf.mul(slope, xNew), intercept),
  numSamples: 100               // default: 100
})

Variational inference

`trainableNormal({ loc, scale, name })`

Normal distribution with tf.variable() parameters optimized via gradient descent. Scale is parameterized internally via softplus to stay positive.

const q = stoch.vi.trainableNormal({ loc: 0, scale: 1 })

q.sample()              // reparameterized: μ + σ * ε
q.sample([10])          // shape [10]
q.logProb(value)        // log N(value; μ, σ)
q.getParameters()       // { loc: number, scale: number }
q.trainableVariables    // [locVar, unconstrainedScaleVar]
q.dispose()

`buildMeanFieldPosterior(initialState, { initialScale })`

One independent trainableNormal per parameter:

const q = stoch.vi.buildMeanFieldPosterior(
  { mu: 0, sigma: 1 },
  { initialScale: 1.0 }
)

q.sample()           // { mu: Tensor, sigma: Tensor }
q.logProb(values)    // scalar — sum of independent log-probs
q.getParameters()    // { mu: { loc, scale }, sigma: { loc, scale } }
q.trainableVariables // all tf.variables
q.dispose()

`computeElbo({ targetLogProbFn, surrogatePosterior, numSamples })`

ELBO = E_q[ log p(z) - log q(z) ]. Returns scalar tensor (higher is better).

const elbo = stoch.vi.computeElbo({
  targetLogProbFn: (z) => tf.mul(-0.5, tf.square(z)),
  surrogatePosterior: q,
  numSamples: 10       // default: 1
})

`fitSurrogatePosterior({ ... })`

Optimization loop minimizing -ELBO:

const { surrogatePosterior, losses } = stoch.vi.fitSurrogatePosterior({
  targetLogProbFn: (z) => tf.mul(-0.5, tf.square(z)),
  surrogatePosterior: q,
  optimizer: tf.train.adam(0.01),
  numSteps: 1000,
  numElboSamples: 1,                          // default: 1
  convergenceFn: (step, loss) => loss < 0.01,  // optional early stop
  traceLogProbFn: (step, loss) => { ... }      // optional logging
})
// losses: number[] — loss at each step

Stats

Summary statistics for MCMC output. All functions operate on plain JS arrays (use tensor.dataSync()).

const [low, high] = stoch.stats.hdi(samples, 0.94)   // Highest Density Interval
const se = stoch.stats.mcse(samples)                  // Monte Carlo Standard Error

const result = stoch.stats.summary({
  mu: [chain1_mu, chain2_mu],   // multiple chains → computes R-hat
  sigma: chain1_sigma           // single chain → R-hat = NaN
}, { hdiProb: 0.94 })
// result.mu = { mean, sd, hdiLow, hdiHigh, ess, rhat, mcse }

Gaussian processes

Kernels

All kernels implement matrix(x1, x2) → kernel matrix [n, m].

| Kernel | Constructor | |---|---| | SquaredExponential | { amplitude, lengthScale } | | Matern | { nu, amplitude, lengthScale } — nu: 0.5, 1.5, or 2.5 | | Linear | { variance, bias } | | Periodic | { amplitude, lengthScale, period } | | White | { variance } |

Combinators: Add(k1, k2), Product(k1, k2), Scale(kernel, scale).

const kernel = new stoch.gp.Add(
  new stoch.gp.SquaredExponential({ lengthScale: 1 }),
  new stoch.gp.White({ variance: 0.1 })
)

`GaussianProcess({ kernel, meanFn, observationNoiseVariance })`

GP prior over functions:

const gpPrior = new stoch.gp.GaussianProcess({
  kernel: new stoch.gp.SquaredExponential({ lengthScale: 1 }),
  meanFn: (x) => tf.zeros([x.shape[0]]),   // optional, default: zero
  observationNoiseVariance: 0.01            // optional, default: 0
})

const x = tf.tensor2d([[0], [1], [2], [3], [4]])
gpPrior.sample(x, [5])                // 5 function draws, shape [5, 5]
gpPrior.logProb(x, observations)       // marginal log-likelihood
gpPrior.posterior(x, observations)     // { mean, covariance }

`GaussianProcessRegressionModel({ ... })`

GP conditioned on observed data:

const gprm = new stoch.gp.GaussianProcessRegressionModel({
  kernel: new stoch.gp.SquaredExponential({ amplitude: 1, lengthScale: 0.5 }),
  indexPoints: xTrain,             // [n, d] training inputs
  observations: yTrain,            // [n] training targets
  observationNoiseVariance: 0.01,  // optional, default: 1e-6
  predictiveNoiseVariance: 0,      // optional, adds noise to predictions
  predictiveIndexPoints: xTest,    // optional default test points
  meanFn: null                     // optional prior mean function
})

const { mean, covariance } = gprm.predict(xTest)
const fSamples = gprm.sample(xTest, [10])    // [10, m] posterior draws
const logML = gprm.logMarginalLikelihood()     // model selection

Math

Special functions

All operate on tensors (scalars auto-converted):

| Function | Description | |---|---| | logGamma(x) | Log Gamma function (Lanczos) | | digamma(x) | Psi function d/dx log Gamma | | logBeta(a, b) | Log Beta function | | ndtr(x) | Normal CDF Phi(x) | | logNdtr(x) | Numerically stable log Phi(x) | | ndtri(p) | Inverse normal CDF Phi⁻¹(p) | | logChoose(n, k) | Log binomial coefficient | | incompleteGamma(a, x) | Returns { lower, upper } | | incompleteBeta(a, b, x) | Regularized incomplete beta I_x(a,b) | | besselI0(x) | Modified Bessel I₀ | | besselI1(x) | Modified Bessel I₁ | | logBesselI0(x) | Stable log I₀ for large x |

Numerically stable operations

| Function | Description | |---|---| | log1mexp(x) | log(1 - exp(x)) for x < 0 | | logAddExp(a, b) | log(exp(a) + exp(b)) | | softplusInverse(x) | log(exp(x) - 1) |

Constants

| Constant | Value | |---|---| | LOG_PI | log(π) | | LOG_2 | log(2) | | LOG_2PI | log(2π) | | LOG_SQRT_2PI | 0.5 × log(2π) | | SQRT_2 | √2 | | SQRT_2_OVER_PI | √(2/π) | | EULER_MASCHERONI | 0.5772... |

Differentiable linear algebra

// Cholesky decomposition with custom gradient (Murray 2016)
const L = stoch.math.cholesky(A)   // L where A = LLᵀ — supports tf.grad

// Triangular linear system solver
stoch.math.triangularSolve(L, b)                         // L·X = B (default: lower=true)
stoch.math.triangularSolve(L, b, { adjoint: true })      // Lᵀ·X = B
stoch.math.triangularSolve(U, b, { lower: false })       // U·X = B

Memory management

Distributions allocate parameter tensors. Always dispose when done:

const dist = new stoch.distributions.Normal({ loc: 0, scale: 1 })
// ... use dist ...
dist.dispose()

Or use tf.tidy() for automatic cleanup of intermediates:

const result = tf.tidy(() => {
  const dist = new stoch.distributions.Normal({ loc: 0, scale: 1 })
  const lp = dist.logProb(0.5)
  dist.dispose()
  return lp  // survives tf.tidy
})

sampleChain manages internal tensor lifecycle automatically. Dispose returned sample tensors when done.

Performance

Benchmarked on Node.js v19.8.1, AMD Ryzen 7 5800HS, RTX 3060. WebPPL is the only other JS probabilistic programming library.

| Task | tfjs | tfjs-node | tfjs-node-gpu | WebPPL | |---|---|---|---|---| | Normal.logProb (100K) | 131 (1.8x) | 3,517 (52x) | 1,808 (26x) | 71 | | Gamma.logProb (100K) | 122 (2.0x) | 1,176 (21x) | 405 (7x) | 60 | | Beta.logProb (100K) | 101 (3.3x) | 502 (17x) | 158 (6x) | 31 | | Normal.sample (100K) | 171 | 300 | 272 | 348 | | Exponential.sample (100K) | 230 | 1,083 | 924 | 471 |

ops/s, higher is better. Bold = fastest. Speedup vs WebPPL in parentheses.

Log-prob is up to 52x faster with native backend. GPU shines on larger tensors and gradient-heavy workloads.

npm run bench          # JS CPU
npm run bench:native   # native CPU (tfjs-node)
npm run bench:gpu      # GPU (tfjs-node-gpu, requires CUDA)

Examples

Build, then open in browser:

npm run build-dev
# open examples/*.html

| Example | Description | |---|---| | linear_regression.html | Bayesian linear regression with HMC | | nuts_explorer.html | Animated NUTS sampler on 2D distributions | | visual_tests.html | 10 interactive visual tests with live controls |

Development

npm install          # install dependencies
npm run build-dev    # fast dev build (no tests, no minification)
npm run build        # production build + full test suite
npm run test:unit    # 1063 tests across 83 suites
npm run bench        # benchmarks vs WebPPL

Reference data for distribution tests:

python3 scripts/generate-reference-data.py   # requires scipy, numpy

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

stoch

Install

Usage

Module overview

Distributions

Continuous

Discrete

Relaxed (differentiable approximations)

Multivariate

Compound

KL divergence

Joint models

Bijectors

Available bijectors

Composed transforms

MCMC

High-level API — stoch.mcmc.sample()

Low-level API

Kernels

Wrappers

Diagnostics

Predictive checks

Variational inference

trainableNormal({ loc, scale, name })

buildMeanFieldPosterior(initialState, { initialScale })

computeElbo({ targetLogProbFn, surrogatePosterior, numSamples })

fitSurrogatePosterior({ ... })

Stats

Gaussian processes

Kernels

GaussianProcess({ kernel, meanFn, observationNoiseVariance })

GaussianProcessRegressionModel({ ... })

Math

Special functions

Numerically stable operations

Constants

Differentiable linear algebra

Memory management

Performance

Examples

Development

License

High-level API — `stoch.mcmc.sample()`

`trainableNormal({ loc, scale, name })`

`buildMeanFieldPosterior(initialState, { initialScale })`

`computeElbo({ targetLogProbFn, surrogatePosterior, numSamples })`

`fitSurrogatePosterior({ ... })`

`GaussianProcess({ kernel, meanFn, observationNoiseVariance })`

`GaussianProcessRegressionModel({ ... })`