npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@nadeemlab/sm-rust

v0.0.1

Published

Spatial multiomics primitives (neighbour graphs, Moran's I, nhood enrichment, Ripley, co-occurrence) compiled to WebAssembly.

Readme

sm-rust

Spatial multiomics primitives compiled to WebAssembly. Exposes neighbour-graph construction, Moran's I, Geary's C, neighbourhood enrichment, interaction matrix, Ripley's K/L/F/G/J, co-occurrence, per-cluster centrality scores, ligand-receptor permutation tests, sepal diffusion scoring, niche detection (four squidpy.gr.calculate_niche flavours — neighborhood, utag, cellcharter, spatialleiden), nearest-neighbour-distance statistics, and Benjamini–Hochberg FDR correction for use from JavaScript / TypeScript. Result shapes mirror esda / squidpy field-for-field — every analytic call returns the full closed-form result (statistic, both variance flavours, signed z-scores, one- and two-sided p-values); every permutation call returns the full perm-null result (I/C-space null moments, signed z-score, z-based + rank-based pseudo p-values).

Installation

npm install @nadeemlab/sm-rust

Usage

The package ships three builds. Most consumers don't need to choose explicitly — import "@nadeemlab/sm-rust" resolves to the right one based on your environment.

Bundlers (Webpack, Vite, Rollup, Next.js, etc.)

import { computeMoranI, computeNhoodEnrichment } from "@nadeemlab/sm-rust";

const result = computeMoranI(/* ... */);

Your bundler will fetch the .wasm file as an asset automatically.

Browser (no bundler)

<script type="module">
  import init, { computeMoranI } from "@nadeemlab/sm-rust/web";

  await init();
  const result = computeMoranI(/* ... */);
</script>

The init() call is required and returns a promise once the wasm module has been instantiated.

Node.js

import { computeMoranI } from "@nadeemlab/sm-rust/node";

const result = computeMoranI(/* ... */);

The Node build is CommonJS internally but works with both require and import syntax (Node ≥ 18).

Tuning the permutation tests

computeMoranI, computeGearyC, and computeNhoodEnrichment run a label- shuffle permutation null (matching esda.Moran(...) / esda.Geary(...) / sq.gr.nhood_enrichment on the serial path). They take an optional trailing { permutations?, seed?, threads? } options object:

computeMoranI(cells, values, neighbors, { permutations: 2000, seed: 42, threads: 0 });
  • permutations — number of label shuffles. Defaults to 1000; values below 2 return undefined (a variance can't be estimated).
  • seed — RNG seed for the label shuffles. Omit to use the squidpy/esda default stream (the serial path then reproduces their reference results bit-for-bit); set it for a different but reproducible shuffle sequence. A number covers seeds up to 2^53; pass a bigint for the full u64 range.
  • threads — thread budget. Omit or 1 = single-threaded, 0 = all logical cores, n = exactly n threads. Honoured only by the native Node build (the optional napi binary); the wasm builds are always single-threaded and ignore it.

Ripley options

The Ripley functions (computeRipley, computeRipleySmprofiler, computeRipleyFSmprofiler, computeRipleyTextbook) take an optional trailing { seed?, strategy?, quadMult?, cellsPerBin?, kExact?, nObservations?, nSimulations? } object:

  • seed — RNG seed for the Monte-Carlo point sampling. Same semantics as the seed field in ComputeOptions. Used by computeRipleyFSmprofiler, all modes of computeRipley / computeRipleySmprofiler, and the F/J modes of computeRipleyTextbook.
  • strategy — pair-counter used for the K/L statistic (no effect for F/G):
    • "exact" (default) — dual-tree, integer counts.
    • "quad" — d²-binned dual-tree, drops the per-pair sqrt for a ~30 % faster hot loop at <0.1 % drift on the resampled curve.
    • "fft" — grid + autocorrelation, sub-quadratic in N. Small-radius bins are approximate.
    • "p3m" — hybrid (exact short-range + FFT long-range).
  • quadMult — internal-bin oversampling for "quad" (default 64; sub-0.1 % resample drift at no measurable runtime cost). Ignored otherwise.
  • cellsPerBin — grid resolution (cells per support bin, clamped to ≥ 2) for "fft" / "p3m". Default 4.
  • kExact — for "p3m", the first kExact bins come from the exact dual-tree. Default 5.
  • nObservations — number of CSR query points (F mode) and simulated points per simulation. Default 1000 (squidpy n_observations). Used by computeRipleyFSmprofiler and all modes of computeRipley / computeRipleySmprofiler; ignored by computeRipleyTextbook.
  • nSimulations — number of Monte-Carlo simulations forming the null. Default 100 (squidpy n_simulations). Same scope as nObservations.
computeRipley(cells, labels, "L", 50, { seed: 42 });
computeRipleyFSmprofiler(cells, labels, { seed: 7n }); // bigint for full u64
computeRipley(cells, labels, "F", 50, { nObservations: 5000, nSimulations: 500 });
computeRipleyTextbook(cells, labels, "K", 100, { strategy: "quad" });
computeRipleyTextbook(cells, labels, "L", 100, { strategy: "fft", cellsPerBin: 4 });

Omit it (or the whole object) for the default exact dual-tree.

Example: random points

Build cells straight from coordinate arrays with cellsFromCoords (no binary buffer needed), label them, then run a few statistics. (Random coordinates and labels have no spatial structure, so expect non-significant p-values.)

import {
  cellsFromCoords,
  Neighbors,
  countCells,
  computeAnalyticalMoranI,
  computeMoranI,
  computeGearyC,
  computeNhoodEnrichment,
  computeInteractionMatrix,
  computeRipley,
} from "@nadeemlab/sm-rust";

// 1. Random cells — plain f64 (x, y) coordinate arrays, no encoding step.
const N = 2000;
const xs = Float64Array.from({ length: N }, () => Math.random() * 1000);
const ys = Float64Array.from({ length: N }, () => Math.random() * 1000);
const cells = cellsFromCoords(xs, ys);

// 2. One binary label per cell (1 = "in phenotype"). Here ~30% at random.
const labels = Uint8Array.from({ length: N }, () => (Math.random() < 0.3 ? 1 : 0));

// Moran's I / Geary's C take a continuous attribute (Float64Array) — a 0/1
// phenotype is just the special case, so reuse the labels here.
const values = Float64Array.from(labels);

// 3. Pick a neighbour graph and run the statistics.
const neighbors = Neighbors.knn(6); // or Neighbors.radius(50), Neighbors.delaunay()

const { count, percentage } = countCells(cells, labels);

// Analytical: every p-value flavour from one closed-form call.
const moranA = computeAnalyticalMoranI(cells, values, neighbors);
// → { i, e_i, var_norm, var_rand, z_norm, z_rand,
//     p_norm, p_rand, p_norm_two_sided, p_rand_two_sided } | undefined

// Permutation: every perm-null flavour from one shuffled call.
const moranP = computeMoranI(cells, values, neighbors, { seed: 42 });
// → { i, perm_mean, perm_var, z_sim, p_z_sim, p_sim,
//     p_z_sim_two_sided, p_sim_two_sided } | undefined

// Geary's C has the same shape (`c`/`e_c` instead of `i`/`e_i`).
const gearyP = computeGearyC(cells, values, neighbors, { permutations: 2000 });

// Multi-cluster neighbourhood enrichment + interaction matrix. Pass
// `nClusters`; labels must be in `0..nClusters`. Both return row-major k×k
// flat arrays.
const nClusters = 2;
const nhood = computeNhoodEnrichment(cells, labels, neighbors, nClusters, { seed: 42 });
// → { k, count: Float64Array, zscore: Float64Array } | undefined
const interaction = computeInteractionMatrix(cells, labels, neighbors, nClusters);
// → Float64Array (length k * k) | undefined

const ripleyL = computeRipley(cells, labels, "L", 50, { seed: 42 });
// → { bins, background, phenotype } | undefined

console.log({ count, percentage, moranA, moranP, gearyP, nhood, interaction });
console.log("Ripley L support points:", ripleyL?.bins.length);

// Cells and Neighbors hold wasm memory — free them when done (or use `using`).
cells.free();
neighbors.free();

Building from source

The build is dockerized so you don't need a local Rust toolchain.

docker compose run --rm wasm-pack

This produces pkg/ containing all three targets ready for npm publish.

API reference

Full type declarations live in pkg/bundler/sm_rust.d.ts after a build. Functions typed number | undefined return undefined for a degenerate specimen (too few cells, an empty phenotype, an unknown mode, etc.).

Input handles

Get a Cells handle one of two ways:

  • cellsFromCoords(xs: Float64Array, ys: Float64Array): Cells — build directly from (x, y) coordinate arrays (f64). The simplest entry point: pair it with a labels: Uint8Array you build yourself (one byte per cell).
  • parseCells(data: Uint8Array): Cells — parse smprofiler's internal binary cell buffer. Carries a 64-bit phenotype mask per cell that the labelCells* helpers turn into labels. Layout (big-endian, masks little-endian):
    • header (20 B): u32 count, u32 minX, u32 maxX, u32 minY, u32 maxY
    • per cell (20 B): u32 id, u32 x, u32 y, u64 mask

Cells is an opaque handle — call .free() (or use using) to release its wasm memory.

Neighbors — neighbour-graph strategy, built with one of:

  • Neighbors.knn(k)k nearest neighbours (excludes self; throws if k == 0).
  • Neighbors.radius(r) — all cells within Euclidean distance r.
  • Neighbors.delaunay() — 2D Delaunay edges; co-located cells are orphaned (scipy/squidpy default).
  • Neighbors.delaunayShareCoplanar() — Delaunay, but co-located cells share neighbours.

Labelling

These derive a labels array from the phenotype mask carried by a parseCells buffer — masks are 64-bit (bigint), and a cell matches when it has all the positive bits and none of the negative bits. (Cells made with cellsFromCoords have no mask, so build labels yourself instead.)

  • labelCellsBinary(cells, positiveMask, negativeMask): Uint8Array — 0/1 label per cell (1 = matches). Consumed by the autocorrelation, enrichment, Ripley, co-occurrence and count functions.
  • labelCellsTwoGroup(cells, posA, negA, posB, negB): Uint8Array — 2-bit label per cell: bit 0 = in group A, bit 1 = in group B (a cell in both → 3). Consumed by computeProximityBinary and the NN-distance functions.

Counts & distances

  • countCells(cells, labels): { count, percentage } — number and percent (0–100) of cells with label != 0.
  • computeMeanNNDistance(cells, labels): number / computeMedianNNDistance(cells, labels): number — A→B nearest-neighbour distance summary (two-group labels).
  • computeProximityBinary(cells, labels, neighbors): number — fraction of group-A cells with at least one group-B neighbour (two-group labels).

Spatial autocorrelation

All take a continuous per-cell attribute as values: Float64Array (a 0/1 phenotype is the special case — Float64Array.from(labels)). Each statistic has an analytic call (microseconds, closed-form) and a permutation call (seconds, full Fisher-Yates null) — pick one based on whether the attribute is well-approximated by a normal.

Analytical (esda.Moran(..., permutations=0) / esda.Geary(...)) — returns the full closed-form result. The normal-variance approximation is mis-calibrated for sparse / non-normal attributes (e.g. a 0/1 phenotype indicator) — prefer the permutation versions there.

  • computeAnalyticalMoranI(cells, values, neighbors): MoranAnalytic | undefined

  • computeAnalyticalGearyC(cells, values, neighbors): GearyAnalytic | undefined

    type MoranAnalytic = {
      i: number;                   // observed I (matches esda.Moran.I)
      e_i: number;                 // -1 / (n - 1)
      var_norm: number;            // normality-assumption variance
      var_rand: number;            // randomization-assumption variance
                                   // (Cliff–Ord 1981, kurtosis-corrected)
      z_norm: number; z_rand: number;
      p_norm: number; p_rand: number;                 // one-sided
      p_norm_two_sided: number; p_rand_two_sided: number;
    };
    // GearyAnalytic is the same shape with `c` / `e_c` (always 1.0) in place
    // of `i` / `e_i`.

Permutation (esda.Moran(...) / esda.Geary(...) matching the serial path) — returns the full perm-null result. Accepts the ComputeOptions object.

  • computeMoranI(cells, values, neighbors, options?): MoranPermutation | undefined

  • computeGearyC(cells, values, neighbors, options?): GearyPermutation | undefined

    type MoranPermutation = {
      i: number;                   // observed I
      perm_mean: number; perm_var: number;  // in I-space
      z_sim: number;               // signed z-score (matches esda.Moran.z_sim)
      p_z_sim: number; p_sim: number;        // one-sided; p_sim is rank-based
      p_z_sim_two_sided: number; p_sim_two_sided: number;
    };

Raw statistics (no null model) — the I / C value itself (matching esda.Moran(...).I / esda.Geary(...).C), without z-scoring or a p-value:

  • computeMoranIStatistic(cells, values, neighbors): number | undefined
  • computeGearyCStatistic(cells, values, neighbors): number | undefined

Batch (many attributes, one graph) — builds the row-standardised adjacency once and evaluates every attribute against it (parallel across cores in the Node build). The fast path for scoring a whole expression matrix: passing nAttrs columns is much cheaper than that many single-attribute calls. Pair this with fdrBh for the "compute N tests, then BH-correct" workflow.

  • computeMoranIStatistics(cells, attrs: Float64Array, nAttrs: number, neighbors): (number | null)[]

  • computeGearyCStatistics(cells, attrs: Float64Array, nAttrs: number, neighbors): (number | null)[]

    attrs is a flat row-major buffer of length nAttrs · nCells; attribute k occupies attrs[k·nCells .. (k+1)·nCells]. Returns one value per attribute (null for a constant or degenerate column). If the buffer length doesn't match nAttrs · nCells, every entry is null.

    const flat = new Float64Array(nAttrs * nCells);
    // ... fill flat[k * nCells + i] = value of attribute k at cell i
    const moranPerAttr = computeMoranIStatistics(cells, flat, nAttrs, neighbors);
  • computeAnalyticalMoranIBatch(cells, attrs: Float64Array, nAttrs: number, neighbors): (MoranAnalytic | null)[]

  • computeAnalyticalGearyCBatch(cells, attrs: Float64Array, nAttrs: number, neighbors): (GearyAnalytic | null)[]

    Closed-form full result per attribute — the squidpy gr.spatial_autocorr(mode="moran" | "geary") workflow across a whole expression matrix. Same flat-row-major layout as the statistic-only batch; pair the resulting p_norm column with fdrBh for the ranked, FDR-corrected table.

    const rows = computeAnalyticalMoranIBatch(cells, flat, nGenes, neighbors);
    const pNorm = rows.map(r => r?.p_norm ?? null);
    const qNorm = fdrBh(pNorm);
    // Sort genes by rows[k].i desc for the squidpy-style ranked table.

Multiple-testing correction

  • fdrBh(pValues: (number | null)[]): (number | null)[] — Benjamini– Hochberg FDR correction. null entries pass through (treated as missing, dropped from the rank denominator). Equivalent to statsmodels.stats.multitest.multipletests(p, method='fdr_bh')[1].

    const pVals = computeMoranIStatistics(cells, flat, nAttrs, neighbors)
      // ... convert observed I to a p-value flavour per the analytic struct ...
    const qVals = fdrBh(pVals);

Neighbourhood enrichment

Multi-cluster: labels are categorical (cluster id 0..nClusters). Both calls return the full k × k matrix.

  • computeNhoodEnrichment(cells, labels, neighbors, nClusters, options?): { k, count, zscore } | undefined — permutation z-score matrix (squidpy parity on the serial path). count and zscore are row-major Float64Arrays of length k * k; entry a * k + b is the (source=a, target=b) cell. Cells whose null variance collapsed hold NaN in zscore. For the legacy 2-cluster scalar value, read entry 0 * k + 1 and apply Φ on the JS side. Accepts ComputeOptions.
  • computeAnalyticalNhoodEnrichment(cells, labels, neighbors, nClusters): { k, count, zscore } | undefined — same shape, closed-form (no permutations).

Interaction matrix

  • computeInteractionMatrix(cells, labels, neighbors, nClusters): Float64Array | undefined — observed k × k directed-edge count matrix (matches sq.gr.interaction_matrix(..., normalized=False)). Row-major; entry a * k + b counts edges from a label-a source to a label-b target.

Co-occurrence

Both calls take nClusters and return the full k × k (single-bin scalar) or (k, k, nBins) (curve) matrix.

  • computeCoOccurrence(cells, labels, neighbors, nClusters): Float64Array | undefined — row-major k × k single-bin enrichment matrix (every neighbour edge counted, no distance binning). Entry a * k + b is the enrichment of cluster a around cluster b.
  • computeCoOccurrenceCurve(cells, labels, neighbors, nClusters, nSteps): { k, interval, occ } | undefined — graph-based enrichment curves over nSteps - 1 cumulative distance bins. interval is linspace(minEdgeDist, maxEdgeDist, nSteps); occ is a flat row-major (k, k, nSteps - 1) array indexed (a * k + b) * nBins + r, where bin r carries the cumulative threshold interval[r + 1]. The last bin equals computeCoOccurrence entry-for-entry. Reproduces squidpy.gr.co_occurrence only for the radius strategy; KNN / Delaunay give a graph-restricted curve.

Centrality scores

  • computeCentralityScores(cells, labels, neighbors, nClusters): { k, degree_centrality, average_clustering, closeness_centrality } | undefined — per-cluster graph-centrality summary (matches sq.gr.centrality_scores). Each field is a Float64Array of length k; entries for empty clusters are NaN. degree_centrality and closeness_centrality are NetworkX group centralities (fraction of non-cluster nodes adjacent to the cluster / reciprocal mean distance from non-cluster nodes to the cluster); average_clustering is the mean per-cell clustering coefficient over the cluster's cells. The adjacency is symmetrized first so KNN's directed asymmetry doesn't bias the result.

Ligand-receptor

  • computeLigrec(expression, nCells, nGenes, clusterLabels, nClusters, interactions, threshold, options?): { n_lr, k, means, pvalues } | undefined — ligand-receptor permutation test matching squidpy.gr.ligrec (CellPhoneDB convention). For each LR pair (l, r) and each cluster pair (a, b), the observed score is (mu[a, l] + mu[b, r]) / 2, where mu[c, g] is the per- cluster mean of gene g; per-cluster gene means at or below threshold are zeroed before averaging. The p-value is the fraction of cluster-label permutations whose permuted score is >= observed.

    Inputs:

    • expression: Float64Array — row-major (nCells, nGenes).
    • clusterLabels: Uint8Array — cluster id per cell, in 0..nClusters.
    • interactions: Uint32Array — flat [src0, tgt0, src1, tgt1, …] of LR gene indices (src = ligand, tgt = receptor); length must be a multiple of 2.
    • threshold: number — gene-mean cutoff (squidpy default 0).
    • optionsComputeOptions. The kernel only aggregates the unique genes referenced by any LR pair, so a wide expression matrix is cheap as long as the pair list touches a modest gene subset.

    Output: both means and pvalues are row-major (nLr, k, k) flat Float64Arrays; entry lr * k * k + a * k + b is the score / p-value for LR pair lr with source cluster a and target cluster b. Reproduces sq.gr.ligrec to f64 epsilon on observed means; permutation p-values agree statistically (different RNG streams) but match exactly when the cluster-gene affinity is sharp enough to drive most outcomes to 0 or 1.

    const interactions = Uint32Array.from([0, 1, 1, 2, 2, 0]); // 3 LR pairs
    const lig = computeLigrec(
      expression, nCells, nGenes, clusterLabels, nClusters,
      interactions, 0.0, { permutations: 1000, seed: 42 },
    );
    // lig.means / lig.pvalues are row-major (n_lr=3, k, k) Float64Arrays.

Sepal

Spatially variable gene ranking by simulated diffusion (squidpy.gr.sepal). For each gene column, the per-cell expression is treated as a concentration on a regular grid; the kernel iterates a discrete Laplacian diffusion step (5-point square stencil for maxNeighs = 4, 7-point hex for maxNeighs = 6) until the per-step entropy delta on the saturated cells drops to or below thresh. The score is dt · iterations_to_converge — the diffusion time needed to wash out the spatial structure. Higher score = more spatial structure.

  • computeSepal(cells, expression, nCells, nGenes, neighbors, maxNeighs, options?): { n_genes, scores } | undefinedexpression is row-major (nCells, nGenes) Float64Array. maxNeighs is 4 (square / ST / Dbit-seq) or 6 (hex / Visium); the supplied neighbour graph must have max degree exactly equal to it, otherwise undefined (matches squidpy's ValueError). Returns one score per gene; genes that didn't converge within nIter iterations hold NaN.

    type SepalOptions = {
      nIter?: number;   // default 30000 (squidpy)
      dt?: number;      // default 0.001 (squidpy)
      thresh?: number;  // default 1e-8 (squidpy)
    };

Ripley

All four accept the RipleyOptions trailing object: seed controls the Monte-Carlo sampling, strategy (+ its per-strategy knobs) selects the K/L pair-counter ("exact" | "quad" | "fft" | "p3m").

  • computeRipley(cells, labels, mode, nSteps, options?): { bins, background, phenotype } | undefined — per-cluster p-value curves; mode "F" | "G" | "L", two-tailed (squidpy gr.ripley).
  • computeRipleySmprofiler(cells, labels, mode, nSteps, options?): { bins, background, phenotype } | undefined — same shape, one-tailed (smprofiler ripley_custom).
  • computeRipleyFSmprofiler(cells, labels, options?): number | undefined — smprofiler F scalar summary.
  • computeRipleyTextbook(cells, labels, mode, nSteps, options?): { support, statistic } | undefined — observed statistic curve over the phenotype point set; mode "K" | "L" | "F" | "G" | "J" (pysal/pointpats-faithful). seed affects the "F" and "J" modes only.

Niche detection

Spatial-niche assignment matching the four flavours of squidpy.gr.calculate_niche. Each call builds a symmetric binary spatial adjacency from the supplied cells + neighbors (the same way every other compute* does), feeds it through the flavour-specific pre-clustering pipeline, and emits per-cell cluster ids. The pipelines are pure Rust — no scanpy / igraph / leidenalg / sklearn dependency. Cluster ids 4294967295 (u32::MAX) mark not_a_niche (cells excluded by mask / minNicheSize).

Outputs come in two shapes:

  • NicheLeidenResult { nResolutions, niches } — one Leiden run per resolution. niches is a flat Uint32Array of length nResolutions · nCells (concatenated row-major). Used by computeNicheNeighborhood and computeNicheUtag.
  • NicheResult { niches } — one cluster id per cell. Used by computeNicheCellcharter and computeNicheSpatialleiden.

All four take an optional seed (default 42) replacing squidpy's random_state.

  • computeNicheNeighborhood(cells, labels, neighbors, groups, nGroups, options?): NicheLeidenResultflavor='neighborhood'. Per-cell category-frequency profile (optionally z-scored, optionally absolute counts, optionally summed over n-hop adjacency with per-hop weights), wrapped in a UMAP fuzzy KNN graph in feature space, then Leiden-clustered once per resolution. groups[i] ∈ [0, nGroups) is the per-cell categorical label; cells with out-of-range labels are masked out as not_a_niche.

    type NicheNeighborhoodOptions = {
      seed?: number | bigint;
      resolutions?: number[];   // default [0.5]
      nNeighbors?: number;      // sc.pp.neighbors n_neighbors. default 15
      scale?: boolean;          // z-score the profile. default true
      absNhood?: boolean;       // raw counts vs relative freqs. default false
      distance?: number;        // n-hop horizon. default 1
      nHopWeights?: number[];   // per-hop weights when distance > 1
      minNicheSize?: number;    // clusters smaller than this → not_a_niche
      mask?: boolean[];         // false cells excluded from clustering
    };
  • computeNicheUtag(cells, labels, neighbors, x, nFeatures, options?): NicheLeidenResultflavor='utag'. Row-L1-normalises the spatial adjacency, multiplies by the expression matrix (x is row-major (nCells, nFeatures)), PCA-reduces the resulting smoothed feature matrix, builds a fuzzy KNN graph in PCA space, Leiden once per resolution.

    type NicheUtagOptions = {
      seed?: number | bigint;
      resolutions?: number[];   // default [0.5]
      nNeighbors?: number;      // default 15
    };
  • computeNicheCellcharter(cells, labels, neighbors, x, nFeatures, options?): NicheResultflavor='cellcharter'. For k = 0..distance, builds normalize(adj^k) @ X (mean or variance aggregation matching squidpy's _aggregate), concatenates the blocks along the feature dimension, PCA-reduces, then clusters with a Gaussian Mixture Model into nComponents niches. The per-hop adjacency uses the squidpy _hop "first-visit" semantics (each (i, j) appears in exactly one hop). Optionally accepts a pre-computed embedding via useRep (matches squidpy's use_rep knob — e.g. a scVI embedding) which bypasses the PCA step.

    type NicheCellcharterOptions = {
      seed?: number | bigint;
      distance?: number;        // n-hop horizon. default 3
      aggregation?: "mean" | "variance";  // default "mean"
      nComponents?: number;     // GMM components. default 10
      useRep?: Float64Array;    // (nCells, nRepFeatures) row-major
      nRepFeatures?: number;
    };
  • computeNicheSpatialleiden(cells, labels, neighbors, latentRows, latentCols, latentVals, options?): NicheResultflavor='spatialleiden'. Multiplex Leiden over two layers: the spatial connectivity (built from cells/neighbors like the other flavours) and a latent connectivity matrix supplied as COO triplets (latentRows, latentCols, latentVals). The latent layer is typically the output of sc.pp.neighbors on a feature embedding; pass obsp['connectivities'].tocoo() in JS / TS via Uint32Array row/col + Float64Array vals. Both layers contribute to a sum of RB-configuration modularities; the spatial layer's weight is scaled by layerRatio (squidpy semantics: higher → spatially homogeneous).

    type NicheSpatialleidenOptions = {
      seed?: number | bigint;
      latentResolution?: number;   // default 1.0
      spatialResolution?: number;  // default 1.0
      layerRatio?: number;         // spatial weight scale. default 1.0
      useWeights?: [boolean, boolean];  // [latent, spatial]. default [true, true]
    };
    const niches = computeNicheSpatialleiden(
      cells, labels, neighbors,
      latentRows, latentCols, latentVals,
      { latentResolution: 0.8, spatialResolution: 0.8, layerRatio: 1.0, seed: 42 },
    );
    // niches.niches is a Uint32Array of length nCells.

Implementation notes: Leiden / GMM / fuzzy KNN are stochastic, so bit-perfect parity with squidpy is impossible — instead the compare/niche_parity.py harness validates partition agreement via ARI / NMI / V-measure with paired Wilcoxon tests across a difficulty-tiered synthetic suite. On the easy tier (well-separated regions), utag and spatialleiden recover the truth perfectly on both sides; neighborhood and cellcharter agree at ARI ≥ 0.85. On the hard tier (noise=2.0, 15 % phenotype mis-calls), all four flavours hit ARI(Rust, squidpy) > 0 at p < 10⁻⁴ and Rust's cellcharter / spatialleiden recover the truth at statistically higher ARI than squidpy's sklearn-GMM / spatialleiden-package paths.

Graph neural network models

Operate over many specimens at once: allBuffers concatenates per-specimen cell buffers, indexed by offsets; labels (Int32Array) holds the per-specimen outcome. train* returns a flat model vector you pass back to the matching compute* to score one specimen.

  • trainCgGnnModel(allBuffers: Uint8Array, offsets: Uint32Array, labels: Int32Array, neighbors): Float64Array / computeCgGnn(cells, model: Float64Array, labels, neighbors): number | undefined.
  • trainGraphTransformerModel(allBuffers: Uint8Array, offsets: Uint32Array, labels: Int32Array, neighbors): Float64Array / computeGraphTransformer(cells, model: Float64Array, labels, neighbors): number | undefined.

License

Apache License 2.0 with the Commons Clause restriction. See LICENSE for the full text.