@nadeemlab/sm-rust

v0.0.1

Published

22 days ago

Spatial multiomics primitives (neighbour graphs, Moran's I, nhood enrichment, Ripley, co-occurrence) compiled to WebAssembly.

0High
0Medium
0Low

sanadeem

franciscouzo

wasm webassembly spatial multiomics bioinformatics rust

sm-rust

Spatial multiomics primitives compiled to WebAssembly. Exposes neighbour-graph construction, Moran's I, Geary's C, neighbourhood enrichment, interaction matrix, Ripley's K/L/F/G/J, co-occurrence, per-cluster centrality scores, ligand-receptor permutation tests, sepal diffusion scoring, niche detection (four squidpy.gr.calculate_niche flavours — neighborhood, utag, cellcharter, spatialleiden), nearest-neighbour-distance statistics, and Benjamini–Hochberg FDR correction for use from JavaScript / TypeScript. Result shapes mirror esda / squidpy field-for-field — every analytic call returns the full closed-form result (statistic, both variance flavours, signed z-scores, one- and two-sided p-values); every permutation call returns the full perm-null result (I/C-space null moments, signed z-score, z-based + rank-based pseudo p-values).

Installation

npm install @nadeemlab/sm-rust

Usage

The package ships three builds. Most consumers don't need to choose explicitly — import "@nadeemlab/sm-rust" resolves to the right one based on your environment.

Bundlers (Webpack, Vite, Rollup, Next.js, etc.)

import { computeMoranI, computeNhoodEnrichment } from "@nadeemlab/sm-rust";

const result = computeMoranI(/* ... */);

Your bundler will fetch the .wasm file as an asset automatically.

Browser (no bundler)

<script type="module">
  import init, { computeMoranI } from "@nadeemlab/sm-rust/web";

  await init();
  const result = computeMoranI(/* ... */);
</script>

The init() call is required and returns a promise once the wasm module has been instantiated.

Node.js

import { computeMoranI } from "@nadeemlab/sm-rust/node";

const result = computeMoranI(/* ... */);

The Node build is CommonJS internally but works with both require and import syntax (Node ≥ 18).

Tuning the permutation tests

computeMoranI, computeGearyC, and computeNhoodEnrichment run a label- shuffle permutation null (matching esda.Moran(...) / esda.Geary(...) / sq.gr.nhood_enrichment on the serial path). They take an optional trailing { permutations?, seed?, threads? } options object:

computeMoranI(cells, values, neighbors, { permutations: 2000, seed: 42, threads: 0 });

permutations — number of label shuffles. Defaults to 1000; values below 2 return undefined (a variance can't be estimated).
seed — RNG seed for the label shuffles. Omit to use the squidpy/esda default stream (the serial path then reproduces their reference results bit-for-bit); set it for a different but reproducible shuffle sequence. A number covers seeds up to 2^53; pass a bigint for the full u64 range.
threads — thread budget. Omit or 1 = single-threaded, 0 = all logical cores, n = exactly n threads. Honoured only by the native Node build (the optional napi binary); the wasm builds are always single-threaded and ignore it.

Ripley options

The Ripley functions (computeRipley, computeRipleySmprofiler, computeRipleyFSmprofiler, computeRipleyTextbook) take an optional trailing { seed?, strategy?, quadMult?, cellsPerBin?, kExact?, nObservations?, nSimulations? } object:

seed — RNG seed for the Monte-Carlo point sampling. Same semantics as the seed field in ComputeOptions. Used by computeRipleyFSmprofiler, all modes of computeRipley / computeRipleySmprofiler, and the F/J modes of computeRipleyTextbook.
strategy — pair-counter used for the K/L statistic (no effect for F/G):
- "exact" (default) — dual-tree, integer counts.
- "quad" — d²-binned dual-tree, drops the per-pair sqrt for a ~30 % faster hot loop at <0.1 % drift on the resampled curve.
- "fft" — grid + autocorrelation, sub-quadratic in N. Small-radius bins are approximate.
- "p3m" — hybrid (exact short-range + FFT long-range).
quadMult — internal-bin oversampling for "quad" (default 64; sub-0.1 % resample drift at no measurable runtime cost). Ignored otherwise.
cellsPerBin — grid resolution (cells per support bin, clamped to ≥ 2) for "fft" / "p3m". Default 4.
kExact — for "p3m", the first kExact bins come from the exact dual-tree. Default 5.
nObservations — number of CSR query points (F mode) and simulated points per simulation. Default 1000 (squidpy n_observations). Used by computeRipleyFSmprofiler and all modes of computeRipley / computeRipleySmprofiler; ignored by computeRipleyTextbook.
nSimulations — number of Monte-Carlo simulations forming the null. Default 100 (squidpy n_simulations). Same scope as nObservations.

computeRipley(cells, labels, "L", 50, { seed: 42 });
computeRipleyFSmprofiler(cells, labels, { seed: 7n }); // bigint for full u64
computeRipley(cells, labels, "F", 50, { nObservations: 5000, nSimulations: 500 });
computeRipleyTextbook(cells, labels, "K", 100, { strategy: "quad" });
computeRipleyTextbook(cells, labels, "L", 100, { strategy: "fft", cellsPerBin: 4 });

Omit it (or the whole object) for the default exact dual-tree.

Example: random points

Build cells straight from coordinate arrays with cellsFromCoords (no binary buffer needed), label them, then run a few statistics. (Random coordinates and labels have no spatial structure, so expect non-significant p-values.)

import {
  cellsFromCoords,
  Neighbors,
  countCells,
  computeAnalyticalMoranI,
  computeMoranI,
  computeGearyC,
  computeNhoodEnrichment,
  computeInteractionMatrix,
  computeRipley,
} from "@nadeemlab/sm-rust";

// 1. Random cells — plain f64 (x, y) coordinate arrays, no encoding step.
const N = 2000;
const xs = Float64Array.from({ length: N }, () => Math.random() * 1000);
const ys = Float64Array.from({ length: N }, () => Math.random() * 1000);
const cells = cellsFromCoords(xs, ys);

// 2. One binary label per cell (1 = "in phenotype"). Here ~30% at random.
const labels = Uint8Array.from({ length: N }, () => (Math.random() < 0.3 ? 1 : 0));

// Moran's I / Geary's C take a continuous attribute (Float64Array) — a 0/1
// phenotype is just the special case, so reuse the labels here.
const values = Float64Array.from(labels);

// 3. Pick a neighbour graph and run the statistics.
const neighbors = Neighbors.knn(6); // or Neighbors.radius(50), Neighbors.delaunay()

const { count, percentage } = countCells(cells, labels);

// Analytical: every p-value flavour from one closed-form call.
const moranA = computeAnalyticalMoranI(cells, values, neighbors);
// → { i, e_i, var_norm, var_rand, z_norm, z_rand,
//     p_norm, p_rand, p_norm_two_sided, p_rand_two_sided } | undefined

// Permutation: every perm-null flavour from one shuffled call.
const moranP = computeMoranI(cells, values, neighbors, { seed: 42 });
// → { i, perm_mean, perm_var, z_sim, p_z_sim, p_sim,
//     p_z_sim_two_sided, p_sim_two_sided } | undefined

// Geary's C has the same shape (`c`/`e_c` instead of `i`/`e_i`).
const gearyP = computeGearyC(cells, values, neighbors, { permutations: 2000 });

// Multi-cluster neighbourhood enrichment + interaction matrix. Pass
// `nClusters`; labels must be in `0..nClusters`. Both return row-major k×k
// flat arrays.
const nClusters = 2;
const nhood = computeNhoodEnrichment(cells, labels, neighbors, nClusters, { seed: 42 });
// → { k, count: Float64Array, zscore: Float64Array } | undefined
const interaction = computeInteractionMatrix(cells, labels, neighbors, nClusters);
// → Float64Array (length k * k) | undefined

const ripleyL = computeRipley(cells, labels, "L", 50, { seed: 42 });
// → { bins, background, phenotype } | undefined

console.log({ count, percentage, moranA, moranP, gearyP, nhood, interaction });
console.log("Ripley L support points:", ripleyL?.bins.length);

// Cells and Neighbors hold wasm memory — free them when done (or use `using`).
cells.free();
neighbors.free();

Building from source

The build is dockerized so you don't need a local Rust toolchain.

docker compose run --rm wasm-pack

This produces pkg/ containing all three targets ready for npm publish.

API reference

Full type declarations live in pkg/bundler/sm_rust.d.ts after a build. Functions typed number | undefined return undefined for a degenerate specimen (too few cells, an empty phenotype, an unknown mode, etc.).

Input handles

Get a Cells handle one of two ways:

cellsFromCoords(xs: Float64Array, ys: Float64Array): Cells — build directly from (x, y) coordinate arrays (f64). The simplest entry point: pair it with a labels: Uint8Array you build yourself (one byte per cell).
parseCells(data: Uint8Array): Cells — parse smprofiler's internal binary cell buffer. Carries a 64-bit phenotype mask per cell that the labelCells* helpers turn into labels. Layout (big-endian, masks little-endian):
- header (20 B): u32 count, u32 minX, u32 maxX, u32 minY, u32 maxY
- per cell (20 B): u32 id, u32 x, u32 y, u64 mask

Cells is an opaque handle — call .free() (or use using) to release its wasm memory.

Neighbors — neighbour-graph strategy, built with one of:

Neighbors.knn(k) — k nearest neighbours (excludes self; throws if k == 0).
Neighbors.radius(r) — all cells within Euclidean distance r.
Neighbors.delaunay() — 2D Delaunay edges; co-located cells are orphaned (scipy/squidpy default).
Neighbors.delaunayShareCoplanar() — Delaunay, but co-located cells share neighbours.

Labelling

These derive a labels array from the phenotype mask carried by a parseCells buffer — masks are 64-bit (bigint), and a cell matches when it has all the positive bits and none of the negative bits. (Cells made with cellsFromCoords have no mask, so build labels yourself instead.)

labelCellsBinary(cells, positiveMask, negativeMask): Uint8Array — 0/1 label per cell (1 = matches). Consumed by the autocorrelation, enrichment, Ripley, co-occurrence and count functions.
labelCellsTwoGroup(cells, posA, negA, posB, negB): Uint8Array — 2-bit label per cell: bit 0 = in group A, bit 1 = in group B (a cell in both → 3). Consumed by computeProximityBinary and the NN-distance functions.

Counts & distances

countCells(cells, labels): { count, percentage } — number and percent (0–100) of cells with label != 0.
computeMeanNNDistance(cells, labels): number / computeMedianNNDistance(cells, labels): number — A→B nearest-neighbour distance summary (two-group labels).
computeProximityBinary(cells, labels, neighbors): number — fraction of group-A cells with at least one group-B neighbour (two-group labels).

Spatial autocorrelation

All take a continuous per-cell attribute as values: Float64Array (a 0/1 phenotype is the special case — Float64Array.from(labels)). Each statistic has an analytic call (microseconds, closed-form) and a permutation call (seconds, full Fisher-Yates null) — pick one based on whether the attribute is well-approximated by a normal.

Analytical (esda.Moran(..., permutations=0) / esda.Geary(...)) — returns the full closed-form result. The normal-variance approximation is mis-calibrated for sparse / non-normal attributes (e.g. a 0/1 phenotype indicator) — prefer the permutation versions there.

computeAnalyticalMoranI(cells, values, neighbors): MoranAnalytic | undefined

computeAnalyticalGearyC(cells, values, neighbors): GearyAnalytic | undefined

type MoranAnalytic = {
  i: number;                   // observed I (matches esda.Moran.I)
  e_i: number;                 // -1 / (n - 1)
  var_norm: number;            // normality-assumption variance
  var_rand: number;            // randomization-assumption variance
                               // (Cliff–Ord 1981, kurtosis-corrected)
  z_norm: number; z_rand: number;
  p_norm: number; p_rand: number;                 // one-sided
  p_norm_two_sided: number; p_rand_two_sided: number;
};
// GearyAnalytic is the same shape with `c` / `e_c` (always 1.0) in place
// of `i` / `e_i`.

Permutation (esda.Moran(...) / esda.Geary(...) matching the serial path) — returns the full perm-null result. Accepts the ComputeOptions object.

computeMoranI(cells, values, neighbors, options?): MoranPermutation | undefined

computeGearyC(cells, values, neighbors, options?): GearyPermutation | undefined

type MoranPermutation = {
  i: number;                   // observed I
  perm_mean: number; perm_var: number;  // in I-space
  z_sim: number;               // signed z-score (matches esda.Moran.z_sim)
  p_z_sim: number; p_sim: number;        // one-sided; p_sim is rank-based
  p_z_sim_two_sided: number; p_sim_two_sided: number;
};

Raw statistics (no null model) — the I / C value itself (matching esda.Moran(...).I / esda.Geary(...).C), without z-scoring or a p-value:

computeMoranIStatistic(cells, values, neighbors): number | undefined
computeGearyCStatistic(cells, values, neighbors): number | undefined

Batch (many attributes, one graph) — builds the row-standardised adjacency once and evaluates every attribute against it (parallel across cores in the Node build). The fast path for scoring a whole expression matrix: passing nAttrs columns is much cheaper than that many single-attribute calls. Pair this with fdrBh for the "compute N tests, then BH-correct" workflow.

computeMoranIStatistics(cells, attrs: Float64Array, nAttrs: number, neighbors): (number | null)[]
computeGearyCStatistics(cells, attrs: Float64Array, nAttrs: number, neighbors): (number | null)[]
attrs is a flat row-major buffer of length nAttrs · nCells; attribute k occupies attrs[k·nCells .. (k+1)·nCells]. Returns one value per attribute (null for a constant or degenerate column). If the buffer length doesn't match nAttrs · nCells, every entry is null.
```
const flat = new Float64Array(nAttrs * nCells);
// ... fill flat[k * nCells + i] = value of attribute k at cell i
const moranPerAttr = computeMoranIStatistics(cells, flat, nAttrs, neighbors);
```
computeAnalyticalMoranIBatch(cells, attrs: Float64Array, nAttrs: number, neighbors): (MoranAnalytic | null)[]
computeAnalyticalGearyCBatch(cells, attrs: Float64Array, nAttrs: number, neighbors): (GearyAnalytic | null)[]
Closed-form full result per attribute — the squidpy gr.spatial_autocorr(mode="moran" | "geary") workflow across a whole expression matrix. Same flat-row-major layout as the statistic-only batch; pair the resulting p_norm column with fdrBh for the ranked, FDR-corrected table.
```
const rows = computeAnalyticalMoranIBatch(cells, flat, nGenes, neighbors);
const pNorm = rows.map(r => r?.p_norm ?? null);
const qNorm = fdrBh(pNorm);
// Sort genes by rows[k].i desc for the squidpy-style ranked table.
```

Multiple-testing correction

fdrBh(pValues: (number | null)[]): (number | null)[] — Benjamini– Hochberg FDR correction. null entries pass through (treated as missing, dropped from the rank denominator). Equivalent to statsmodels.stats.multitest.multipletests(p, method='fdr_bh')[1].
```
const pVals = computeMoranIStatistics(cells, flat, nAttrs, neighbors)
  // ... convert observed I to a p-value flavour per the analytic struct ...
const qVals = fdrBh(pVals);
```

Neighbourhood enrichment

Multi-cluster: labels are categorical (cluster id 0..nClusters). Both calls return the full k × k matrix.

computeNhoodEnrichment(cells, labels, neighbors, nClusters, options?): { k, count, zscore } | undefined — permutation z-score matrix (squidpy parity on the serial path). count and zscore are row-major Float64Arrays of length k * k; entry a * k + b is the (source=a, target=b) cell. Cells whose null variance collapsed hold NaN in zscore. For the legacy 2-cluster scalar value, read entry 0 * k + 1 and apply Φ on the JS side. Accepts ComputeOptions.
computeAnalyticalNhoodEnrichment(cells, labels, neighbors, nClusters): { k, count, zscore } | undefined — same shape, closed-form (no permutations).

Interaction matrix

computeInteractionMatrix(cells, labels, neighbors, nClusters): Float64Array | undefined — observed k × k directed-edge count matrix (matches sq.gr.interaction_matrix(..., normalized=False)). Row-major; entry a * k + b counts edges from a label-a source to a label-b target.

Co-occurrence

Both calls take nClusters and return the full k × k (single-bin scalar) or (k, k, nBins) (curve) matrix.

computeCoOccurrence(cells, labels, neighbors, nClusters): Float64Array | undefined — row-major k × k single-bin enrichment matrix (every neighbour edge counted, no distance binning). Entry a * k + b is the enrichment of cluster a around cluster b.
computeCoOccurrenceCurve(cells, labels, neighbors, nClusters, nSteps): { k, interval, occ } | undefined — graph-based enrichment curves over nSteps - 1 cumulative distance bins. interval is linspace(minEdgeDist, maxEdgeDist, nSteps); occ is a flat row-major (k, k, nSteps - 1) array indexed (a * k + b) * nBins + r, where bin r carries the cumulative threshold interval[r + 1]. The last bin equals computeCoOccurrence entry-for-entry. Reproduces squidpy.gr.co_occurrence only for the radius strategy; KNN / Delaunay give a graph-restricted curve.

Centrality scores

computeCentralityScores(cells, labels, neighbors, nClusters): { k, degree_centrality, average_clustering, closeness_centrality } | undefined — per-cluster graph-centrality summary (matches sq.gr.centrality_scores). Each field is a Float64Array of length k; entries for empty clusters are NaN. degree_centrality and closeness_centrality are NetworkX group centralities (fraction of non-cluster nodes adjacent to the cluster / reciprocal mean distance from non-cluster nodes to the cluster); average_clustering is the mean per-cell clustering coefficient over the cluster's cells. The adjacency is symmetrized first so KNN's directed asymmetry doesn't bias the result.

Ligand-receptor

computeLigrec(expression, nCells, nGenes, clusterLabels, nClusters, interactions, threshold, options?): { n_lr, k, means, pvalues } | undefined — ligand-receptor permutation test matching squidpy.gr.ligrec (CellPhoneDB convention). For each LR pair (l, r) and each cluster pair (a, b), the observed score is (mu[a, l] + mu[b, r]) / 2, where mu[c, g] is the per- cluster mean of gene g; per-cluster gene means at or below threshold are zeroed before averaging. The p-value is the fraction of cluster-label permutations whose permuted score is >= observed.
Inputs:
- expression: Float64Array — row-major (nCells, nGenes).
- clusterLabels: Uint8Array — cluster id per cell, in 0..nClusters.
- interactions: Uint32Array — flat [src0, tgt0, src1, tgt1, …] of LR gene indices (src = ligand, tgt = receptor); length must be a multiple of 2.
- threshold: number — gene-mean cutoff (squidpy default 0).
- options — ComputeOptions. The kernel only aggregates the unique genes referenced by any LR pair, so a wide expression matrix is cheap as long as the pair list touches a modest gene subset.
Output: both means and pvalues are row-major (nLr, k, k) flat Float64Arrays; entry lr * k * k + a * k + b is the score / p-value for LR pair lr with source cluster a and target cluster b. Reproduces sq.gr.ligrec to f64 epsilon on observed means; permutation p-values agree statistically (different RNG streams) but match exactly when the cluster-gene affinity is sharp enough to drive most outcomes to 0 or 1.
```
const interactions = Uint32Array.from([0, 1, 1, 2, 2, 0]); // 3 LR pairs
const lig = computeLigrec(
  expression, nCells, nGenes, clusterLabels, nClusters,
  interactions, 0.0, { permutations: 1000, seed: 42 },
);
// lig.means / lig.pvalues are row-major (n_lr=3, k, k) Float64Arrays.
```

Sepal

Spatially variable gene ranking by simulated diffusion (squidpy.gr.sepal). For each gene column, the per-cell expression is treated as a concentration on a regular grid; the kernel iterates a discrete Laplacian diffusion step (5-point square stencil for maxNeighs = 4, 7-point hex for maxNeighs = 6) until the per-step entropy delta on the saturated cells drops to or below thresh. The score is dt · iterations_to_converge — the diffusion time needed to wash out the spatial structure. Higher score = more spatial structure.

computeSepal(cells, expression, nCells, nGenes, neighbors, maxNeighs, options?): { n_genes, scores } | undefined — expression is row-major (nCells, nGenes) Float64Array. maxNeighs is 4 (square / ST / Dbit-seq) or 6 (hex / Visium); the supplied neighbour graph must have max degree exactly equal to it, otherwise undefined (matches squidpy's ValueError). Returns one score per gene; genes that didn't converge within nIter iterations hold NaN.
```
type SepalOptions = {
  nIter?: number;   // default 30000 (squidpy)
  dt?: number;      // default 0.001 (squidpy)
  thresh?: number;  // default 1e-8 (squidpy)
};
```

Ripley

All four accept the RipleyOptions trailing object: seed controls the Monte-Carlo sampling, strategy (+ its per-strategy knobs) selects the K/L pair-counter ("exact" | "quad" | "fft" | "p3m").

computeRipley(cells, labels, mode, nSteps, options?): { bins, background, phenotype } | undefined — per-cluster p-value curves; mode "F" | "G" | "L", two-tailed (squidpy gr.ripley).
computeRipleySmprofiler(cells, labels, mode, nSteps, options?): { bins, background, phenotype } | undefined — same shape, one-tailed (smprofiler ripley_custom).
computeRipleyFSmprofiler(cells, labels, options?): number | undefined — smprofiler F scalar summary.
computeRipleyTextbook(cells, labels, mode, nSteps, options?): { support, statistic } | undefined — observed statistic curve over the phenotype point set; mode "K" | "L" | "F" | "G" | "J" (pysal/pointpats-faithful). seed affects the "F" and "J" modes only.

Niche detection

Spatial-niche assignment matching the four flavours of squidpy.gr.calculate_niche. Each call builds a symmetric binary spatial adjacency from the supplied cells + neighbors (the same way every other compute* does), feeds it through the flavour-specific pre-clustering pipeline, and emits per-cell cluster ids. The pipelines are pure Rust — no scanpy / igraph / leidenalg / sklearn dependency. Cluster ids 4294967295 (u32::MAX) mark not_a_niche (cells excluded by mask / minNicheSize).

Outputs come in two shapes:

NicheLeidenResult { nResolutions, niches } — one Leiden run per resolution. niches is a flat Uint32Array of length nResolutions · nCells (concatenated row-major). Used by computeNicheNeighborhood and computeNicheUtag.
NicheResult { niches } — one cluster id per cell. Used by computeNicheCellcharter and computeNicheSpatialleiden.

All four take an optional seed (default 42) replacing squidpy's random_state.

computeNicheNeighborhood(cells, labels, neighbors, groups, nGroups, options?): NicheLeidenResult — flavor='neighborhood'. Per-cell category-frequency profile (optionally z-scored, optionally absolute counts, optionally summed over n-hop adjacency with per-hop weights), wrapped in a UMAP fuzzy KNN graph in feature space, then Leiden-clustered once per resolution. groups[i] ∈ [0, nGroups) is the per-cell categorical label; cells with out-of-range labels are masked out as not_a_niche.

type NicheNeighborhoodOptions = {
  seed?: number | bigint;
  resolutions?: number[];   // default [0.5]
  nNeighbors?: number;      // sc.pp.neighbors n_neighbors. default 15
  scale?: boolean;          // z-score the profile. default true
  absNhood?: boolean;       // raw counts vs relative freqs. default false
  distance?: number;        // n-hop horizon. default 1
  nHopWeights?: number[];   // per-hop weights when distance > 1
  minNicheSize?: number;    // clusters smaller than this → not_a_niche
  mask?: boolean[];         // false cells excluded from clustering
};

computeNicheUtag(cells, labels, neighbors, x, nFeatures, options?): NicheLeidenResult — flavor='utag'. Row-L1-normalises the spatial adjacency, multiplies by the expression matrix (x is row-major (nCells, nFeatures)), PCA-reduces the resulting smoothed feature matrix, builds a fuzzy KNN graph in PCA space, Leiden once per resolution.
```
type NicheUtagOptions = {
  seed?: number | bigint;
  resolutions?: number[];   // default [0.5]
  nNeighbors?: number;      // default 15
};
```
computeNicheCellcharter(cells, labels, neighbors, x, nFeatures, options?): NicheResult — flavor='cellcharter'. For k = 0..distance, builds normalize(adj^k) @ X (mean or variance aggregation matching squidpy's _aggregate), concatenates the blocks along the feature dimension, PCA-reduces, then clusters with a Gaussian Mixture Model into nComponents niches. The per-hop adjacency uses the squidpy _hop "first-visit" semantics (each (i, j) appears in exactly one hop). Optionally accepts a pre-computed embedding via useRep (matches squidpy's use_rep knob — e.g. a scVI embedding) which bypasses the PCA step.
```
type NicheCellcharterOptions = {
  seed?: number | bigint;
  distance?: number;        // n-hop horizon. default 3
  aggregation?: "mean" | "variance";  // default "mean"
  nComponents?: number;     // GMM components. default 10
  useRep?: Float64Array;    // (nCells, nRepFeatures) row-major
  nRepFeatures?: number;
};
```
computeNicheSpatialleiden(cells, labels, neighbors, latentRows, latentCols, latentVals, options?): NicheResult — flavor='spatialleiden'. Multiplex Leiden over two layers: the spatial connectivity (built from cells/neighbors like the other flavours) and a latent connectivity matrix supplied as COO triplets (latentRows, latentCols, latentVals). The latent layer is typically the output of sc.pp.neighbors on a feature embedding; pass obsp['connectivities'].tocoo() in JS / TS via Uint32Array row/col + Float64Array vals. Both layers contribute to a sum of RB-configuration modularities; the spatial layer's weight is scaled by layerRatio (squidpy semantics: higher → spatially homogeneous).
```
type NicheSpatialleidenOptions = {
  seed?: number | bigint;
  latentResolution?: number;   // default 1.0
  spatialResolution?: number;  // default 1.0
  layerRatio?: number;         // spatial weight scale. default 1.0
  useWeights?: [boolean, boolean];  // [latent, spatial]. default [true, true]
};
```
```
const niches = computeNicheSpatialleiden(
  cells, labels, neighbors,
  latentRows, latentCols, latentVals,
  { latentResolution: 0.8, spatialResolution: 0.8, layerRatio: 1.0, seed: 42 },
);
// niches.niches is a Uint32Array of length nCells.
```

Implementation notes: Leiden / GMM / fuzzy KNN are stochastic, so bit-perfect parity with squidpy is impossible — instead the compare/niche_parity.py harness validates partition agreement via ARI / NMI / V-measure with paired Wilcoxon tests across a difficulty-tiered synthetic suite. On the easy tier (well-separated regions), utag and spatialleiden recover the truth perfectly on both sides; neighborhood and cellcharter agree at ARI ≥ 0.85. On the hard tier (noise=2.0, 15 % phenotype mis-calls), all four flavours hit ARI(Rust, squidpy) > 0 at p < 10⁻⁴ and Rust's cellcharter / spatialleiden recover the truth at statistically higher ARI than squidpy's sklearn-GMM / spatialleiden-package paths.

Graph neural network models

Operate over many specimens at once: allBuffers concatenates per-specimen cell buffers, indexed by offsets; labels (Int32Array) holds the per-specimen outcome. train* returns a flat model vector you pass back to the matching compute* to score one specimen.

trainCgGnnModel(allBuffers: Uint8Array, offsets: Uint32Array, labels: Int32Array, neighbors): Float64Array / computeCgGnn(cells, model: Float64Array, labels, neighbors): number | undefined.
trainGraphTransformerModel(allBuffers: Uint8Array, offsets: Uint32Array, labels: Int32Array, neighbors): Float64Array / computeGraphTransformer(cells, model: Float64Array, labels, neighbors): number | undefined.

License

Apache License 2.0 with the Commons Clause restriction. See LICENSE for the full text.