@nadeemlab/sm-rust
v0.0.1
Published
Spatial multiomics primitives (neighbour graphs, Moran's I, nhood enrichment, Ripley, co-occurrence) compiled to WebAssembly.
Readme
sm-rust
Spatial multiomics primitives compiled to WebAssembly. Exposes neighbour-graph
construction, Moran's I, Geary's C, neighbourhood enrichment, interaction
matrix, Ripley's K/L/F/G/J, co-occurrence, per-cluster centrality scores,
ligand-receptor permutation tests, sepal diffusion scoring, niche detection
(four squidpy.gr.calculate_niche flavours — neighborhood, utag,
cellcharter, spatialleiden), nearest-neighbour-distance statistics, and
Benjamini–Hochberg FDR correction for use from JavaScript / TypeScript. Result shapes mirror
esda / squidpy
field-for-field — every analytic call returns the full closed-form result
(statistic, both variance flavours, signed z-scores, one- and two-sided
p-values); every permutation call returns the full perm-null result
(I/C-space null moments, signed z-score, z-based + rank-based pseudo
p-values).
Installation
npm install @nadeemlab/sm-rustUsage
The package ships three builds. Most consumers don't need to choose explicitly
— import "@nadeemlab/sm-rust" resolves to the right one based on your environment.
Bundlers (Webpack, Vite, Rollup, Next.js, etc.)
import { computeMoranI, computeNhoodEnrichment } from "@nadeemlab/sm-rust";
const result = computeMoranI(/* ... */);Your bundler will fetch the .wasm file as an asset automatically.
Browser (no bundler)
<script type="module">
import init, { computeMoranI } from "@nadeemlab/sm-rust/web";
await init();
const result = computeMoranI(/* ... */);
</script>The init() call is required and returns a promise once the wasm module has
been instantiated.
Node.js
import { computeMoranI } from "@nadeemlab/sm-rust/node";
const result = computeMoranI(/* ... */);The Node build is CommonJS internally but works with both require and
import syntax (Node ≥ 18).
Tuning the permutation tests
computeMoranI, computeGearyC, and computeNhoodEnrichment run a label-
shuffle permutation null (matching esda.Moran(...) / esda.Geary(...) /
sq.gr.nhood_enrichment on the serial path). They take an optional trailing
{ permutations?, seed?, threads? } options object:
computeMoranI(cells, values, neighbors, { permutations: 2000, seed: 42, threads: 0 });permutations— number of label shuffles. Defaults to1000; values below2returnundefined(a variance can't be estimated).seed— RNG seed for the label shuffles. Omit to use the squidpy/esda default stream (the serial path then reproduces their reference results bit-for-bit); set it for a different but reproducible shuffle sequence. Anumbercovers seeds up to2^53; pass abigintfor the fullu64range.threads— thread budget. Omit or1= single-threaded,0= all logical cores,n= exactlynthreads. Honoured only by the native Node build (the optional napi binary); the wasm builds are always single-threaded and ignore it.
Ripley options
The Ripley functions (computeRipley, computeRipleySmprofiler,
computeRipleyFSmprofiler, computeRipleyTextbook) take an optional trailing
{ seed?, strategy?, quadMult?, cellsPerBin?, kExact?, nObservations?, nSimulations? }
object:
seed— RNG seed for the Monte-Carlo point sampling. Same semantics as theseedfield inComputeOptions. Used bycomputeRipleyFSmprofiler, all modes ofcomputeRipley/computeRipleySmprofiler, and the F/J modes ofcomputeRipleyTextbook.strategy— pair-counter used for the K/L statistic (no effect for F/G):"exact"(default) — dual-tree, integer counts."quad"— d²-binned dual-tree, drops the per-pair sqrt for a ~30 % faster hot loop at <0.1 % drift on the resampled curve."fft"— grid + autocorrelation, sub-quadratic in N. Small-radius bins are approximate."p3m"— hybrid (exact short-range + FFT long-range).
quadMult— internal-bin oversampling for"quad"(default 64; sub-0.1 % resample drift at no measurable runtime cost). Ignored otherwise.cellsPerBin— grid resolution (cells per support bin, clamped to ≥ 2) for"fft"/"p3m". Default 4.kExact— for"p3m", the firstkExactbins come from the exact dual-tree. Default 5.nObservations— number of CSR query points (F mode) and simulated points per simulation. Default1000(squidpyn_observations). Used bycomputeRipleyFSmprofilerand all modes ofcomputeRipley/computeRipleySmprofiler; ignored bycomputeRipleyTextbook.nSimulations— number of Monte-Carlo simulations forming the null. Default100(squidpyn_simulations). Same scope asnObservations.
computeRipley(cells, labels, "L", 50, { seed: 42 });
computeRipleyFSmprofiler(cells, labels, { seed: 7n }); // bigint for full u64
computeRipley(cells, labels, "F", 50, { nObservations: 5000, nSimulations: 500 });
computeRipleyTextbook(cells, labels, "K", 100, { strategy: "quad" });
computeRipleyTextbook(cells, labels, "L", 100, { strategy: "fft", cellsPerBin: 4 });Omit it (or the whole object) for the default exact dual-tree.
Example: random points
Build cells straight from coordinate arrays with cellsFromCoords (no binary
buffer needed), label them, then run a few statistics. (Random coordinates and
labels have no spatial structure, so expect non-significant p-values.)
import {
cellsFromCoords,
Neighbors,
countCells,
computeAnalyticalMoranI,
computeMoranI,
computeGearyC,
computeNhoodEnrichment,
computeInteractionMatrix,
computeRipley,
} from "@nadeemlab/sm-rust";
// 1. Random cells — plain f64 (x, y) coordinate arrays, no encoding step.
const N = 2000;
const xs = Float64Array.from({ length: N }, () => Math.random() * 1000);
const ys = Float64Array.from({ length: N }, () => Math.random() * 1000);
const cells = cellsFromCoords(xs, ys);
// 2. One binary label per cell (1 = "in phenotype"). Here ~30% at random.
const labels = Uint8Array.from({ length: N }, () => (Math.random() < 0.3 ? 1 : 0));
// Moran's I / Geary's C take a continuous attribute (Float64Array) — a 0/1
// phenotype is just the special case, so reuse the labels here.
const values = Float64Array.from(labels);
// 3. Pick a neighbour graph and run the statistics.
const neighbors = Neighbors.knn(6); // or Neighbors.radius(50), Neighbors.delaunay()
const { count, percentage } = countCells(cells, labels);
// Analytical: every p-value flavour from one closed-form call.
const moranA = computeAnalyticalMoranI(cells, values, neighbors);
// → { i, e_i, var_norm, var_rand, z_norm, z_rand,
// p_norm, p_rand, p_norm_two_sided, p_rand_two_sided } | undefined
// Permutation: every perm-null flavour from one shuffled call.
const moranP = computeMoranI(cells, values, neighbors, { seed: 42 });
// → { i, perm_mean, perm_var, z_sim, p_z_sim, p_sim,
// p_z_sim_two_sided, p_sim_two_sided } | undefined
// Geary's C has the same shape (`c`/`e_c` instead of `i`/`e_i`).
const gearyP = computeGearyC(cells, values, neighbors, { permutations: 2000 });
// Multi-cluster neighbourhood enrichment + interaction matrix. Pass
// `nClusters`; labels must be in `0..nClusters`. Both return row-major k×k
// flat arrays.
const nClusters = 2;
const nhood = computeNhoodEnrichment(cells, labels, neighbors, nClusters, { seed: 42 });
// → { k, count: Float64Array, zscore: Float64Array } | undefined
const interaction = computeInteractionMatrix(cells, labels, neighbors, nClusters);
// → Float64Array (length k * k) | undefined
const ripleyL = computeRipley(cells, labels, "L", 50, { seed: 42 });
// → { bins, background, phenotype } | undefined
console.log({ count, percentage, moranA, moranP, gearyP, nhood, interaction });
console.log("Ripley L support points:", ripleyL?.bins.length);
// Cells and Neighbors hold wasm memory — free them when done (or use `using`).
cells.free();
neighbors.free();Building from source
The build is dockerized so you don't need a local Rust toolchain.
docker compose run --rm wasm-packThis produces pkg/ containing all three targets ready for npm publish.
API reference
Full type declarations live in
pkg/bundler/sm_rust.d.ts after a build. Functions
typed number | undefined return undefined for a degenerate specimen (too few
cells, an empty phenotype, an unknown mode, etc.).
Input handles
Get a Cells handle one of two ways:
cellsFromCoords(xs: Float64Array, ys: Float64Array): Cells— build directly from(x, y)coordinate arrays (f64). The simplest entry point: pair it with alabels: Uint8Arrayyou build yourself (one byte per cell).parseCells(data: Uint8Array): Cells— parse smprofiler's internal binary cell buffer. Carries a 64-bit phenotype mask per cell that thelabelCells*helpers turn into labels. Layout (big-endian, masks little-endian):- header (20 B):
u32 count, u32 minX, u32 maxX, u32 minY, u32 maxY - per cell (20 B):
u32 id, u32 x, u32 y, u64 mask
- header (20 B):
Cells is an opaque handle — call .free() (or use using) to release its
wasm memory.
Neighbors — neighbour-graph strategy, built with one of:
Neighbors.knn(k)—knearest neighbours (excludes self; throws ifk == 0).Neighbors.radius(r)— all cells within Euclidean distancer.Neighbors.delaunay()— 2D Delaunay edges; co-located cells are orphaned (scipy/squidpy default).Neighbors.delaunayShareCoplanar()— Delaunay, but co-located cells share neighbours.
Labelling
These derive a labels array from the phenotype mask carried by a
parseCells buffer — masks are 64-bit (bigint), and a cell matches when it has
all the positive bits and none of the negative bits. (Cells made with
cellsFromCoords have no mask, so build labels yourself instead.)
labelCellsBinary(cells, positiveMask, negativeMask): Uint8Array— 0/1 label per cell (1= matches). Consumed by the autocorrelation, enrichment, Ripley, co-occurrence and count functions.labelCellsTwoGroup(cells, posA, negA, posB, negB): Uint8Array— 2-bit label per cell: bit 0 = in group A, bit 1 = in group B (a cell in both →3). Consumed bycomputeProximityBinaryand the NN-distance functions.
Counts & distances
countCells(cells, labels): { count, percentage }— number and percent (0–100) of cells withlabel != 0.computeMeanNNDistance(cells, labels): number/computeMedianNNDistance(cells, labels): number— A→B nearest-neighbour distance summary (two-group labels).computeProximityBinary(cells, labels, neighbors): number— fraction of group-A cells with at least one group-B neighbour (two-group labels).
Spatial autocorrelation
All take a continuous per-cell attribute as values: Float64Array (a 0/1
phenotype is the special case — Float64Array.from(labels)). Each statistic
has an analytic call (microseconds, closed-form) and a permutation call
(seconds, full Fisher-Yates null) — pick one based on whether the attribute
is well-approximated by a normal.
Analytical (esda.Moran(..., permutations=0) / esda.Geary(...)) —
returns the full closed-form result. The normal-variance approximation is
mis-calibrated for sparse / non-normal attributes (e.g. a 0/1 phenotype
indicator) — prefer the permutation versions there.
computeAnalyticalMoranI(cells, values, neighbors): MoranAnalytic | undefinedcomputeAnalyticalGearyC(cells, values, neighbors): GearyAnalytic | undefinedtype MoranAnalytic = { i: number; // observed I (matches esda.Moran.I) e_i: number; // -1 / (n - 1) var_norm: number; // normality-assumption variance var_rand: number; // randomization-assumption variance // (Cliff–Ord 1981, kurtosis-corrected) z_norm: number; z_rand: number; p_norm: number; p_rand: number; // one-sided p_norm_two_sided: number; p_rand_two_sided: number; }; // GearyAnalytic is the same shape with `c` / `e_c` (always 1.0) in place // of `i` / `e_i`.
Permutation (esda.Moran(...) / esda.Geary(...) matching the serial
path) — returns the full perm-null result. Accepts the
ComputeOptions object.
computeMoranI(cells, values, neighbors, options?): MoranPermutation | undefinedcomputeGearyC(cells, values, neighbors, options?): GearyPermutation | undefinedtype MoranPermutation = { i: number; // observed I perm_mean: number; perm_var: number; // in I-space z_sim: number; // signed z-score (matches esda.Moran.z_sim) p_z_sim: number; p_sim: number; // one-sided; p_sim is rank-based p_z_sim_two_sided: number; p_sim_two_sided: number; };
Raw statistics (no null model) — the I / C value itself (matching
esda.Moran(...).I / esda.Geary(...).C), without z-scoring or a p-value:
computeMoranIStatistic(cells, values, neighbors): number | undefinedcomputeGearyCStatistic(cells, values, neighbors): number | undefined
Batch (many attributes, one graph) — builds the row-standardised adjacency
once and evaluates every attribute against it (parallel across cores in the
Node build). The fast path for scoring a whole expression matrix: passing
nAttrs columns is much cheaper than that many single-attribute calls. Pair
this with fdrBh for the "compute N tests,
then BH-correct" workflow.
computeMoranIStatistics(cells, attrs: Float64Array, nAttrs: number, neighbors): (number | null)[]computeGearyCStatistics(cells, attrs: Float64Array, nAttrs: number, neighbors): (number | null)[]attrsis a flat row-major buffer of lengthnAttrs · nCells; attributekoccupiesattrs[k·nCells .. (k+1)·nCells]. Returns one value per attribute (nullfor a constant or degenerate column). If the buffer length doesn't matchnAttrs · nCells, every entry isnull.const flat = new Float64Array(nAttrs * nCells); // ... fill flat[k * nCells + i] = value of attribute k at cell i const moranPerAttr = computeMoranIStatistics(cells, flat, nAttrs, neighbors);computeAnalyticalMoranIBatch(cells, attrs: Float64Array, nAttrs: number, neighbors): (MoranAnalytic | null)[]computeAnalyticalGearyCBatch(cells, attrs: Float64Array, nAttrs: number, neighbors): (GearyAnalytic | null)[]Closed-form full result per attribute — the squidpy
gr.spatial_autocorr(mode="moran" | "geary")workflow across a whole expression matrix. Same flat-row-major layout as the statistic-only batch; pair the resultingp_normcolumn withfdrBhfor the ranked, FDR-corrected table.const rows = computeAnalyticalMoranIBatch(cells, flat, nGenes, neighbors); const pNorm = rows.map(r => r?.p_norm ?? null); const qNorm = fdrBh(pNorm); // Sort genes by rows[k].i desc for the squidpy-style ranked table.
Multiple-testing correction
fdrBh(pValues: (number | null)[]): (number | null)[]— Benjamini– Hochberg FDR correction.nullentries pass through (treated as missing, dropped from the rank denominator). Equivalent tostatsmodels.stats.multitest.multipletests(p, method='fdr_bh')[1].const pVals = computeMoranIStatistics(cells, flat, nAttrs, neighbors) // ... convert observed I to a p-value flavour per the analytic struct ... const qVals = fdrBh(pVals);
Neighbourhood enrichment
Multi-cluster: labels are categorical (cluster id 0..nClusters). Both
calls return the full k × k matrix.
computeNhoodEnrichment(cells, labels, neighbors, nClusters, options?): { k, count, zscore } | undefined— permutation z-score matrix (squidpy parity on the serial path).countandzscoreare row-majorFloat64Arrays of lengthk * k; entrya * k + bis the(source=a, target=b)cell. Cells whose null variance collapsed holdNaNinzscore. For the legacy 2-cluster scalar value, read entry0 * k + 1and applyΦon the JS side. AcceptsComputeOptions.computeAnalyticalNhoodEnrichment(cells, labels, neighbors, nClusters): { k, count, zscore } | undefined— same shape, closed-form (no permutations).
Interaction matrix
computeInteractionMatrix(cells, labels, neighbors, nClusters): Float64Array | undefined— observedk × kdirected-edge count matrix (matchessq.gr.interaction_matrix(..., normalized=False)). Row-major; entrya * k + bcounts edges from a label-asource to a label-btarget.
Co-occurrence
Both calls take nClusters and return the full k × k (single-bin scalar)
or (k, k, nBins) (curve) matrix.
computeCoOccurrence(cells, labels, neighbors, nClusters): Float64Array | undefined— row-majork × ksingle-bin enrichment matrix (every neighbour edge counted, no distance binning). Entrya * k + bis the enrichment of clusteraaround clusterb.computeCoOccurrenceCurve(cells, labels, neighbors, nClusters, nSteps): { k, interval, occ } | undefined— graph-based enrichment curves overnSteps - 1cumulative distance bins.intervalislinspace(minEdgeDist, maxEdgeDist, nSteps);occis a flat row-major(k, k, nSteps - 1)array indexed(a * k + b) * nBins + r, where binrcarries the cumulative thresholdinterval[r + 1]. The last bin equalscomputeCoOccurrenceentry-for-entry. Reproducessquidpy.gr.co_occurrenceonly for the radius strategy; KNN / Delaunay give a graph-restricted curve.
Centrality scores
computeCentralityScores(cells, labels, neighbors, nClusters): { k, degree_centrality, average_clustering, closeness_centrality } | undefined— per-cluster graph-centrality summary (matchessq.gr.centrality_scores). Each field is aFloat64Arrayof lengthk; entries for empty clusters areNaN.degree_centralityandcloseness_centralityare NetworkX group centralities (fraction of non-cluster nodes adjacent to the cluster / reciprocal mean distance from non-cluster nodes to the cluster);average_clusteringis the mean per-cell clustering coefficient over the cluster's cells. The adjacency is symmetrized first so KNN's directed asymmetry doesn't bias the result.
Ligand-receptor
computeLigrec(expression, nCells, nGenes, clusterLabels, nClusters, interactions, threshold, options?): { n_lr, k, means, pvalues } | undefined— ligand-receptor permutation test matchingsquidpy.gr.ligrec(CellPhoneDB convention). For each LR pair(l, r)and each cluster pair(a, b), the observed score is(mu[a, l] + mu[b, r]) / 2, wheremu[c, g]is the per- cluster mean of geneg; per-cluster gene means at or belowthresholdare zeroed before averaging. The p-value is the fraction of cluster-label permutations whose permuted score is>= observed.Inputs:
expression: Float64Array— row-major(nCells, nGenes).clusterLabels: Uint8Array— cluster id per cell, in0..nClusters.interactions: Uint32Array— flat[src0, tgt0, src1, tgt1, …]of LR gene indices (src= ligand,tgt= receptor); length must be a multiple of 2.threshold: number— gene-mean cutoff (squidpy default0).options—ComputeOptions. The kernel only aggregates the unique genes referenced by any LR pair, so a wide expression matrix is cheap as long as the pair list touches a modest gene subset.
Output: both
meansandpvaluesare row-major(nLr, k, k)flatFloat64Arrays; entrylr * k * k + a * k + bis the score / p-value for LR pairlrwith source clusteraand target clusterb. Reproducessq.gr.ligrecto f64 epsilon on observed means; permutation p-values agree statistically (different RNG streams) but match exactly when the cluster-gene affinity is sharp enough to drive most outcomes to 0 or 1.const interactions = Uint32Array.from([0, 1, 1, 2, 2, 0]); // 3 LR pairs const lig = computeLigrec( expression, nCells, nGenes, clusterLabels, nClusters, interactions, 0.0, { permutations: 1000, seed: 42 }, ); // lig.means / lig.pvalues are row-major (n_lr=3, k, k) Float64Arrays.
Sepal
Spatially variable gene ranking by simulated diffusion
(squidpy.gr.sepal). For each gene column, the per-cell expression is treated
as a concentration on a regular grid; the kernel iterates a discrete Laplacian
diffusion step (5-point square stencil for maxNeighs = 4, 7-point hex for
maxNeighs = 6) until the per-step entropy delta on the saturated cells drops
to or below thresh. The score is dt · iterations_to_converge — the
diffusion time needed to wash out the spatial structure. Higher score = more
spatial structure.
computeSepal(cells, expression, nCells, nGenes, neighbors, maxNeighs, options?): { n_genes, scores } | undefined—expressionis row-major(nCells, nGenes)Float64Array.maxNeighsis4(square / ST / Dbit-seq) or6(hex / Visium); the supplied neighbour graph must have max degree exactly equal to it, otherwiseundefined(matches squidpy'sValueError). Returns one score per gene; genes that didn't converge withinnIteriterations holdNaN.type SepalOptions = { nIter?: number; // default 30000 (squidpy) dt?: number; // default 0.001 (squidpy) thresh?: number; // default 1e-8 (squidpy) };
Ripley
All four accept the RipleyOptions trailing object: seed
controls the Monte-Carlo sampling, strategy (+ its per-strategy knobs)
selects the K/L pair-counter ("exact" | "quad" | "fft" | "p3m").
computeRipley(cells, labels, mode, nSteps, options?): { bins, background, phenotype } | undefined— per-cluster p-value curves;mode"F" | "G" | "L", two-tailed (squidpygr.ripley).computeRipleySmprofiler(cells, labels, mode, nSteps, options?): { bins, background, phenotype } | undefined— same shape, one-tailed (smprofilerripley_custom).computeRipleyFSmprofiler(cells, labels, options?): number | undefined— smprofiler F scalar summary.computeRipleyTextbook(cells, labels, mode, nSteps, options?): { support, statistic } | undefined— observed statistic curve over the phenotype point set;mode"K" | "L" | "F" | "G" | "J"(pysal/pointpats-faithful).seedaffects the"F"and"J"modes only.
Niche detection
Spatial-niche assignment matching the four flavours of
squidpy.gr.calculate_niche. Each call builds a symmetric binary spatial
adjacency from the supplied cells + neighbors (the same way every other
compute* does), feeds it through the flavour-specific pre-clustering pipeline,
and emits per-cell cluster ids. The pipelines are pure Rust — no scanpy /
igraph / leidenalg / sklearn dependency. Cluster ids 4294967295 (u32::MAX)
mark not_a_niche (cells excluded by mask / minNicheSize).
Outputs come in two shapes:
NicheLeidenResult { nResolutions, niches }— one Leiden run per resolution.nichesis a flatUint32Arrayof lengthnResolutions · nCells(concatenated row-major). Used bycomputeNicheNeighborhoodandcomputeNicheUtag.NicheResult { niches }— one cluster id per cell. Used bycomputeNicheCellcharterandcomputeNicheSpatialleiden.
All four take an optional seed (default 42) replacing squidpy's
random_state.
computeNicheNeighborhood(cells, labels, neighbors, groups, nGroups, options?): NicheLeidenResult—flavor='neighborhood'. Per-cell category-frequency profile (optionally z-scored, optionally absolute counts, optionally summed over n-hop adjacency with per-hop weights), wrapped in a UMAP fuzzy KNN graph in feature space, then Leiden-clustered once per resolution.groups[i] ∈ [0, nGroups)is the per-cell categorical label; cells with out-of-range labels are masked out asnot_a_niche.type NicheNeighborhoodOptions = { seed?: number | bigint; resolutions?: number[]; // default [0.5] nNeighbors?: number; // sc.pp.neighbors n_neighbors. default 15 scale?: boolean; // z-score the profile. default true absNhood?: boolean; // raw counts vs relative freqs. default false distance?: number; // n-hop horizon. default 1 nHopWeights?: number[]; // per-hop weights when distance > 1 minNicheSize?: number; // clusters smaller than this → not_a_niche mask?: boolean[]; // false cells excluded from clustering };computeNicheUtag(cells, labels, neighbors, x, nFeatures, options?): NicheLeidenResult—flavor='utag'. Row-L1-normalises the spatial adjacency, multiplies by the expression matrix (xis row-major(nCells, nFeatures)), PCA-reduces the resulting smoothed feature matrix, builds a fuzzy KNN graph in PCA space, Leiden once per resolution.type NicheUtagOptions = { seed?: number | bigint; resolutions?: number[]; // default [0.5] nNeighbors?: number; // default 15 };computeNicheCellcharter(cells, labels, neighbors, x, nFeatures, options?): NicheResult—flavor='cellcharter'. Fork = 0..distance, buildsnormalize(adj^k) @ X(mean or variance aggregation matching squidpy's_aggregate), concatenates the blocks along the feature dimension, PCA-reduces, then clusters with a Gaussian Mixture Model intonComponentsniches. The per-hop adjacency uses the squidpy_hop"first-visit" semantics (each(i, j)appears in exactly one hop). Optionally accepts a pre-computed embedding viauseRep(matches squidpy'suse_repknob — e.g. a scVI embedding) which bypasses the PCA step.type NicheCellcharterOptions = { seed?: number | bigint; distance?: number; // n-hop horizon. default 3 aggregation?: "mean" | "variance"; // default "mean" nComponents?: number; // GMM components. default 10 useRep?: Float64Array; // (nCells, nRepFeatures) row-major nRepFeatures?: number; };computeNicheSpatialleiden(cells, labels, neighbors, latentRows, latentCols, latentVals, options?): NicheResult—flavor='spatialleiden'. Multiplex Leiden over two layers: the spatial connectivity (built fromcells/neighborslike the other flavours) and a latent connectivity matrix supplied as COO triplets(latentRows, latentCols, latentVals). The latent layer is typically the output ofsc.pp.neighborson a feature embedding; passobsp['connectivities'].tocoo()in JS / TS viaUint32Arrayrow/col +Float64Arrayvals. Both layers contribute to a sum of RB-configuration modularities; the spatial layer's weight is scaled bylayerRatio(squidpy semantics: higher → spatially homogeneous).type NicheSpatialleidenOptions = { seed?: number | bigint; latentResolution?: number; // default 1.0 spatialResolution?: number; // default 1.0 layerRatio?: number; // spatial weight scale. default 1.0 useWeights?: [boolean, boolean]; // [latent, spatial]. default [true, true] };const niches = computeNicheSpatialleiden( cells, labels, neighbors, latentRows, latentCols, latentVals, { latentResolution: 0.8, spatialResolution: 0.8, layerRatio: 1.0, seed: 42 }, ); // niches.niches is a Uint32Array of length nCells.
Implementation notes: Leiden / GMM / fuzzy KNN are stochastic, so bit-perfect
parity with squidpy is impossible — instead the
compare/niche_parity.py harness validates
partition agreement via ARI / NMI / V-measure with paired Wilcoxon tests
across a difficulty-tiered synthetic suite. On the easy tier (well-separated
regions), utag and spatialleiden recover the truth perfectly on both
sides; neighborhood and cellcharter agree at ARI ≥ 0.85. On the hard
tier (noise=2.0, 15 % phenotype mis-calls), all four flavours hit
ARI(Rust, squidpy) > 0 at p < 10⁻⁴ and Rust's cellcharter /
spatialleiden recover the truth at statistically higher ARI than
squidpy's sklearn-GMM / spatialleiden-package paths.
Graph neural network models
Operate over many specimens at once: allBuffers concatenates per-specimen cell
buffers, indexed by offsets; labels (Int32Array) holds the per-specimen
outcome. train* returns a flat model vector you pass back to the matching
compute* to score one specimen.
trainCgGnnModel(allBuffers: Uint8Array, offsets: Uint32Array, labels: Int32Array, neighbors): Float64Array/computeCgGnn(cells, model: Float64Array, labels, neighbors): number | undefined.trainGraphTransformerModel(allBuffers: Uint8Array, offsets: Uint32Array, labels: Int32Array, neighbors): Float64Array/computeGraphTransformer(cells, model: Float64Array, labels, neighbors): number | undefined.
License
Apache License 2.0 with the Commons Clause restriction. See LICENSE for the full text.
