@c4a/graph-analytics
v0.4.15-alpha.9
Published
> 中文文档见 [README_CN.md](./README_CN.md)
Readme
@c4a/graph-analytics
中文文档见 README_CN.md
Pure TypeScript graph analytics algorithms for knowledge graph analysis. Zero native dependencies, backend-agnostic, runs anywhere Node.js or Bun runs.
This package provides the algorithmic foundation for C4A's post-modeling graph enrichment pipeline and advanced query capabilities. It serves as the backend-agnostic pure TS layer — when the graph database backend (Neo4j, NebulaGraph, DuckPGQ) has a native implementation, the adapter calls it directly; otherwise, the adapter exports the subgraph to an in-memory AdjacencyMap and delegates to this package.
Installation
bun add @c4a/graph-analyticsQuick Start
import { leiden, pagerank, personalizedPageRank } from "@c4a/graph-analytics";
import type { AdjacencyMap } from "@c4a/graph-analytics";
// Build a weighted adjacency map
const graph: AdjacencyMap = new Map();
graph.set("A", [{ target: "B", weight: 1 }, { target: "C", weight: 1 }]);
graph.set("B", [{ target: "A", weight: 1 }, { target: "C", weight: 1 }]);
graph.set("C", [{ target: "A", weight: 1 }, { target: "B", weight: 1 }]);
graph.set("D", [{ target: "E", weight: 1 }, { target: "F", weight: 1 }]);
graph.set("E", [{ target: "D", weight: 1 }, { target: "F", weight: 1 }]);
graph.set("F", [{ target: "D", weight: 1 }, { target: "E", weight: 1 }]);
// Community detection
const communities = leiden(graph, { resolution: 0.5, randomSeed: 42 });
// → [{ entityId: "A", communityId: "0", level: 0 }, ...]
// Global importance ranking
const ranks = pagerank(graph, { damping: 0.85 });
// → [{ entityId: "A", rank: 0.166 }, ...]
// Local relevance from seed nodes
const pprRanks = personalizedPageRank(graph, ["A"], { alpha: 0.85 });
// → [{ entityId: "A", rank: 0.52 }, { entityId: "B", rank: 0.19 }, ...]Algorithms
Leiden Community Detection
Identifies densely connected groups of nodes using the Constant Potts Model (CPM). Guarantees that all discovered communities are internally connected — a key improvement over the classic Louvain algorithm.
leiden(graph: AdjacencyMap, options?: LeidenOptions): CommunityResult[]| Option | Type | Default | Description |
|--------|------|---------|-------------|
| resolution | number | 1 | Controls community granularity. Lower → fewer, larger communities |
| randomness | number | 1e-2 | Randomness in refinement phase |
| nIterations | number | 0 | Fixed iteration count. 0 = iterate until convergence |
| randomSeed | number | undefined | Seed for deterministic results |
Implementation: Three-phase iterative cycle (fast local moving → probabilistic refinement → network aggregation) with recursive coarsening. Ported from the CWTS networkanalysis-ts reference implementation with optimizations for V8 JIT (plain number[] over TypedArrays, CSR sparse representation).
PageRank
Computes global node importance scores via power iteration with dangling node handling.
pagerank(graph: AdjacencyMap, options?: PageRankOptions): RankResult[]| Option | Type | Default | Description |
|--------|------|---------|-------------|
| damping | number | 0.85 | Probability of following an edge vs random teleport |
| maxIterations | number | 100 | Maximum power iterations |
| epsilon | number | 1e-6 | Convergence threshold (L1 norm) |
Personalized PageRank (PPR)
Computes node relevance scores biased toward a set of seed nodes. Enables multi-hop semantic diffusion — starting from query-relevant entities, PPR naturally spreads through CORRESPONDS, IMPLEMENTS, and DEPENDS_ON edges to discover related nodes at arbitrary distance.
personalizedPageRank(
graph: AdjacencyMap,
seedIds: string[],
options?: PPROptions,
): RankResult[]| Option | Type | Default | Description |
|--------|------|---------|-------------|
| alpha | number | 0.85 | Probability of following an edge vs teleporting back to seeds |
| maxIterations | number | 100 | Maximum power iterations |
| epsilon | number | 1e-6 | Convergence threshold (L1 norm) |
If no seed nodes are found in the graph, falls back to uniform teleport (equivalent to standard PageRank).
Data Types
type WeightedEdge = { target: string; weight: number };
type AdjacencyMap = Map<string, WeightedEdge[]>;
type CommunityResult = { entityId: string; communityId: string; level: number };
type RankResult = { entityId: string; rank: number };All algorithms accept a Map<string, WeightedEdge[]> — a straightforward adjacency list with weighted edges. Node IDs are arbitrary strings (entity IDs, file paths, etc.).
Usage in C4A
| Consumer | Algorithm | Purpose |
|----------|-----------|---------|
| Post-Promote pipeline | PageRank | Entity importance ranking → entity.metadata.rank |
| Post-Promote pipeline | Leiden | Global community detection → entity.metadata.community_id |
| Post-Promote pipeline | (Leiden output) | LLM community report generation → Document(subtype: community_report) |
| Advanced query (PPR traversal) | PPR | Multi-hop semantic expansion from query seed entities |
| Advanced query (Global Search) | (Leiden + PageRank) | Map-Reduce community report aggregation |
| Advanced query (Token budget) | (PageRank) | Entity ranking for context window filling |
| Graph adapter fallback | All | NebulaGraph / DuckPGQ backends without native algorithm support |
Roadmap
| Feature | Status | Version | Notes |
|---------|--------|---------|-------|
| Leiden (flat, CPM) | ✅ Implemented | 0.4.12 | Ported from networkanalysis-ts |
| PageRank (power iteration) | ✅ Implemented | 0.4.12 | Weighted, with dangling node handling |
| Personalized PageRank | ✅ Implemented | 0.4.12 | Seed-biased power iteration |
| Hierarchical Leiden | 📋 Planned | 0.4.14+ | Multi-level community hierarchy (run flat Leiden recursively on reduced networks) |
| Cycle detection (BFS) | 📋 Planned | 0.4.14+ | DEPENDS_ON circular dependency detection for cross-validation |
| Graph pruning utilities | 📋 Planned | 0.4.14+ | Isolate removal, super-node filtering, weak-edge percentile cutoff |
| Leiden quality score (CPM) | 📋 Planned | 0.4.20+ | Report partition quality metric for monitoring |
Benchmark vs Reference Implementation
Comparison against the original networkanalysis-ts on Zachary's Karate Club (34 nodes, 78 edges). Both run with resolution=0.1, nIterations=1, 50 seeds, 20 warmup rounds.
| Metric | @c4a/graph-analytics | networkanalysis-ts | |--------|---------------------:|-------------------:| | Avg time | 0.097 ms | 0.116 ms | | Avg CPM quality | 42.94 | 40.41 | | Quality range | 39.70 ~ 43.10 | 32.40 ~ 40.80 | | Avg communities | 4.7 | 2.0 | | Speed ratio | 1.2x faster | baseline | | Quality diff | +6.27% | baseline |
| Dimension | Score | |-----------|------:| | Algorithm correctness | 100/100 | | Determinism & stability | 100/100 | | Speed (vs original) | 100/100 | | Code conciseness (~800 LOC vs ~3,900) | 95/100 | | Zero dependencies | 100/100 | | Overall | 99/100 |
Key differences:
- 1.2x faster — plain
number[]over TypedArrays for better V8 JIT optimization - Zero dependencies — no
java-randomrequired; uses built-in XorShift32 PRNG - ~5x less code — ~800 LOC vs ~3,900 LOC in the original
- +6% better partition quality — finds more granular communities with higher CPM scores
Reproduce: bun run --filter @c4a/graph-analytics bench
Testing
bun testTest suite includes:
- Edge cases: empty graphs, single nodes, disconnected components, self-loops
- Determinism: same seed → identical results
- Structural properties: community connectivity guarantee (Leiden), contiguous IDs, no empty communities
- Resolution sensitivity: lower resolution → fewer communities
- Cross-validation: structural behavior aligned with reference
networkanalysis-tsimplementation (Karate Club benchmark) - Convergence: both fixed-iteration and convergence-until-stable modes
References
- V.A. Traag, L. Waltman, N.J. van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9:5233, 2019. DOI: 10.1038/s41598-019-41695-z
- L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, 1999.
- T.H. Haveliwala. Topic-Sensitive PageRank. Proceedings of the 11th International Conference on World Wide Web, 2002.
Acknowledgments
The Leiden algorithm implementation in this package is ported from networkanalysis-ts by Nees Jan van Eck (CWTS, Leiden University), which is itself a TypeScript port of the Java networkanalysis library by the Centre for Science and Technology Studies (CWTS) at Leiden University. The original Java implementation was authored by Vincent Traag, Ludo Waltman, and Nees Jan van Eck.
We thank the CWTS team for publishing their high-quality reference implementation under the MIT license, which made this port possible. The CSR network representation, fast local moving algorithm, local merging refinement, and fastExp utility are directly derived from their work.
License
MIT — see the repository root LICENSE file.
