@cervid/data
v1.0.1
Published
**High-Performance Data Engine for Node.js**
Maintainers
Readme
Cervid Data
High-Performance Data Engine for Node.js
Process millions of rows in seconds — directly in Node.js.
The Bare-Metal Data Engine
Cervid/data is a high-performance, vectorized data engine designed to push Node.js beyond traditional limits. By leveraging SharedArrayBuffer and low-level Atomics, Cervid processes massive datasets with extremely low latency and a predictable memory footprint.
Unlike traditional libraries that rely on JavaScript objects, Cervid operates directly on raw memory.
Why Cervid?
- ⚡ Built for Node.js — no Python bindings, no native dependencies
- 🧠 Zero-Copy Architecture — no serialization, no GC pressure
- 🧵 True Multithreading — parallel workers using shared memory
- 📊 Vectorized Execution — optimized for CPU cache efficiency
- 📉 Predictable Memory Usage — no hidden overhead
Performance Beyond the Heap
Traditional Node.js data processing suffers from:
- Heavy object allocation
- Frequent garbage collection (GC)
- Poor CPU cache utilization
Cervid solves this by using TypedArrays for columnar storage, SharedArrayBuffer for zero-copy memory access, and Atomics for lock-free parallelism.
Benchmark: Cervid vs Polars vs Pandas
Dataset: 7.8M rows NYC Taxi Trip Data (~1.5GB CSV)
Environment: Node.js 22.x | Python 3.11 | 8 Workers | Local Machine
| Metric | Cervid | Polars (Rust) | Pandas (Python) | | :--- | :---: | :---: | :---: | | Total Execution Time | 2.79s | 5.98s | ~35.0s | | Ingestion Speed | 1.16s | 1.88s | ~18.5s | | Processing Throughput | ~2.8M rows/s | ~1.6M rows/s | ~0.2M rows/s | | Peak Memory Usage | ~1.2GB | ~2.5GB+ | ~6.0GB+ |
Cervid/data achieves high performance by staying close to the metal and avoiding cross-language overhead (FFI).
Installation
npm install @cervid/dataQuick Start
import { Cervid } from '@cervid/data';
async function main() {
const ds = await Cervid.read('./large_dataset.csv', { workers: 8 });
ds.with_columns([
{
name: 'profit_per_mile',
inputs: ['fare_amount', 'tip_amount', 'trip_distance'],
formula: (fare, tip, dist) => dist > 0 ? (fare + tip) / dist : 0
}
]);
const results = ds.groupByID('PULocationID', 'profit_per_mile');
console.log(results.slice(0, 5));
}
main().catch(console.error);Example Output
[
{ group: 84, avg: 2165.64 },
{ group: 132, avg: 1987.21 }
]Architecture Overview
Columnar Storage Engine
Data is stored in contiguous memory using TypedArrays, allowing fast sequential access and optimal CPU cache usage. This structure minimizes memory fragmentation and maximizes data locality.
Parallel Execution Engine
Workloads are split across persistent Workers, achieving near 100% CPU utilization. By avoiding the traditional "main thread bottleneck," Cervid can process millions of rows per second without blocking the event loop.
Zero-Copy Memory Model
All workers operate on shared memory via SharedArrayBuffer, eliminating data duplication and reducing memory pressure. This allows for seamless thread communication without the overhead of IPC serialization.
Key Features
- Native Parquet Engine: High-performance, zero-copy binary decoding. Cervid reads Parquet files directly into columnar memory without intermediate object conversion.
- Streaming Engine: Out-of-core processing architecture. Analyze datasets that exceed physical RAM limits by leveraging chunked ingestion and shared memory buffers.
- Columnar Storage: Data is stored in contiguous TypedArrays, optimizing CPU cache hits and minimizing memory fragmentation.
Roadmap
- Binary-Native Ingestion: Implementing direct mapping for binary formats (Parquet/Arrow) to eliminate string-parsing bottlenecks and achieve true zero-copy ingestion.
- Query Planner & Optimization Layer: Automated predicate pushdown and execution graph optimization to skip unnecessary data processing.
- SIMD / WASM Acceleration: Leveraging hardware-level vectorization via WebAssembly for ultra-fast mathematical operations on columnar data.
- Advanced Streaming: Enhancing out-of-core processing for multi-terabyte datasets that far exceed physical RAM.
License
MIT © 2026 Villager/Github
