npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

jude-map

v0.1.1

Published

_Named after St. Jude, patron of hopeless causes — like getting Node.js to beat Python at ML inference._

Readme

jude-map

Named after St. Jude, patron of hopeless causes — like getting Node.js to beat Python at ML inference.

A native Node.js addon providing mmap-backed shared tensor memory with seqlock consistency, zero-copy reads, and a libuv-integrated async wait mechanism for GPU-ready inference pipelines.


Why

Every high-throughput ML inference pipeline in Node.js eventually hits the same wall: moving tensor data between threads means either serializing through postMessage (structured clone, pays a copy tax on every frame) or fighting the V8 heap ceiling with SharedArrayBuffer (capped at 4GB, misaligned for CUDA DMA).

jude-map sidesteps both. It maps tensor memory directly via mmap/MapViewOfFile, outside the V8 heap, with a seqlock protecting consistency across writer and reader. For GPU workflows, pin() page-locks the mapping so CUDA can DMA directly from host memory without a staging copy. The data pointer is 256-byte aligned at DATA_OFFSET, satisfying cudaMemcpyAsync's alignment requirement out of the box.


Features

  • mmap-backed segments outside the V8 heap — no 4GB ceiling, no GC pressure
  • Seqlock consistency — writer-priority, reader-optimistic, zero kernel involvement in the fast path
  • Zero-copy reads via external ArrayBuffer backed directly by the mapped region
  • Safe copy reads for data that needs to outlive the next write
  • readWait() / readCopyWait() — spin N times, then park on uv_async; the writer's commit wakes all parked readers via a single coalesced uv_async_send
  • pin() / unpin() — page-locks the mapping for CUDA H2D zero-copy paths
  • Cross-platform — Linux, macOS, Windows (MSVC)
  • Full TypeScript API with DType enum, TensorResult interface, and declaration maps

Requirements

  • Node.js >= 18
  • npm >= 11.5.1
  • For building from source:
    • Linux/macOS: GCC or Clang with C++17 support, Python 3
    • Windows: Visual Studio 2019+ with "Desktop development with C++" workload, Python 3

Installation

npm install jude-map

ARM architectures compile locally on install. x64 uses prebuilt binaries when available — no toolchain required.


Quick Start

import { SharedTensorSegment, DType } from "jude-map";

const seg = new SharedTensorSegment(4 * 1024 * 1024); // 4 MB capacity

// Write a [2, 3] float32 tensor
const input = new Float32Array([1, 2, 3, 4, 5, 6]);
seg.write([2, 3], DType.FLOAT32, input);

// Zero-copy read — view directly into mmap memory
// Valid until the next write(). Do not retain across async boundaries.
const r = seg.read();
if (r) {
  console.log(r.shape); // [2, 3]
  console.log(r.dtype); // 0 (DType.FLOAT32)
  console.log(r.version); // seqlock sequence number at time of read
  console.log((r.data as Float32Array)[5]); // 6
}

// Safe copy read — owns its buffer, safe to retain and pass anywhere
const copy = seg.readCopy();

seg.destroy();

Async reads

readWait() is the primary API for consumer threads that need to park until data is ready. Under the hood, it spins up to 16 times on the seqlock before registering a pending read. When the writer calls write(), it calls uv_async_send() after the seqlock commit. libuv coalesces concurrent sends into a single callback, which wakes all parked readers at once.

const seg = new SharedTensorSegment(4 * 1024 * 1024);

// Park until a writer commits
const result = await seg.readWait();
if (result) {
  console.log(result.shape);
  console.log((result.data as Float32Array)[0]);
}

// Or park for a safe copy
const copy = await seg.readCopyWait();

This means there is no polling, no setInterval, and no blocked event loop. Readers pay zero CPU while waiting. The coalescing property of uv_async_send means 256 parked readers are woken by a single writer commit with one callback delivery, not 256.


GPU / CUDA pinning

Page-locking the mapping lets CUDA initiate H2D DMA directly from host memory, skipping the internal staging buffer that cudaMemcpy would otherwise allocate. Call pin() before handing the data pointer to TF_NewTensor.

const seg = new SharedTensorSegment(1024 * 1024 * 64); // 64 MB

const pinned = seg.pin();
if (!pinned) {
  // Pinning failed — insufficient RLIMIT_MEMLOCK (Linux) or working set quota (Windows).
  // The segment still works correctly for CPU-only inference; pin() failing is non-fatal.
  console.warn("Page pinning unavailable, falling back to staged DMA");
}

// Write pre-processed tensor data (e.g. from dspx)
seg.write([1, 224, 224, 3], DType.FLOAT32, preprocessedBuffer);

// Hand data_ptr directly to TF_NewTensor with TF_DONT_DEALLOCATE_CONTENTS.
// The gpu_private host threads can now DMA without a staging copy.
const dataPtr = /* seg.dataPtr — see C++ shim notes below */;

seg.unpin(); // Release when inference is done
seg.destroy();

The tensor data begins at DATA_OFFSET = 256 bytes into the mapped region. This offset is enforced by a static_assert in segment.h and satisfies CUDA's async DMA alignment requirement.

On Linux, if pin() returns false:

ulimit -l unlimited
# or programmatically: setrlimit(RLIMIT_MEMLOCK, { rlim_cur: RLIM_INFINITY, ... })

On Windows, the default per-process VirtualLock quota is 64MB. Raise it via SetProcessWorkingSetSizeEx for large models.


API

new SharedTensorSegment(maxBytes: number)

Allocates an mmap-backed segment with maxBytes bytes of tensor capacity, plus a 256-byte internal header. The total mapped region is rounded up to the nearest page boundary.

write(shape: number[], dtype: DType, buffer: ArrayBuffer | TypedArray): void

Commits a tensor under the seqlock. Shape rank is capped at 8 dimensions. Throws RangeError if buffer.byteLength exceeds byteCapacity. After commit, wakes any parked readWait() / readCopyWait() callers via uv_async_send.

read(): TensorResult | null

Zero-copy read. Returns an external ArrayBuffer view directly into mmap memory. Valid until the next write() or destroy(). Returns null if no data has been written yet.

Do not retain the returned data view across async boundaries without calling readCopy() instead.

readCopy(): TensorResult | null

Copies tensor data into a fresh Buffer before returning. Safe to retain indefinitely and pass across worker boundaries.

readWait(): Promise<TensorResult | null>

Spins up to 16 times, then parks until a writer commit wakes it. Resolves with a zero-copy view. Rejects if destroy() is called while parked.

readCopyWait(): Promise<TensorResult | null>

Same as readWait() but resolves with a copied buffer.

pin(): boolean

Page-locks the entire mapped region. Returns true on success, false if the OS denies the lock (non-fatal). Idempotent — calling pin() on an already-pinned segment returns true immediately.

unpin(): void

Releases the page lock. Safe to call when not pinned or after destroy().

destroy(): void

Unmaps the region and rejects all parked readWait() promises. Subsequent calls to write() or pin() will throw.

byteCapacity: number

The usable tensor capacity in bytes (excludes the internal header).

isPinned: boolean

Whether the mapping is currently page-locked.


DType

enum DType {
  FLOAT32 = 0,
  FLOAT64 = 1,
  INT32 = 2,
  INT64 = 3,
  UINT8 = 4,
  INT8 = 5,
  UINT16 = 6,
  INT16 = 7,
  BOOL = 8,
}

The numeric enum is passed directly over the N-API bridge, avoiding string parsing on the hot path.


TensorResult

interface TensorResult {
  shape: number[]; // dimension sizes
  dtype: DType; // element type
  data: TypedArray; // typed view into the buffer
  version: number; // seqlock sequence number at time of read
}

version increments by 2 on each successful write (seqlock counter is even when stable). You can use it to detect stale reads in a pipeline.


Segment memory layout

Byte offset 0:
  [0..63]    Seqlock          — cache line 0 (sequence counter, alignas(64))
  [64..191]  TensorMeta       — cache lines 1-2 (ndim, dtype, shape[8], byte_length, pad)
  [192..255] _gpu_pad         — 64 bytes, pads header to 256 for CUDA DMA alignment
Byte offset 256:
  [256..]    Tensor data      — your bytes, 256-byte aligned

Each section occupies distinct cache lines, preventing false sharing between the seqlock counter, metadata reads, and tensor data writes.


Seqlock failure modes

The reader spins up to READ_SPIN_LIMIT = 16 times on a torn read (writer mid-commit). In practice, a commit is a memcpy plus two atomic increments — the spin resolves in nanoseconds. If the segment has no committed data yet, read() returns null and readWait() parks until the first write().


Building from source

git clone https://github.com/A-KGeorge/jude
cd jude-map
npm install
npm run build
npm test

Individual steps:

npm run build:ts      # TypeScript → dist/
npm run build:native  # C++ → build/Release/jude-map.node
npm test              # node --import tsx --test src/ts/__tests__/*.test.ts

Generating prebuilds for distribution:

npm run prebuildify

This produces binaries for Node 18, 20, 22, and 24 under prebuilds/. Commit this directory — it is explicitly not gitignored.


License

Apache-2.0