@nxtedition/shared

v5.1.4

Published

A high-performance, thread-safe ring buffer for inter-thread communication in Node.js using `SharedArrayBuffer`.

Readme

@nxtedition/shared

A high-performance, thread-safe ring buffer for inter-thread communication in Node.js using SharedArrayBuffer.

Why

Passing data between worker threads in Node.js typically involves structured cloning or transferring ArrayBuffer ownership. Structured cloning copies every byte — fine for occasional messages, but a bottleneck when streaming megabytes per second between threads. Transferable objects avoid the copy, but each transfer still requires allocating a new ArrayBuffer, serializing the transfer list, and coordinating ownership between threads — overhead that adds up quickly in high-throughput scenarios.

This ring buffer avoids these problems. A single SharedArrayBuffer is mapped into both threads. The writer appends messages by advancing a write pointer; the reader consumes them by advancing a read pointer. No copies, no ownership transfers, no cloning overhead. Because messages are stored inline in a contiguous buffer, data lives right where the protocol is rather than scattered across separately allocated ArrayBuffers — keeping access patterns cache-friendly. The pointers are coordinated with Atomics operations, and cache-line-aligned to prevent false sharing between CPU cores.

Reads are zero-copy: the reader callback receives a DataView directly into the shared buffer, so parsing can happen in-place without allocating intermediate buffers. Writes are batched — the write pointer is only published to the reader after a high-water mark is reached or the current event loop tick ends, drastically reducing the frequency of expensive atomic stores.

Native binding

The ring buffer relies on a native C++ addon for two capabilities that are not available from pure JavaScript.

Double-mapped virtual memory. The key trick that eliminates wrap-around copies is mapping the same physical memory region into two consecutive virtual address ranges. From the CPU's perspective the buffer appears twice as large: a message that starts near the end of the physical ring and would normally wrap around is simply read or written as a contiguous span across the boundary. On POSIX systems this uses mmap twice with MAP_FIXED | MAP_SHARED onto the same memfd; on Windows it uses VirtualAlloc2 / MapViewOfFile3. Neither is expressible in JavaScript, which has no way to create an ArrayBuffer backed by a custom virtual memory layout.

Huge pages (Linux). When the requested buffer size is at least 1 MiB, the native allocator first attempts to back the memory with 2 MiB explicit huge pages (via memfd_create with MFD_HUGETLB | MFD_HUGE_2MB). Huge pages reduce TLB pressure substantially for large ring buffers — the CPU needs far fewer TLB entries to cover the same address range, which means fewer TLB misses on hot read/write paths. If explicit huge pages are unavailable (e.g. the huge-page pool is empty or the kernel does not support it), the allocator falls back to regular 4 KiB pages and advises the kernel to back the region with transparent huge pages (madvise(MADV_HUGEPAGE)).

Platform Assumptions

All messages are aligned on 4-byte boundaries. Message length headers are read and written via Int32Array indexing rather than DataView, avoiding per-access endianness checks on the hot path.

Install

npm install @nxtedition/shared

Usage

import { Reader, Writer } from '@nxtedition/shared'

// Create writer — allocates the ring buffer internally
const w = new Writer(1024 * 1024) // 1 MB ring buffer

const payload = Buffer.from('hello world')
w.writeSync(payload.length, (data) => {
  payload.copy(data.buffer, data.byteOffset)
  return data.byteOffset + payload.length
})

// Create reader from the same handle (pass w.handle to the other thread)
const r = new Reader(w.handle)

r.readSome((data) => {
  const msg = data.buffer.subarray(data.byteOffset, data.byteOffset + data.byteLength).toString()
  console.log(msg) // 'hello world'
})

Batching writes with cork

w.cork(() => {
  for (const item of items) {
    const buf = Buffer.from(JSON.stringify(item))
    w.writeSync(buf.length, (data) => {
      buf.copy(data.buffer, data.byteOffset)
      return data.byteOffset + buf.length
    })
  }
})
// All writes flushed atomically when cork returns

Or manually:

w.cork()
w.writeSync(buf1.length, writeFn, buf1)
w.writeSync(buf2.length, writeFn, buf2)
w.uncork() // publishes all writes to the reader

Non-blocking writes with tryWrite

const buf = Buffer.from('data')
const ok = w.tryWrite(buf.length, (data) => {
  buf.copy(data.buffer, data.byteOffset)
  return data.byteOffset + buf.length
})
if (!ok) {
  // Buffer is full — the reader hasn't caught up yet
}

Cross-thread usage

// main.js
import { Writer } from '@nxtedition/shared'
import { Worker } from 'node:worker_threads'

const w = new Writer(1024 * 1024)
const worker = new Worker('./reader-worker.js', {
  workerData: w.handle,
})
// ... write messages
// reader-worker.js
import { Reader } from '@nxtedition/shared'
import { workerData } from 'node:worker_threads'

const r = new Reader(workerData)

function poll() {
  const count = r.readSome((data) => {
    // process data.buffer at data.byteOffset..data.byteOffset+data.byteLength
  })
  setImmediate(poll)
}
poll()

API

new Reader(handleOrSize)

Creates a reader for the ring buffer.

  • handleOrSize — Either a SharedHandle obtained from writer.handle (to connect to an existing ring buffer), or a positive integer to allocate a new ring buffer of that payload capacity in bytes (max ~2 GB). Overhead is added automatically.

reader.handle

The underlying SharedHandle (a branded SharedArrayBuffer). Pass this to another thread via workerData or to new Writer(handle) / new Reader(handle) to share the buffer.

reader.readSome(next, opaque?)

Reads a batch of messages. Calls next(data, opaque) for each message, where data has:

  • buffer: Buffer — The underlying shared buffer

  • view: DataView — A DataView over the shared buffer

  • byteOffset: number — Start offset of the message payload. Due to the double-mapped virtual memory layout, this value can be greater than or equal to the physical ring size for messages that span the wrap boundary — the bytes are always contiguous and valid within buffer. Do not compare byteOffset against the physical ring size.

  • byteLength: number — Length of the message payload in bytes

  • opaque — Optional user-provided context object passed through to the callback. Useful for avoiding closures on hot paths.

Return false from the callback to stop reading early. Returns the number of messages processed.

Messages are batched: up to 256 KiB of data per call.

new Writer(handleOrSize, options?)

Creates a writer for the ring buffer.

  • handleOrSize — Either a SharedHandle obtained from writer.handle (to connect to an existing ring buffer), or a positive integer to allocate a new ring buffer of that payload capacity in bytes (max ~2 GB). Overhead is added automatically.

Options:

  • yield?: () => void — Called when the writer must wait for the reader to catch up. Useful to prevent deadlocks when the writer thread also drives the reader.
  • logger?: { warn(obj, msg): void } — Logger for yield warnings (pino-compatible).

writer.handle

The underlying SharedHandle (a branded SharedArrayBuffer). Pass this to another thread via workerData or to new Reader(handle) to share the buffer.

writer.writeSync(len, fn, opaque?)

Synchronously writes a message. Blocks (via Atomics.wait) until buffer space is available.

  • len — Maximum payload size in bytes. Writing beyond len bytes in the callback is undefined behavior.
  • fn(data, opaque) → number — Write callback. Write payload into data.buffer starting at data.byteOffset. Must return the end position (data.byteOffset + bytesWritten), not the byte count.
  • opaque — Optional user-provided context object passed through to the callback. Useful for avoiding closures on hot paths.

Throws on timeout (default: 60000 ms).

writer.tryWrite(len, fn, opaque?)

Non-blocking write attempt. Returns false if the buffer is full. The fn and opaque parameters follow the same contract as writeSync.

writer.cork(callback?)

Batches multiple writes. The write pointer is only published to the reader when the cork is released, reducing atomic operation overhead.

When called with a callback, uncork is called automatically when the callback returns. When called without a callback, you must call uncork() manually.

writer.uncork()

Decrements the cork counter. When it reaches zero, publishes the pending write position to the reader. Safe to call when not corked (no-op).

writer.flushSync()

Immediately publishes the pending write position to the reader. Unlike uncork, this does not interact with the cork counter — it forces a flush regardless.

Benchmarks

Measured on AMD EPYC 9355P 32-Core Processor, Node.js 25.8.1, 8 MiB ring buffer, Docker (x64-linux).

Each benchmark writes batches of fixed-size messages from the main thread and reads them in a worker thread. The shared ring buffer is compared against Node.js postMessage (structured clone). "shared (string)" uses the latin1 fast path; UTF-8 strings are slower.

Throughput

| Size | shared (buffer) | shared (latin1 str) | postMessage (buffer) | postMessage (string) | | -----: | --------------: | ------------------: | -------------------: | -------------------: | | 64 B | 2.00 GiB/s | 549 MiB/s | 22.78 MiB/s | 39.58 MiB/s | | 256 B | 3.62 GiB/s | 1.73 GiB/s | 89.90 MiB/s | 174.62 MiB/s | | 1 KiB | 11.23 GiB/s | 6.00 GiB/s | 341.78 MiB/s | 521.39 MiB/s | | 4 KiB | 24.36 GiB/s | 20.04 GiB/s | 1.13 GiB/s | 1.59 GiB/s | | 16 KiB | 44.16 GiB/s | 50.76 GiB/s | 3.70 GiB/s | 7.45 GiB/s | | 64 KiB | 90.02 GiB/s | 78.88 GiB/s | 9.49 GiB/s | 15.09 GiB/s |

Message rate

| Size | shared (buffer) | shared (latin1 str) | postMessage (buffer) | postMessage (string) | | -----: | --------------: | ------------------: | -------------------: | -------------------: | | 64 B | 33.61 M/s | 8.99 M/s | 373 K/s | 648 K/s | | 256 B | 15.16 M/s | 7.27 M/s | 368 K/s | 715 K/s | | 1 KiB | 11.78 M/s | 6.29 M/s | 350 K/s | 534 K/s | | 4 KiB | 6.39 M/s | 5.25 M/s | 297 K/s | 417 K/s | | 16 KiB | 2.89 M/s | 3.33 M/s | 242 K/s | 488 K/s | | 64 KiB | 1.47 M/s | 1.29 M/s | 155 K/s | 247 K/s |

Key findings

  • Small messages (64–256 B): Buffer.set delivers 33.6–15.2 M msg/s — up to 90x faster than postMessage (buffer) and 52x faster than postMessage (string). Per-message overhead dominates at these sizes, and avoiding structured cloning makes the biggest difference.

  • Medium messages (1 KiB): Buffer.set pulls ahead at 11.23 GiB/s — roughly 1.9x faster than the latin1 string path and ~22x faster than the best postMessage variant.

  • Large messages (4–64 KiB): Both shared paths dominate. At 16 KiB the latin1 string path (50.76 GiB/s) overtakes Buffer.set (44.16 GiB/s) — V8's string-to-buffer fast path becomes more cache-efficient at this size. At 64 KiB Buffer.set reclaims the lead at 90.02 GiB/s6x faster than postMessage (string).

  • Caveat: The string benchmark uses ASCII-only content. Multi-byte UTF-8 strings will not hit V8's vectorized fast path and will be significantly slower.

Running the benchmark

node --allow-natives-syntax packages/shared/src/bench.mjs

License

MIT