npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@smoothbricks/arrow-builder

v0.1.1

Published

A low-level, high-performance columnar buffer engine for building Apache Arrow tables with explicit memory management and zero-copy data structures.

Readme

@smoothbricks/arrow-builder

A low-level, high-performance columnar buffer engine for building Apache Arrow tables with explicit memory management and zero-copy data structures.

Overview

Arrow-builder is a lightweight alternative to Arrow JS builders, designed for use cases that require:

  • Explicit allocations: No hidden resizes or memory surprises
  • Zero-copy construction: Direct TypedArray access with no intermediate copies
  • Cache-aligned buffers: 64-byte aligned TypedArrays optimized for CPU cache
  • V8-optimized codegen: Runtime class generation for monomorphic property access
  • Predictable performance: Hot-path operations with minimal overhead

Unlike official Arrow JS builders that automatically resize and copy data, arrow-builder gives you complete control over memory allocation and layout.

Use Cases

Arrow-builder is a generic columnar buffer engine suitable for any tabular data collection scenario:

  • Time-series data collection: High-frequency sensor readings, market data ticks
  • Metrics aggregation: System metrics, application performance monitoring
  • Event sourcing buffers: Event streams with structured attributes
  • Database result caching: Efficient in-memory columnar storage
  • Analytics pipelines: Fast data transformation and aggregation
  • Streaming data processing: Low-latency event processing

Key Features

1. Explicit Memory Management

import { createColumnBuffer } from '@smoothbricks/arrow-builder';

// Create buffer with explicit capacity
const buffer = createColumnBuffer(schema, 1000);

// Write data with manual bounds checking
buffer.timestamp[buffer.writeIndex] = timestamp;
buffer.entry_type[buffer.writeIndex] = opType;
buffer.writeIndex++;

// Chain to next buffer when full
if (buffer.writeIndex >= buffer.capacity) {
  buffer.next = createColumnBuffer(schema, 1000);
}

2. Zero-Copy Arrow Tables

Arrow-builder constructs Arrow tables directly from your TypedArrays with no copies:

// Your columnar data (already in memory)
const buffer = createColumnBuffer(schema, capacity);

// Zero-copy conversion to Arrow Table
const table = convertToArrowTable(buffer);

// TypedArrays are reused, not copied

3. Cache-Aligned TypedArrays

All buffers are automatically aligned to 64-byte cache line boundaries for optimal CPU performance:

// Internally allocates cache-aligned ArrayBuffers
const buffer = createColumnBuffer(schema, 64);

// All TypedArrays are 64-byte aligned
buffer.timestamp; // BigInt64Array (aligned)
buffer.entry_type; // Uint8Array (aligned)

4. V8-Optimized Runtime Codegen

Arrow-builder generates optimized classes at runtime to maximize V8 performance:

// Generated class with direct properties (not lazy getters)
class GeneratedColumnBuffer {
  timestamps: Float64Array; // Direct property
  operations: Uint8Array; // Direct property
  attr_userId_values: Uint32Array; // Direct property
  attr_userId_nulls: Uint8Array; // Direct property
  // ...
}

// V8 optimizations:
// - Hidden class stability
// - Monomorphic inline caching
// - Predictable memory layout

Architecture

Column Layout

Each attribute column consists of two arrays sharing one ArrayBuffer:

[null bitmap bytes | padding | value bytes]
         ↓                          ↓
   attr_X_nulls              attr_X_values

This design:

  • Maintains cache locality (related data in same buffer)
  • Ensures proper alignment (padding to bytesPerElement boundaries)
  • Minimizes memory allocations (one buffer per column)

Schema System

Arrow-builder uses Sury schemas with metadata to determine TypedArray types:

import * as s from '@sury/sury';

const schema = {
  userId: s.number, // → Float64Array
  status: s.enum, // → Uint8/16/32Array (based on enum size)
  category: s.string, // → Uint32Array (string interning)
  isActive: s.boolean, // → Uint8Array
};

// Attach metadata
schema.userId.__schema_type = 'number';
schema.status.__schema_type = 'enum';
schema.status.__enum_values = ['pending', 'active', 'completed'];

TypedArray Mapping

| Schema Type | TypedArray | Bytes | Use Case | | ---------------------- | ------------ | ----- | ------------------------ | | number | Float64Array | 8 | Full-precision numbers | | boolean | Uint8Array | 1 | Boolean flags (0/1) | | enum (≤256 values) | Uint8Array | 1 | Small enums | | enum (≤65536 values) | Uint16Array | 2 | Medium enums | | enum (>65536 values) | Uint32Array | 4 | Large enums | | category | Uint32Array | 4 | String interning indices | | text | Uint32Array | 4 | Raw string indices |

Installation

npm install @smoothbricks/arrow-builder
# or
bun add @smoothbricks/arrow-builder

API Reference

Core Functions

createColumnBuffer(schema, capacity?)

Create a columnar buffer with the specified schema and capacity.

import { createColumnBuffer } from '@smoothbricks/arrow-builder';
import type { SchemaFields } from '@smoothbricks/arrow-builder';

const schema: SchemaFields = {
  userId: userIdSchema,
  timestamp: timestampSchema,
};

const buffer = createColumnBuffer(schema, 1000);

Parameters:

  • schema: Schema defining column types
  • capacity: Buffer capacity (default: 64)

Returns: ColumnBuffer with direct TypedArray properties

createAttributeColumns(schema, capacity?)

Create attribute columns as a record of TypedArrays.

import { createAttributeColumns } from '@smoothbricks/arrow-builder';

const columns = createAttributeColumns(schema, 1000);
// Returns: { attr_userId: Float64Array, attr_timestamp: Float64Array, ... }

Type Utilities

Microseconds

Branded type for microsecond-precision timestamps.

import { Microseconds } from '@smoothbricks/arrow-builder';

// Convert from milliseconds
const timestamp = Microseconds.fromMillis(Date.now());

// Convert from nanoseconds (Node.js)
const precise = Microseconds.fromNanos(process.hrtime.bigint());

// Use in buffer
buffer.timestamp[idx] = timestamp;

Benefits:

  • Type-safe time unit handling
  • Prevents mixing milliseconds and microseconds
  • Zero runtime overhead (compile-time only)

Performance Characteristics

Hot Path Operations

Arrow-builder is optimized for the hot path (writing data):

// Hot path: Direct property access, no function calls
buffer.timestamp[idx] = timestamp; // ~1-2 CPU cycles
buffer.entry_type[idx] = opType; // ~1-2 CPU cycles
buffer.attr_userId_values[idx] = userId; // ~1-2 CPU cycles
buffer.writeIndex++; // ~1 CPU cycle

Cold Path Operations

Arrow conversion happens in the cold path (background processing):

// Cold path: Run in background, no hot-path impact
const table = convertToArrowTable(buffer);

Memory Layout

  • System columns (timestamps, operations): Eagerly allocated
  • Attribute columns: Lazily allocated on first access
  • All allocations: 64-byte aligned for cache efficiency

Comparison with Arrow JS builders

| Feature | arrow-builder | Arrow JS builders | | --------------- | -------------------- | ----------------- | | Allocations | Explicit | Hidden/automatic | | Resizing | Manual chaining | Automatic grow | | Memory control | Full control | Opaque | | Cache alignment | 64-byte aligned | Not guaranteed | | V8 optimization | Runtime codegen | Generic builders | | Use case | Performance-critical | General purpose |

Examples

Basic Time-Series Data Collection

import { createColumnBuffer, Microseconds } from '@smoothbricks/arrow-builder';

// Define schema
const schema = {
  metric: metricSchema, // enum: 'cpu', 'memory', 'disk'
  value: valueSchema, // number
};

// Create buffer
const buffer = createColumnBuffer(schema, 1000);

// Write data
function recordMetric(metric: number, value: number) {
  const idx = buffer.writeIndex;

  buffer.timestamp[idx] = Microseconds.fromMillis(Date.now());
  buffer.entry_type[idx] = 1; // METRIC_SAMPLE operation
  buffer.attr_metric_values[idx] = metric;
  buffer.attr_value_values[idx] = value;

  buffer.writeIndex++;
}

// Use it
recordMetric(0, 45.2); // CPU: 45.2%
recordMetric(1, 8192); // Memory: 8192 MB

Event Sourcing Buffer

import { createColumnBuffer, Microseconds } from '@smoothbricks/arrow-builder';

const schema = {
  eventType: eventTypeSchema, // enum: 'created', 'updated', 'deleted'
  entityId: entityIdSchema, // category (string interning)
  payload: payloadSchema, // text (raw strings)
};

const buffer = createColumnBuffer(schema, 500);

function appendEvent(eventType: number, entityId: number, payload: number) {
  const idx = buffer.writeIndex;

  buffer.timestamp[idx] = Microseconds.fromNanos(process.hrtime.bigint());
  buffer.entry_type[idx] = eventType;
  buffer.attr_entityId_values[idx] = entityId;
  buffer.attr_payload_values[idx] = payload;

  buffer.writeIndex++;

  // Chain when full
  if (buffer.writeIndex >= buffer.capacity) {
    buffer.next = createColumnBuffer(schema, 500);
  }
}

Database Result Caching

import { createColumnBuffer } from '@smoothbricks/arrow-builder';

const schema = {
  id: idSchema,
  name: nameSchema,
  age: ageSchema,
  active: activeSchema,
};

const buffer = createColumnBuffer(schema, 10000);

// Cache query results
function cacheResults(rows: Array<{ id: number; name: string; age: number; active: boolean }>) {
  for (const row of rows) {
    const idx = buffer.writeIndex;

    buffer.timestamp[idx] = Microseconds.fromMillis(Date.now());
    buffer.entry_type[idx] = 0; // ROW operation
    buffer.attr_id_values[idx] = row.id;
    buffer.attr_name_values[idx] = internString(row.name);
    buffer.attr_age_values[idx] = row.age;
    buffer.attr_active_values[idx] = row.active ? 1 : 0;

    buffer.writeIndex++;
  }
}

Advanced Topics

Buffer Chaining

When a buffer reaches capacity, chain to the next buffer:

let headBuffer = createColumnBuffer(schema, 1000);
let currentBuffer = headBuffer;

function writeEntry(data: Entry) {
  if (currentBuffer.writeIndex >= currentBuffer.capacity) {
    currentBuffer.next = createColumnBuffer(schema, 1000);
    currentBuffer = currentBuffer.next;
  }

  // Write to current buffer
  const idx = currentBuffer.writeIndex;
  // ... write data ...
  currentBuffer.writeIndex++;
}

Null Handling

Each attribute has a null bitmap (Arrow format: 1=valid, 0=null):

// Write valid value
buffer.attr_userId_values[idx] = 12345;
buffer.attr_userId_nulls[idx] = 1; // Mark as valid

// Write null value
buffer.attr_userId_values[idx] = 0; // Value doesn't matter
buffer.attr_userId_nulls[idx] = 0; // Mark as null

Custom Schemas

Extend the schema system for domain-specific types:

import * as s from '@sury/sury';

function createEnumSchema<T extends string>(values: readonly T[]) {
  const schema = s.enum(values);
  schema.__schema_type = 'enum';
  schema.__enum_values = values;
  return schema;
}

const statusSchema = createEnumSchema(['pending', 'active', 'completed']);

License

MIT

Building

Run nx build arrow-builder to build the library.

Testing

Run bun test to execute tests.