arrow-rs-wasm
v0.4.0
Published
High-performance WebAssembly library for Apache Arrow, Feather, and Parquet data with zero-copy semantics, LZ4 compression, and comprehensive model-based testing
Maintainers
Readme
arrow-rs-wasm
High-performance WebAssembly bindings for Apache Arrow that expose zero-copy columnar data—built in Rust, delivered to JavaScript/TypeScript with camelCase APIs.
Features
- JS-friendly surface – Functions and classes are exported with camelCase/PascalCase via
#[wasm_bindgen(js_name = ...)]. - Zero-copy buffers – Access Arrow vectors through live
TypedArrayviews backed directly by Wasm linear memory. - UTF-8 columns – Retrieve values/offsets/validity buffers and lazily decode strings only when needed.
- Compute helpers – Optional table transforms such as
filterTable,takeRows, andsortTablereturn new handles on the Wasm side. - Dual runtime support – Tested with modern Vite/React browser pipelines and Node.js ESM projects.
Getting Started
Installation (local build)
npm i /Users/ods/Documents/arrow-rs-wasm/pkgInstallation (registry build)
npm i arrow-rs-wasmIf consuming from source, regenerate the package with:
# Browser ESM bundle wasm-pack build --release --target web # Node.js bundle wasm-pack build --release --target nodejs
The package ships as an ESM module. Always call the default export (await init()) once before using any other function.
Quick Start (Browser / Vite + React)
Install the local build:
npm i /Users/ods/Documents/arrow-rs-wasm/pkgSeed a test handle (for example, in
src/test-setup.ts):import init, { createTestTable } from 'arrow-rs-wasm'; void (async () => { await init(); (window as any).TEST_HANDLE = createTestTable(); })();Render the app.
// src/App.tsx
import { useEffect, useState } from 'react'
import init, { getColumnNames, exportUtf8Buffers, exportPrimitiveBuffers } from 'arrow-rs-wasm'
export default function App() {
const [ready, setReady] = useState(false)
const [cols, setCols] = useState<string[]>([])
useEffect(() => {
(async () => {
await init(); // Required wasm-bindgen init
const handle = (window as any).TEST_HANDLE; // Supply a real handle in your boot script
const names = await getColumnNames(handle);
setCols(names);
setReady(true);
// Example: peek at buffers
const primitives = await exportPrimitiveBuffers(handle, 'id');
console.log('Primitive values buffer', primitives.values);
const utf = await exportUtf8Buffers(handle, 'name');
console.log('UTF-8 buffer lengths', utf.values.length, utf.offsets.length);
})();
}, []);
return (
<main>
<h1>arrow-rs-wasm (Vite/React)</h1>
<p>Ready: {String(ready)}</p>
<pre>{JSON.stringify(cols, null, 2)}</pre>
</main>
);
}// src/main.tsx
import { StrictMode } from 'react'
import { createRoot } from 'react-dom/client'
import './test-setup' // registers TEST_HANDLE, etc.
import App from './App'
createRoot(document.getElementById('root')!).render(
<StrictMode>
<App />
</StrictMode>,
)Quick Start (Node.js)
Build a node-compatible bundle (wasm-pack build --release --target nodejs) if you are consuming locally.
// node-example.mjs
import init, { createTestTable, getColumnNames } from 'arrow-rs-wasm';
await init(); // Loads the Node.js target Wasm
const handle = createTestTable(); // or hydrate from Arrow IPC/parquet bytes
const columns = await getColumnNames(handle);
console.log('Columns:', columns);Run with:
node node-example.mjsAPI Overview (JS Names)
| Export | Description |
| --- | --- |
| init(options?) | Default async initializer (must be awaited once). |
| initWithOptions(enableConsoleLogs: boolean) | Optional second-stage setup for debugging. |
| setPanicHook() | Routes Rust panics to console (no-op unless enabled). |
| createTestTable() | Returns a demo table handle for quick experiments. |
| readTableFromBytes(data: Uint8Array) | Loads Arrow IPC bytes into Wasm, returns a table handle. |
| writeTableToIpc(handle, enableLz4) | Serializes a table handle back to IPC bytes. |
| getColumnNames(handle) | Resolves string[] of column names. |
| exportPrimitiveBuffers(handle, columnName) | Returns { values: TypedArray; validity?: Uint8Array | null }. |
| exportUtf8Buffers(handle, columnName) | Returns { values: Uint8Array; offsets: Int32Array | BigInt64Array; validity?: Uint8Array | null }. |
| exportBinaryBuffers(handle, columnName) | Returns { values: Uint8Array; offsets?: Int32Array; validity?: Uint8Array | null }. |
| filterTable(handle, predicateSpec) | Applies predicate, yields new table handle. |
| takeRows(handle, indices) | Selects row subset. |
| sortTable(handle, sortKeys) | Returns sorted table handle. |
| getTableInfo(handle) | Summaries about row/column counts and schema. |
| freeTable(handle) | Releases Wasm-side resources. |
| getMemoryInfo() | Debug helper describing Wasm memory usage. |
Internally, Rust keeps snake_case identifiers; the exported JS API uses camelCase/PascalCase thanks to
#[wasm_bindgen(js_name = ...)].
Zero-copy Model & UTF-8 Handling
Every buffer exporter returns a view into Wasm linear memory—no copies are made. Treat these objects as live slices:
exportPrimitiveBuffers→TypedArray(e.g.,Int32Array,Float64Array) plus optionalUint8Arrayvalidity bitmap.exportUtf8Buffers→values(Uint8Arrayof concatenated UTF-8),offsets(Int32ArrayorBigInt64Arraydepending on column width), and optionalvalidity.
Lazy decode UTF-8 values only when needed:
const utf = await exportUtf8Buffers(handle, 'name');
const decoder = new TextDecoder();
const i = 0;
const start = utf.offsets[i];
const end = utf.offsets[i + 1];
const firstValue = decoder.decode(utf.values.subarray(start, end));Memory Growth & View Refresh
Wasm memory may grow during allocations, producing a new backing ArrayBuffer. Existing views detach and report length 0. After any heavy operation (e.g., filterTable, sortTable, or bulk append), regenerate views:
let { values } = await exportPrimitiveBuffers(handle, 'score');
// ... after operations that may allocate
({ values } = await exportPrimitiveBuffers(handle, 'score')); // refresh viewLong-lived UIs should re-request buffers whenever a major action completes.
TypeScript Hints
export type TableHandle = number;
export interface PrimitiveBuffers {
values: Int8Array | Int16Array | Int32Array | Float32Array | Float64Array;
validity?: Uint8Array | null;
}
export interface Utf8Buffers {
values: Uint8Array;
offsets: Int32Array | BigInt64Array; // LargeUtf8 uses 64-bit offsets
validity?: Uint8Array | null;
}Type definitions ship in pkg/arrow_rs_wasm.d.ts. Ensure esModuleInterop or native ESM pipeline for consumers.
End-to-End Examples
Primitive column
const { values, validity } = await exportPrimitiveBuffers(handle, 'id');
const dataView = new DataView(values.buffer, values.byteOffset, values.byteLength);
const first = dataView.getInt32(0, true);
const isValid = !validity || (validity[0] & 1) === 1;UTF-8 column
const utf = await exportUtf8Buffers(handle, 'name');
const decoder = new TextDecoder();
for (let i = 0; i < utf.offsets.length - 1; i++) {
const isNull = utf.validity && (utf.validity[Math.floor(i / 8)] & (1 << (i % 8))) === 0;
if (isNull) continue;
const start = Number(utf.offsets[i]);
const end = Number(utf.offsets[i + 1]);
console.log(decoder.decode(utf.values.subarray(start, end)));
}Performance Notes
- Favor
filterTable,takeRows, andsortTable(where available) to keep computations inside Wasm and reduce host copies. - Avoid eagerly decoding UTF-8 strings; decode on demand at the UI boundary.
- Reuse handles and refresh views after operations that might trigger Wasm memory growth.
Project Layout & Build
/Users/ods/Documents/arrow-rs-wasm/pkgcontains the ESM wrapper (arrow_rs_wasm.js), the compiled Wasm artifact, type definitions, andpackage.json.Rebuild with
wasm-pack build --release --target web(browser) or--target nodejs(Node).Install into client projects with:
npm i /Users/ods/Documents/arrow-rs-wasm/pkg
Testing (E2E)
- Browser (Vite):
npm run dev, openhttp://localhost:5173, ensureawait init()runs, verify column names, decode a UTF-8 element, and confirm zero-copy buffers by comparing.bufferto a cached reference. - Chromium DevTools MCP: Automate via a DevTools protocol session—assert camelCase exports, zero-copy
bufferequality, and lazy decode results. - Memory detach test: Execute an operation that grows memory (e.g., heavy filter), then re-run buffer exporters to rebuild views.
Troubleshooting
- Module not found: Use the appropriate bundle (
--target webfor browsers,--target nodejsfor Node). - Empty buffers: Likely due to Wasm memory growth; call the exporter again.
- Snake_case exports: Ensure the Rust functions use
#[wasm_bindgen(js_name = ...)], then rebuild the pkg directory.
Versioning & License
- Semantic versioning: MAJOR.MINOR.PATCH.
- Dual-licensed under MIT and Apache-2.0—use either license at your option.
