arrow-to-json
v1.1.0
Published
Convert Arrow IPC bytes to JSON using Rust (napi-rs)
Maintainers
Readme
arrow-to-json
A native Node.js addon that converts Apache Arrow IPC bytes to JSON. Written in Rust using napi-rs for maximum throughput — typically ~20x faster than parsing with the JavaScript apache-arrow library and serializing with JSON.stringify.
Install
npm install arrow-to-jsonUsage
Row-object format
import { arrowIpcToJson } from 'arrow-to-json'
const json: string = arrowIpcToJson(arrowBytes)
const rows: unknown[] = JSON.parse(json)
// [{ id: 1, name: "Alice" }, { id: 2, name: "Bob" }]Returns a JSON array of row objects. Each element has column names as keys.
Columnar format (recommended)
import { arrowIpcToJsonColumns } from 'arrow-to-json'
const json: string = arrowIpcToJsonColumns(arrowBytes)
const cols: Record<string, unknown[]> = JSON.parse(json)
// { id: [1, 2], name: ["Alice", "Bob"] }Returns a JSON object where each key is a column name and each value is an array of all row values. This format is ~36% smaller than the row-object format because column names appear only once, leading to proportionally faster JSON.parse on the JS side.
Supported Arrow types
| Arrow type | JSON representation |
| ------------------------------------------------------------- | ------------------------------------ |
| Boolean | true / false |
| Int8 .. Int32, UInt8 .. UInt32 | number |
| Int64 / UInt64 | number if ≤ 2^53, string otherwise |
| Float16 / Float32 / Float64 | number (NaN / Infinity → null) |
| Utf8 / LargeUtf8 | string (JSON-escaped) |
| Binary / LargeBinary | array of byte values |
| List / LargeList / FixedSizeList | array (recursive) |
| Struct | object (recursive) |
| Map<Utf8, *> | object ({key: value}) |
| Map<non-Utf8, *> | array of {key, value} objects |
| Dictionary<*, *> | resolved value (recursive) |
| Timestamp, Date32/64, Time32/64, Duration, Interval | string (cast to Utf8) |
| Null values | omitted from output objects |
| Empty Map | omitted from output objects |
API
arrowIpcToJson(data: Buffer): string
Converts Arrow IPC bytes to a JSON array of row objects.
- data —
Buffercontaining Arrow IPC bytes (file or stream format) - Returns — JSON string:
[{"col": val, ...}, ...] - Throws — if the input is not valid Arrow IPC data
arrowIpcToJsonColumns(data: Buffer): string
Converts Arrow IPC bytes to a columnar JSON object.
- data —
Buffercontaining Arrow IPC bytes (file or stream format) - Returns — JSON string:
{"col": [v1, v2, ...], ...} - Throws — if the input is not valid Arrow IPC data
Performance
The Rust implementation uses several optimizations for throughput:
- Direct JSON writing — JSON is written directly to a pre-allocated
Stringbuffer, bypassing any intermediate value tree. - Pre-downcast columns — Arrow column types are resolved once per batch into a
ColWriterenum, eliminating per-cell dynamic dispatch anddowncast_refin the hot loop. - Specialized fast paths — Common column patterns (
Map<Utf8, Utf8>,List<Int64>) have dedicated write functions that skip generic dispatch. - Fast number formatting — Uses
itoaandryufor integer and float serialization. - Pre-computed column metadata — JSON-escaped column keys and skip-check flags are computed once per batch.
Development
yarn install
yarn build # native release build
yarn build:debug # native debug build
yarn test # run tests
yarn bench # run benchmarksRelease
Pushing to main triggers CI. The publish job inspects the last commit message:
- Exact semver (e.g.
0.2.0) → publishes to npm aslatest - Semver with pre-release suffix (e.g.
0.2.0-beta.1) → publishes withnexttag - Anything else → skips publishing
To release a new version:
npm version patch # or minor / major
git push && git push --tagsLicense
MIT
