npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

json-to-arrow-ipc

v0.1.0

Published

Convert JSON data to Apache Arrow IPC format

Readme

json-to-arrow-ipc

Convert JSON data to Apache Arrow IPC format (without dictionary encoding, making it compatible with DuckDB's arrow extension).

Installation

npm install json-to-arrow-ipc

Why this package?

When using Apache Arrow's JavaScript library with tableFromJSON(), string columns are automatically dictionary-encoded. DuckDB's arrow extension doesn't support dictionary-encoded columns when reading Arrow IPC streams, resulting in errors like:

Schema message field with DictionaryEncoding not supported

This package provides functions that convert JSON to Arrow IPC format without dictionary encoding, making it fully compatible with DuckDB's arrow extension.

Features

  • No dictionary encoding - Compatible with DuckDB's shellfs extension
  • Automatic type inference - Samples data to determine optimal column types
  • Schema mismatch handling - Configurable behavior for dirty data
  • Nested object support - Flatten or serialize as JSON strings
  • Single-pass processing - Optimized for large datasets
  • Full TypeScript support - Comprehensive type definitions

Usage

Zero-Install Scripts with Bun

Bun features transparent dependency installation - when you run a TypeScript file, Bun automatically installs any missing npm packages on-the-fly. This means you can create standalone .ts scripts that "just work" without running npm install first.

Simply create a script file and run it directly:

#!/usr/bin/env bun

import { jsonToArrowIPC } from 'json-to-arrow-ipc';

const data = [
  { name: 'John', age: 30, city: 'New York' },
  { name: 'Jane', age: 25, city: 'Los Angeles' }
];

const ipc = jsonToArrowIPC(data);
process.stdout.write(ipc);
# No npm install needed - Bun handles it automatically!
./my-script.ts

This is especially powerful when combined with DuckDB's shellfs extension, enabling dynamic data pipelines with zero setup.

With DuckDB

Create a script (fetch-data.ts):

#!/usr/bin/env bun

import { jsonToArrowIPC } from 'json-to-arrow-ipc';

const response = await fetch('https://api.example.com/users');
const users = await response.json();

process.stdout.write(jsonToArrowIPC(users));

Then in DuckDB:

-- First, install and load required extensions
INSTALL shellfs FROM community;
INSTALL arrow FROM community;
LOAD shellfs;
LOAD arrow;
CREATE OR REPLACE MACRO bun(script, args := '') AS TABLE
SELECT * FROM read_arrow('bun ' || script || ' ' || args || ' |');

-- Query the data
SELECT * FROM bun('fetch-data.ts');

Configuration Options

import { jsonToArrowIPC, type JsonToArrowOptions } from 'json-to-arrow-ipc';

const options: JsonToArrowOptions = {
  // Number of rows to sample for schema inference (default: 100)
  schemaSampleSize: 50,

  // How to handle schema mismatches: 'error' | 'skip' | 'coerce' (default: 'coerce')
  onSchemaMismatch: 'skip',

  // Flatten nested objects with dot notation (default: false)
  flattenNestedObjects: true,

  // Serialize arrays as JSON strings (default: true)
  serializeArrays: true
};

const ipc = jsonToArrowIPC(data, options);

Getting Conversion Statistics

import { jsonToArrowTableWithStats } from 'json-to-arrow-ipc';

const { table, skippedRows, totalRows } = jsonToArrowTableWithStats(data, {
  onSchemaMismatch: 'skip'
});

console.log(`Converted ${totalRows - skippedRows}/${totalRows} rows`);

API Reference

jsonToArrowIPC(data, options?)

Converts JSON data directly to Arrow IPC stream format.

  • data: Array of JSON objects or a single object
  • options: Optional configuration (see below)
  • Returns: Uint8Array containing Arrow IPC stream bytes

jsonToArrowTable(data, options?)

Converts JSON data to an Apache Arrow Table.

  • data: Array of JSON objects or a single object
  • options: Optional configuration (see below)
  • Returns: Apache Arrow Table

jsonToArrowTableWithStats(data, options?)

Converts JSON data to an Arrow Table with conversion statistics.

  • Returns: { table: Table, skippedRows: number, totalRows: number }

Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | schemaSampleSize | number | 100 | Number of rows to sample for schema inference | | onSchemaMismatch | 'error' \| 'skip' \| 'coerce' | 'coerce' | How to handle schema mismatches | | flattenNestedObjects | boolean | false | Flatten nested objects with dot notation | | serializeArrays | boolean | true | Serialize arrays as JSON strings |

Type Mapping

| JSON Type | Arrow Type | |-----------|------------| | string | Utf8 | | ISO 8601 date strings | DateMillisecond | | number (integer, 32-bit) | Int32 | | number (integer, 64-bit) | Int64 | | number (float) | Float64 | | boolean | Bool | | null | nullable column | | Nested objects/arrays | Utf8 (JSON serialized) |

License

MIT