npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

parquet-tool

v1.2.1

Published

Parquet file processing tool with C++ native addon — read, write, append, split and parallel process Parquet files

Readme

parquet-tool

npm version license build status

以 TypeScript + C++ Native Addon 打造的 Parquet 處理工具包。 本專案不依賴現有的 npm parquet 套件;核心 Parquet 讀寫邏輯直接在此儲存庫中實作。

語言文件:

功能

  • 建立並讀取/寫入 Parquet 檔案
  • Append 模式(在較早的需求中稱為 "apply"),可為既有檔案新增新的 row group
  • 合併多個 Parquet 檔案,並檢查 schema 相容性
  • 驗證 Parquet 結構、中繼資料與 row group
  • CSV 與 Parquet 互轉
  • Apache Arrow IPC 與 Parquet 互轉
  • 將大型 Parquet 檔案切分為較小檔案
  • 提供平行讀取/處理/寫入輔助工具
  • CLI 與函式庫 API 都支援除錯模式
  • CLI 指令:inforeadwriteappendsplitmergevalidatecsv-to-parquetparquet-to-csvarrow-to-parquetparquet-to-arrow
  • 提供 Docker Compose viewer 以便快速檢查結果

快速開始

npm install parquet-tool

範例

儲存庫另外在 examples/ 下提供可直接執行的腳本,並依工作流程分資料夾整理。

1. 基本寫入、讀取與 append

import { ParquetReader, ParquetWriter, Schema } from 'parquet-tool';

const schema = Schema.create({
  id: 'INT32',
  name: 'STRING',
  score: { type: 'DOUBLE', optional: true },
});

const writer = new ParquetWriter('output.parquet', schema);
writer.write([
  { id: 1, name: 'Alice', score: 98.5 },
  { id: 2, name: 'Bob' },
]);
writer.close();

const appender = ParquetWriter.openForAppend('output.parquet');
appender.write({ id: 3, name: 'Charlie', score: 75.0 });
appender.close();

const reader = ParquetReader.open('output.parquet');
const all = reader.readAll();
console.log(all.numRows, all.columns);
reader.close();

2. 合併與驗證

import { mergeParquetFiles, validateParquetFile } from 'parquet-tool';

mergeParquetFiles(['part-1.parquet', 'part-2.parquet'], 'merged.parquet');

const report = validateParquetFile('merged.parquet');
if (!report.valid) {
  console.error(report.issues);
}

3. CSV 與 Arrow 轉換

import {
  arrowToParquet,
  csvToParquet,
  parquetToArrow,
  parquetToCsv,
} from 'parquet-tool';

csvToParquet('input.csv', 'input.parquet');
parquetToCsv('input.parquet', 'roundtrip.csv');

parquetToArrow('input.parquet', 'input.arrow');
arrowToParquet('input.arrow', 'from-arrow.parquet');

4. 切檔與平行處理

import {
  parallelProcess,
  parallelRead,
  splitParquetFile,
} from 'parquet-tool';

const files = splitParquetFile('large.parquet', {
  maxRowsPerFile: 100_000,
  outputDir: './parts',
  prefix: 'large',
});
console.log(files);

const combined = await parallelRead('large.parquet', { concurrency: 4 });
console.log(combined.numRows);

const names = await parallelProcess(
  'large.parquet',
  (rows) => rows.map((row) => String(row.name ?? '')),
  { concurrency: 4 },
);
console.log(names.length);

5. 執行儲存庫內建範例

npx ts-node examples/basic-read-write/index.ts
npx ts-node examples/merge-and-validate/index.ts
npx ts-node examples/conversions/index.ts
npx ts-node examples/split-and-parallel/index.ts
npx ts-node examples/buffer-roundtrip/index.ts

可用的範例資料夾:

  • examples/basic-read-write/
  • examples/merge-and-validate/
  • examples/conversions/
  • examples/split-and-parallel/
  • examples/buffer-roundtrip/

舊版的 examples/example.ts 入口仍可使用,並會轉送到基本範例。

CLI 用法

# 中繼資料
npx parquet-tool info data.parquet

# 讀取資料列
npx parquet-tool read data.parquet --json
npx parquet-tool read data.parquet --limit 50

# 從 JSON 寫入
npx parquet-tool write out.parquet -i input.json -s "id:INT32,name:STRING"

# Append 資料列
npx parquet-tool append out.parquet -i more.json

# 切檔 / 合併
npx parquet-tool split large.parquet -n 10000 -o ./output
npx parquet-tool merge merged.parquet part1.parquet part2.parquet

# 驗證
npx parquet-tool validate merged.parquet

# CSV <-> Parquet
npx parquet-tool csv-to-parquet input.csv output.parquet
npx parquet-tool parquet-to-csv output.parquet output.csv

# Arrow <-> Parquet
npx parquet-tool arrow-to-parquet input.arrow output.parquet
npx parquet-tool parquet-to-arrow output.parquet output.arrow

# 除錯模式
npx parquet-tool --debug validate data.parquet

Docker Viewer

mkdir -p data
cp your_file.parquet data/
docker-compose up --build

開啟 http://localhost:8080

開發

npm install
npm run build:native
npm run build:ts
npm test
npm run clean

發佈

本專案使用 Commitizen + semantic-release。

npm run cz
npm run release

已設定的 semantic-release 外掛:

  • @semantic-release/commit-analyzer
  • @semantic-release/release-notes-generator
  • @semantic-release/changelog
  • @semantic-release/npm
  • @semantic-release/github
  • @semantic-release/git

分支策略:

  • main:穩定版本發佈

支援型別

| Parquet 型別 | TypeScript 型別 | 說明 | |---|---|---| | BOOLEAN | boolean | 布林值 | | INT32 | number | 32 位元整數 | | INT64 | bigint | 64 位元整數 | | FLOAT | number | 32 位元浮點數 | | DOUBLE | number | 64 位元浮點數 | | BYTE_ARRAY | string | UTF-8 字串 |

授權

MIT