npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

geoparquet-extractor

v0.2.2

Published

Extract and convert spatial data from remote GeoParquet files. Supports bbox filtering, multiple output formats, and pluggable metadata providers.

Readme

geoparquet-extractor

npm version GitHub release

Extract and convert spatial data from remote GeoParquet files entirely in the browser, using OPFS (Origin Private File System) for temporary storage and spill-over. Supports bbox filtering, multiple output formats, and pluggable metadata providers.

Browser-only — requires Origin Private File System (OPFS), Web Workers, and Web Locks APIs.

How It Works

The library queries remote GeoParquet files using HTTP range requests — only the data matching the requested bounding box is transferred. All processing happens client-side with no backend involved.

Browser
  ├─ DuckDB-WASM
  │   ├─ HTTP range requests → remote GeoParquet
  │   ├─ Spatial filtering by bounding box
  │   ├─ OPFS temp directory for spill-over
  │   └─ COPY TO OPFS (intermediate or final output)
  └─ GeoPackage Worker (for .gpkg only)
      ├─ hyparquet (reads intermediate parquet from OPFS)
      └─ wa-sqlite/sqwab (writes .gpkg with R-tree index to OPFS)

Key Internal Dependencies

  • DuckDB-WASM with OPFS temp directory — uses duckdb-wasm-opfs-tempdir which supports SET temp_directory = 'opfs://...' for processing datasets larger than available memory. DuckDB's spatial extension is loaded at runtime for geometry operations (ST_AsWKB, ST_AsGeoJSON, ST_Hilbert, etc.).
  • sqwab — wa-sqlite with R-tree support for GeoPackage output. Runs in a dedicated Web Worker using OPFSAdaptiveVFS for file I/O.
  • hyparquet — pure-JS parquet reader used in the GeoPackage worker to read intermediate files from OPFS.
  • Apache Arrow — columnar data handling.

Used By

Installation

npm install geoparquet-extractor

Quick Start

import { GeoParquetExtractor, createDuckDBClient } from 'geoparquet-extractor';

// You initialize DuckDB yourself
import * as duckdb from 'duckdb-wasm-opfs-tempdir';

const db = /* your initialized AsyncDuckDB instance */;
const client = await createDuckDBClient(db, {
  extensions: ['spatial', 'httpfs'],
});

const extractor = new GeoParquetExtractor({ duckdb: client });

await extractor.extract({
  urls: ['https://example.com/data.parquet'],
  bbox: [77.5, 12.9, 77.7, 13.1],
  format: 'geoparquet',
  baseName: 'my-data',
  onProgress: (pct) => console.log(`${pct}%`),
  onStatus: (msg) => console.log(msg),
});

Features

  • 9 output formats: GeoParquet (v1.1 & v2.0), GeoPackage, Shapefile, CSV, GeoJSON, GeoJSONSeq, KML, DXF
  • Spatial filtering: Bbox intersection with per-partition and per-row-group optimization
  • Pluggable metadata: Override how partition URLs and bboxes are resolved
  • Extent visualization data: Fetch partition/row-group bboxes as GeoJSON for map display
  • DuckDB-powered: Spatial SQL queries via DuckDB WASM (you provide the instance)
  • Self-contained GeoPackage worker: wa-sqlite-rtree bundled into the worker — no CDN needed

Formats

| Format | Value | Extension | Notes | |--------|-------|-----------|-------| | GeoPackage | geopackage | .gpkg | Requires GeoPackage worker | | GeoJSON | geojson | .geojson | FeatureCollection | | GeoJSONSeq | geojsonseq | .geojsonl | Newline-delimited | | GeoParquet v1.1 | geoparquet | .parquet | With Hilbert spatial sort | | GeoParquet v2.0 | geoparquet2 | .parquet | Native geometry encoding | | CSV | csv | .csv | WKT geometry column | | Shapefile | shapefile | .shp | 2 GB limit per component | | KML | kml | .kml | XML format | | DXF | dxf | .dxf | AutoCAD R14, UTM projection |

DuckDB Setup

The library does NOT bundle DuckDB WASM. You initialize it yourself and pass it in:

import { createDuckDBClient } from 'geoparquet-extractor';
import * as duckdb from 'duckdb-wasm-opfs-tempdir';

// Standard duckdb-wasm-opfs-tempdir init
const MANUAL_BUNDLES = { /* your bundle config */ };
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
const worker = new Worker(bundle.mainWorker);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

// Wrap it for the library
const client = await createDuckDBClient(db, {
  extensions: ['spatial', 'httpfs'],
});

Custom DuckDB Builds

The duckdb-wasm-opfs-tempdir package supports SET temp_directory = 'opfs://...' for large downloads that exceed browser memory limits. The library's createDuckDBClient adapter works with any DuckDB WASM build that provides AsyncDuckDB.

GeoPackage Worker

The GeoPackage format requires a Web Worker for wa-sqlite. The library ships a self-contained worker with wa-sqlite-rtree bundled in:

// Option 1: URL to hosted worker
const extractor = new GeoParquetExtractor({
  duckdb: client,
  gpkgWorkerUrl: '/workers/gpkg_worker.js',
});

// Option 2: Worker instance (import.meta.url resolves to dist/gpkg_worker.js)
const worker = new Worker(new URL('geoparquet-extractor/gpkg-worker', import.meta.url), { type: 'module' });
const extractor = new GeoParquetExtractor({
  duckdb: client,
  gpkgWorker: worker,
});

Note: The worker requires wa-sqlite-async.wasm to be served from the same directory as gpkg_worker.js. Both files are included in the dist/ directory.

Custom Metadata Provider

Override how partition URLs and bboxes are resolved:

import { MetadataProvider, GeoParquetExtractor } from 'geoparquet-extractor';

class MyMetadataProvider extends MetadataProvider {
  async getParquetUrls(sourceUrl) {
    const meta = await fetch(sourceUrl + '.meta.json').then(r => r.json());
    const baseUrl = sourceUrl.replace(/[^/]+$/, '');
    return Object.keys(meta.extents).map(f => baseUrl + f);
  }

  async getExtents(sourceUrl) {
    const meta = await fetch(sourceUrl + '.meta.json').then(r => r.json());
    return meta.extents; // { "file.parquet": [minx, miny, maxx, maxy] }
  }

  async getBbox(sourceUrl, duckdb) {
    const extents = await this.getExtents(sourceUrl);
    // Compute overall bbox from all partition extents
    let bbox = [Infinity, Infinity, -Infinity, -Infinity];
    for (const ext of Object.values(extents)) {
      bbox[0] = Math.min(bbox[0], ext[0]);
      bbox[1] = Math.min(bbox[1], ext[1]);
      bbox[2] = Math.max(bbox[2], ext[2]);
      bbox[3] = Math.max(bbox[3], ext[3]);
    }
    return bbox;
  }
}

const extractor = new GeoParquetExtractor({
  duckdb: client,
  metadataProvider: new MyMetadataProvider(),
});

Extent Visualization

Fetch partition and row-group bboxes as GeoJSON for map display:

import { ExtentData, MetadataProvider } from 'geoparquet-extractor';

const extentData = new ExtentData({
  metadataProvider: new MetadataProvider(),
  duckdb: client,
});

const { dataExtents, rgExtents } = await extentData.fetchExtents({
  sourceUrl: 'https://example.com/data.mosaic.json',
  partitioned: true,
});
// dataExtents: { filename: [minx, miny, maxx, maxy] } or null
// rgExtents: { filename: { rg_N: [minx, miny, maxx, maxy] } } or null

API

GeoParquetExtractor

Main orchestrator class.

  • constructor({ duckdb, metadataProvider?, gpkgWorkerUrl?, gpkgWorker?, memoryLimitMB? })
  • async prepare(options) → Returns format handler for inspection before download
  • async download(handler, { baseName, onProgress?, onStatus? }) → Execute download, returns boolean
  • async extract(options) → Convenience: prepare + download in one call
  • cancel() → Cancel in-flight download
  • static async cleanupOrphanedFiles() → Clean up OPFS files from dead sessions
  • static getDownloadBaseName(sourceName, bbox) → Generate suggested filename

ExtentData

Data-fetching for partition/row-group bboxes.

  • constructor({ metadataProvider, duckdb? })
  • async fetchExtents({ sourceUrl, partitioned?, includeRowGroups?, onStatus? }){ dataExtents, rgExtents }

MetadataProvider

Base class with working defaults. Override to customize metadata resolution.

  • getParquetUrl(sourceUrl)string — resolve source URL to parquet URL (default: identity)
  • async getParquetUrls(sourceUrl)string[]
  • async getExtents(sourceUrl){ filename: [minx, miny, maxx, maxy] } or null
  • async getBbox(parquetUrl, duckdb)[minx, miny, maxx, maxy] or null
  • async getRowGroupBboxes(parquetUrl, duckdb){ rg_N: bbox } or null
  • async getRowGroupBboxesMulti(urls, duckdb){ filename: { rg_N: bbox } } or null

createDuckDBClient(db, options?)

Wraps an AsyncDuckDB instance into the library's DuckDBClient interface.

initDuckDB(duckdbDist, options?)

Creates a DuckDBClient by loading DuckDB-WASM from a distribution URL. Handles bundle selection, worker creation, and WASM instantiation.

proxyUrl(url)

Returns the proxied version of a URL (complement to setProxyUrl).

Format Handlers

Base class and per-format subclasses for advanced usage and subclassing. Normally you don't need these directly — GeoParquetExtractor.prepare() creates them for you.

  • FormatHandler — Base class. Manages OPFS file lifecycle, DuckDB queries, bbox filtering, and progress tracking.
    • getExpectedBrowserStorageUsage() → Expected peak OPFS usage in bytes
    • getTotalExpectedDiskUsage() → Total expected disk usage including downloads
    • getFormatWarning() → Format-specific warning string, or null
    • getDownloadMap(baseName) → List of downloadable files
    • async write(callbacks) → Run the format handler's write pipeline
    • async triggerDownload(baseName, cleanupDelayMs?) → Trigger browser download(s)
    • async cleanup() → Clean up all OPFS files belonging to this session
    • cancel() → Cancel the operation
  • CsvFormatHandler — CSV with WKT geometry column
  • GeoJsonFormatHandler — GeoJSON / GeoJSONSeq ({ commaSeparated? })
  • GeoParquetFormatHandler — GeoParquet v1.1 or v2.0 ({ version? })
  • GeoPackageFormatHandler — GeoPackage via wa-sqlite worker ({ gpkgWorker? })
  • ShapefileFormatHandler — Shapefile (.shp/.dbf/.shx/.prj)
  • KmlFormatHandler — KML (Keyhole Markup Language)
  • DxfFormatHandler — DXF (AutoCAD R14, UTM projection)

Utilities

  • formatSize(bytes) → Human-readable string (e.g., "1.5 MB")
  • async getStorageEstimate() → Browser storage quota and usage via navigator.storage
  • SizeGetter — Fetches and caches file sizes via HEAD requests through the configured proxy

CORS Proxy

If your parquet files need a CORS proxy:

import { setProxyUrl } from 'geoparquet-extractor';

// Set a custom proxy URL transformer
setProxyUrl((url) => `/proxy?url=${encodeURIComponent(url)}`);

License

Unlicense — public domain.