npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

stratum-duckdb

v0.1.6

Published

Run analytical SQL queries against Parquet, CSV and JSON files on any static website. Part of the Stratum toolkit.

Downloads

36

Readme

stratum-duckdb

Run full analytical SQL against Parquet, CSV, and JSON files on any static website — no server required. Powered by DuckDB WASM.

Part of the Stratum toolkit for publishing research data on static websites. See STRATUM.md for a comparison of stratum-duckdb and stratum-sqlite and guidance on choosing the right library for your dataset.


How it works

DuckDB WASM runs the full DuckDB analytical engine in the browser using WebAssembly. stratum-duckdb wraps the DuckDB WASM API with a simple open() / query() interface and adds file registration helpers for both local and remote files.

For Parquet files on remote servers, DuckDB uses HTTP range requests to fetch only the row groups and columns each query needs — even for files hundreds of megabytes in size.

Browser                                     Remote server (GitHub / Zenodo / S3)
  │                                               │
  │  SELECT country, AVG(life_exp) …              │
  │  FROM 'https://…/indicators.parquet'          │
  │  WHERE year = 2022                            │
  │                                               │
  ├── GET …indicators.parquet  Range: 0-7         │  ← Parquet footer (metadata)
  │   ← 404 bytes              ────────────────►  │
  │                                               │
  ├── GET …indicators.parquet  Range: 512-8192    │  ← Only year+country columns
  │   ← 7680 bytes             ────────────────►  │
  │                                               │
  │  Result: [{country:"NOR", avg:83.2}, …]       │

Quick start

For the demo site (local preview)

git clone https://github.com/stratum-toolkit/stratum-duckdb
cd stratum-duckdb
node build.mjs   # build the library
bash setup.sh    # download DuckDB binary files into docs/libs/duckdb/
python3 -m http.server 8000 --directory docs
# open http://localhost:8000

setup.sh downloads both the mvp and eh DuckDB WASM bundles from jsDelivr and copies the built library into docs/libs/duckdb/. These binary files are in .gitignore — the CI pipeline downloads them fresh on every deploy.

Option A — Self-hosted (for your own project)

Step 1: Download the DuckDB binary files into your project. Both mvp and eh bundles are needed — selectBundle() picks the right one at runtime based on browser capabilities:

mkdir -p libs/duckdb
BASE="https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/dist"
curl -sSfL "$BASE/duckdb-browser-mvp.worker.js" -o libs/duckdb/duckdb-browser-mvp.worker.js
curl -sSfL "$BASE/duckdb-mvp.wasm"              -o libs/duckdb/duckdb-mvp.wasm
curl -sSfL "$BASE/duckdb-browser-eh.worker.js"  -o libs/duckdb/duckdb-browser-eh.worker.js
curl -sSfL "$BASE/duckdb-eh.wasm"               -o libs/duckdb/duckdb-eh.wasm

Why only binary files? The DuckDB main JS module is an ES module that cannot be served as a plain <script> tag. It is loaded automatically from jsDelivr CDN via dynamic import() at runtime (~100 KB compressed, CDN-cached). Only the binary companion files need to be self-hosted.

Step 2: Download stratum-duckdb.esm.js from the Releases page and place it alongside the DuckDB files:

libs/
└── duckdb/
    ├── duckdb-browser-mvp.worker.js
    ├── duckdb-mvp.wasm
    ├── duckdb-browser-eh.worker.js
    ├── duckdb-eh.wasm
    └── stratum-duckdb.esm.js

Step 3: Use in your HTML:

<!-- All pages need type="module" for dynamic import() to work -->
<script type="module">
  const StratumDuckDB =
    (await import('libs/duckdb/stratum-duckdb.esm.js')).default;

  const db = await StratumDuckDB.open({
    duckdbPath: 'libs/duckdb/',   // folder with worker.js and .wasm files
  });

  // Register local files (served from the same site)
  await db.registerFile('countries',  'data/countries.parquet');
  await db.registerFile('indicators', 'data/indicators.parquet');

  // Full analytical SQL
  const rows = await db.query(`
    SELECT
      c.name,
      i.life_expectancy,
      RANK() OVER (ORDER BY i.life_expectancy DESC) AS rank
    FROM countries c
    JOIN indicators i ON c.iso3 = i.country_iso
    WHERE i.year = 2022
    ORDER BY rank
  `);
  console.log(rows);
</script>

Option B — npm + bundler

npm install stratum-duckdb
import StratumDuckDB from 'stratum-duckdb';

const db = await StratumDuckDB.open({ duckdbPath: '/libs/duckdb/' });
await db.registerFile('countries', 'data/countries.parquet');
const rows = await db.query('SELECT * FROM countries');

API

StratumDuckDB.open(options)Promise<Database>

Initialises DuckDB WASM and returns a Database.

| Option | Type | Default | Description | |--------|------|---------|-------------| | duckdbPath | string | jsDelivr CDN | Path to folder containing the DuckDB .worker.js and .wasm binary files. | | siteRoot | string | location.href directory | Base URL for resolving relative paths in registerFile(). |

db.query(sql)Promise<Array<Object>>

Run any SQL and return rows as plain objects.

await db.query(`
  SELECT subregion, ROUND(AVG(gdp_usd_bn),1) AS avg_gdp
  FROM countries
  GROUP BY subregion
  ORDER BY avg_gdp DESC
`)
// → [{ subregion: "Northern America", avg_gdp: 9200.3 }, …]

db.registerFile(alias, filePath)Promise<void>

Fetch a local file (served from the same site) and register it as a named view. Supported formats: .parquet, .csv, .json.

await db.registerFile('my_data', 'data/my_data.parquet')
await db.query('SELECT * FROM my_data LIMIT 10')

db.registerRemote(alias, url)Promise<void>

Register a remote HTTPS file as a named view. DuckDB fetches only what each query needs via HTTP range requests — the file is never fully downloaded.

await db.registerRemote('indicators',
  'https://github.com/org/repo/releases/download/v1/indicators.parquet')
await db.query('SELECT * FROM indicators WHERE year = 2022')

db.views()Promise<string[]>

List all registered view names.

db.describe(name)Promise<Array<Object>>

Return column types for a view or table (equivalent to DESCRIBE).

db.count(name)Promise<number>

Return total row count.

db.close()Promise<void>

Close the connection and free resources.


Quarto / ObservableJS integration

_duckdb-init.html

Add this file to your Quarto project root and include it via _quarto.yml. It loads stratum-duckdb via import() and exposes a ready Database as a Promise that OJS cells can safely await.

<script type="module">
(async function () {
  // Detect page depth from Quarto's injected site_libs/ script path.
  // Works for pages at any subdirectory depth without hardcoding.
  const siteLibScript = document.querySelector('script[src*="site_libs/"]');
  const root = siteLibScript
    ? siteLibScript.getAttribute('src').replace(/site_libs\/.*$/, '')
    : '';

  window._duckdbBase = root + 'libs/duckdb/';
  window._dataRoot   = root + 'data/';

  // Expose the initialised Database as a Promise. OJS cells start evaluating
  // before scripts finish loading, so a plain global is unreliable.
  window._stratumDuckDB = (async () => {
    const StratumDuckDB =
      (await import(window._duckdbBase + 'stratum-duckdb.esm.js')).default;
    return StratumDuckDB.open({ duckdbPath: window._duckdbBase });
  })();
}());
</script>

_quarto.yml

format:
  html:
    include-in-header: _duckdb-init.html
project:
  resources:
    - libs/duckdb/   # must contain worker.js, .wasm, and stratum-duckdb.esm.js
    - data/

Any .qmd page

// Await the Database Promise before registering files and querying.
// Other OJS cells that reference db receive the resolved value automatically.
db = {
  const database = await window._stratumDuckDB;
  await database.registerFile('indicators', 'indicators.csv');
  return database;
}

rows = db.query("SELECT * FROM indicators WHERE year = 2022 ORDER BY life_expectancy DESC")
Inputs.table(await rows)

Creating Parquet files for your research data

Python (pyarrow)

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.read_csv("my_data.csv")
pq.write_table(pa.Table.from_pandas(df), "docs/data/my_data.parquet",
               compression="snappy")

R (arrow)

library(arrow)
df <- read.csv("my_data.csv")
write_parquet(df, "docs/data/my_data.parquet")

From an existing SQLite database

import sqlite3, pandas as pd, pyarrow.parquet as pq, pyarrow as pa

con = sqlite3.connect("my.sqlite")
for table in ["countries", "indicators"]:
    df = pd.read_sql(f"SELECT * FROM {table}", con)
    pq.write_table(pa.Table.from_pandas(df),
                   f"docs/data/{table}.parquet", compression="snappy")

Hosting your data

| Host | Free | Parquet range requests | |------|------|----------------------| | GitHub Pages (same origin) | ✓ | ✓ | | GitHub Releases | ✓ | ✓ | | Zenodo | ✓ | ✓ | | Hugging Face Datasets | ✓ | ✓ | | AWS Open Data / S3 | varies | ✓ (configure CORS) | | Cloudflare R2 | free tier | ✓ (configure CORS) |


Developing this library

git clone https://github.com/stratum-toolkit/stratum-duckdb
cd stratum-duckdb

# Build dist bundles
node build.mjs

# Create demo data (CSV + Parquet)
python3 scripts/create_demo_data.py
# or
Rscript scripts/create_demo_data.R

# Download DuckDB binary files and copy the built library (one command)
bash setup.sh

# Serve demo site
python3 -m http.server 8000 --directory docs
# open http://localhost:8000

setup.sh downloads both mvp and eh DuckDB WASM bundles from jsDelivr into docs/libs/duckdb/ and copies the built library there. These binary files are in .gitignore — the CI pipeline downloads them fresh on every deploy.

To force a re-download of all binary files: bash setup.sh --force


License

MIT