stratum-duckdb
v0.1.6
Published
Run analytical SQL queries against Parquet, CSV and JSON files on any static website. Part of the Stratum toolkit.
Downloads
36
Maintainers
Readme
stratum-duckdb
Run full analytical SQL against Parquet, CSV, and JSON files on any static website — no server required. Powered by DuckDB WASM.
Part of the Stratum toolkit for publishing research data on static websites. See STRATUM.md for a comparison of stratum-duckdb and stratum-sqlite and guidance on choosing the right library for your dataset.
How it works
DuckDB WASM runs the full DuckDB analytical engine in the browser using
WebAssembly. stratum-duckdb wraps the DuckDB WASM API with a simple
open() / query() interface and adds file registration helpers for both
local and remote files.
For Parquet files on remote servers, DuckDB uses HTTP range requests to fetch only the row groups and columns each query needs — even for files hundreds of megabytes in size.
Browser Remote server (GitHub / Zenodo / S3)
│ │
│ SELECT country, AVG(life_exp) … │
│ FROM 'https://…/indicators.parquet' │
│ WHERE year = 2022 │
│ │
├── GET …indicators.parquet Range: 0-7 │ ← Parquet footer (metadata)
│ ← 404 bytes ────────────────► │
│ │
├── GET …indicators.parquet Range: 512-8192 │ ← Only year+country columns
│ ← 7680 bytes ────────────────► │
│ │
│ Result: [{country:"NOR", avg:83.2}, …] │Quick start
For the demo site (local preview)
git clone https://github.com/stratum-toolkit/stratum-duckdb
cd stratum-duckdb
node build.mjs # build the library
bash setup.sh # download DuckDB binary files into docs/libs/duckdb/
python3 -m http.server 8000 --directory docs
# open http://localhost:8000setup.sh downloads both the mvp and eh DuckDB WASM bundles from jsDelivr
and copies the built library into docs/libs/duckdb/. These binary files are
in .gitignore — the CI pipeline downloads them fresh on every deploy.
Option A — Self-hosted (for your own project)
Step 1: Download the DuckDB binary files into your project.
Both mvp and eh bundles are needed — selectBundle() picks the right one
at runtime based on browser capabilities:
mkdir -p libs/duckdb
BASE="https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/dist"
curl -sSfL "$BASE/duckdb-browser-mvp.worker.js" -o libs/duckdb/duckdb-browser-mvp.worker.js
curl -sSfL "$BASE/duckdb-mvp.wasm" -o libs/duckdb/duckdb-mvp.wasm
curl -sSfL "$BASE/duckdb-browser-eh.worker.js" -o libs/duckdb/duckdb-browser-eh.worker.js
curl -sSfL "$BASE/duckdb-eh.wasm" -o libs/duckdb/duckdb-eh.wasmWhy only binary files? The DuckDB main JS module is an ES module that cannot be served as a plain
<script>tag. It is loaded automatically from jsDelivr CDN via dynamicimport()at runtime (~100 KB compressed, CDN-cached). Only the binary companion files need to be self-hosted.
Step 2: Download stratum-duckdb.esm.js from the
Releases page and
place it alongside the DuckDB files:
libs/
└── duckdb/
├── duckdb-browser-mvp.worker.js
├── duckdb-mvp.wasm
├── duckdb-browser-eh.worker.js
├── duckdb-eh.wasm
└── stratum-duckdb.esm.jsStep 3: Use in your HTML:
<!-- All pages need type="module" for dynamic import() to work -->
<script type="module">
const StratumDuckDB =
(await import('libs/duckdb/stratum-duckdb.esm.js')).default;
const db = await StratumDuckDB.open({
duckdbPath: 'libs/duckdb/', // folder with worker.js and .wasm files
});
// Register local files (served from the same site)
await db.registerFile('countries', 'data/countries.parquet');
await db.registerFile('indicators', 'data/indicators.parquet');
// Full analytical SQL
const rows = await db.query(`
SELECT
c.name,
i.life_expectancy,
RANK() OVER (ORDER BY i.life_expectancy DESC) AS rank
FROM countries c
JOIN indicators i ON c.iso3 = i.country_iso
WHERE i.year = 2022
ORDER BY rank
`);
console.log(rows);
</script>Option B — npm + bundler
npm install stratum-duckdbimport StratumDuckDB from 'stratum-duckdb';
const db = await StratumDuckDB.open({ duckdbPath: '/libs/duckdb/' });
await db.registerFile('countries', 'data/countries.parquet');
const rows = await db.query('SELECT * FROM countries');API
StratumDuckDB.open(options) → Promise<Database>
Initialises DuckDB WASM and returns a Database.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| duckdbPath | string | jsDelivr CDN | Path to folder containing the DuckDB .worker.js and .wasm binary files. |
| siteRoot | string | location.href directory | Base URL for resolving relative paths in registerFile(). |
db.query(sql) → Promise<Array<Object>>
Run any SQL and return rows as plain objects.
await db.query(`
SELECT subregion, ROUND(AVG(gdp_usd_bn),1) AS avg_gdp
FROM countries
GROUP BY subregion
ORDER BY avg_gdp DESC
`)
// → [{ subregion: "Northern America", avg_gdp: 9200.3 }, …]db.registerFile(alias, filePath) → Promise<void>
Fetch a local file (served from the same site) and register it as a named
view. Supported formats: .parquet, .csv, .json.
await db.registerFile('my_data', 'data/my_data.parquet')
await db.query('SELECT * FROM my_data LIMIT 10')db.registerRemote(alias, url) → Promise<void>
Register a remote HTTPS file as a named view. DuckDB fetches only what each query needs via HTTP range requests — the file is never fully downloaded.
await db.registerRemote('indicators',
'https://github.com/org/repo/releases/download/v1/indicators.parquet')
await db.query('SELECT * FROM indicators WHERE year = 2022')db.views() → Promise<string[]>
List all registered view names.
db.describe(name) → Promise<Array<Object>>
Return column types for a view or table (equivalent to DESCRIBE).
db.count(name) → Promise<number>
Return total row count.
db.close() → Promise<void>
Close the connection and free resources.
Quarto / ObservableJS integration
_duckdb-init.html
Add this file to your Quarto project root and include it via _quarto.yml.
It loads stratum-duckdb via import() and exposes a ready Database as a
Promise that OJS cells can safely await.
<script type="module">
(async function () {
// Detect page depth from Quarto's injected site_libs/ script path.
// Works for pages at any subdirectory depth without hardcoding.
const siteLibScript = document.querySelector('script[src*="site_libs/"]');
const root = siteLibScript
? siteLibScript.getAttribute('src').replace(/site_libs\/.*$/, '')
: '';
window._duckdbBase = root + 'libs/duckdb/';
window._dataRoot = root + 'data/';
// Expose the initialised Database as a Promise. OJS cells start evaluating
// before scripts finish loading, so a plain global is unreliable.
window._stratumDuckDB = (async () => {
const StratumDuckDB =
(await import(window._duckdbBase + 'stratum-duckdb.esm.js')).default;
return StratumDuckDB.open({ duckdbPath: window._duckdbBase });
})();
}());
</script>_quarto.yml
format:
html:
include-in-header: _duckdb-init.html
project:
resources:
- libs/duckdb/ # must contain worker.js, .wasm, and stratum-duckdb.esm.js
- data/Any .qmd page
// Await the Database Promise before registering files and querying.
// Other OJS cells that reference db receive the resolved value automatically.
db = {
const database = await window._stratumDuckDB;
await database.registerFile('indicators', 'indicators.csv');
return database;
}
rows = db.query("SELECT * FROM indicators WHERE year = 2022 ORDER BY life_expectancy DESC")
Inputs.table(await rows)Creating Parquet files for your research data
Python (pyarrow)
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
df = pd.read_csv("my_data.csv")
pq.write_table(pa.Table.from_pandas(df), "docs/data/my_data.parquet",
compression="snappy")R (arrow)
library(arrow)
df <- read.csv("my_data.csv")
write_parquet(df, "docs/data/my_data.parquet")From an existing SQLite database
import sqlite3, pandas as pd, pyarrow.parquet as pq, pyarrow as pa
con = sqlite3.connect("my.sqlite")
for table in ["countries", "indicators"]:
df = pd.read_sql(f"SELECT * FROM {table}", con)
pq.write_table(pa.Table.from_pandas(df),
f"docs/data/{table}.parquet", compression="snappy")Hosting your data
| Host | Free | Parquet range requests | |------|------|----------------------| | GitHub Pages (same origin) | ✓ | ✓ | | GitHub Releases | ✓ | ✓ | | Zenodo | ✓ | ✓ | | Hugging Face Datasets | ✓ | ✓ | | AWS Open Data / S3 | varies | ✓ (configure CORS) | | Cloudflare R2 | free tier | ✓ (configure CORS) |
Developing this library
git clone https://github.com/stratum-toolkit/stratum-duckdb
cd stratum-duckdb
# Build dist bundles
node build.mjs
# Create demo data (CSV + Parquet)
python3 scripts/create_demo_data.py
# or
Rscript scripts/create_demo_data.R
# Download DuckDB binary files and copy the built library (one command)
bash setup.sh
# Serve demo site
python3 -m http.server 8000 --directory docs
# open http://localhost:8000setup.sh downloads both mvp and eh DuckDB WASM bundles from jsDelivr
into docs/libs/duckdb/ and copies the built library there. These binary files
are in .gitignore — the CI pipeline downloads them fresh on every deploy.
To force a re-download of all binary files: bash setup.sh --force
License
MIT
