npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

delta-unity-duckdb

v1.0.7

Published

Read Databricks Delta tables from Azure storage through Unity Catalog using DuckDB

Readme

Delta Scanner

A standalone Delta Table Scanner using DuckDB for querying Delta tables through Unity Catalog from Node.js (designed with Azure Databricks in mind).

Features

  • Query Delta tables using DuckDB through Unity Catalog without a Databricks cluster
  • Support for Unity Catalog tables with automatic credential handling
  • ABFSS path support with SAS token authentication
  • Use as a CLI tool or import as a module in your Node.js applications
  • Custom SQL queries with table placeholder support

Requirements

  • Node.js 14 or higher
  • Internet access to Databricks Unity Catalog API (for UC tables)
  • No Azure credentials needed for UC tables (temporary SAS is requested automatically)
  • For direct ABFSS access only: an Azure Storage account name and a valid SAS token

Installation

Global Installation

npm install -g delta-unity-duckdb

Local Installation

npm install delta-unity-duckdb

Configuration

Delta Scanner can be configured using environment variables or CLI arguments:

Environment Variables

Create a .env file in your project root with the minimal variables for Unity Catalog tables (recommended):

DATABRICKS_WORKSPACE_URL=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-token-here

Direct ABFSS access (no Unity Catalog):

AZURE_STORAGE_ACCOUNT_NAME=your-storage-account-name
AZURE_STORAGE_SAS_TOKEN=your-sas-token

Notes for ABFSS:

  • These are only required if you pass an abfss:// table path directly and are NOT using Unity Catalog.
  • Provide the SAS token string itself (usually starts with sv=...), without the leading ?.

CLI Arguments

CLI arguments will override environment variables:

delta-unity-duckdb --table="catalog.schema.table_name" --databricks_host="https://your-workspace.cloud.databricks.com" --databricks_token="your-token"

ABFSS example (requires AZURE_STORAGE_* envs):

export AZURE_STORAGE_ACCOUNT_NAME=your-account
export AZURE_STORAGE_SAS_TOKEN='sv=...&ss=...&srt=...&sp=rl&se=...&st=...&spr=https&sig=...'
delta-unity-duckdb --table="abfss://[email protected]/path/to/delta-table" --limit=5 --format=table

Quickstart

  1. Create a .env file with the minimal required variables for Unity Catalog tables:
DATABRICKS_WORKSPACE_URL=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-databricks-personal-access-token

Tip: export your .env into the shell (mac/Linux):

set -a; source .env; set +a
  1. Run a query against a Unity Catalog table:
delta-unity-duckdb --table="$TABLE" --limit=5 --format=table

Example output (table format):

Table row count: 12345

Query Results:
Columns: [ 'id', 'name', 'value' ]
┌─────────┬─────┬───────────┬───────┐
│ (index) │ id  │   name    │ value │
├─────────┼─────┼───────────┼───────┤
│    0    │  1  │ 'alpha'   │  3.14 │
│    1    │  2  │ 'beta'    │  2.71 │
│    2    │  3  │ 'gamma'   │  1.62 │
│    3    │  4  │ 'delta'   │  0.58 │
│    4    │  5  │ 'epsilon' │  4.20 │
└─────────┴─────┴───────────┴───────┘
  1. Inspect schema (optional):
delta-unity-duckdb --table="catalog.schema.table_name" --schema

Usage

As a CLI Tool

After installing globally, you can use the delta-unity-duckdb command directly:

# Basic usage with .env file configuration (uses TABLE variable from .env)
delta-unity-duckdb --table="$TABLE" --limit=10

# Using explicit table name with CLI arguments for Databricks credentials
delta-unity-duckdb --table="catalog.schema.table_name" --databricks_host="https://your-workspace.cloud.databricks.com" --databricks_token="your-token" --limit=5

# Custom SQL query with explicit table name (IMPORTANT: use actual table name in --table parameter)
delta-unity-duckdb --table="catalog.schema.table_name" --query="SELECT column1, COUNT(*) as count FROM \$TABLE GROUP BY column1 ORDER BY count DESC LIMIT 5"

# Display results as a table instead of JSON
delta-unity-duckdb --table="catalog.schema.table_name" --format=table

# IMPORTANT: Table Parameter vs $TABLE Placeholder
# --table="actual.table.name"  <- Specifies which table to query
# \$TABLE in --query           <- Placeholder that gets replaced with delta_scan() function

How it works

Delta Scanner:

  • Loads DuckDB with the httpfs, delta, and azure extensions.
  • Looks up your Unity Catalog table and requests temporary credentials from the Databricks Unity Catalog API.
  • Creates a temporary DuckDB secret with the retrieved SAS token to securely access Azure Storage.
  • Resolves the Delta table location and runs your query via DuckDB (e.g., delta_scan(...)).
  • Closes connections when finished.

As a Module

const { DeltaScanner } = require('delta-unity-duckdb');

async function queryDeltaTable() {
  const scanner = new DeltaScanner();
  
  try {
    // Get table statistics
    const stats = await scanner.getTableStats('catalog.schema.table_name');
    console.log(`Table row count: ${stats.count}`);
    
    // Get schema information
    const schemaInfo = await scanner.getTableSchema('catalog.schema.table_name');
    console.log('Table Schema:', schemaInfo.schema);
    
    // Execute a simple query
    const rows = await scanner.query('catalog.schema.table_name', null, { limit: 5 });
    console.log('Query Results:', rows);
    
    // Execute a custom query
    const customResults = await scanner.query(
      'catalog.schema.table_name',
      'SELECT column1, COUNT(*) as count FROM $TABLE GROUP BY column1 ORDER BY count DESC LIMIT 5'
    );
    console.log('Custom Query Results:', customResults);
  } catch (error) {
    console.error('Error:', error.message);
  } finally {
    await scanner.close();
  }
}

queryDeltaTable();

Next.js / Webpack Integration

Webpack Bundling Issues

When using delta-unity-duckdb in Next.js applications, you must exclude it from client-side bundling since it contains Node.js-specific dependencies that cannot run in browsers.

Add to your next.config.js:

/** @type {import('next').NextConfig} */
const nextConfig = {
  // Exclude server-only packages from client bundles
  serverExternalPackages: [
    'duckdb', 
    '@duckdb/node-api', 
    '@mapbox/node-pre-gyp', 
    'node-gyp', 
    'delta-unity-duckdb'
  ],
  
  // Mark API routes as server-only
  experimental: {
    serverComponentsExternalPackages: ['delta-unity-duckdb']
  }
}

module.exports = nextConfig

In your API routes, add the server runtime directive:

// Mark this as server-only to prevent client bundling
export const runtime = 'nodejs';

import { DeltaScanner } from 'delta-unity-duckdb'; // This will work in API routes

Singleton Pattern for Concurrency

When using multiple DeltaScanner instances simultaneously (e.g., in a web server with concurrent requests), you may encounter DuckDB extension initialization conflicts and query deadlocks. Use a singleton pattern with query queuing to prevent these issues:

// lib/shared-scanner.js
const { DeltaScanner } = require('delta-unity-duckdb');

class DeltaScannerSingleton {
  constructor() {
    if (DeltaScannerSingleton.instance) {
      return DeltaScannerSingleton.instance;
    }
    
    this.scanner = null;
    this.isInitialized = false;
    this.initPromise = null;
    this.queryQueue = [];
    this.isProcessingQueue = false;
    
    DeltaScannerSingleton.instance = this;
  }

  static getInstance() {
    if (!DeltaScannerSingleton.instance) {
      DeltaScannerSingleton.instance = new DeltaScannerSingleton();
    }
    return DeltaScannerSingleton.instance;
  }

  async initialize() {
    if (this.isInitialized) return;
    
    if (this.initPromise) {
      return this.initPromise;
    }
    
    this.initPromise = this._initialize();
    return this.initPromise;
  }

  async _initialize() {
    try {
      this.scanner = new DeltaScanner();
      await this.scanner.initialize();
      this.isInitialized = true;
      console.log('[DELTA SINGLETON] Initialized successfully');
    } catch (error) {
      console.error('[DELTA SINGLETON] Initialization failed:', error);
      this.initPromise = null;
      throw error;
    }
  }

  async query(tableName, query) {
    await this.initialize();
    
    if (!this.scanner) {
      throw new Error('DeltaScanner not initialized');
    }

    // Queue the query to prevent concurrent access issues
    return new Promise((resolve, reject) => {
      this.queryQueue.push(async () => {
        try {
          const result = await this.scanner.query(tableName, query);
          resolve(result);
        } catch (error) {
          reject(error);
        }
      });
      
      // Use setImmediate to avoid blocking the event loop
      setImmediate(() => this.processQueue());
    });
  }

  async processQueue() {
    if (this.isProcessingQueue || this.queryQueue.length === 0) {
      return;
    }

    this.isProcessingQueue = true;
    
    try {
      while (this.queryQueue.length > 0) {
        const queryFn = this.queryQueue.shift();
        if (queryFn) {
          await queryFn();
        }
      }
    } finally {
      this.isProcessingQueue = false;
    }
  }

  async close() {
    if (this.scanner) {
      await this.scanner.close();
      this.scanner = null;
      this.isInitialized = false;
      this.initPromise = null;
    }
  }
}

// Export singleton instance
const sharedScanner = DeltaScannerSingleton.getInstance();

// Cleanup on process exit
process.on('exit', async () => {
  await sharedScanner.close();
});

process.on('SIGINT', async () => {
  await sharedScanner.close();
  process.exit(0);
});

module.exports = { sharedScanner };

Usage in API routes:

// app/api/your-endpoint/route.js
export const runtime = 'nodejs';

import { sharedScanner } from '../../../lib/shared-scanner';

export async function GET() {
  try {
    const data = await sharedScanner.query(
      'catalog.schema.table_name',
      'SELECT * FROM $TABLE LIMIT 10'
    );
    
    return Response.json({ success: true, data });
  } catch (error) {
    return Response.json({ error: error.message }, { status: 500 });
  }
  // Note: Don't call scanner.close() - let singleton manage lifecycle
}

Key Benefits of Singleton Pattern with Query Queue

  • Prevents DuckDB Extension Conflicts: Only one scanner initializes extensions
  • Improves Performance: Reuses connection across requests
  • Handles Concurrency: Query queue prevents DuckDB connection deadlocks
  • Sequential Processing: Queries execute one at a time to avoid conflicts
  • Automatic Cleanup: Properly closes connections on process exit

Important Notes

  • Query Queuing: The singleton processes queries sequentially to prevent DuckDB concurrency issues
  • Multiple Queries: Combine multiple COUNT/aggregate queries into single SQL statements using CASE expressions for better performance
  • Event Loop: Uses setImmediate() to avoid blocking the Node.js event loop during queue processing

API Reference

DeltaScanner Class

Constructor

const scanner = new DeltaScanner(options);

Options:

  • memory: Boolean, whether to use in-memory database (default: true)
  • extensions: Array of extensions to load (default: ['httpfs', 'delta', 'azure'])

Methods

  • async initialize(): Initialize DuckDB connection and load required extensions
  • async query(tablePath, sql = null, options = {}): Execute a SQL query on a Delta table
  • async getTableStats(tablePath): Get table statistics (row count)
  • async getTableSchema(tablePath): Get table schema information
  • async close(): Close the database connection

License

MIT