npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dotdo/deltalake

v0.0.1

Published

Delta Lake format implementation - shared foundation for MongoLake and KafkaLake

Readme

@dotdo/deltalake

Delta Lake format implementation - the shared foundation for MongoLake and KafkaLake.

Overview

@dotdo/deltalake provides:

  • Delta Lake Transaction Log: JSON-based transaction log for ACID operations
  • Storage Backends: R2, S3, filesystem, and memory implementations
  • Parquet Utilities: Streaming writer, variant encoding, zone maps
  • Query Layer: MongoDB-style operators with Parquet predicate pushdown
  • CDC Primitives: Unified change data capture format

Getting Started

Installation

npm install @dotdo/deltalake

Basic Usage

import { DeltaTable } from '@dotdo/deltalake/delta'
import { createStorage } from '@dotdo/deltalake/storage'

// Create a storage backend
// In Node.js, defaults to FileSystemStorage with ./.deltalake path
// In other environments (like Cloudflare Workers), you must specify the storage type explicitly
const storage = createStorage({ type: 'memory' })  // or { type: 'r2', bucket: env.MY_BUCKET }

// Create a new Delta table
const table = new DeltaTable(storage, 'my-table')

// Write data to the table
await table.write([
  { _id: '1', name: 'Alice', age: 30 },
  { _id: '2', name: 'Bob', age: 25 },
  { _id: '3', name: 'Charlie', age: 35 },
])

// Read data from the table
const records = await table.query()
console.log(records)
// [
//   { _id: '1', name: 'Alice', age: 30 },
//   { _id: '2', name: 'Bob', age: 25 },
//   { _id: '3', name: 'Charlie', age: 35 },
// ]

// Query with filters
const filtered = await table.query({ age: { $gte: 30 } })
console.log(filtered)
// [
//   { _id: '1', name: 'Alice', age: 30 },
//   { _id: '3', name: 'Charlie', age: 35 },
// ]

Architecture

@dotdo/deltalake              ← This package (foundation)
    ↑
    ├── @dotdo/mongolake      ← MongoDB API on Delta Lake
    └── @dotdo/kafkalake      ← Kafka API on Delta Lake

Integration with MongoLake and KafkaLake

This package provides the core Delta Lake foundation that powers both MongoLake and KafkaLake.

MongoLake

MongoLake (coming soon) uses @dotdo/deltalake for MongoDB-compatible document storage. It provides:

  • A MongoDB wire protocol implementation on top of Delta Lake
  • Document CRUD operations stored as Parquet files
  • MongoDB query operators translated to Parquet predicates via the query layer
  • Change streams powered by the CDC primitives
// MongoLake internally uses deltalake for storage
import { DeltaTable } from '@dotdo/deltalake/delta'
import { matchesFilter } from '@dotdo/deltalake/query'

// Documents are written to Delta tables
await deltaTable.write([{ _id: '1', name: 'Alice' }])

KafkaLake

KafkaLake (coming soon) uses @dotdo/deltalake for Kafka-compatible event streaming. It provides:

  • A Kafka protocol implementation backed by Delta Lake
  • Topics stored as partitioned Delta tables
  • Consumer groups with offset tracking via the transaction log
  • Event retention and compaction using Delta Lake's time travel
// KafkaLake internally uses deltalake for event storage
import { DeltaTable } from '@dotdo/deltalake/delta'
import { CDCRecord } from '@dotdo/deltalake/cdc'

// Events are stored as CDC records in Delta tables
await deltaTable.write([
  { _id: 'evt-1', _op: 'c', _after: { topic: 'orders', value: '...' } }
])

Key Components

Storage Backends

import { createStorage } from '@dotdo/deltalake/storage'

// Explicit configuration (recommended):
const r2Storage = createStorage({ type: 'r2', bucket: env.MY_BUCKET })
const fsStorage = createStorage({ type: 'filesystem', path: './.data' })
const memStorage = createStorage({ type: 'memory' })
const s3Storage = createStorage({ type: 's3', bucket: 'my-bucket', region: 'us-east-1' })

// URL-based configuration:
const storage = createStorage('s3://my-bucket/prefix')
const storage2 = createStorage('/data/lake')  // filesystem
const storage3 = createStorage('memory://')

// Default (Node.js only): uses FileSystemStorage with ./.deltalake path
const storage = createStorage()

Delta Lake Transaction Log

import { DeltaTable } from '@dotdo/deltalake/delta'

const table = new DeltaTable(storage, 'events')

// Write with ACID guarantees
await table.write([
  { _id: '1', user: 'alice', action: 'click' },
  { _id: '2', user: 'bob', action: 'purchase' },
])

// Time travel
const snapshot = await table.snapshot(5)

CDC Records

import { CDCRecord } from '@dotdo/deltalake/cdc'

// Unified CDC format for all systems
interface CDCRecord<T> {
  _id: string          // Entity ID
  _seq: bigint         // Sequence number
  _op: 'c' | 'u' | 'd' | 'r'  // create/update/delete/read
  _before: T | null    // Previous state
  _after: T | null     // New state
  _ts: bigint          // Nanosecond timestamp
  _source: CDCSource   // Origin metadata
}

Query Layer

import { matchesFilter, filterToParquetPredicate } from '@dotdo/deltalake/query'

// MongoDB-style operators
const filter = {
  user: 'alice',
  amount: { $gte: 100 },
  action: { $in: ['purchase', 'refund'] }
}

// Predicate pushdown for efficient Parquet reads
const predicate = filterToParquetPredicate(filter)

Local Development

# Zero config - just works
npm install @dotdo/deltalake

# Data stored in .deltalake/
.deltalake/
├── events/
│   ├── _delta_log/
│   │   ├── 00000000000000000000.json
│   │   └── 00000000000000000001.json
│   └── part-00000.parquet
└── users/
    ├── _delta_log/
    └── part-00000.parquet

Testing

# Unit tests (vitest-pool-workers)
npm run test:unit

# Integration tests (multi-worker miniflare)
npm run test:integration

# E2E tests (against deployed service)
npm run test:e2e

API Documentation

Generate API documentation using TypeDoc:

# Install typedoc (dev dependency)
npm install --save-dev typedoc typedoc-plugin-markdown

# Generate documentation
npm run typedoc

This generates markdown documentation in the docs/ directory. The source code also contains extensive JSDoc comments for inline documentation.

Roadmap to 1.0

Current Status: v0.0.1 (early development)

This package is in early development. The API may change significantly before 1.0.

Criteria for 1.0 Release

  1. Complete Delta Lake read/write with real Parquet files

    • Full Delta Lake protocol support for reads and writes
    • Production-ready Parquet file handling via hyparquet
  2. Published to npm with stable API

    • Stable, documented public API
    • Semantic versioning from 1.0 onward
    • No breaking changes without major version bump
  3. Documentation complete

    • Comprehensive API reference
    • Usage guides for common patterns
    • Integration examples for MongoLake and KafkaLake
  4. Core operations implemented

    • Read: Query tables with filter pushdown
    • Write: Append data with ACID guarantees
    • Time travel: Access historical table versions
    • Vacuum: Clean up old files and optimize storage
  5. Storage backends production-ready

    • Memory: For testing and development
    • R2: Cloudflare R2 object storage
    • S3: AWS S3 compatible storage
    • FileSystem: Local development and edge cases

Cloudflare Workers Compatibility

This package is designed for edge runtime compatibility and works seamlessly with Cloudflare Workers.

Edge Runtime Design

  • Pure JavaScript: All dependencies are pure JavaScript with no native Node.js modules, ensuring compatibility with Workers runtime
  • R2 Storage Backend: First-class support for Cloudflare R2 as the storage backend
  • Streaming APIs: Uses standard Web APIs (ReadableStream, Uint8Array) that work in both Node.js and Workers

Testing with vitest-pool-workers

We use vitest-pool-workers to ensure Workers compatibility:

# Run unit tests in miniflare environment
npm run test:unit

The test suite runs against the actual Workers runtime via miniflare, catching compatibility issues before deployment.

Using with Cloudflare R2

import { createStorage } from '@dotdo/deltalake/storage'
import { DeltaTable } from '@dotdo/deltalake/delta'

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    // Create R2 storage backend
    const storage = createStorage({ type: 'r2', bucket: env.MY_R2_BUCKET })

    // Use Delta tables normally
    const table = new DeltaTable(storage, 'events')
    await table.write([{ _id: '1', data: 'example' }])

    return new Response('OK')
  }
}

Pure JavaScript Dependencies

Key dependencies that ensure Workers compatibility:

  • hyparquet: Pure JavaScript Parquet reader/writer (no native bindings)
  • No dependencies on fs, path, or other Node.js-specific modules in the core library

Design Principles

  1. Simple Local Development: In Node.js, works without configuration by defaulting to filesystem storage
  2. Parquet Native: All data stored as Parquet with Variant type for schema-free
  3. Delta Lake Format: Standard format for ecosystem compatibility
  4. Streaming First: Designed for high-frequency appends (Kafka/CDC use cases)
  5. Query Pushdown: MongoDB operators translate to Parquet predicates