npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@seed-ship/duckdb-mcp-native

v0.11.1

Published

Native Node.js/TypeScript implementation of DuckDB MCP extension

Readme

DuckDB MCP Native

npm version License: MIT

Native TypeScript implementation of DuckDB MCP (Model Context Protocol) server with federation capabilities.

Status

🚀 Production Ready (v0.8.1+)

✅ Production-Ready Features

  • Core: Native TypeScript, no C++ dependencies
  • Transports: stdio ✅ | WebSocket ✅ | TCP ✅ | HTTP ⚠️
  • Federation: Distributed queries across multiple MCP servers ✨
  • Tools: 26 MCP tools including new federate_query
  • Virtual Tables: JSON/CSV/Parquet with auto-refresh
  • Virtual Filesystem: Direct SQL access via mcp:// URIs
  • Monitoring: Built-in performance metrics and slow query detection
  • Security: Enhanced SQL injection prevention, server authentication, path traversal protection
  • DuckPGQ Documentation: Comprehensive guides with developer insights, syntax validation, and migration examples

🚧 In Progress

  • HTTP transport initialization issues
  • MotherDuck cloud integration (waiting for DuckDB v1.4.0 support)
  • Test coverage improvement (current: ~75%, target: 80%)

Installation

As NPM Package

# Install from npm
npm install @seed-ship/duckdb-mcp-native

As MCP Server for Claude Desktop

  1. Install the package globally:
npm install -g @seed-ship/duckdb-mcp-native
  1. Configure Claude Desktop:

Edit your Claude configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Add the DuckDB MCP server:

{
  "mcpServers": {
    "duckdb": {
      "command": "npx",
      "args": ["@seed-ship/duckdb-mcp-native"],
      "env": {
        "DUCKDB_MEMORY": "4GB",
        "DUCKDB_THREADS": "4",
        "MCP_SECURITY_MODE": "development",
        "MCP_CACHE_DIR": "/tmp/mcp-cache",
        "MCP_CACHE_TTL": "300000",
        "MCP_CACHE_SIZE": "104857600"
      }
    }
  }
}
  1. Optional: Configure S3/MinIO for cloud storage:
{
  "mcpServers": {
    "duckdb": {
      "command": "npx",
      "args": ["@seed-ship/duckdb-mcp-native"],
      "env": {
        "DUCKDB_MEMORY": "4GB",
        "DUCKDB_THREADS": "4",
        "MCP_SECURITY_MODE": "development",
        "MINIO_PUBLIC_ENDPOINT": "https://s3.example.com",
        "MINIO_ACCESS_KEY": "your-access-key",
        "MINIO_SECRET_KEY": "your-secret-key",
        "MINIO_REGION": "us-east-1",
        "MINIO_USE_SSL": "true"
      }
    }
  }
}
  1. Restart Claude Desktop to load the MCP server.

For Development

git clone https://github.com/theseedship/duckdb_mcp_node
cd duckdb_mcp_node
npm install

# Start MCP server
npm run dev:server

# Test with Inspector
npm run inspector

# Run tests
npm test

MCP Tools (25 Available)

Database Operations

  • query_duckdb - Execute SQL queries
  • list_tables - List schema tables
  • describe_table - Get table structure
  • load_csv - Load CSV files
  • load_parquet - Load Parquet files

Federation Operations

  • attach_mcp - Connect external MCP servers
  • detach_mcp - Disconnect servers
  • list_attached_servers - Show connections
  • list_mcp_resources - List available resources
  • create_virtual_table - Create from MCP resource
  • drop_virtual_table - Remove virtual table
  • list_virtual_tables - Show virtual tables
  • refresh_virtual_table - Update table data
  • query_hybrid - Query across local/remote data

DuckLake Operations (Advanced)

  • ducklake.attach - Attach or create DuckLake catalog for ACID transactions
  • ducklake.snapshots - List, view, clone or rollback table snapshots
  • ducklake.time_travel - Execute queries on historical data

MotherDuck Cloud Operations

⚠️ Note: MotherDuck integration requires DuckDB v1.3.2. Currently using v1.4.0 which MotherDuck doesn't support yet. These tools will become functional once MotherDuck adds support for DuckDB v1.4.0 stable.

  • motherduck.attach - Connect to MotherDuck cloud with token
  • motherduck.detach - Disconnect from MotherDuck
  • motherduck.status - Check connection status and usage
  • motherduck.list_databases - List available cloud databases
  • motherduck.create_database - Create new cloud database
  • motherduck.query - Execute queries on MotherDuck cloud
  • motherduck.share_table - Share local tables to cloud
  • motherduck.import_table - Import cloud tables to local

📊 DuckPGQ Property Graph Support (v0.7.2+)

SQL:2023 Property Graph queries - Limited support with automatic installation

This package supports DuckPGQ extension version 7705c5c from the community repository for basic graph analytics.

🎯 Quick Start

For DuckDB 1.4.x (Current):

# Enable DuckPGQ (installs automatically from community repository)
ENABLE_DUCKPGQ=true
ALLOW_UNSIGNED_EXTENSIONS=true
DUCKPGQ_SOURCE=community  # Default, no custom URL needed

For DuckDB 1.0-1.2.2 (full DuckPGQ support):

DUCKPGQ_SOURCE=community  # Complete feature set available

📋 Real Compatibility Matrix

| DuckDB Version | DuckPGQ Version | Fixed Paths | Bounded {n,m} | ANY SHORTEST | Kleene (alone) | Status | | ------------------- | --------------- | ----------- | ------------- | ------------ | -------------- | -------------------- | | 1.0.0 - 1.2.2 | Stable | ✅ | ✅ | ✅ | ✅ | Production Ready | | 1.4.1 (current) | 7705c5c | | | | | Functional | | 1.5.x+ | TBD | ? | ? | ? | ? | Planned |

⚠️ IMPORTANT: DuckPGQ 7705c5c Capabilities (Validated via Automated Testing)

Version 7705c5c (current for DuckDB 1.4.x) - Tested 2025-10-20

What Works:

  • Property graph creation (VERTEX/EDGE TABLES)
  • Basic pattern matching with GRAPH_TABLE
  • Fixed-length paths (explicit hops: 1-hop, 2-hop, 3-hop, etc.)
  • ANY SHORTEST path queries with ->* syntax (star AFTER arrow)
  • Bounded quantifiers {n,m} with ->{n,m} syntax (quantifier AFTER arrow)
  • Direct relationship queries (edge variable required: [e:Label])

⚠️ Design Decisions (Not Bugs):

These are intentional safety features and planned roadmap items:

  1. 🛡️ Safety Feature: Standalone Kleene operators (->*, ->+) blocked without ANY SHORTEST

    • Why: Prevents potentially infinite results on cyclic graphs
    • Error: "ALL unbounded with path mode WALK is not possible"
    • Workaround: Use ANY SHORTEST or bounded quantifiers ->{n,m}
    • Developer insight: System protection against runaway queries
  2. 🚧 Roadmap Item: Edge variable names required (temporary)

    • Current: Must use [e:Label], not [:Label]
    • Future: Anonymous edge syntax will be supported
    • Reason: Internal query translation needs named references
  3. 🚧 Roadmap Item: Explicit label binding sometimes required

    • Status: Label inference implemented but not yet exposed
    • Workaround: Always specify labels explicitly

See detailed findings: DUCKPGQ_FINDINGS.md Understand "why": DUCKPGQ_FAILURE_ANALYSIS.md (18 test cases + developer commentary)

⚙️ Configuration Options

Minimal configuration (add to .env):

# Required for DuckPGQ 7705c5c
ENABLE_DUCKPGQ=true
ALLOW_UNSIGNED_EXTENSIONS=true

# Optional: Graceful degradation (default)
DUCKPGQ_STRICT_MODE=false  # Continue without graph if load fails

Full configuration options:

DUCKPGQ_SOURCE=community     # community, edge, custom
DUCKPGQ_CUSTOM_REPO=         # URL for custom builds
DUCKPGQ_VERSION=             # Specific version (optional)

See full documentation: docs/DUCKPGQ_INTEGRATION.md

📝 Example Usage (Validated Syntax for 7705c5c)

-- Create property graph from existing tables
CREATE PROPERTY GRAPH social_network
  VERTEX TABLES (Person)
  EDGE TABLES (
    Knows
      SOURCE KEY (from_id) REFERENCES Person (id)
      DESTINATION KEY (to_id) REFERENCES Person (id)
  );

-- ✅ WORKS: Direct connections (1-hop) - edge variable required
FROM GRAPH_TABLE (social_network
  MATCH (p1:Person)-[k:Knows]->(p2:Person)
  COLUMNS (p1.name AS person, p2.name AS friend)
);

-- ✅ WORKS: Friends of friends (fixed 2-hop)
FROM GRAPH_TABLE (social_network
  MATCH (p1:Person)-[k1:Knows]->(p2:Person)-[k2:Knows]->(p3:Person)
  WHERE p1.id != p3.id
  COLUMNS (p1.name AS person, p3.name AS friend_of_friend)
);

-- ✅ WORKS: ANY SHORTEST path query (note: ->* AFTER arrow, WITH path variable)
FROM GRAPH_TABLE (social_network
  MATCH p = ANY SHORTEST (start:Person WHERE start.id = 1)-[k:Knows]->*(end:Person WHERE end.id = 10)
  COLUMNS (start.name AS from_person, end.name AS to_person, path_length(p) AS hops)
);

-- ✅ WORKS: Bounded quantifiers (1 to 3 hops, note: ->{n,m} AFTER arrow)
FROM GRAPH_TABLE (social_network
  MATCH (p1:Person)-[k:Knows]->{1,3}(p2:Person)
  COLUMNS (p1.name AS person, p2.name AS connection)
);

-- ⚠️ SAFETY FEATURE (by design): Standalone Kleene operators blocked
-- FROM GRAPH_TABLE (social_network
--   MATCH (p1:Person)-[k:Knows]->+(p2:Person)
--   -- Error: "ALL unbounded with path mode WALK is not possible"
--   -- Why blocked: Prevents infinite results on cyclic graphs (safety feature)
--   -- Workaround: Use ANY SHORTEST or bounded quantifiers instead
-- );

Migration notes:

🚨 Current Status (DuckDB 1.4.x)

Tested Configuration (Validated 2025-10-20 with developer insights):

  • ✅ DuckPGQ 7705c5c available from community repository
  • ✅ Installation automatic with ALLOW_UNSIGNED_EXTENSIONS=true
  • ✅ Property graphs and fixed-length queries work
  • ANY SHORTEST path queries work (with ->* syntax)
  • Bounded quantifiers work (with ->{n,m} syntax)
  • 🛡️ Standalone Kleene operators blocked (safety feature - prevents infinite results)

Understanding "Limitations":

  • Safety features: Intentional blocks to prevent runaway queries (e.g., ALL unbounded)
  • Roadmap items: Planned features not yet implemented (e.g., anonymous edges, path modes)
  • What works today: See compatibility matrix above

Test Suites:

  • npm run test:duckpgq:syntax - Validate working features (13 tests)
  • npm run test:duckpgq:failures - Understand design decisions (18 tests)

Migration Path: When full DuckPGQ support arrives for DuckDB 1.4.x/1.5.x, your configuration will automatically upgrade. See MIGRATION_GUIDE.md for migration examples.

📚 Resources


🔄 Process Mining Tools (v0.9.4+)

NEW: Enhanced process mining capabilities with embeddings standardization and robust composition

🎯 Overview

Three specialized tools for analyzing workflow processes stored in Parquet files:

  • process.describe - List and filter process summaries with confidence scores
  • process.similar - Find similar processes using vector embeddings (FLOAT[384])
  • process.compose - Merge multiple processes into unified workflow with QA checks

✨ What's New in v0.9.4

P2.8: Embeddings Standardization

Problem Solved: Inconsistent embedding dimensions and no fallback when vector search fails

Features:

  • Dimension Validation: Validates embedding dimensions (configurable via PROCESS_EMBEDDING_DIM, defaults to 384)
  • Fallback Mechanism: Automatic TypeScript L2 distance when DuckDB VSS unavailable
  • Transparency: Results include distance_source field (duckdb_vss or typescript_l2)

Example:

// Search for similar processes
const results = await handlers['process.similar']({
  signature_emb: [0.1, 0.2, ...], // 384-dimensional embedding
  k: 5,
  parquet_url: 's3://bucket/signatures.parquet'
})

// Results include distance_source for observability
// {
//   matches: [
//     { doc_id: 'doc1', distance: 0.45, distance_source: 'duckdb_vss' }
//   ]
// }

Benefits:

  • 🛡️ Type Safety: Prevents dimension mismatch errors at runtime
  • 🔄 Reliability: Graceful degradation when VSS extension unavailable
  • 📊 Observability: Know which distance computation method was used

P2.9: Process Composition Robustification

Problem Solved: Simple deduplication without conflict resolution or quality checks

Features:

  • Step Normalization: Lowercase + trim step keys (handles "Login" vs "login")
  • Conflict Resolution: Uses median order when multiple processes have same step
  • Edge Remapping: Updates edge references after step deduplication
  • QA Checks: Detects orphan steps, cycles, and duplicate edges

Example:

// Compose workflows from multiple documents
const composed = await handlers['process.compose']({
  doc_ids: ['doc1', 'doc2', 'doc3'],
  steps_url: 's3://bucket/steps.parquet',
  edges_url: 's3://bucket/edges.parquet',
})

// Result includes QA report
// {
//   success: true,
//   steps: [...],           // Deduplicated and normalized steps
//   edges: [...],           // Remapped edges
//   merged_count: 5,        // Number of steps merged
//   qa: {
//     orphan_steps: [],     // Steps with no edges
//     cycles: [],           // Detected cycles
//     duplicate_edges: [],  // Duplicate connections
//     warnings: []          // Human-readable issues
//   }
// }

Benefits:

  • 🔀 Smart Merging: Handles variations in step naming across documents
  • ⚖️ Fair Ordering: Median order prevents bias from any single process
  • 🔗 Edge Consistency: Automatically updates all edge references after merging
  • 🔍 Quality Assurance: Immediate feedback on workflow structural issues

📋 Use Cases

1. Process Discovery: Find similar workflows across document corpus

// Generate signature embedding for a new process
const signature = await embedWorkflow(process)

// Find top 10 similar processes
const similar = await handlers['process.similar']({
  signature_emb: signature,
  k: 10,
})

2. Workflow Consolidation: Merge related processes into master workflow

// Compose processes from multiple sources
const master = await handlers['process.compose']({
  doc_ids: ['onboarding_v1', 'onboarding_v2', 'onboarding_v3'],
})

// Check for quality issues
if (master.qa.warnings.length > 0) {
  console.warn('Workflow issues detected:', master.qa.warnings)
}

3. Process Validation: Verify workflow structural integrity

// Compose and validate
const workflow = await handlers['process.compose']({ doc_ids: [...] })

// QA checks automatically run
console.log('Orphan steps:', workflow.qa.orphan_steps)
console.log('Cycles detected:', workflow.qa.cycles)
console.log('Duplicate edges:', workflow.qa.duplicate_edges)

🔧 Configuration

Set Parquet URLs via environment variables:

# Process mining Parquet locations
PROCESS_SUMMARY_URL=s3://bucket/process_summary.parquet
PROCESS_STEPS_URL=s3://bucket/process_steps.parquet
PROCESS_EDGES_URL=s3://bucket/process_edges.parquet
PROCESS_SIGNATURE_URL=s3://bucket/process_signatures.parquet

# Embedding dimension (defaults to 384)
# Set to match your embedding model (e.g., 1024 for text-embedding-3-large)
PROCESS_EMBEDDING_DIM=1024

# Enable DuckDB VSS for fast similarity (optional)
# Falls back to TypeScript L2 distance if unavailable

📊 Data Schema

Process Steps (configurable embedding dimension):

CREATE TABLE process_steps (
  doc_id VARCHAR,
  process_id VARCHAR,
  step_id VARCHAR,
  order INTEGER,
  step_key VARCHAR,      -- Normalized to lowercase+trim in v0.9.4
  label VARCHAR,
  evidence VARCHAR,
  embedding FLOAT[N]     -- Dimension validated via PROCESS_EMBEDDING_DIM (default: 384)
)

Process Edges:

CREATE TABLE process_edges (
  doc_id VARCHAR,
  process_id VARCHAR,
  from_step_id VARCHAR,  -- Automatically remapped in v0.9.4
  to_step_id VARCHAR,    -- Automatically remapped in v0.9.4
  relation VARCHAR,
  evidence VARCHAR
)

🚀 Migration from v0.9.3 → v0.9.4

Breaking Changes: None

New Features (backwards compatible):

  • process.similar now validates embedding dimensions (configurable via PROCESS_EMBEDDING_DIM)
  • process.similar returns distance_source field
  • process.compose returns qa field with quality report

Recommended Actions:

  1. ✅ Set PROCESS_EMBEDDING_DIM to match your embedding model (e.g., 1024 for text-embedding-3-large)
  2. ✅ Review qa.warnings after composition to identify workflow issues
  3. ✅ Monitor distance_source to track VSS availability

🤖 Mastra AI Integration ⚠️ EXPERIMENTAL

Status: Phase 0 - Preparation (November 2025) | Stability: EXPERIMENTAL - API may change

Note: This is a generic adapter only. It provides tool conversion for Mastra agents but does NOT include pre-built agents or business logic. Community contributions welcome for Phase 1 implementation!

Overview

Mastra AI Framework integration provides an adapter to convert DuckDB tools to Mastra-compatible format, enabling AI agents powered by DuckDB's analytical capabilities.

Why Mastra?

  • TypeScript-Native: Seamless integration with existing codebase
  • MCP First-Class Support: @mastra/mcp provides bidirectional MCP integration
  • Production-Ready: Batteries included (workflows, HITL, observability, state management)
  • Rapid Growth: Y Combinator W25 backed, 7.5K+ GitHub stars

Planned Capabilities

Phase 1 (December 2025 - January 2026): Proof-of-Concept

  • Convert DuckDB MCP tools to Mastra-compatible format
  • Example SQL Analytics Agent (natural language → SQL)
  • Basic agent examples and validation

Phase 2 (February - April 2026): Production-Ready

  • Full MCPServer integration for external Mastra agents
  • Process mining agents (workflow discovery, similarity search, composition)
  • Multi-agent orchestration patterns

Phase 3 (May - September 2026): Advanced Features

  • Small Language Models (SLM) for context-aware NL-to-SQL
  • Human-in-the-Loop (HITL) workflows with suspend/resume
  • Vector store integration for semantic search

Use Case Preview

// Example: SQL Analytics Agent (Phase 1)
import { Agent } from '@mastra/core'
import { convertToMastraTools } from '@seed-ship/duckdb-mcp-native/mastra'

const analyticsAgent = new Agent({
  name: 'Business Intelligence Agent',
  tools: convertToMastraTools({ duckdb: myDuckDBInstance }),
  instructions: `You are a SQL expert. Convert natural language questions into SQL queries...`,
  model: { provider: 'ANTHROPIC', name: 'claude-3-5-sonnet' },
})

await analyticsAgent.generate({
  messages: [{ role: 'user', content: 'What were our top selling products last quarter?' }],
})

Current Status

Phase 0 (November 2025) - Infrastructure preparation:

  • ✅ Export path /mastra in package.json
  • ✅ Adapter skeleton src/adapters/mastra-adapter.ts
  • ✅ Comprehensive integration roadmap

Next Steps: Phase 1 implementation begins December 2025

📚 Complete Roadmap: docs/MASTRA_INTEGRATION.md

💡 Community Interest: Mastra integration contributions welcome! See roadmap for details.


🎯 Three Usage Modes

This package supports three distinct usage modes to fit different integration scenarios:

Mode 1: Standalone Server

Run DuckDB MCP as an independent server that other applications can connect to.

# Configure in .env
DUCKDB_MEMORY=4GB
DUCKDB_THREADS=4
MCP_SECURITY_MODE=development

# Start server
npm run dev:server
# Or with npx
npx @seed-ship/duckdb-mcp-native

Use case: When you need a dedicated DuckDB service that multiple clients can connect to.

Mode 2: Library Mode (/lib)

Import tool handlers directly into your existing MCP server without any auto-initialization.

// Import from /lib for clean library mode (v0.3.0+)
import { nativeToolHandlers, nativeToolDefinitions } from '@seed-ship/duckdb-mcp-native/lib'

// Register tools in your MCP server
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [...yourTools, ...nativeToolDefinitions],
  }
})

// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params

  if (name in nativeToolHandlers) {
    const handler = nativeToolHandlers[name]
    return await handler(args)
  }
  // ... handle your other tools
})

Use case: Adding DuckDB capabilities to an existing MCP server (like deposium_MCPs).

Mode 3: Embedded Server

Create a server instance with full control over its lifecycle and configuration.

import { createEmbeddedServer } from '@seed-ship/duckdb-mcp-native/lib'

// Create embedded server with custom config
const duckdbServer = createEmbeddedServer({
  embeddedMode: true, // Prevents stdio transport initialization
  duckdbService: yourDuckDBInstance, // Optional: use your own DuckDB instance
})

// Start when ready
await duckdbServer.start()

// Get handlers bound to this instance
const handlers = duckdbServer.getNativeHandlers()

// Use in your application
const result = await handlers.query_duckdb({
  sql: 'SELECT * FROM users',
  limit: 100,
})

Use case: Advanced integration scenarios where you need full control over the server lifecycle.

Quick Integration Example

For most integrations, library mode is the simplest approach:

// In your existing MCP server (e.g., deposium_MCPs)
import { nativeToolHandlers, nativeToolDefinitions } from '@seed-ship/duckdb-mcp-native/lib'

// That's it! Tools are now available
console.log('Available DuckDB tools:', Object.keys(nativeToolHandlers))
// Output: ['query_duckdb', 'list_tables', 'describe_table', 'load_csv', 'load_parquet', 'export_data']

Federation Example (v0.6.5) ✨

Federation enables distributed queries across multiple MCP servers using the mcp:// protocol:

-- Query GitHub issues directly
SELECT * FROM 'mcp://github/issues.json' WHERE status = 'open'

-- Join local users with remote GitHub data
SELECT u.name, COUNT(g.id) as issue_count
FROM local_users u
JOIN 'mcp://github/issues.json' g ON u.github_username = g.assignee
GROUP BY u.name

-- Aggregate across multiple sources
WITH all_issues AS (
  SELECT 'github' as source, * FROM 'mcp://github/issues.json'
  UNION ALL
  SELECT 'jira' as source, * FROM 'mcp://jira/tickets.json'
)
SELECT source, COUNT(*) as total_issues FROM all_issues GROUP BY source

Using Federation Tool

// Use the new federate_query tool
const result = await handlers['federate_query']({
  sql: `
    SELECT
      g.title as issue,
      s.message as last_commit
    FROM 'mcp://github/issues.json' g
    JOIN 'mcp://slack/messages.json' s ON g.id = s.issue_id
  `,
  explain: false, // Set to true to see query plan
})

// Automatic server registration with attach_mcp
await handlers['attach_mcp']({
  connectionString: 'stdio://github-mcp-server',
  alias: 'github',
})
// Server is now automatically registered with federation!

DuckLake Example (Advanced)

DuckLake provides ACID transactions and time travel capabilities on top of Parquet files:

// Attach a DuckLake catalog
await handlers['ducklake.attach']({
  catalogName: 'analytics',
  catalogLocation: 's3://data-lake/analytics',
  format: 'DELTA',
  enableTimeTravel: true,
  retentionDays: 30,
  compressionType: 'ZSTD',
})

// List table snapshots
const snapshots = await handlers['ducklake.snapshots']({
  catalogName: 'analytics',
  tableName: 'sales',
  action: 'list',
})

// Time travel query - query data as it was yesterday
const historicalData = await handlers['ducklake.time_travel']({
  catalogName: 'analytics',
  tableName: 'sales',
  query: 'SELECT SUM(revenue) as total FROM sales',
  timestamp: '2025-01-20T00:00:00Z',
  limit: 100,
})

// Clone a table at specific version
await handlers['ducklake.snapshots']({
  catalogName: 'analytics',
  tableName: 'sales',
  action: 'clone',
  version: 42,
  targetTableName: 'sales_backup_v42',
})

// Rollback to previous version
await handlers['ducklake.snapshots']({
  catalogName: 'analytics',
  tableName: 'sales',
  action: 'rollback',
  version: 41,
})

DuckLake Features:

  • ACID Transactions: Ensures data consistency across operations
  • Time Travel: Query historical data at any point in time
  • Snapshots: Version control for your data tables
  • Multi-tenant Isolation: Space-aware catalogs for tenant separation
  • Format Support: Delta Lake and Apache Iceberg formats
  • Migration Utilities: Convert existing Parquet/CSV files to DuckLake

MotherDuck Cloud Example

MotherDuck enables hybrid execution with cloud storage and compute:

// Connect to MotherDuck cloud
await handlers['motherduck.attach']({
  token: process.env.MOTHERDUCK_TOKEN,
  database: 'production',
  endpoint: 'app.motherduck.com', // Optional, defaults to main endpoint
})

// List cloud databases
const databases = await handlers['motherduck.list_databases']()

// Share local table to cloud
await handlers['motherduck.share_table']({
  localTable: 'local_sales',
  cloudTable: 'cloud_sales', // Optional, uses local name if not specified
})

// Query cloud data
const cloudResults = await handlers['motherduck.query']({
  sql: 'SELECT * FROM cloud_sales WHERE region = "US"',
  limit: 1000,
})

// Import cloud table to local
await handlers['motherduck.import_table']({
  cloudTable: 'cloud_analytics',
  localTable: 'local_analytics',
})

// Check connection status
const status = await handlers['motherduck.status']()
console.log(`Connected: ${status.connected}, Storage: ${status.bytesUsed}/${status.bytesLimit}`)

// Disconnect when done
await handlers['motherduck.detach']()

MotherDuck Features:

  • Hybrid Execution: Seamlessly query local and cloud data
  • Cloud Storage: Persistent storage in MotherDuck cloud
  • Collaborative: Share tables across team members
  • Zero-Copy Cloning: Efficient table copies in cloud
  • Automatic Scaling: Cloud compute scales with workload

🚀 Virtual Filesystem (v0.6.0)

Query MCP resources directly in SQL with zero configuration:

Environment Variables

  • MCP_CACHE_DIR: Directory for caching MCP resources (default: /tmp/mcp-cache)
  • MCP_CACHE_TTL: Cache time-to-live in milliseconds (default: 300000 - 5 minutes)
  • MCP_CACHE_SIZE: Maximum cache size in bytes (default: 104857600 - 100MB)

Before (Complex)

-- Required 3 steps:
CALL attach_mcp('stdio://weather-server', 'weather');
CALL create_virtual_table('weather_data', 'weather://forecast');
SELECT * FROM weather_data;

Now (Simple)

-- Direct access with mcp:// URIs:
SELECT * FROM 'mcp://weather-server/forecast.csv';

Features

  • Zero Setup: No manual connection or table creation needed
  • Auto-Detection: Automatically detects CSV, JSON, Parquet, Arrow, Excel formats
  • Glob Patterns: Query multiple resources with wildcards
  • Caching: Intelligent local caching for performance
  • Federation: Join data across multiple MCP servers

Examples

-- Query specific resource
SELECT * FROM 'mcp://github-server/issues.json'
WHERE status = 'open';

-- Join across servers
SELECT g.title, j.priority
FROM 'mcp://github/issues.json' g
JOIN 'mcp://jira/tickets.json' j ON g.id = j.github_id;

-- Glob patterns for multiple files
SELECT COUNT(*) as error_count
FROM 'mcp://*/logs/2024-*.csv'
WHERE level = 'ERROR';

-- Automatic format detection
SELECT * FROM 'mcp://data/users.parquet';  -- Parquet
SELECT * FROM 'mcp://api/response.json';    -- JSON
SELECT * FROM 'mcp://reports/sales.csv';    -- CSV

Configuration

Enable Virtual Filesystem in your DuckDB service:

const duckdb = new DuckDBService({
  virtualFilesystem: {
    enabled: true,
    config: {
      cacheDir: '/tmp/mcp-cache',
      defaultTTL: 300000, // 5 minutes
    },
  },
})

// Use VFS-aware query execution
const results = await duckdb.executeQueryWithVFS("SELECT * FROM 'mcp://server/data.csv'")

Architecture

src/
├── duckdb/          # DuckDB service
├── server/          # MCP server implementation
├── client/          # MCP client for federation
├── federation/      # Federation components
│   ├── ResourceRegistry.ts
│   ├── ConnectionPool.ts
│   └── QueryRouter.ts
├── filesystem/      # Virtual Filesystem (v0.6.0)
│   ├── VirtualFilesystem.ts
│   ├── URIParser.ts
│   ├── CacheManager.ts
│   ├── FormatDetector.ts
│   └── QueryPreprocessor.ts
└── protocol/        # Transport implementations

Development

See DEVELOPMENT.md for:

  • Testing guide
  • Troubleshooting
  • Port management
  • Debug logging

Security Modes

  • development: All queries allowed (default)
  • production: Blocks DROP/TRUNCATE, enforces limits

Set via: MCP_SECURITY_MODE=production

Scripts

Development Scripts

npm run dev:server      # Start MCP server
npm run inspector       # MCP Inspector UI
npm test               # Run tests
npm run lint:fix       # Fix linting
npm run build          # Compile TypeScript

Port Management

# Inspector specific
npm run inspector:clean   # Kill Inspector processes on ports 6274/6277
npm run inspector:restart # Clean ports and restart Inspector
npm run inspector:reset   # Force kill and restart (alternative method)

# General port management
npm run port:clean       # Clear stuck ports (5432, 3000, 8080)
npm run port:status     # Check port usage status
npm run port:kill-all   # Force kill all managed ports

Common Issues & Solutions

Inspector Port Blocked

If you see ❌ Proxy Server PORT IS IN USE at port 6277 ❌:

# Quick fix - clean and restart
npm run inspector:restart

# Alternative if the above doesn't work
npm run inspector:reset

# Manual cleanup if needed
npm run inspector:clean
npm run inspector

Requirements

  • Node.js 18+
  • TypeScript 5+

License

MIT