@seed-ship/duckdb-mcp-native
v0.11.1
Published
Native Node.js/TypeScript implementation of DuckDB MCP extension
Maintainers
Readme
DuckDB MCP Native
Native TypeScript implementation of DuckDB MCP (Model Context Protocol) server with federation capabilities.
Status
🚀 Production Ready (v0.8.1+)
✅ Production-Ready Features
- Core: Native TypeScript, no C++ dependencies
- Transports: stdio ✅ | WebSocket ✅ | TCP ✅ | HTTP ⚠️
- Federation: Distributed queries across multiple MCP servers ✨
- Tools: 26 MCP tools including new
federate_query - Virtual Tables: JSON/CSV/Parquet with auto-refresh
- Virtual Filesystem: Direct SQL access via mcp:// URIs
- Monitoring: Built-in performance metrics and slow query detection
- Security: Enhanced SQL injection prevention, server authentication, path traversal protection
- DuckPGQ Documentation: Comprehensive guides with developer insights, syntax validation, and migration examples
🚧 In Progress
- HTTP transport initialization issues
- MotherDuck cloud integration (waiting for DuckDB v1.4.0 support)
- Test coverage improvement (current: ~75%, target: 80%)
Installation
As NPM Package
# Install from npm
npm install @seed-ship/duckdb-mcp-nativeAs MCP Server for Claude Desktop
- Install the package globally:
npm install -g @seed-ship/duckdb-mcp-native- Configure Claude Desktop:
Edit your Claude configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Add the DuckDB MCP server:
{
"mcpServers": {
"duckdb": {
"command": "npx",
"args": ["@seed-ship/duckdb-mcp-native"],
"env": {
"DUCKDB_MEMORY": "4GB",
"DUCKDB_THREADS": "4",
"MCP_SECURITY_MODE": "development",
"MCP_CACHE_DIR": "/tmp/mcp-cache",
"MCP_CACHE_TTL": "300000",
"MCP_CACHE_SIZE": "104857600"
}
}
}
}- Optional: Configure S3/MinIO for cloud storage:
{
"mcpServers": {
"duckdb": {
"command": "npx",
"args": ["@seed-ship/duckdb-mcp-native"],
"env": {
"DUCKDB_MEMORY": "4GB",
"DUCKDB_THREADS": "4",
"MCP_SECURITY_MODE": "development",
"MINIO_PUBLIC_ENDPOINT": "https://s3.example.com",
"MINIO_ACCESS_KEY": "your-access-key",
"MINIO_SECRET_KEY": "your-secret-key",
"MINIO_REGION": "us-east-1",
"MINIO_USE_SSL": "true"
}
}
}
}- Restart Claude Desktop to load the MCP server.
For Development
git clone https://github.com/theseedship/duckdb_mcp_node
cd duckdb_mcp_node
npm install
# Start MCP server
npm run dev:server
# Test with Inspector
npm run inspector
# Run tests
npm testMCP Tools (25 Available)
Database Operations
query_duckdb- Execute SQL querieslist_tables- List schema tablesdescribe_table- Get table structureload_csv- Load CSV filesload_parquet- Load Parquet files
Federation Operations
attach_mcp- Connect external MCP serversdetach_mcp- Disconnect serverslist_attached_servers- Show connectionslist_mcp_resources- List available resourcescreate_virtual_table- Create from MCP resourcedrop_virtual_table- Remove virtual tablelist_virtual_tables- Show virtual tablesrefresh_virtual_table- Update table dataquery_hybrid- Query across local/remote data
DuckLake Operations (Advanced)
ducklake.attach- Attach or create DuckLake catalog for ACID transactionsducklake.snapshots- List, view, clone or rollback table snapshotsducklake.time_travel- Execute queries on historical data
MotherDuck Cloud Operations
⚠️ Note: MotherDuck integration requires DuckDB v1.3.2. Currently using v1.4.0 which MotherDuck doesn't support yet. These tools will become functional once MotherDuck adds support for DuckDB v1.4.0 stable.
motherduck.attach- Connect to MotherDuck cloud with tokenmotherduck.detach- Disconnect from MotherDuckmotherduck.status- Check connection status and usagemotherduck.list_databases- List available cloud databasesmotherduck.create_database- Create new cloud databasemotherduck.query- Execute queries on MotherDuck cloudmotherduck.share_table- Share local tables to cloudmotherduck.import_table- Import cloud tables to local
📊 DuckPGQ Property Graph Support (v0.7.2+)
SQL:2023 Property Graph queries - Limited support with automatic installation
This package supports DuckPGQ extension version 7705c5c from the community repository for basic graph analytics.
🎯 Quick Start
For DuckDB 1.4.x (Current):
# Enable DuckPGQ (installs automatically from community repository)
ENABLE_DUCKPGQ=true
ALLOW_UNSIGNED_EXTENSIONS=true
DUCKPGQ_SOURCE=community # Default, no custom URL neededFor DuckDB 1.0-1.2.2 (full DuckPGQ support):
DUCKPGQ_SOURCE=community # Complete feature set available📋 Real Compatibility Matrix
| DuckDB Version | DuckPGQ Version | Fixed Paths | Bounded {n,m} | ANY SHORTEST | Kleene (alone) | Status | | ------------------- | --------------- | ----------- | ------------- | ------------ | -------------- | -------------------- | | 1.0.0 - 1.2.2 | Stable | ✅ | ✅ | ✅ | ✅ | Production Ready | | 1.4.1 (current) | 7705c5c | ✅ | ✅ | ✅ | ❌ | Functional | | 1.5.x+ | TBD | ? | ? | ? | ? | Planned |
⚠️ IMPORTANT: DuckPGQ 7705c5c Capabilities (Validated via Automated Testing)
Version 7705c5c (current for DuckDB 1.4.x) - Tested 2025-10-20
✅ What Works:
- Property graph creation (VERTEX/EDGE TABLES)
- Basic pattern matching with
GRAPH_TABLE - Fixed-length paths (explicit hops: 1-hop, 2-hop, 3-hop, etc.)
- ANY SHORTEST path queries with
->*syntax (star AFTER arrow) - Bounded quantifiers
{n,m}with->{n,m}syntax (quantifier AFTER arrow) - Direct relationship queries (edge variable required:
[e:Label])
⚠️ Design Decisions (Not Bugs):
These are intentional safety features and planned roadmap items:
🛡️ Safety Feature: Standalone Kleene operators (
->*,->+) blocked without ANY SHORTEST- Why: Prevents potentially infinite results on cyclic graphs
- Error: "ALL unbounded with path mode WALK is not possible"
- Workaround: Use
ANY SHORTESTor bounded quantifiers->{n,m} - Developer insight: System protection against runaway queries
🚧 Roadmap Item: Edge variable names required (temporary)
- Current: Must use
[e:Label], not[:Label] - Future: Anonymous edge syntax will be supported
- Reason: Internal query translation needs named references
- Current: Must use
🚧 Roadmap Item: Explicit label binding sometimes required
- Status: Label inference implemented but not yet exposed
- Workaround: Always specify labels explicitly
See detailed findings: DUCKPGQ_FINDINGS.md
Understand "why": DUCKPGQ_FAILURE_ANALYSIS.md (18 test cases + developer commentary)
⚙️ Configuration Options
Minimal configuration (add to .env):
# Required for DuckPGQ 7705c5c
ENABLE_DUCKPGQ=true
ALLOW_UNSIGNED_EXTENSIONS=true
# Optional: Graceful degradation (default)
DUCKPGQ_STRICT_MODE=false # Continue without graph if load failsFull configuration options:
DUCKPGQ_SOURCE=community # community, edge, custom
DUCKPGQ_CUSTOM_REPO= # URL for custom builds
DUCKPGQ_VERSION= # Specific version (optional)See full documentation: docs/DUCKPGQ_INTEGRATION.md
📝 Example Usage (Validated Syntax for 7705c5c)
-- Create property graph from existing tables
CREATE PROPERTY GRAPH social_network
VERTEX TABLES (Person)
EDGE TABLES (
Knows
SOURCE KEY (from_id) REFERENCES Person (id)
DESTINATION KEY (to_id) REFERENCES Person (id)
);
-- ✅ WORKS: Direct connections (1-hop) - edge variable required
FROM GRAPH_TABLE (social_network
MATCH (p1:Person)-[k:Knows]->(p2:Person)
COLUMNS (p1.name AS person, p2.name AS friend)
);
-- ✅ WORKS: Friends of friends (fixed 2-hop)
FROM GRAPH_TABLE (social_network
MATCH (p1:Person)-[k1:Knows]->(p2:Person)-[k2:Knows]->(p3:Person)
WHERE p1.id != p3.id
COLUMNS (p1.name AS person, p3.name AS friend_of_friend)
);
-- ✅ WORKS: ANY SHORTEST path query (note: ->* AFTER arrow, WITH path variable)
FROM GRAPH_TABLE (social_network
MATCH p = ANY SHORTEST (start:Person WHERE start.id = 1)-[k:Knows]->*(end:Person WHERE end.id = 10)
COLUMNS (start.name AS from_person, end.name AS to_person, path_length(p) AS hops)
);
-- ✅ WORKS: Bounded quantifiers (1 to 3 hops, note: ->{n,m} AFTER arrow)
FROM GRAPH_TABLE (social_network
MATCH (p1:Person)-[k:Knows]->{1,3}(p2:Person)
COLUMNS (p1.name AS person, p2.name AS connection)
);
-- ⚠️ SAFETY FEATURE (by design): Standalone Kleene operators blocked
-- FROM GRAPH_TABLE (social_network
-- MATCH (p1:Person)-[k:Knows]->+(p2:Person)
-- -- Error: "ALL unbounded with path mode WALK is not possible"
-- -- Why blocked: Prevents infinite results on cyclic graphs (safety feature)
-- -- Workaround: Use ANY SHORTEST or bounded quantifiers instead
-- );Migration notes:
- ✅ ANY SHORTEST and bounded quantifiers now work in 7705c5c!
- See
DUCKPGQ_FINDINGS.mdfor comprehensive syntax validation results - See
MIGRATION_GUIDE.mdfor detailed migration examples
🚨 Current Status (DuckDB 1.4.x)
Tested Configuration (Validated 2025-10-20 with developer insights):
- ✅ DuckPGQ 7705c5c available from community repository
- ✅ Installation automatic with
ALLOW_UNSIGNED_EXTENSIONS=true - ✅ Property graphs and fixed-length queries work
- ✅ ANY SHORTEST path queries work (with
->*syntax) - ✅ Bounded quantifiers work (with
->{n,m}syntax) - 🛡️ Standalone Kleene operators blocked (safety feature - prevents infinite results)
Understanding "Limitations":
- Safety features: Intentional blocks to prevent runaway queries (e.g., ALL unbounded)
- Roadmap items: Planned features not yet implemented (e.g., anonymous edges, path modes)
- What works today: See compatibility matrix above
Test Suites:
npm run test:duckpgq:syntax- Validate working features (13 tests)npm run test:duckpgq:failures- Understand design decisions (18 tests)
Migration Path:
When full DuckPGQ support arrives for DuckDB 1.4.x/1.5.x, your configuration will automatically upgrade. See MIGRATION_GUIDE.md for migration examples.
📚 Resources
- Complete Guide:
docs/DUCKPGQ_INTEGRATION.md- Full integration guide - What Works:
DUCKPGQ_FINDINGS.md- Comprehensive syntax testing results - Why It Works:
DUCKPGQ_FAILURE_ANALYSIS.md- 18 test cases + developer insights
🔄 Process Mining Tools (v0.9.4+)
NEW: Enhanced process mining capabilities with embeddings standardization and robust composition
🎯 Overview
Three specialized tools for analyzing workflow processes stored in Parquet files:
process.describe- List and filter process summaries with confidence scoresprocess.similar- Find similar processes using vector embeddings (FLOAT[384])process.compose- Merge multiple processes into unified workflow with QA checks
✨ What's New in v0.9.4
P2.8: Embeddings Standardization
Problem Solved: Inconsistent embedding dimensions and no fallback when vector search fails
Features:
- ✅ Dimension Validation: Validates embedding dimensions (configurable via
PROCESS_EMBEDDING_DIM, defaults to 384) - ✅ Fallback Mechanism: Automatic TypeScript L2 distance when DuckDB VSS unavailable
- ✅ Transparency: Results include
distance_sourcefield (duckdb_vssortypescript_l2)
Example:
// Search for similar processes
const results = await handlers['process.similar']({
signature_emb: [0.1, 0.2, ...], // 384-dimensional embedding
k: 5,
parquet_url: 's3://bucket/signatures.parquet'
})
// Results include distance_source for observability
// {
// matches: [
// { doc_id: 'doc1', distance: 0.45, distance_source: 'duckdb_vss' }
// ]
// }Benefits:
- 🛡️ Type Safety: Prevents dimension mismatch errors at runtime
- 🔄 Reliability: Graceful degradation when VSS extension unavailable
- 📊 Observability: Know which distance computation method was used
P2.9: Process Composition Robustification
Problem Solved: Simple deduplication without conflict resolution or quality checks
Features:
- ✅ Step Normalization: Lowercase + trim step keys (handles "Login" vs "login")
- ✅ Conflict Resolution: Uses median order when multiple processes have same step
- ✅ Edge Remapping: Updates edge references after step deduplication
- ✅ QA Checks: Detects orphan steps, cycles, and duplicate edges
Example:
// Compose workflows from multiple documents
const composed = await handlers['process.compose']({
doc_ids: ['doc1', 'doc2', 'doc3'],
steps_url: 's3://bucket/steps.parquet',
edges_url: 's3://bucket/edges.parquet',
})
// Result includes QA report
// {
// success: true,
// steps: [...], // Deduplicated and normalized steps
// edges: [...], // Remapped edges
// merged_count: 5, // Number of steps merged
// qa: {
// orphan_steps: [], // Steps with no edges
// cycles: [], // Detected cycles
// duplicate_edges: [], // Duplicate connections
// warnings: [] // Human-readable issues
// }
// }Benefits:
- 🔀 Smart Merging: Handles variations in step naming across documents
- ⚖️ Fair Ordering: Median order prevents bias from any single process
- 🔗 Edge Consistency: Automatically updates all edge references after merging
- 🔍 Quality Assurance: Immediate feedback on workflow structural issues
📋 Use Cases
1. Process Discovery: Find similar workflows across document corpus
// Generate signature embedding for a new process
const signature = await embedWorkflow(process)
// Find top 10 similar processes
const similar = await handlers['process.similar']({
signature_emb: signature,
k: 10,
})2. Workflow Consolidation: Merge related processes into master workflow
// Compose processes from multiple sources
const master = await handlers['process.compose']({
doc_ids: ['onboarding_v1', 'onboarding_v2', 'onboarding_v3'],
})
// Check for quality issues
if (master.qa.warnings.length > 0) {
console.warn('Workflow issues detected:', master.qa.warnings)
}3. Process Validation: Verify workflow structural integrity
// Compose and validate
const workflow = await handlers['process.compose']({ doc_ids: [...] })
// QA checks automatically run
console.log('Orphan steps:', workflow.qa.orphan_steps)
console.log('Cycles detected:', workflow.qa.cycles)
console.log('Duplicate edges:', workflow.qa.duplicate_edges)🔧 Configuration
Set Parquet URLs via environment variables:
# Process mining Parquet locations
PROCESS_SUMMARY_URL=s3://bucket/process_summary.parquet
PROCESS_STEPS_URL=s3://bucket/process_steps.parquet
PROCESS_EDGES_URL=s3://bucket/process_edges.parquet
PROCESS_SIGNATURE_URL=s3://bucket/process_signatures.parquet
# Embedding dimension (defaults to 384)
# Set to match your embedding model (e.g., 1024 for text-embedding-3-large)
PROCESS_EMBEDDING_DIM=1024
# Enable DuckDB VSS for fast similarity (optional)
# Falls back to TypeScript L2 distance if unavailable📊 Data Schema
Process Steps (configurable embedding dimension):
CREATE TABLE process_steps (
doc_id VARCHAR,
process_id VARCHAR,
step_id VARCHAR,
order INTEGER,
step_key VARCHAR, -- Normalized to lowercase+trim in v0.9.4
label VARCHAR,
evidence VARCHAR,
embedding FLOAT[N] -- Dimension validated via PROCESS_EMBEDDING_DIM (default: 384)
)Process Edges:
CREATE TABLE process_edges (
doc_id VARCHAR,
process_id VARCHAR,
from_step_id VARCHAR, -- Automatically remapped in v0.9.4
to_step_id VARCHAR, -- Automatically remapped in v0.9.4
relation VARCHAR,
evidence VARCHAR
)🚀 Migration from v0.9.3 → v0.9.4
Breaking Changes: None
New Features (backwards compatible):
process.similarnow validates embedding dimensions (configurable viaPROCESS_EMBEDDING_DIM)process.similarreturnsdistance_sourcefieldprocess.composereturnsqafield with quality report
Recommended Actions:
- ✅ Set
PROCESS_EMBEDDING_DIMto match your embedding model (e.g.,1024for text-embedding-3-large) - ✅ Review
qa.warningsafter composition to identify workflow issues - ✅ Monitor
distance_sourceto track VSS availability
🤖 Mastra AI Integration ⚠️ EXPERIMENTAL
Status: Phase 0 - Preparation (November 2025) | Stability: EXPERIMENTAL - API may change
Note: This is a generic adapter only. It provides tool conversion for Mastra agents but does NOT include pre-built agents or business logic. Community contributions welcome for Phase 1 implementation!
Overview
Mastra AI Framework integration provides an adapter to convert DuckDB tools to Mastra-compatible format, enabling AI agents powered by DuckDB's analytical capabilities.
Why Mastra?
- TypeScript-Native: Seamless integration with existing codebase
- MCP First-Class Support:
@mastra/mcpprovides bidirectional MCP integration - Production-Ready: Batteries included (workflows, HITL, observability, state management)
- Rapid Growth: Y Combinator W25 backed, 7.5K+ GitHub stars
Planned Capabilities
Phase 1 (December 2025 - January 2026): Proof-of-Concept
- Convert DuckDB MCP tools to Mastra-compatible format
- Example SQL Analytics Agent (natural language → SQL)
- Basic agent examples and validation
Phase 2 (February - April 2026): Production-Ready
- Full MCPServer integration for external Mastra agents
- Process mining agents (workflow discovery, similarity search, composition)
- Multi-agent orchestration patterns
Phase 3 (May - September 2026): Advanced Features
- Small Language Models (SLM) for context-aware NL-to-SQL
- Human-in-the-Loop (HITL) workflows with suspend/resume
- Vector store integration for semantic search
Use Case Preview
// Example: SQL Analytics Agent (Phase 1)
import { Agent } from '@mastra/core'
import { convertToMastraTools } from '@seed-ship/duckdb-mcp-native/mastra'
const analyticsAgent = new Agent({
name: 'Business Intelligence Agent',
tools: convertToMastraTools({ duckdb: myDuckDBInstance }),
instructions: `You are a SQL expert. Convert natural language questions into SQL queries...`,
model: { provider: 'ANTHROPIC', name: 'claude-3-5-sonnet' },
})
await analyticsAgent.generate({
messages: [{ role: 'user', content: 'What were our top selling products last quarter?' }],
})Current Status
Phase 0 (November 2025) - Infrastructure preparation:
- ✅ Export path
/mastrain package.json - ✅ Adapter skeleton
src/adapters/mastra-adapter.ts - ✅ Comprehensive integration roadmap
Next Steps: Phase 1 implementation begins December 2025
📚 Complete Roadmap: docs/MASTRA_INTEGRATION.md
💡 Community Interest: Mastra integration contributions welcome! See roadmap for details.
🎯 Three Usage Modes
This package supports three distinct usage modes to fit different integration scenarios:
Mode 1: Standalone Server
Run DuckDB MCP as an independent server that other applications can connect to.
# Configure in .env
DUCKDB_MEMORY=4GB
DUCKDB_THREADS=4
MCP_SECURITY_MODE=development
# Start server
npm run dev:server
# Or with npx
npx @seed-ship/duckdb-mcp-nativeUse case: When you need a dedicated DuckDB service that multiple clients can connect to.
Mode 2: Library Mode (/lib)
Import tool handlers directly into your existing MCP server without any auto-initialization.
// Import from /lib for clean library mode (v0.3.0+)
import { nativeToolHandlers, nativeToolDefinitions } from '@seed-ship/duckdb-mcp-native/lib'
// Register tools in your MCP server
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [...yourTools, ...nativeToolDefinitions],
}
})
// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params
if (name in nativeToolHandlers) {
const handler = nativeToolHandlers[name]
return await handler(args)
}
// ... handle your other tools
})Use case: Adding DuckDB capabilities to an existing MCP server (like deposium_MCPs).
Mode 3: Embedded Server
Create a server instance with full control over its lifecycle and configuration.
import { createEmbeddedServer } from '@seed-ship/duckdb-mcp-native/lib'
// Create embedded server with custom config
const duckdbServer = createEmbeddedServer({
embeddedMode: true, // Prevents stdio transport initialization
duckdbService: yourDuckDBInstance, // Optional: use your own DuckDB instance
})
// Start when ready
await duckdbServer.start()
// Get handlers bound to this instance
const handlers = duckdbServer.getNativeHandlers()
// Use in your application
const result = await handlers.query_duckdb({
sql: 'SELECT * FROM users',
limit: 100,
})Use case: Advanced integration scenarios where you need full control over the server lifecycle.
Quick Integration Example
For most integrations, library mode is the simplest approach:
// In your existing MCP server (e.g., deposium_MCPs)
import { nativeToolHandlers, nativeToolDefinitions } from '@seed-ship/duckdb-mcp-native/lib'
// That's it! Tools are now available
console.log('Available DuckDB tools:', Object.keys(nativeToolHandlers))
// Output: ['query_duckdb', 'list_tables', 'describe_table', 'load_csv', 'load_parquet', 'export_data']Federation Example (v0.6.5) ✨
Federation enables distributed queries across multiple MCP servers using the mcp:// protocol:
-- Query GitHub issues directly
SELECT * FROM 'mcp://github/issues.json' WHERE status = 'open'
-- Join local users with remote GitHub data
SELECT u.name, COUNT(g.id) as issue_count
FROM local_users u
JOIN 'mcp://github/issues.json' g ON u.github_username = g.assignee
GROUP BY u.name
-- Aggregate across multiple sources
WITH all_issues AS (
SELECT 'github' as source, * FROM 'mcp://github/issues.json'
UNION ALL
SELECT 'jira' as source, * FROM 'mcp://jira/tickets.json'
)
SELECT source, COUNT(*) as total_issues FROM all_issues GROUP BY sourceUsing Federation Tool
// Use the new federate_query tool
const result = await handlers['federate_query']({
sql: `
SELECT
g.title as issue,
s.message as last_commit
FROM 'mcp://github/issues.json' g
JOIN 'mcp://slack/messages.json' s ON g.id = s.issue_id
`,
explain: false, // Set to true to see query plan
})
// Automatic server registration with attach_mcp
await handlers['attach_mcp']({
connectionString: 'stdio://github-mcp-server',
alias: 'github',
})
// Server is now automatically registered with federation!DuckLake Example (Advanced)
DuckLake provides ACID transactions and time travel capabilities on top of Parquet files:
// Attach a DuckLake catalog
await handlers['ducklake.attach']({
catalogName: 'analytics',
catalogLocation: 's3://data-lake/analytics',
format: 'DELTA',
enableTimeTravel: true,
retentionDays: 30,
compressionType: 'ZSTD',
})
// List table snapshots
const snapshots = await handlers['ducklake.snapshots']({
catalogName: 'analytics',
tableName: 'sales',
action: 'list',
})
// Time travel query - query data as it was yesterday
const historicalData = await handlers['ducklake.time_travel']({
catalogName: 'analytics',
tableName: 'sales',
query: 'SELECT SUM(revenue) as total FROM sales',
timestamp: '2025-01-20T00:00:00Z',
limit: 100,
})
// Clone a table at specific version
await handlers['ducklake.snapshots']({
catalogName: 'analytics',
tableName: 'sales',
action: 'clone',
version: 42,
targetTableName: 'sales_backup_v42',
})
// Rollback to previous version
await handlers['ducklake.snapshots']({
catalogName: 'analytics',
tableName: 'sales',
action: 'rollback',
version: 41,
})DuckLake Features:
- ACID Transactions: Ensures data consistency across operations
- Time Travel: Query historical data at any point in time
- Snapshots: Version control for your data tables
- Multi-tenant Isolation: Space-aware catalogs for tenant separation
- Format Support: Delta Lake and Apache Iceberg formats
- Migration Utilities: Convert existing Parquet/CSV files to DuckLake
MotherDuck Cloud Example
MotherDuck enables hybrid execution with cloud storage and compute:
// Connect to MotherDuck cloud
await handlers['motherduck.attach']({
token: process.env.MOTHERDUCK_TOKEN,
database: 'production',
endpoint: 'app.motherduck.com', // Optional, defaults to main endpoint
})
// List cloud databases
const databases = await handlers['motherduck.list_databases']()
// Share local table to cloud
await handlers['motherduck.share_table']({
localTable: 'local_sales',
cloudTable: 'cloud_sales', // Optional, uses local name if not specified
})
// Query cloud data
const cloudResults = await handlers['motherduck.query']({
sql: 'SELECT * FROM cloud_sales WHERE region = "US"',
limit: 1000,
})
// Import cloud table to local
await handlers['motherduck.import_table']({
cloudTable: 'cloud_analytics',
localTable: 'local_analytics',
})
// Check connection status
const status = await handlers['motherduck.status']()
console.log(`Connected: ${status.connected}, Storage: ${status.bytesUsed}/${status.bytesLimit}`)
// Disconnect when done
await handlers['motherduck.detach']()MotherDuck Features:
- Hybrid Execution: Seamlessly query local and cloud data
- Cloud Storage: Persistent storage in MotherDuck cloud
- Collaborative: Share tables across team members
- Zero-Copy Cloning: Efficient table copies in cloud
- Automatic Scaling: Cloud compute scales with workload
🚀 Virtual Filesystem (v0.6.0)
Query MCP resources directly in SQL with zero configuration:
Environment Variables
MCP_CACHE_DIR: Directory for caching MCP resources (default:/tmp/mcp-cache)MCP_CACHE_TTL: Cache time-to-live in milliseconds (default:300000- 5 minutes)MCP_CACHE_SIZE: Maximum cache size in bytes (default:104857600- 100MB)
Before (Complex)
-- Required 3 steps:
CALL attach_mcp('stdio://weather-server', 'weather');
CALL create_virtual_table('weather_data', 'weather://forecast');
SELECT * FROM weather_data;Now (Simple)
-- Direct access with mcp:// URIs:
SELECT * FROM 'mcp://weather-server/forecast.csv';Features
- Zero Setup: No manual connection or table creation needed
- Auto-Detection: Automatically detects CSV, JSON, Parquet, Arrow, Excel formats
- Glob Patterns: Query multiple resources with wildcards
- Caching: Intelligent local caching for performance
- Federation: Join data across multiple MCP servers
Examples
-- Query specific resource
SELECT * FROM 'mcp://github-server/issues.json'
WHERE status = 'open';
-- Join across servers
SELECT g.title, j.priority
FROM 'mcp://github/issues.json' g
JOIN 'mcp://jira/tickets.json' j ON g.id = j.github_id;
-- Glob patterns for multiple files
SELECT COUNT(*) as error_count
FROM 'mcp://*/logs/2024-*.csv'
WHERE level = 'ERROR';
-- Automatic format detection
SELECT * FROM 'mcp://data/users.parquet'; -- Parquet
SELECT * FROM 'mcp://api/response.json'; -- JSON
SELECT * FROM 'mcp://reports/sales.csv'; -- CSVConfiguration
Enable Virtual Filesystem in your DuckDB service:
const duckdb = new DuckDBService({
virtualFilesystem: {
enabled: true,
config: {
cacheDir: '/tmp/mcp-cache',
defaultTTL: 300000, // 5 minutes
},
},
})
// Use VFS-aware query execution
const results = await duckdb.executeQueryWithVFS("SELECT * FROM 'mcp://server/data.csv'")Architecture
src/
├── duckdb/ # DuckDB service
├── server/ # MCP server implementation
├── client/ # MCP client for federation
├── federation/ # Federation components
│ ├── ResourceRegistry.ts
│ ├── ConnectionPool.ts
│ └── QueryRouter.ts
├── filesystem/ # Virtual Filesystem (v0.6.0)
│ ├── VirtualFilesystem.ts
│ ├── URIParser.ts
│ ├── CacheManager.ts
│ ├── FormatDetector.ts
│ └── QueryPreprocessor.ts
└── protocol/ # Transport implementationsDevelopment
See DEVELOPMENT.md for:
- Testing guide
- Troubleshooting
- Port management
- Debug logging
Security Modes
- development: All queries allowed (default)
- production: Blocks DROP/TRUNCATE, enforces limits
Set via: MCP_SECURITY_MODE=production
Scripts
Development Scripts
npm run dev:server # Start MCP server
npm run inspector # MCP Inspector UI
npm test # Run tests
npm run lint:fix # Fix linting
npm run build # Compile TypeScriptPort Management
# Inspector specific
npm run inspector:clean # Kill Inspector processes on ports 6274/6277
npm run inspector:restart # Clean ports and restart Inspector
npm run inspector:reset # Force kill and restart (alternative method)
# General port management
npm run port:clean # Clear stuck ports (5432, 3000, 8080)
npm run port:status # Check port usage status
npm run port:kill-all # Force kill all managed portsCommon Issues & Solutions
Inspector Port Blocked
If you see ❌ Proxy Server PORT IS IN USE at port 6277 ❌:
# Quick fix - clean and restart
npm run inspector:restart
# Alternative if the above doesn't work
npm run inspector:reset
# Manual cleanup if needed
npm run inspector:clean
npm run inspectorRequirements
- Node.js 18+
- TypeScript 5+
License
MIT
