@seed-ship/duckdb-mcp-native
v1.0.1
Published
Native Node.js/TypeScript implementation of DuckDB MCP extension
Maintainers
Readme
DuckDB MCP Native
Native TypeScript implementation of DuckDB MCP (Model Context Protocol) server with federation, graph algorithms, and human-in-the-loop security.
v1.0.0 — 469 tests, 0 failures
Features
- 32+ MCP Tools: SQL queries, schema inspection, CSV/Parquet loading, federation, graph algorithms, process mining, data helpers
- 8 Graph Algorithm Tools: PageRank, eigenvector, community detection, modularity, weighted paths, temporal analysis, period comparison, multi-format export
- HITL Security: Production mode asks user confirmation before destructive SQL via MCP elicitation API
- Federation: Distributed queries across multiple MCP servers with
mcp://URIs - Virtual Filesystem: Direct SQL access via
mcp://URIs with auto-format detection - Transports: stdio, WebSocket, TCP (HTTP client-side)
- Process Mining: 3 tools for workflow analysis from Parquet files
- DuckLake: ACID transactions and time travel on Parquet files
- DuckPGQ: SQL:2023 property graph queries (fixed paths, ANY SHORTEST, bounded quantifiers)
- MCP SDK 1.26.0: Pinned, with elicitation API and connect() guard
Installation
As MCP Server (Claude Desktop / Claude Code)
{
"mcpServers": {
"duckdb": {
"command": "npx",
"args": ["@seed-ship/duckdb-mcp-native"],
"env": {
"DUCKDB_MEMORY": "4GB",
"DUCKDB_THREADS": "4",
"MCP_SECURITY_MODE": "development"
}
}
}
}With S3/MinIO:
{
"mcpServers": {
"duckdb": {
"command": "npx",
"args": ["@seed-ship/duckdb-mcp-native"],
"env": {
"DUCKDB_MEMORY": "4GB",
"DUCKDB_THREADS": "4",
"MCP_SECURITY_MODE": "development",
"MINIO_PUBLIC_ENDPOINT": "https://s3.example.com",
"MINIO_ACCESS_KEY": "your-access-key",
"MINIO_SECRET_KEY": "your-secret-key",
"MINIO_REGION": "us-east-1",
"MINIO_USE_SSL": "true"
}
}
}
}Config file locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
As NPM Package
npm install @seed-ship/duckdb-mcp-nativeFor Development
git clone https://github.com/theseedship/duckdb_mcp_node
cd duckdb_mcp_node
npm install
npm run dev:server # Start MCP server
npm run inspector # Test with Inspector UI
npm test # Run testsMCP Tools
Database Operations
| Tool | Description |
| ---------------- | --------------------------------------- |
| query_duckdb | Execute SQL queries with optional LIMIT |
| list_tables | List tables in a schema |
| describe_table | Get table structure (columns, types) |
| load_csv | Load CSV files into DuckDB |
| load_parquet | Load Parquet files into DuckDB |
Federation
| Tool | Description |
| ----------------------- | ------------------------------ |
| attach_mcp | Connect to external MCP server |
| detach_mcp | Disconnect server |
| list_attached_servers | Show connections |
| list_mcp_resources | List remote resources |
| create_virtual_table | Create table from MCP resource |
| drop_virtual_table | Remove virtual table |
| list_virtual_tables | Show virtual tables |
| refresh_virtual_table | Update table data |
| query_hybrid | Query across local/remote data |
DuckLake (ACID + Time Travel)
| Tool | Description |
| ---------------------- | ------------------------------------------ |
| ducklake.attach | Attach or create DuckLake catalog |
| ducklake.snapshots | List, view, clone, or rollback snapshots |
| ducklake.time_travel | Query historical data at any point in time |
MotherDuck Cloud
MotherDuck requires DuckDB v1.3.2. These tools will activate once MotherDuck supports DuckDB v1.4.x.
| Tool | Description |
| ---------------------------- | --------------------------- |
| motherduck.attach | Connect to MotherDuck cloud |
| motherduck.detach | Disconnect |
| motherduck.status | Check connection and usage |
| motherduck.list_databases | List cloud databases |
| motherduck.create_database | Create cloud database |
| motherduck.query | Execute cloud queries |
| motherduck.share_table | Share local table to cloud |
| motherduck.import_table | Import cloud table locally |
Graph Algorithm Tools
8 MCP tools for graph analysis using iterative SQL with temp tables (no recursive CTEs — DuckPGQ workaround for DuckDB 1.4.x).
Tools
| Tool | Description |
| ------------------------ | -------------------------------------------------------------- |
| graph.pagerank | PageRank centrality with configurable damping/iterations |
| graph.eigenvector | Eigenvector centrality via power iteration |
| graph.community_detect | Label propagation community detection |
| graph.modularity | Modularity score Q for community quality |
| graph.weighted_path | Weighted paths: strongest, cheapest, or combined |
| graph.temporal_filter | Filter graph by time period, return stats |
| graph.compare_periods | Compare two periods (NEW/REMOVED/STRENGTHENED/WEAKENED/STABLE) |
| graph.export | Export as JSON, CSV (Gephi), D3, GraphML, or Parquet |
Usage
All tools share a common base schema:
import { graphToolHandlers } from '@seed-ship/duckdb-mcp-native/graph'
const result = await graphToolHandlers['graph.pagerank'](
{
node_table: 'vars',
edge_table: 'drives',
node_id_column: 'var_id',
source_column: 'from_var',
target_column: 'to_var',
weight_column: 'confidence',
iterations: 20,
damping: 0.85,
top_n: 10,
},
duckdb
)
// result.nodes = [{ node_id: 4, rank: 0.21 }, ...]Temporal Analysis
// Compare graph evolution across time periods
const changes = await graphToolHandlers['graph.compare_periods'](
{
node_table: 'vars',
edge_table: 'drives',
source_column: 'from_var',
target_column: 'to_var',
weight_column: 'confidence',
period_column: 'period',
period_a: '1993-2023',
period_b: '2020-2023',
},
duckdb
)
// changes.summary = { new_edges: 3, removed_edges: 5, strengthened: 2, ... }DuckPGQ Property Graph Support
SQL:2023 Property Graph queries — with automatic DuckPGQ installation from community repository.
Configuration
# Required for DuckPGQ
ENABLE_DUCKPGQ=true
ALLOW_UNSIGNED_EXTENSIONS=true
DUCKPGQ_SOURCE=community # Default, no custom URL neededCompatibility Matrix
| DuckDB Version | DuckPGQ Version | Fixed Paths | Bounded {n,m} | ANY SHORTEST | Kleene (alone) | Status | | ------------------- | --------------- | ----------- | ------------- | ------------ | -------------- | -------------------- | | 1.0.0 - 1.2.2 | Stable | ✅ | ✅ | ✅ | ✅ | Production Ready | | 1.4.1 (current) | 7705c5c | ✅ | ✅ | ✅ | ❌ | Functional | | 1.5.x+ | TBD | ? | ? | ? | ? | Planned |
What Works (Validated via Automated Testing)
Version 7705c5c (current for DuckDB 1.4.x)
✅ Working features:
- Property graph creation (VERTEX/EDGE TABLES)
- Basic pattern matching with
GRAPH_TABLE - Fixed-length paths (explicit hops: 1-hop, 2-hop, 3-hop, etc.)
- ANY SHORTEST path queries with
->*syntax (star AFTER arrow) - Bounded quantifiers
{n,m}with->{n,m}syntax (quantifier AFTER arrow) - Direct relationship queries (edge variable required:
[e:Label])
⚠️ Design Decisions (Not Bugs):
Safety Feature: Standalone Kleene operators (
->*,->+) blocked without ANY SHORTEST- Why: Prevents potentially infinite results on cyclic graphs
- Error: "ALL unbounded with path mode WALK is not possible"
- Workaround: Use
ANY SHORTESTor bounded quantifiers->{n,m}
Roadmap Item: Edge variable names required (temporary)
- Current: Must use
[e:Label], not[:Label] - Future: Anonymous edge syntax will be supported
- Current: Must use
Roadmap Item: Explicit label binding sometimes required
- Status: Label inference implemented but not yet exposed
- Workaround: Always specify labels explicitly
Example Queries
-- Create property graph from existing tables
CREATE PROPERTY GRAPH social_network
VERTEX TABLES (Person)
EDGE TABLES (
Knows
SOURCE KEY (from_id) REFERENCES Person (id)
DESTINATION KEY (to_id) REFERENCES Person (id)
);
-- Direct connections (1-hop) - edge variable required
FROM GRAPH_TABLE (social_network
MATCH (p1:Person)-[k:Knows]->(p2:Person)
COLUMNS (p1.name AS person, p2.name AS friend)
);
-- Friends of friends (fixed 2-hop)
FROM GRAPH_TABLE (social_network
MATCH (p1:Person)-[k1:Knows]->(p2:Person)-[k2:Knows]->(p3:Person)
WHERE p1.id != p3.id
COLUMNS (p1.name AS person, p3.name AS friend_of_friend)
);
-- ANY SHORTEST path query
FROM GRAPH_TABLE (social_network
MATCH p = ANY SHORTEST (start:Person WHERE start.id = 1)-[k:Knows]->*(end:Person WHERE end.id = 10)
COLUMNS (start.name AS from_person, end.name AS to_person, path_length(p) AS hops)
);
-- Bounded quantifiers (1 to 3 hops)
FROM GRAPH_TABLE (social_network
MATCH (p1:Person)-[k:Knows]->{1,3}(p2:Person)
COLUMNS (p1.name AS person, p2.name AS connection)
);Detailed findings: docs/duckpgq/FINDINGS.md | Failure analysis: docs/duckpgq/FAILURE_ANALYSIS.md
Process Mining Tools
Three specialized tools for analyzing workflow processes stored in Parquet files.
| Tool | Description |
| ------------------ | ------------------------------------------------------------- |
| process.describe | List and filter process summaries with confidence scores |
| process.similar | Find similar processes using vector embeddings (FLOAT[384]) |
| process.compose | Merge multiple processes into unified workflow with QA checks |
Embeddings & Similarity Search
- Validates embedding dimensions (configurable via
PROCESS_EMBEDDING_DIM, defaults to 384) - Automatic fallback to TypeScript L2 distance when DuckDB VSS is unavailable
- Results include
distance_sourcefield (duckdb_vssortypescript_l2) for observability
const results = await handlers['process.similar']({
signature_emb: [0.1, 0.2, ...], // 384-dimensional embedding
k: 5,
parquet_url: 's3://bucket/signatures.parquet',
})
// { matches: [{ doc_id: 'doc1', distance: 0.45, distance_source: 'duckdb_vss' }] }Process Composition
- Step normalization (lowercase + trim, handles "Login" vs "login")
- Conflict resolution via median order when multiple processes share a step
- Automatic edge remapping after step deduplication
- QA checks: detects orphan steps, cycles, and duplicate edges
const composed = await handlers['process.compose']({
doc_ids: ['doc1', 'doc2', 'doc3'],
steps_url: 's3://bucket/steps.parquet',
edges_url: 's3://bucket/edges.parquet',
})
// {
// success: true,
// steps: [...], // Deduplicated and normalized
// edges: [...], // Remapped edges
// merged_count: 5,
// qa: { orphan_steps: [], cycles: [], duplicate_edges: [], warnings: [] }
// }Configuration
PROCESS_SUMMARY_URL=s3://bucket/process_summary.parquet
PROCESS_STEPS_URL=s3://bucket/process_steps.parquet
PROCESS_EDGES_URL=s3://bucket/process_edges.parquet
PROCESS_SIGNATURE_URL=s3://bucket/process_signatures.parquet
PROCESS_EMBEDDING_DIM=384 # Match your embedding model (e.g., 1024 for text-embedding-3-large)Three Usage Modes
Mode 1: Standalone Server
DUCKDB_MEMORY=4GB DUCKDB_THREADS=4 npm run dev:server
# Or: npx @seed-ship/duckdb-mcp-nativeMode 2: Library Mode
Import tool handlers directly into your existing MCP server:
import { nativeToolHandlers, nativeToolDefinitions } from '@seed-ship/duckdb-mcp-native/lib'
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [...yourTools, ...nativeToolDefinitions],
}))
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params
if (name in nativeToolHandlers) {
return await nativeToolHandlers[name](args)
}
})Mode 3: Embedded Server
import { DuckDBMCPServer } from '@seed-ship/duckdb-mcp-native/server'
const server = new DuckDBMCPServer({
embeddedMode: true,
duckdbService: yourDuckDBInstance, // Optional
})
await server.start()
const handlers = server.getNativeHandlers()Virtual Filesystem
Query MCP resources directly in SQL with zero configuration:
-- Direct access with mcp:// URIs
SELECT * FROM 'mcp://weather-server/forecast.csv';
-- Join across servers
SELECT g.title, j.priority
FROM 'mcp://github/issues.json' g
JOIN 'mcp://jira/tickets.json' j ON g.id = j.github_id;
-- Glob patterns
SELECT COUNT(*) FROM 'mcp://*/logs/2024-*.csv' WHERE level = 'ERROR';Features: auto-format detection (CSV, JSON, Parquet, Arrow, Excel), intelligent caching, glob patterns.
Federation
// Attach servers
await handlers['attach_mcp']({
connectionString: 'stdio://github-mcp-server',
alias: 'github',
})
// Federated query
const result = await handlers['federate_query']({
sql: `
SELECT g.title, s.message
FROM 'mcp://github/issues.json' g
JOIN 'mcp://slack/messages.json' s ON g.id = s.issue_id
`,
})DuckLake (ACID + Time Travel)
// Attach catalog
await handlers['ducklake.attach']({
catalogName: 'analytics',
catalogLocation: 's3://data-lake/analytics',
format: 'DELTA',
enableTimeTravel: true,
})
// Time travel
const historical = await handlers['ducklake.time_travel']({
catalogName: 'analytics',
tableName: 'sales',
query: 'SELECT SUM(revenue) FROM sales',
timestamp: '2025-01-20T00:00:00Z',
})
// Snapshot management
await handlers['ducklake.snapshots']({
catalogName: 'analytics',
tableName: 'sales',
action: 'rollback',
version: 41,
})Security
Modes
- development (default): All queries allowed
- production: Destructive SQL triggers HITL elicitation
Set via MCP_SECURITY_MODE=production
HITL Elicitation
In production mode, destructive operations (DROP, DELETE, ALTER, TRUNCATE, INSERT, UPDATE, GRANT, REVOKE) trigger a confirmation prompt via the MCP SDK's elicitation API:
| Scenario | Behavior | | ------------------------------------------ | ---------------------------- | | Client supports elicitation, user confirms | Query executes | | Client supports elicitation, user declines | Query blocked | | Client does not support elicitation | Query blocked (safe default) | | Elicitation times out or errors | Query blocked |
Configure timeout: MCP_ELICIT_TIMEOUT=30000 (ms, default 30s)
Architecture
src/
duckdb/ # DuckDB service with pooling
server/ # MCP server (32+ tools, HITL security)
client/ # MCP client for federation
federation/ # ResourceRegistry, ConnectionPool, QueryRouter
filesystem/ # Virtual Filesystem (mcp:// URIs)
protocol/ # Transport implementations (stdio, WS, TCP, HTTP)
tools/ # Graph, process, data helper, DuckLake, MotherDuck tools
context/ # Multi-tenant space isolation
monitoring/ # Performance metricsConfiguration
| Variable | Default | Description |
| --------------------------- | ---------------- | -------------------------------------- |
| DUCKDB_MEMORY | 4GB | DuckDB memory limit |
| DUCKDB_THREADS | 4 | DuckDB thread count |
| MCP_SECURITY_MODE | development | development / production |
| MCP_ELICIT_TIMEOUT | 30000 | HITL elicitation timeout (ms) |
| MCP_MAX_QUERY_SIZE | 1000000 | Max SQL query size (chars) |
| MCP_CACHE_DIR | /tmp/mcp-cache | VFS cache directory |
| MCP_CACHE_TTL | 300000 | VFS cache TTL (ms) |
| ENABLE_DUCKPGQ | false | Enable DuckPGQ extension |
| ALLOW_UNSIGNED_EXTENSIONS | false | Required for DuckPGQ |
| PROCESS_EMBEDDING_DIM | 384 | Embedding dimension for process mining |
Scripts
npm run dev:server # Start MCP server
npm run inspector # MCP Inspector UI
npm test # Run tests
npm run test:watch # TDD mode
npm run check:all # Typecheck + lint + format + tests
npm run lint:fix # Auto-fix lint issues
npm run format # Format code
npm run port:clean # Fix port issues
npm run inspector:reset # Reset stuck InspectorDocumentation
| Document | Description |
| ------------------------------------------------------ | ------------------------------------- |
| docs/CHANGELOG.md | Detailed changelog |
| docs/DEVELOPMENT.md | Development setup guide |
| docs/ARCHITECTURE.md | System architecture |
| docs/TRANSPORTS.md | Transport protocols + HITL flow |
| docs/FEDERATION_GUIDE.md | Federation documentation |
| docs/duckpgq/ | DuckPGQ findings and failure analysis |
| docs/roadmap/ | Roadmap and planning |
Requirements
- Node.js 20+
- TypeScript 5+
License
MIT
