@seed-ship/duckdb-mcp-native

v1.0.1

Published

7 days ago

Native Node.js/TypeScript implementation of DuckDB MCP extension

0High
0Medium
0Low

seedship

duckdb mcp model-context-protocol sql database typescript nodejs

DuckDB MCP Native

Native TypeScript implementation of DuckDB MCP (Model Context Protocol) server with federation, graph algorithms, and human-in-the-loop security.

v1.0.0 — 469 tests, 0 failures

Features

32+ MCP Tools: SQL queries, schema inspection, CSV/Parquet loading, federation, graph algorithms, process mining, data helpers
8 Graph Algorithm Tools: PageRank, eigenvector, community detection, modularity, weighted paths, temporal analysis, period comparison, multi-format export
HITL Security: Production mode asks user confirmation before destructive SQL via MCP elicitation API
Federation: Distributed queries across multiple MCP servers with mcp:// URIs
Virtual Filesystem: Direct SQL access via mcp:// URIs with auto-format detection
Transports: stdio, WebSocket, TCP (HTTP client-side)
Process Mining: 3 tools for workflow analysis from Parquet files
DuckLake: ACID transactions and time travel on Parquet files
DuckPGQ: SQL:2023 property graph queries (fixed paths, ANY SHORTEST, bounded quantifiers)
MCP SDK 1.26.0: Pinned, with elicitation API and connect() guard

Installation

As MCP Server (Claude Desktop / Claude Code)

{
  "mcpServers": {
    "duckdb": {
      "command": "npx",
      "args": ["@seed-ship/duckdb-mcp-native"],
      "env": {
        "DUCKDB_MEMORY": "4GB",
        "DUCKDB_THREADS": "4",
        "MCP_SECURITY_MODE": "development"
      }
    }
  }
}

With S3/MinIO:

{
  "mcpServers": {
    "duckdb": {
      "command": "npx",
      "args": ["@seed-ship/duckdb-mcp-native"],
      "env": {
        "DUCKDB_MEMORY": "4GB",
        "DUCKDB_THREADS": "4",
        "MCP_SECURITY_MODE": "development",
        "MINIO_PUBLIC_ENDPOINT": "https://s3.example.com",
        "MINIO_ACCESS_KEY": "your-access-key",
        "MINIO_SECRET_KEY": "your-secret-key",
        "MINIO_REGION": "us-east-1",
        "MINIO_USE_SSL": "true"
      }
    }
  }
}

Config file locations:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

As NPM Package

npm install @seed-ship/duckdb-mcp-native

For Development

git clone https://github.com/theseedship/duckdb_mcp_node
cd duckdb_mcp_node
npm install
npm run dev:server    # Start MCP server
npm run inspector     # Test with Inspector UI
npm test              # Run tests

MCP Tools

Database Operations

| Tool | Description | | ---------------- | --------------------------------------- | | query_duckdb | Execute SQL queries with optional LIMIT | | list_tables | List tables in a schema | | describe_table | Get table structure (columns, types) | | load_csv | Load CSV files into DuckDB | | load_parquet | Load Parquet files into DuckDB |

Federation

| Tool | Description | | ----------------------- | ------------------------------ | | attach_mcp | Connect to external MCP server | | detach_mcp | Disconnect server | | list_attached_servers | Show connections | | list_mcp_resources | List remote resources | | create_virtual_table | Create table from MCP resource | | drop_virtual_table | Remove virtual table | | list_virtual_tables | Show virtual tables | | refresh_virtual_table | Update table data | | query_hybrid | Query across local/remote data |

DuckLake (ACID + Time Travel)

| Tool | Description | | ---------------------- | ------------------------------------------ | | ducklake.attach | Attach or create DuckLake catalog | | ducklake.snapshots | List, view, clone, or rollback snapshots | | ducklake.time_travel | Query historical data at any point in time |

MotherDuck Cloud

MotherDuck requires DuckDB v1.3.2. These tools will activate once MotherDuck supports DuckDB v1.4.x.

| Tool | Description | | ---------------------------- | --------------------------- | | motherduck.attach | Connect to MotherDuck cloud | | motherduck.detach | Disconnect | | motherduck.status | Check connection and usage | | motherduck.list_databases | List cloud databases | | motherduck.create_database | Create cloud database | | motherduck.query | Execute cloud queries | | motherduck.share_table | Share local table to cloud | | motherduck.import_table | Import cloud table locally |

Graph Algorithm Tools

8 MCP tools for graph analysis using iterative SQL with temp tables (no recursive CTEs — DuckPGQ workaround for DuckDB 1.4.x).

Tools

| Tool | Description | | ------------------------ | -------------------------------------------------------------- | | graph.pagerank | PageRank centrality with configurable damping/iterations | | graph.eigenvector | Eigenvector centrality via power iteration | | graph.community_detect | Label propagation community detection | | graph.modularity | Modularity score Q for community quality | | graph.weighted_path | Weighted paths: strongest, cheapest, or combined | | graph.temporal_filter | Filter graph by time period, return stats | | graph.compare_periods | Compare two periods (NEW/REMOVED/STRENGTHENED/WEAKENED/STABLE) | | graph.export | Export as JSON, CSV (Gephi), D3, GraphML, or Parquet |

Usage

All tools share a common base schema:

import { graphToolHandlers } from '@seed-ship/duckdb-mcp-native/graph'

const result = await graphToolHandlers['graph.pagerank'](
  {
    node_table: 'vars',
    edge_table: 'drives',
    node_id_column: 'var_id',
    source_column: 'from_var',
    target_column: 'to_var',
    weight_column: 'confidence',
    iterations: 20,
    damping: 0.85,
    top_n: 10,
  },
  duckdb
)
// result.nodes = [{ node_id: 4, rank: 0.21 }, ...]

Temporal Analysis

// Compare graph evolution across time periods
const changes = await graphToolHandlers['graph.compare_periods'](
  {
    node_table: 'vars',
    edge_table: 'drives',
    source_column: 'from_var',
    target_column: 'to_var',
    weight_column: 'confidence',
    period_column: 'period',
    period_a: '1993-2023',
    period_b: '2020-2023',
  },
  duckdb
)
// changes.summary = { new_edges: 3, removed_edges: 5, strengthened: 2, ... }

DuckPGQ Property Graph Support

SQL:2023 Property Graph queries — with automatic DuckPGQ installation from community repository.

Configuration

# Required for DuckPGQ
ENABLE_DUCKPGQ=true
ALLOW_UNSIGNED_EXTENSIONS=true
DUCKPGQ_SOURCE=community  # Default, no custom URL needed

Compatibility Matrix

| DuckDB Version | DuckPGQ Version | Fixed Paths | Bounded {n,m} | ANY SHORTEST | Kleene (alone) | Status | | ------------------- | --------------- | ----------- | ------------- | ------------ | -------------- | -------------------- | | 1.0.0 - 1.2.2 | Stable | ✅ | ✅ | ✅ | ✅ | Production Ready | | 1.4.1 (current) | 7705c5c | ✅ | ✅ | ✅ | ❌ | Functional | | 1.5.x+ | TBD | ? | ? | ? | ? | Planned |

What Works (Validated via Automated Testing)

Version 7705c5c (current for DuckDB 1.4.x)

✅ Working features:

Property graph creation (VERTEX/EDGE TABLES)
Basic pattern matching with GRAPH_TABLE
Fixed-length paths (explicit hops: 1-hop, 2-hop, 3-hop, etc.)
ANY SHORTEST path queries with ->* syntax (star AFTER arrow)
Bounded quantifiers {n,m} with ->{n,m} syntax (quantifier AFTER arrow)
Direct relationship queries (edge variable required: [e:Label])

⚠️ Design Decisions (Not Bugs):

Safety Feature: Standalone Kleene operators (->*, ->+) blocked without ANY SHORTEST
- Why: Prevents potentially infinite results on cyclic graphs
- Error: "ALL unbounded with path mode WALK is not possible"
- Workaround: Use ANY SHORTEST or bounded quantifiers ->{n,m}
Roadmap Item: Edge variable names required (temporary)
- Current: Must use [e:Label], not [:Label]
- Future: Anonymous edge syntax will be supported
Roadmap Item: Explicit label binding sometimes required
- Status: Label inference implemented but not yet exposed
- Workaround: Always specify labels explicitly

Example Queries

-- Create property graph from existing tables
CREATE PROPERTY GRAPH social_network
  VERTEX TABLES (Person)
  EDGE TABLES (
    Knows
      SOURCE KEY (from_id) REFERENCES Person (id)
      DESTINATION KEY (to_id) REFERENCES Person (id)
  );

-- Direct connections (1-hop) - edge variable required
FROM GRAPH_TABLE (social_network
  MATCH (p1:Person)-[k:Knows]->(p2:Person)
  COLUMNS (p1.name AS person, p2.name AS friend)
);

-- Friends of friends (fixed 2-hop)
FROM GRAPH_TABLE (social_network
  MATCH (p1:Person)-[k1:Knows]->(p2:Person)-[k2:Knows]->(p3:Person)
  WHERE p1.id != p3.id
  COLUMNS (p1.name AS person, p3.name AS friend_of_friend)
);

-- ANY SHORTEST path query
FROM GRAPH_TABLE (social_network
  MATCH p = ANY SHORTEST (start:Person WHERE start.id = 1)-[k:Knows]->*(end:Person WHERE end.id = 10)
  COLUMNS (start.name AS from_person, end.name AS to_person, path_length(p) AS hops)
);

-- Bounded quantifiers (1 to 3 hops)
FROM GRAPH_TABLE (social_network
  MATCH (p1:Person)-[k:Knows]->{1,3}(p2:Person)
  COLUMNS (p1.name AS person, p2.name AS connection)
);

Detailed findings: docs/duckpgq/FINDINGS.md | Failure analysis: docs/duckpgq/FAILURE_ANALYSIS.md

Process Mining Tools

Three specialized tools for analyzing workflow processes stored in Parquet files.

| Tool | Description | | ------------------ | ------------------------------------------------------------- | | process.describe | List and filter process summaries with confidence scores | | process.similar | Find similar processes using vector embeddings (FLOAT[384]) | | process.compose | Merge multiple processes into unified workflow with QA checks |

Embeddings & Similarity Search

Validates embedding dimensions (configurable via PROCESS_EMBEDDING_DIM, defaults to 384)
Automatic fallback to TypeScript L2 distance when DuckDB VSS is unavailable
Results include distance_source field (duckdb_vss or typescript_l2) for observability

const results = await handlers['process.similar']({
  signature_emb: [0.1, 0.2, ...], // 384-dimensional embedding
  k: 5,
  parquet_url: 's3://bucket/signatures.parquet',
})
// { matches: [{ doc_id: 'doc1', distance: 0.45, distance_source: 'duckdb_vss' }] }

Process Composition

Step normalization (lowercase + trim, handles "Login" vs "login")
Conflict resolution via median order when multiple processes share a step
Automatic edge remapping after step deduplication
QA checks: detects orphan steps, cycles, and duplicate edges

const composed = await handlers['process.compose']({
  doc_ids: ['doc1', 'doc2', 'doc3'],
  steps_url: 's3://bucket/steps.parquet',
  edges_url: 's3://bucket/edges.parquet',
})
// {
//   success: true,
//   steps: [...],           // Deduplicated and normalized
//   edges: [...],           // Remapped edges
//   merged_count: 5,
//   qa: { orphan_steps: [], cycles: [], duplicate_edges: [], warnings: [] }
// }

Configuration

PROCESS_SUMMARY_URL=s3://bucket/process_summary.parquet
PROCESS_STEPS_URL=s3://bucket/process_steps.parquet
PROCESS_EDGES_URL=s3://bucket/process_edges.parquet
PROCESS_SIGNATURE_URL=s3://bucket/process_signatures.parquet
PROCESS_EMBEDDING_DIM=384  # Match your embedding model (e.g., 1024 for text-embedding-3-large)

Three Usage Modes

Mode 1: Standalone Server

DUCKDB_MEMORY=4GB DUCKDB_THREADS=4 npm run dev:server
# Or: npx @seed-ship/duckdb-mcp-native

Mode 2: Library Mode

Import tool handlers directly into your existing MCP server:

import { nativeToolHandlers, nativeToolDefinitions } from '@seed-ship/duckdb-mcp-native/lib'

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [...yourTools, ...nativeToolDefinitions],
}))

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params
  if (name in nativeToolHandlers) {
    return await nativeToolHandlers[name](args)
  }
})

Mode 3: Embedded Server

import { DuckDBMCPServer } from '@seed-ship/duckdb-mcp-native/server'

const server = new DuckDBMCPServer({
  embeddedMode: true,
  duckdbService: yourDuckDBInstance, // Optional
})
await server.start()
const handlers = server.getNativeHandlers()

Virtual Filesystem

Query MCP resources directly in SQL with zero configuration:

-- Direct access with mcp:// URIs
SELECT * FROM 'mcp://weather-server/forecast.csv';

-- Join across servers
SELECT g.title, j.priority
FROM 'mcp://github/issues.json' g
JOIN 'mcp://jira/tickets.json' j ON g.id = j.github_id;

-- Glob patterns
SELECT COUNT(*) FROM 'mcp://*/logs/2024-*.csv' WHERE level = 'ERROR';

Features: auto-format detection (CSV, JSON, Parquet, Arrow, Excel), intelligent caching, glob patterns.

Federation

// Attach servers
await handlers['attach_mcp']({
  connectionString: 'stdio://github-mcp-server',
  alias: 'github',
})

// Federated query
const result = await handlers['federate_query']({
  sql: `
    SELECT g.title, s.message
    FROM 'mcp://github/issues.json' g
    JOIN 'mcp://slack/messages.json' s ON g.id = s.issue_id
  `,
})

DuckLake (ACID + Time Travel)

// Attach catalog
await handlers['ducklake.attach']({
  catalogName: 'analytics',
  catalogLocation: 's3://data-lake/analytics',
  format: 'DELTA',
  enableTimeTravel: true,
})

// Time travel
const historical = await handlers['ducklake.time_travel']({
  catalogName: 'analytics',
  tableName: 'sales',
  query: 'SELECT SUM(revenue) FROM sales',
  timestamp: '2025-01-20T00:00:00Z',
})

// Snapshot management
await handlers['ducklake.snapshots']({
  catalogName: 'analytics',
  tableName: 'sales',
  action: 'rollback',
  version: 41,
})

Security

Modes

development (default): All queries allowed
production: Destructive SQL triggers HITL elicitation

Set via MCP_SECURITY_MODE=production

HITL Elicitation

In production mode, destructive operations (DROP, DELETE, ALTER, TRUNCATE, INSERT, UPDATE, GRANT, REVOKE) trigger a confirmation prompt via the MCP SDK's elicitation API:

| Scenario | Behavior | | ------------------------------------------ | ---------------------------- | | Client supports elicitation, user confirms | Query executes | | Client supports elicitation, user declines | Query blocked | | Client does not support elicitation | Query blocked (safe default) | | Elicitation times out or errors | Query blocked |

Configure timeout: MCP_ELICIT_TIMEOUT=30000 (ms, default 30s)

Architecture

src/
  duckdb/          # DuckDB service with pooling
  server/          # MCP server (32+ tools, HITL security)
  client/          # MCP client for federation
  federation/      # ResourceRegistry, ConnectionPool, QueryRouter
  filesystem/      # Virtual Filesystem (mcp:// URIs)
  protocol/        # Transport implementations (stdio, WS, TCP, HTTP)
  tools/           # Graph, process, data helper, DuckLake, MotherDuck tools
  context/         # Multi-tenant space isolation
  monitoring/      # Performance metrics

Configuration

| Variable | Default | Description | | --------------------------- | ---------------- | -------------------------------------- | | DUCKDB_MEMORY | 4GB | DuckDB memory limit | | DUCKDB_THREADS | 4 | DuckDB thread count | | MCP_SECURITY_MODE | development | development / production | | MCP_ELICIT_TIMEOUT | 30000 | HITL elicitation timeout (ms) | | MCP_MAX_QUERY_SIZE | 1000000 | Max SQL query size (chars) | | MCP_CACHE_DIR | /tmp/mcp-cache | VFS cache directory | | MCP_CACHE_TTL | 300000 | VFS cache TTL (ms) | | ENABLE_DUCKPGQ | false | Enable DuckPGQ extension | | ALLOW_UNSIGNED_EXTENSIONS | false | Required for DuckPGQ | | PROCESS_EMBEDDING_DIM | 384 | Embedding dimension for process mining |

Scripts

npm run dev:server        # Start MCP server
npm run inspector         # MCP Inspector UI
npm test                  # Run tests
npm run test:watch        # TDD mode
npm run check:all         # Typecheck + lint + format + tests
npm run lint:fix          # Auto-fix lint issues
npm run format            # Format code
npm run port:clean        # Fix port issues
npm run inspector:reset   # Reset stuck Inspector

Documentation

| Document | Description | | ------------------------------------------------------ | ------------------------------------- | | docs/CHANGELOG.md | Detailed changelog | | docs/DEVELOPMENT.md | Development setup guide | | docs/ARCHITECTURE.md | System architecture | | docs/TRANSPORTS.md | Transport protocols + HITL flow | | docs/FEDERATION_GUIDE.md | Federation documentation | | docs/duckpgq/ | DuckPGQ findings and failure analysis | | docs/roadmap/ | Roadmap and planning |

Requirements

Node.js 20+
TypeScript 5+

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

DuckDB MCP Native

Features

Installation

As MCP Server (Claude Desktop / Claude Code)

As NPM Package

For Development

MCP Tools

Database Operations

Federation

DuckLake (ACID + Time Travel)

MotherDuck Cloud

Graph Algorithm Tools

Tools

Usage

Temporal Analysis

DuckPGQ Property Graph Support

Configuration

Compatibility Matrix

What Works (Validated via Automated Testing)

Example Queries

Process Mining Tools

Embeddings & Similarity Search

Process Composition

Configuration

Three Usage Modes

Mode 1: Standalone Server

Mode 2: Library Mode

Mode 3: Embedded Server

Virtual Filesystem

Federation

DuckLake (ACID + Time Travel)

Security

Modes

HITL Elicitation

Architecture

Configuration

Scripts

Documentation

Requirements

License