dataform-docs-mcp
v0.2.2
Published
MCP server for Claude Code providing access to Dataform documentation through semantic search, concept explanations, and best practices
Downloads
33
Maintainers
Readme
Dataform Documentation MCP Server
A Model Context Protocol (MCP) server for Claude Code (VS Code extension) that provides access to comprehensive Dataform documentation. Get instant answers about Dataform's features, syntax, and best practices while coding.
✨ New in v0.2.0: Pre-built documentation database included! Install via npm and start using immediately - no ingestion wait required.
Features
- Semantic Search: Search Dataform documentation using natural language queries
- Concept Retrieval: Get detailed explanations of Dataform concepts with examples
- Syntax Reference: Look up syntax for specific Dataform constructs
- Error Solutions: Find solutions for common Dataform errors
- Best Practices: Retrieve curated best practices for various use cases
Installation
From npm (Recommended)
The easiest way to install with a pre-built documentation database:
npm install -g dataform-docs-mcpIncludes: Pre-built database with all Dataform documentation (~5.5MB download, 12.6MB unpacked)
From Source (Development)
For contributors or those who want to build from source:
# Clone the repository
git clone https://github.com/mhooson/dataform-mcp.git
cd dataform-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Run initial documentation ingestion
npm run ingestQuick Start
Prerequisites
Before starting, ensure you have:
- Node.js 18+ installed
- ChromaDB installed:
pip install chromadb - ~13MB disk space for the documentation database
- OpenAI API key required for query embeddings: Get one here
- Used to convert search queries to embeddings (~$0.0001 per query)
- Also needed for updating the documentation database
Quick Start (npm installation)
The package includes a pre-built database with all Dataform documentation, so you can start using it immediately!
Step 1: Install the package
npm install -g dataform-docs-mcpThis installs the package including the pre-built documentation database (~5.5MB download, 12.6MB unpacked).
Step 2: Start ChromaDB server
# The database is included in the npm package installation
# Find the package location:
npm root -g # Example: /usr/local/lib/node_modules
# Start ChromaDB pointing to the installed database
chroma run --path $(npm root -g)/dataform-docs-mcp/data/vectordb --port 8000Keep this terminal running.
Step 3: Start the MCP server
dataform-mcp serveThat's it! The server is ready to use with the pre-built database.
Optional: Update the documentation later
# Only needed if you want the latest Dataform docs
export OPENAI_API_KEY=your-api-key-here
dataform-mcp updateThe update command refreshes the database with the latest documentation from Google Cloud.
Step 4: Configure your IDE (see IDE Configuration section below)
Alternative: Source Installation Quick Start
For those installing from source (see Installation section above):
1. Start ChromaDB Server
The MCP server requires ChromaDB to be running:
chroma run --path ./data/vectordb --port 8000Keep this running in a separate terminal.
Note: ChromaDB must be running on port 8000 (default) for the MCP server to connect.
2. Configure Claude Code (VS Code Extension)
For npm installation:
Create a .mcp.json file in your project root:
{
"mcpServers": {
"dataform-docs": {
"command": "dataform-mcp",
"args": ["serve"],
"env": {
"OPENAI_API_KEY": "your-api-key-here"
}
}
}
}Or use the CLI command:
claude mcp add --transport stdio dataform-docs \
--env OPENAI_API_KEY=your-api-key-here \
-- dataform-mcp serveFor development (from source):
{
"mcpServers": {
"dataform-docs": {
"command": "node",
"args": ["/absolute/path/to/dataform-mcp/dist/src/index.js"],
"env": {
"OPENAI_API_KEY": "your-api-key-here"
}
}
}
}Scope Options:
--scope local(default): Private to current project--scope project: Shared with team via version control--scope user: Available across all your projects
Why OpenAI API key is required: The server needs to convert your search queries into embeddings to search the vector database. This incurs minimal cost (~$0.0001 per query).
3. Start the MCP Server
For development (from source):
npm run devOr for production (after global install):
dataform-mcp serveAvailable Tools
1. search_dataform_docs
Search Dataform documentation semantically.
Parameters:
query(string, required): The search querymax_results(number, optional): Maximum results to return (default: 5)doc_type(array, optional): Filter by type: ["guide", "reference", "tutorial", "api"]include_code(boolean, optional): Prioritize code examples (default: false)
Example:
{
"query": "How do I create an incremental table?",
"max_results": 3,
"include_code": true
}2. get_dataform_concept
Retrieve detailed explanation of a Dataform concept.
Parameters:
concept(string, required): The concept name (e.g., "assertions", "dependencies")
Example:
{
"concept": "incremental tables"
}3. get_syntax_reference
Get syntax reference for Dataform constructs.
Parameters:
construct(string, required): The construct name (e.g., "config block", "ref() function")language(string, optional): "sqlx" or "javascript"
Example:
{
"construct": "ref() function",
"language": "sqlx"
}4. search_error_solutions
Find solutions for Dataform errors.
Parameters:
error_message(string, required): The error messagecontext(string, optional): Additional context
Example:
{
"error_message": "Cannot find module '@dataform/core'",
"context": "Occurs when running dataform compile"
}5. get_best_practices
Retrieve best practices for specific topics.
Parameters:
topic(string, required): The topic (e.g., "testing", "performance", "incremental models")
Example:
{
"topic": "testing"
}Project Structure
dataform-mcp/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── cli.ts # CLI interface
│ ├── tools/ # Tool implementations
│ ├── ingestion/ # Documentation scraping & processing
│ ├── storage/ # Vector & metadata storage
│ ├── search/ # Search implementations
│ └── utils/ # Utilities (config, logger)
├── data/
│ ├── vectordb/ # Vector database files
│ ├── metadata.db # SQLite metadata
│ └── cache/ # Query cache
├── scripts/
│ ├── ingest.ts # Initial ingestion
│ └── update.ts # Update documentation
├── config.json # Configuration
└── package.jsonConfiguration
Edit config.json to customize behavior:
{
"documentation": {
"base_url": "https://cloud.google.com/dataform/docs",
"update_frequency": "weekly"
},
"embeddings": {
"model": "text-embedding-3-small",
"dimensions": 1536
},
"vector_store": {
"type": "chroma",
"path": "./data/vectordb"
},
"search": {
"default_max_results": 5,
"similarity_threshold": 0.7
}
}Create config.local.json for local overrides (gitignored).
Development
Build
npm run buildRun in Development Mode
npm run devRun Tests
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm test -- --coverageTest Structure:
tests/unit/- Unit tests for individual componentstests/integration/- Integration tests for full pipelinestests/fixtures/- Mock data and test fixtures
Test Coverage:
- ✓ Document parsing (HTML to Markdown)
- ✓ Document chunking with overlap
- ✓ Query caching
- ✓ Tool handlers (all 5 tools)
- ✓ Metadata storage
- ✓ End-to-end ingestion pipeline
Update Documentation
npm run updateMaintenance
Scheduled Updates
Set up a cron job for weekly updates:
0 0 * * 0 dataform-mcp updateMonitoring
Logs are written to logs/dataform-mcp.log. Set log level via environment variable:
LOG_LEVEL=0 dataform-mcp serve # DEBUG
LOG_LEVEL=1 dataform-mcp serve # INFO (default)
LOG_LEVEL=2 dataform-mcp serve # WARN
LOG_LEVEL=3 dataform-mcp serve # ERRORArchitecture
The MCP server uses:
- Vector Store: ChromaDB for semantic search
- Embeddings: OpenAI text-embedding-3-small
- Metadata: SQLite for structured data
- Cache: In-memory cache for frequent queries
- Scraping: Playwright + Cheerio for documentation extraction
Environment Variables
# OpenAI API key for embeddings
OPENAI_API_KEY=your_api_key_here
# Log level (0-3)
LOG_LEVEL=1
# Custom config path
CONFIG_PATH=/path/to/config.jsonTroubleshooting
Native Module Error (better-sqlite3)
Error: The module 'better_sqlite3.node' was compiled against a different Node.js version
This occurs when the native SQLite module needs to be rebuilt for your Node.js version.
Solution:
# Find the package installation directory
cd $(npm root -g)/dataform-docs-mcp
# Rebuild native modules
npm rebuild better-sqlite3
# Or rebuild all native modules
npm rebuildWhy this happens: VS Code or Claude Code may use a different Node.js version than your system, requiring native modules to be recompiled.
Server won't start
- Check logs in
logs/dataform-mcp.log - Ensure all dependencies are installed:
npm install - Verify configuration in
config.json - Try rebuilding native modules (see above)
Search returns no results
- Run ingestion:
npm run ingest - Check if vector database exists in
data/vectordb/ - Verify OpenAI API key is set
- Check that ChromaDB is running on port 8000
Documentation is stale
Run update manually:
npm run updateContributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
License
MIT
Support
For issues and questions:
- GitHub Issues: Create an issue
- Documentation: Full documentation
Acknowledgments
Built with:
