npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

dataform-docs-mcp

v0.2.2

Published

MCP server for Claude Code providing access to Dataform documentation through semantic search, concept explanations, and best practices

Downloads

33

Readme

Dataform Documentation MCP Server

CI npm version License: MIT Node Version

A Model Context Protocol (MCP) server for Claude Code (VS Code extension) that provides access to comprehensive Dataform documentation. Get instant answers about Dataform's features, syntax, and best practices while coding.

✨ New in v0.2.0: Pre-built documentation database included! Install via npm and start using immediately - no ingestion wait required.

Features

  • Semantic Search: Search Dataform documentation using natural language queries
  • Concept Retrieval: Get detailed explanations of Dataform concepts with examples
  • Syntax Reference: Look up syntax for specific Dataform constructs
  • Error Solutions: Find solutions for common Dataform errors
  • Best Practices: Retrieve curated best practices for various use cases

Installation

From npm (Recommended)

The easiest way to install with a pre-built documentation database:

npm install -g dataform-docs-mcp

Includes: Pre-built database with all Dataform documentation (~5.5MB download, 12.6MB unpacked)

From Source (Development)

For contributors or those who want to build from source:

# Clone the repository
git clone https://github.com/mhooson/dataform-mcp.git
cd dataform-mcp

# Install dependencies
npm install

# Build the project
npm run build

# Run initial documentation ingestion
npm run ingest

Quick Start

Prerequisites

Before starting, ensure you have:

  • Node.js 18+ installed
  • ChromaDB installed: pip install chromadb
  • ~13MB disk space for the documentation database
  • OpenAI API key required for query embeddings: Get one here
    • Used to convert search queries to embeddings (~$0.0001 per query)
    • Also needed for updating the documentation database

Quick Start (npm installation)

The package includes a pre-built database with all Dataform documentation, so you can start using it immediately!

Step 1: Install the package

npm install -g dataform-docs-mcp

This installs the package including the pre-built documentation database (~5.5MB download, 12.6MB unpacked).

Step 2: Start ChromaDB server

# The database is included in the npm package installation
# Find the package location:
npm root -g  # Example: /usr/local/lib/node_modules

# Start ChromaDB pointing to the installed database
chroma run --path $(npm root -g)/dataform-docs-mcp/data/vectordb --port 8000

Keep this terminal running.

Step 3: Start the MCP server

dataform-mcp serve

That's it! The server is ready to use with the pre-built database.

Optional: Update the documentation later

# Only needed if you want the latest Dataform docs
export OPENAI_API_KEY=your-api-key-here
dataform-mcp update

The update command refreshes the database with the latest documentation from Google Cloud.

Step 4: Configure your IDE (see IDE Configuration section below)


Alternative: Source Installation Quick Start

For those installing from source (see Installation section above):

1. Start ChromaDB Server

The MCP server requires ChromaDB to be running:

chroma run --path ./data/vectordb --port 8000

Keep this running in a separate terminal.

Note: ChromaDB must be running on port 8000 (default) for the MCP server to connect.

2. Configure Claude Code (VS Code Extension)

For npm installation:

Create a .mcp.json file in your project root:

{
  "mcpServers": {
    "dataform-docs": {
      "command": "dataform-mcp",
      "args": ["serve"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Or use the CLI command:

claude mcp add --transport stdio dataform-docs \
  --env OPENAI_API_KEY=your-api-key-here \
  -- dataform-mcp serve

For development (from source):

{
  "mcpServers": {
    "dataform-docs": {
      "command": "node",
      "args": ["/absolute/path/to/dataform-mcp/dist/src/index.js"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Scope Options:

  • --scope local (default): Private to current project
  • --scope project: Shared with team via version control
  • --scope user: Available across all your projects

Why OpenAI API key is required: The server needs to convert your search queries into embeddings to search the vector database. This incurs minimal cost (~$0.0001 per query).

3. Start the MCP Server

For development (from source):

npm run dev

Or for production (after global install):

dataform-mcp serve

Available Tools

1. search_dataform_docs

Search Dataform documentation semantically.

Parameters:

  • query (string, required): The search query
  • max_results (number, optional): Maximum results to return (default: 5)
  • doc_type (array, optional): Filter by type: ["guide", "reference", "tutorial", "api"]
  • include_code (boolean, optional): Prioritize code examples (default: false)

Example:

{
  "query": "How do I create an incremental table?",
  "max_results": 3,
  "include_code": true
}

2. get_dataform_concept

Retrieve detailed explanation of a Dataform concept.

Parameters:

  • concept (string, required): The concept name (e.g., "assertions", "dependencies")

Example:

{
  "concept": "incremental tables"
}

3. get_syntax_reference

Get syntax reference for Dataform constructs.

Parameters:

  • construct (string, required): The construct name (e.g., "config block", "ref() function")
  • language (string, optional): "sqlx" or "javascript"

Example:

{
  "construct": "ref() function",
  "language": "sqlx"
}

4. search_error_solutions

Find solutions for Dataform errors.

Parameters:

  • error_message (string, required): The error message
  • context (string, optional): Additional context

Example:

{
  "error_message": "Cannot find module '@dataform/core'",
  "context": "Occurs when running dataform compile"
}

5. get_best_practices

Retrieve best practices for specific topics.

Parameters:

  • topic (string, required): The topic (e.g., "testing", "performance", "incremental models")

Example:

{
  "topic": "testing"
}

Project Structure

dataform-mcp/
├── src/
│   ├── index.ts              # MCP server entry point
│   ├── cli.ts                # CLI interface
│   ├── tools/                # Tool implementations
│   ├── ingestion/            # Documentation scraping & processing
│   ├── storage/              # Vector & metadata storage
│   ├── search/               # Search implementations
│   └── utils/                # Utilities (config, logger)
├── data/
│   ├── vectordb/             # Vector database files
│   ├── metadata.db           # SQLite metadata
│   └── cache/                # Query cache
├── scripts/
│   ├── ingest.ts             # Initial ingestion
│   └── update.ts             # Update documentation
├── config.json               # Configuration
└── package.json

Configuration

Edit config.json to customize behavior:

{
  "documentation": {
    "base_url": "https://cloud.google.com/dataform/docs",
    "update_frequency": "weekly"
  },
  "embeddings": {
    "model": "text-embedding-3-small",
    "dimensions": 1536
  },
  "vector_store": {
    "type": "chroma",
    "path": "./data/vectordb"
  },
  "search": {
    "default_max_results": 5,
    "similarity_threshold": 0.7
  }
}

Create config.local.json for local overrides (gitignored).

Development

Build

npm run build

Run in Development Mode

npm run dev

Run Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm test -- --coverage

Test Structure:

  • tests/unit/ - Unit tests for individual components
  • tests/integration/ - Integration tests for full pipelines
  • tests/fixtures/ - Mock data and test fixtures

Test Coverage:

  • ✓ Document parsing (HTML to Markdown)
  • ✓ Document chunking with overlap
  • ✓ Query caching
  • ✓ Tool handlers (all 5 tools)
  • ✓ Metadata storage
  • ✓ End-to-end ingestion pipeline

Update Documentation

npm run update

Maintenance

Scheduled Updates

Set up a cron job for weekly updates:

0 0 * * 0 dataform-mcp update

Monitoring

Logs are written to logs/dataform-mcp.log. Set log level via environment variable:

LOG_LEVEL=0 dataform-mcp serve  # DEBUG
LOG_LEVEL=1 dataform-mcp serve  # INFO (default)
LOG_LEVEL=2 dataform-mcp serve  # WARN
LOG_LEVEL=3 dataform-mcp serve  # ERROR

Architecture

The MCP server uses:

  • Vector Store: ChromaDB for semantic search
  • Embeddings: OpenAI text-embedding-3-small
  • Metadata: SQLite for structured data
  • Cache: In-memory cache for frequent queries
  • Scraping: Playwright + Cheerio for documentation extraction

Environment Variables

# OpenAI API key for embeddings
OPENAI_API_KEY=your_api_key_here

# Log level (0-3)
LOG_LEVEL=1

# Custom config path
CONFIG_PATH=/path/to/config.json

Troubleshooting

Native Module Error (better-sqlite3)

Error: The module 'better_sqlite3.node' was compiled against a different Node.js version

This occurs when the native SQLite module needs to be rebuilt for your Node.js version.

Solution:

# Find the package installation directory
cd $(npm root -g)/dataform-docs-mcp

# Rebuild native modules
npm rebuild better-sqlite3

# Or rebuild all native modules
npm rebuild

Why this happens: VS Code or Claude Code may use a different Node.js version than your system, requiring native modules to be recompiled.

Server won't start

  1. Check logs in logs/dataform-mcp.log
  2. Ensure all dependencies are installed: npm install
  3. Verify configuration in config.json
  4. Try rebuilding native modules (see above)

Search returns no results

  1. Run ingestion: npm run ingest
  2. Check if vector database exists in data/vectordb/
  3. Verify OpenAI API key is set
  4. Check that ChromaDB is running on port 8000

Documentation is stale

Run update manually:

npm run update

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

MIT

Support

For issues and questions:

Acknowledgments

Built with: