npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@cosmocoder/mcp-web-docs

v1.2.0

Published

MCP server for crawling and indexing web documentation - works with any website

Downloads

409

Readme

MCP Web Docs

npm version npm downloads License: MIT Node.js CI

Index Any Documentation. Search Locally. Stay Private.

A self-hosted Model Context Protocol (MCP) server that crawls, indexes, and searches documentation from any website. Unlike remote MCP servers limited to GitHub repos or pre-indexed libraries, web-docs gives you full control over what gets indexed — including private documentation behind authentication.

FeaturesInstallationQuick StartToolsTipsTroubleshootingContributing


❌ The Problem

AI assistants struggle with documentation:

  • Remote MCP servers only work with GitHub or pre-indexed libraries
  • Private docs behind authentication can't be accessed
  • Outdated indexes don't reflect your team's latest documentation
  • No control over what gets indexed or when

✅ The Solution

MCP Web Docs crawls and indexes documentation from ANY website locally:

  • Any website - Docusaurus, Storybook, GitBook, custom sites, internal wikis
  • Private docs - Interactive browser login for authenticated sites
  • Always fresh - Re-index anytime with one command
  • Your data, your machine - No API keys, no cloud, full privacy

✨ Features

  • 🌐 Universal Crawler - Works with any documentation site, not just GitHub
  • 🔍 Hybrid Search - Combines full-text search (FTS) with semantic vector search
  • 🏷️ Tags & Categories - Organize docs with tags and filter searches by project, team, or category
  • 🔐 Authentication Support - Crawl private/protected docs with interactive browser login (auto-detects your default browser)
  • 📊 Smart Extraction - Automatically extracts code blocks, props tables, and structured content
  • ⚡ Local Embeddings - Uses FastEmbed for fast, private embedding generation (no API keys)
  • 🗄️ Persistent Storage - LanceDB for vectors, SQLite for metadata
  • 🔄 Real-time Progress - Track indexing status with progress updates

🚀 Installation

Prerequisites

  • Node.js >= 22.19.0

Option 1: Install from NPM (Recommended)

npm install -g @cosmocoder/mcp-web-docs

Option 2: Run with npx

No installation required - just configure your MCP client to use npx (see below).

Option 3: Build from Source

# Clone the repository
git clone https://github.com/cosmocoder/mcp-web-docs.git
cd mcp-web-docs

# Install dependencies (automatically installs Playwright browsers)
npm install

# Build
npm run build

Configure Your MCP Client

Add to your Cursor MCP settings (~/.cursor/mcp.json):

Using npx (no install required):

{
  "mcpServers": {
    "web-docs": {
      "command": "npx",
      "args": ["-y", "@cosmocoder/mcp-web-docs"]
    }
  }
}

Using global install:

{
  "mcpServers": {
    "web-docs": {
      "command": "mcp-web-docs"
    }
  }
}

Using local build:

{
  "mcpServers": {
    "web-docs": {
      "command": "node",
      "args": ["/path/to/mcp-web-docs/build/index.js"]
    }
  }
}

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

Using npx:

{
  "mcpServers": {
    "web-docs": {
      "command": "npx",
      "args": ["-y", "@cosmocoder/mcp-web-docs"]
    }
  }
}

Using global install:

{
  "mcpServers": {
    "web-docs": {
      "command": "mcp-web-docs"
    }
  }
}

Add to .vscode/mcp.json in your workspace:

Using npx:

{
  "servers": {
    "web-docs": {
      "command": "npx",
      "args": ["-y", "@cosmocoder/mcp-web-docs"]
    }
  }
}

Using global install:

{
  "servers": {
    "web-docs": {
      "command": "mcp-web-docs"
    }
  }
}

Add to ~/.codeium/windsurf/mcp_config.json:

Using npx:

{
  "mcpServers": {
    "web-docs": {
      "command": "npx",
      "args": ["-y", "@cosmocoder/mcp-web-docs"]
    }
  }
}

Using global install:

{
  "mcpServers": {
    "web-docs": {
      "command": "mcp-web-docs"
    }
  }
}

Add to ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json:

Using npx:

{
  "mcpServers": {
    "web-docs": {
      "command": "npx",
      "args": ["-y", "@cosmocoder/mcp-web-docs"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Using global install:

{
  "mcpServers": {
    "web-docs": {
      "command": "mcp-web-docs",
      "disabled": false,
      "autoApprove": []
    }
  }
}

Global configuration: Open RooCode → Click MCP icon → "Edit Global MCP"

Project-level configuration: Create .roo/mcp.json at your project root

Using npx:

{
  "mcpServers": {
    "web-docs": {
      "command": "npx",
      "args": ["-y", "@cosmocoder/mcp-web-docs"]
    }
  }
}

Using global install:

{
  "mcpServers": {
    "web-docs": {
      "command": "mcp-web-docs"
    }
  }
}

⚡ Quick Start

1. Index public documentation

Index the LanceDB documentation from https://lancedb.com/docs/

The AI assistant will call add_documentation and begin crawling.

2. Search for information

How do I create a table in LanceDB?

The AI will use search_documentation to find relevant content.

3. For private docs, authenticate first

I need to index private documentation at https://internal.company.com/docs/
It requires authentication.

A browser window will open for you to log in. The session is saved for future crawls.


🔨 Available Tools

add_documentation

Add a new documentation site for indexing.

add_documentation({
  url: "https://docs.example.com/",
  title: "Example Docs",              // Optional
  id: "example-docs",                 // Optional custom ID
  tags: ["frontend", "mycompany"],    // Optional tags for categorization
  auth: {                             // Optional authentication
    requiresAuth: true,
    // browser auto-detected from OS settings if omitted
    loginTimeoutSecs: 300
  }
})

search_documentation

Search through indexed documentation using hybrid search (FTS + semantic).

search_documentation({
  query: "how to configure authentication",
  url: "https://docs.example.com/",    // Optional: filter to specific site
  tags: ["frontend", "mycompany"],     // Optional: filter by tags
  limit: 10                            // Optional: max results
})

authenticate

Open a browser window for interactive login to protected sites. Your default browser is automatically detected from OS settings.

authenticate({
  url: "https://private-docs.example.com/",
  // browser auto-detected from OS settings - only specify to override
  loginTimeoutSecs: 300         // Optional: timeout in seconds
})

list_documentation

List all indexed documentation sites with their metadata including tags.

set_tags

Set or update tags for a documentation site. Tags help categorize and filter documentation.

set_tags({
  url: "https://docs.example.com/",
  tags: ["frontend", "react", "mycompany"]  // Replaces existing tags
})

list_tags

List all available tags with usage counts. Useful to see what tags exist across your indexed docs.

reindex_documentation

Re-crawl and re-index a specific documentation site.

get_indexing_status

Get the current status of indexing operations.

delete_documentation

Delete an indexed documentation site and all its data.

clear_auth

Clear saved authentication session for a domain.


💡 Tips

Crafting Better Search Queries

The search uses hybrid full-text and semantic search. For best results:

  1. Be specific - Include unique terms from what you're looking for

    • Instead of: "Button props"
    • Try: "Button props onClick disabled loading"
  2. Use exact phrases - Wrap in quotes for exact matching

    • "authentication middleware" finds that exact phrase
  3. Include context - Add related terms to narrow results

    • API docs: "GET /users endpoint authentication headers"
    • Config: "webpack config entry output plugins"

Auto-Invoke with Rules

To avoid typing search instructions in every prompt, add a rule to your MCP client:

Cursor (Cursor Settings > Rules):

When I ask about library documentation or need code examples,
use the web-docs MCP server to search indexed documentation.

Windsurf (.windsurfrules):

Always use web-docs search_documentation when I ask about
API references, configuration, or library usage.

Scoping Searches

If you have multiple sites indexed, filter by URL or tags:

// Filter by specific site URL
search_documentation({
  query: "routing",
  url: "https://nextjs.org/docs/"
})

// Filter by tags (searches all docs with matching tags)
search_documentation({
  query: "Button component",
  tags: ["frontend", "mycompany"]  // Only docs tagged with BOTH tags
})

Organizing with Tags

Tags help organize documentation when you have multiple related sites. Add tags when indexing:

// Index frontend package docs
add_documentation({
  url: "https://docs.mycompany.com/ui-components/",
  tags: ["frontend", "mycompany", "react"]
})

// Index backend API docs
add_documentation({
  url: "https://docs.mycompany.com/api/",
  tags: ["backend", "mycompany", "api"]
})

Later, search across all frontend docs:

search_documentation({
  query: "authentication",
  tags: ["frontend"]  // Searches all frontend-tagged docs
})

You can also add tags to existing documentation with set_tags.


🚨 Troubleshooting

The content extractor couldn't process the page. Try:

  • Re-indexing the documentation
  • Checking if the site uses JavaScript rendering (should work with Playwright)
  • Looking at the crawled data in ~/.mcp-web-docs/crawlee/datasets/
  • Make sure you call authenticate before add_documentation
  • The browser window needs to stay open until login is detected
  • For OAuth sites, complete the full flow manually
  • Your default browser is auto-detected; specify a different one with browser: "firefox", for example, if needed
  • Try more specific queries with unique terms
  • Use quotes for exact phrase matching
  • Filter by URL to search within a specific documentation site
  • Re-index if the documentation has been updated

If browsers aren't installed, run:

npx playwright install

Data Storage

All data is stored locally in ~/.mcp-web-docs/:

~/.mcp-web-docs/
├── docs.db           # SQLite database for document metadata
├── vectors/          # LanceDB vector database
├── sessions/         # Saved authentication sessions
└── crawlee/          # Crawlee datasets (cached crawl data)

📄 License

MIT License - see LICENSE for details.


🙏 Acknowledgments