npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@beejay141/docs-mcp

v1.0.9

Published

MCP server that crawls documentation websites and exposes structured tools for AI agents to search and read library docs

Readme

docs-mcp

A Model Context Protocol (MCP) server that crawls documentation websites and exposes structured tools enabling AI agents to search and read library documentation for accurate code generation.

Features

  • Register any docs site via URL — static or JS-rendered (Docusaurus, VitePress, Nextra, TypeDoc)
  • Full-text search across all indexed docs with BM25 ranking (SQLite FTS5)
  • Clean Markdown output stripped of nav/sidebar noise, preserving code blocks with language hints
  • Background sync — on every startup, stale libraries re-crawl automatically without blocking tool calls
  • 6 MCP tools visible in Claude Desktop, VS Code Copilot, Cursor, and any MCP client

Installation

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "docs-mcp": {
      "command": "npx",
      "args": ["-y", "docs-mcp@latest"],
      "env": {
        "DOCS_MCP_DB": "/Users/you/.docs-mcp/docs.db"
      }
    }
  }
}

VS Code (GitHub Copilot)

Create .vscode/mcp.json in your project (already included in this repo):

{
  "servers": {
    "docs-mcp": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "docs-mcp@latest"],
      "env": {
        "DOCS_MCP_DB": "${env:HOME}/.docs-mcp/docs.db"
      }
    }
  }
}

Or add it to your VS Code user settings.json under the "mcp" key:

{
  "mcp": {
    "servers": {
      "docs-mcp": {
        "type": "stdio",
        "command": "npx",
        "args": ["-y", "docs-mcp@latest"]
      }
    }
  }
}

Cursor

Edit ~/.cursor/mcp.json (global) or <project>/.cursor/mcp.json (project-scoped):

{
  "mcpServers": {
    "docs-mcp": {
      "command": "npx",
      "args": ["-y", "docs-mcp@latest"],
      "env": {
        "DOCS_MCP_DB": "/Users/you/.docs-mcp/docs.db"
      }
    }
  }
}

Zed

Merge into ~/.config/zed/settings.json:

{
  "context_servers": {
    "docs-mcp": {
      "command": {
        "path": "npx",
        "args": ["-y", "docs-mcp@latest"],
        "env": {
          "DOCS_MCP_DB": "/Users/you/.docs-mcp/docs.db"
        }
      }
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "docs-mcp": {
      "command": "npx",
      "args": ["-y", "docs-mcp@latest"],
      "env": {
        "DOCS_MCP_DB": "/Users/you/.docs-mcp/docs.db"
      }
    }
  }
}

After adding the config, restart the application and ask your AI assistant:

"Add the React docs from https://react.dev/reference/react"

CLI (optional management tool)

npx docs-mcp-cli add https://react.dev/reference/react --name "React"
npx docs-mcp-cli add --preset tailwind
npx docs-mcp-cli list
npx docs-mcp-cli search "useState hook"

MCP Tools

| Tool | Description | | ---------------- | ------------------------------------------------------------------- | | add_library | Register a documentation source by URL. Triggers background crawl. | | list_libraries | List all indexed libraries with page counts and sync status. | | search_docs | Full-text search across docs. Returns ranked results with snippets. | | get_page | Retrieve full Markdown content of a specific page. | | list_sections | Browse pages/sections in a library (useful before searching). | | sync_status | Check background sync progress for one or all libraries. |

add_library

url          Required. Base documentation URL.
             Use https:// for public sites.
             Use http:// for internal/private hosts only (e.g. http://192.168.1.10/docs).
name         Optional. Human-readable label (auto-derived from domain)
id           Optional. Library ID slug (auto-derived from domain)
options:
  dynamic          Force Playwright crawler for JS-rendered sites
  contentSelector  CSS selector for main content area
  excludePatterns  URL patterns to skip (e.g. ["/blog", "/changelog"])
  maxPages         Max pages to crawl (default: 500, max: 5000)
  crawlDelay       Delay between requests in ms (default: 500)
  ttlHours         Re-sync interval in hours (default: 24)

CLI Reference

docs-mcp-cli add <baseUrl>         Register and crawl a doc site
  --name <label>                   Human-readable name
  --id <slug>                      Library ID slug
  --version <ver>                  Library version label
  --dynamic                        Force Playwright for JS-rendered sites
  --selector <css>                 CSS selector for main content
  --exclude <pattern...>           URL patterns to skip
  --max-pages <n>                  Max pages (default: 500)
  --delay <ms>                     Crawl delay (default: 500ms)
  --ttl <hours>                    Re-sync interval (default: 24h)
  --preset <name>                  Use a preset from libraries.json

docs-mcp-cli list                  List all indexed libraries
docs-mcp-cli remove <id>           Delete a library and its pages
docs-mcp-cli refresh <id>          Force immediate re-crawl
docs-mcp-cli sync-status [id]      Show sync queue and job progress
docs-mcp-cli search <query>        Quick CLI search
  --library <id>                   Limit to a specific library
  --limit <n>                  Number of results (default: 5)

Presets

Popular libraries are pre-configured in libraries.json:

docs-mcp-cli add --preset react
docs-mcp-cli add --preset vue
docs-mcp-cli add --preset tailwind
docs-mcp-cli add --preset nextjs
docs-mcp-cli add --preset typescript
docs-mcp-cli add --preset fastapi
docs-mcp-cli add --preset pydantic
docs-mcp-cli add --preset langchain

How Background Sync Works

On every server startup:

  1. SyncManager.startupSync() queries all libraries where last_scraped_at IS NULL or older than their TTL (default: 24 hours)
  2. Never-synced libraries get priority = "high" and run first
  3. Stale libraries are queued as priority = "normal"
  4. Up to maxConcurrency (default: 2) libraries crawl simultaneously
  5. All MCP tools are available immediately — the sync never blocks tool calls

The sync queue is capped at 50 pending jobs. Rapid add_library calls beyond that return status: "queue_full".


Configuration

| Environment Variable | Default | Description | | -------------------------- | --------------------- | --------------------------- | | DOCS_MCP_DB | ~/.docs-mcp/docs.db | Path to the SQLite database | | DOCS_MCP_MAX_CONCURRENCY | 2 | Max simultaneous crawls |


Supported Site Types

| Framework | Crawler | Notes | | --------------- | -------------------- | -------------------------------------------------------------------------- | | Plain HTML | Static (Cheerio) | Default | | Docusaurus | Static (Cheerio) | Usually works without --dynamic | | VitePress | Auto-detected | Falls back to Playwright if < 10 static pages | | Nextra | Auto-detected | Falls back to Playwright | | TypeDoc | Static (Cheerio) | .col-content selector auto-detected; strip TypeDoc sidebar/toolbar noise | | GitBook | Static | Use --selector ".page-inner" | | ReadMe.io | Static | May need --selector "section.content" | | Any JS-rendered | Dynamic (Playwright) | Use --dynamic flag | | Internal (http) | Static (Cheerio) | Use http:// only for private/internal hosts |


Troubleshooting

Indexing a TypeDoc-generated site? TypeDoc's .col-content is auto-detected and sidebar/toolbar noise (.col-sidebar, .tsd-navigation, .tsd-toolbar) is stripped automatically. No extra flags needed for most TypeDoc sites:

docs-mcp-cli add https://mylib.github.io/api --name "MyLib API"
# Or for an internal TypeDoc server:
docs-mcp-cli add http://192.168.1.20:8080 --name "Internal API"

JS-rendered site not indexing? Add --dynamic to force Playwright:

docs-mcp-cli add https://example.com/docs --dynamic

Too many irrelevant pages being crawled? Use --exclude to skip sections:

docs-mcp-cli add https://example.com/docs --exclude /blog --exclude /changelog

Getting rate-limited? Increase crawl delay:

docs-mcp-cli add https://example.com/docs --delay 2000

Large site hitting page limit? Increase or specify a content selector to focus on relevant pages:

docs-mcp-cli add https://example.com/docs --max-pages 2000 --selector "article"

Check sync status:

docs-mcp-cli sync-status
# or via MCP tool:
# sync_status {}

Development

npm install
npm run dev          # Start MCP server
npm run dev:cli      # Run CLI
npm test             # Run all tests
npm run test:watch   # Watch mode
npm run build        # Build for production

Playwright (for dynamic crawling)

If you plan to crawl JS-rendered sites (use --dynamic), Playwright and its browsers are required. After installing dependencies, install Playwright browsers:

# install project deps (if not done already)
npm install
# install Playwright browsers needed for dynamic crawling
npx playwright install
# (Linux-only) install required system deps
npx playwright install-deps

Security

  • https:// is accepted for all hosts — public and internal.
  • http:// is only permitted for private/internal hosts (RFC-1918 ranges: 10.x, 172.16–31.x, 192.168.x, 127.x, localhost, ULA IPv6). This prevents unencrypted traffic being proxied to arbitrary public servers.
  • Hostnames are resolved via DNS and checked against private IP ranges before crawling begins.
  • All DB queries use parameterized statements (no SQL injection)
  • FTS5 query input is sanitized before use

URL policy summary:

| URL | Allowed? | | ----------------------------------- | ------------------------- | | https://react.dev/docs | ✅ | | https://internal.company.com/docs | ✅ | | http://192.168.1.10/docs | ✅ (private host) | | http://localhost:8080/docs | ✅ (loopback) | | http://react.dev/docs | ❌ (public host, no TLS) | | ftp://example.com | ❌ (unsupported protocol) |


License

ISC