@florexlabs/docs-to-mcp

v0.2.2

Published

11 days ago

Convert any documentation URL into a ready-to-run MCP server — 100% local, no API keys needed

@florexlabs/docs-to-mcp

Convert any documentation URL into a ready-to-run MCP server.

100% local by default — no API keys needed. Embeddings run locally with Transformers.js.

URL → crawl → clean HTML → markdown → chunks → embeddings → vector store → MCP server

Prerequisites

Node.js >= 18
Docker (for ChromaDB): docker run -p 8000:8000 chromadb/chroma
Playwright browsers: npx playwright install chromium
No API keys needed for default local embeddings

Quick Start

# Install Playwright browsers (one-time)
npx playwright install chromium

# Start ChromaDB
docker run -p 8000:8000 chromadb/chroma

# Initialize a project from a docs URL
npx @florexlabs/docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp

cd my-docs-to-mcp
npm install

# Crawl, build, and start — no API keys needed!
npm run crawl
npm run build
npm run start

Installation

npm install -g @florexlabs/docs-to-mcp

Or use directly with npx:

npx @florexlabs/docs-to-mcp <command>

Embedding Providers

Local (default)

Uses Transformers.js with the Xenova/all-MiniLM-L6-v2 model. Runs 100% on your machine via ONNX runtime. No API keys, no external services, no cost.

docs-to-mcp build                                    # uses local by default
docs-to-mcp build --model Xenova/all-MiniLM-L6-v2    # explicit model

The model is downloaded automatically on first use (~80MB) and cached locally.

OpenAI (opt-in)

For higher quality embeddings on large documentation sets, you can use OpenAI:

export OPENAI_API_KEY=sk-...
docs-to-mcp build --provider openai
docs-to-mcp build --provider openai --model text-embedding-3-large

Commands

`docs-to-mcp init <url>`

Generate a new MCP server project from a documentation URL.

docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp

Options:

--out <dir> — Output directory (default: ./docs-to-mcp-project)
--depth <n> — Crawl depth (default: 3)
--limit <n> — Max pages (default: 50)
--provider <name> — Embedding provider: local or openai (default: local)
--model <name> — Embedding model
--collection <name> — Collection name (default: docs)

`docs-to-mcp crawl <url>`

Crawl a documentation site, parse HTML to markdown, and chunk it.

docs-to-mcp crawl https://docs.example.com --out ./data --depth 3 --limit 50

Options:

--out <dir> — Output directory (default: ./data)
--depth <n> — Crawl depth (default: 3)
--limit <n> — Max pages (default: 50)
--verbose — Verbose output

`docs-to-mcp build`

Embed chunks and upsert into ChromaDB.

docs-to-mcp build                          # local embeddings (default)
docs-to-mcp build --provider openai        # use OpenAI instead

Options:

--collection <name> — Collection name (default: docs)
--provider <name> — local or openai (default: local)
--model <name> — Embedding model
--data <dir> — Data directory (default: ./data)
--force — Force rebuild
--verbose — Verbose output

`docs-to-mcp start`

Start the MCP server (stdio transport).

docs-to-mcp start --collection docs

`docs-to-mcp dev`

Start the MCP server in development mode with logging.

docs-to-mcp dev --collection docs

MCP Tools

The server exposes three tools:

| Tool | Description | |------|-------------| | search_docs(query, topK?) | Semantic search across indexed documentation | | get_source(url) | Get all chunks from a specific source URL | | list_sources() | List all indexed documentation sources |

Connecting to MCP Clients

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "my-docs": {
      "command": "npx",
      "args": ["@florexlabs/docs-to-mcp", "start", "--collection", "docs"],
      "env": {
        "CHROMA_URL": "http://localhost:8000"
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "my-docs": {
      "command": "npx",
      "args": ["@florexlabs/docs-to-mcp", "start", "--collection", "docs"],
      "env": {
        "CHROMA_URL": "http://localhost:8000"
      }
    }
  }
}

Environment Variables

CHROMA_URL=http://localhost:8000

# Only needed with --provider openai:
OPENAI_API_KEY=sk-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

Architecture

packages/
  cli/          — CLI commands (init, crawl, build, start, dev)
  crawler/      — Playwright-based same-origin doc crawler
  parser/       — HTML cleanup (Cheerio) + markdown conversion (Turndown)
  chunker/      — Heading-aware markdown chunking
  embeddings/   — Local (Transformers.js) + OpenAI providers
  vector-store/ — ChromaDB adapter
  mcp-server/   — MCP server with search tools

Security Notes

Only crawls same-origin links by default
Never executes scraped content
URLs are sanitized and normalized
Local embeddings stay on your machine — nothing leaves your network
If using OpenAI, embeddings are sent to OpenAI's API
Do not crawl private documentation unless you understand where data goes
No shell execution from user-controlled input

Development

pnpm install
pnpm test
pnpm build

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@florexlabs/docs-to-mcp

Prerequisites

Quick Start

Installation

Embedding Providers

Local (default)

OpenAI (opt-in)

Commands

docs-to-mcp init <url>

docs-to-mcp crawl <url>

docs-to-mcp build

docs-to-mcp start

docs-to-mcp dev

MCP Tools

Connecting to MCP Clients

Claude Desktop

Cursor

Environment Variables

Architecture

Security Notes

Development

License

`docs-to-mcp init <url>`

`docs-to-mcp crawl <url>`

`docs-to-mcp build`

`docs-to-mcp start`

`docs-to-mcp dev`