@florexlabs/docs-to-mcp
v0.2.2
Published
Convert any documentation URL into a ready-to-run MCP server — 100% local, no API keys needed
Readme
@florexlabs/docs-to-mcp
Convert any documentation URL into a ready-to-run MCP server.
100% local by default — no API keys needed. Embeddings run locally with Transformers.js.
URL → crawl → clean HTML → markdown → chunks → embeddings → vector store → MCP serverPrerequisites
- Node.js >= 18
- Docker (for ChromaDB):
docker run -p 8000:8000 chromadb/chroma - Playwright browsers:
npx playwright install chromium - No API keys needed for default local embeddings
Quick Start
# Install Playwright browsers (one-time)
npx playwright install chromium
# Start ChromaDB
docker run -p 8000:8000 chromadb/chroma
# Initialize a project from a docs URL
npx @florexlabs/docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp
cd my-docs-to-mcp
npm install
# Crawl, build, and start — no API keys needed!
npm run crawl
npm run build
npm run startInstallation
npm install -g @florexlabs/docs-to-mcpOr use directly with npx:
npx @florexlabs/docs-to-mcp <command>Embedding Providers
Local (default)
Uses Transformers.js with the Xenova/all-MiniLM-L6-v2 model. Runs 100% on your machine via ONNX runtime. No API keys, no external services, no cost.
docs-to-mcp build # uses local by default
docs-to-mcp build --model Xenova/all-MiniLM-L6-v2 # explicit modelThe model is downloaded automatically on first use (~80MB) and cached locally.
OpenAI (opt-in)
For higher quality embeddings on large documentation sets, you can use OpenAI:
export OPENAI_API_KEY=sk-...
docs-to-mcp build --provider openai
docs-to-mcp build --provider openai --model text-embedding-3-largeCommands
docs-to-mcp init <url>
Generate a new MCP server project from a documentation URL.
docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcpOptions:
--out <dir>— Output directory (default:./docs-to-mcp-project)--depth <n>— Crawl depth (default:3)--limit <n>— Max pages (default:50)--provider <name>— Embedding provider:localoropenai(default:local)--model <name>— Embedding model--collection <name>— Collection name (default:docs)
docs-to-mcp crawl <url>
Crawl a documentation site, parse HTML to markdown, and chunk it.
docs-to-mcp crawl https://docs.example.com --out ./data --depth 3 --limit 50Options:
--out <dir>— Output directory (default:./data)--depth <n>— Crawl depth (default:3)--limit <n>— Max pages (default:50)--verbose— Verbose output
docs-to-mcp build
Embed chunks and upsert into ChromaDB.
docs-to-mcp build # local embeddings (default)
docs-to-mcp build --provider openai # use OpenAI insteadOptions:
--collection <name>— Collection name (default:docs)--provider <name>—localoropenai(default:local)--model <name>— Embedding model--data <dir>— Data directory (default:./data)--force— Force rebuild--verbose— Verbose output
docs-to-mcp start
Start the MCP server (stdio transport).
docs-to-mcp start --collection docsdocs-to-mcp dev
Start the MCP server in development mode with logging.
docs-to-mcp dev --collection docsMCP Tools
The server exposes three tools:
| Tool | Description |
|------|-------------|
| search_docs(query, topK?) | Semantic search across indexed documentation |
| get_source(url) | Get all chunks from a specific source URL |
| list_sources() | List all indexed documentation sources |
Connecting to MCP Clients
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"my-docs": {
"command": "npx",
"args": ["@florexlabs/docs-to-mcp", "start", "--collection", "docs"],
"env": {
"CHROMA_URL": "http://localhost:8000"
}
}
}
}Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"my-docs": {
"command": "npx",
"args": ["@florexlabs/docs-to-mcp", "start", "--collection", "docs"],
"env": {
"CHROMA_URL": "http://localhost:8000"
}
}
}
}Environment Variables
CHROMA_URL=http://localhost:8000
# Only needed with --provider openai:
OPENAI_API_KEY=sk-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-smallArchitecture
packages/
cli/ — CLI commands (init, crawl, build, start, dev)
crawler/ — Playwright-based same-origin doc crawler
parser/ — HTML cleanup (Cheerio) + markdown conversion (Turndown)
chunker/ — Heading-aware markdown chunking
embeddings/ — Local (Transformers.js) + OpenAI providers
vector-store/ — ChromaDB adapter
mcp-server/ — MCP server with search toolsSecurity Notes
- Only crawls same-origin links by default
- Never executes scraped content
- URLs are sanitized and normalized
- Local embeddings stay on your machine — nothing leaves your network
- If using OpenAI, embeddings are sent to OpenAI's API
- Do not crawl private documentation unless you understand where data goes
- No shell execution from user-controlled input
Development
pnpm install
pnpm test
pnpm buildLicense
MIT
