@deepsweet/mdn
v0.4.0
Published
Offline-first [MDN Web Docs](https://developer.mozilla.org/) RAG-MCP server ready for semantic search with hybrid vector (1024-d) and full‑text (BM25) retrieval.
Readme
Offline-first MDN Web Docs RAG-MCP server ready for semantic search with hybrid vector (1024-d) and full‑text (BM25) retrieval.
Example

Content
The dataset covers the core MDN documentation sections, including:
- Web API
- JavaScript
- HTML
- CSS
- SVG
- HTTP
See dataset repo on HuggigFace for more details.
Usage
1. Download dataset and embedding model
npx -y @deepsweet/mdn@latest downloadBoth dataset (~260 MB) and the embedding model GGUF file (~438 MB) will be downloaded directly from HugginFace and stored in its default cache location (typically ~/.cache/huggingface/), just like the hf download command does.
2. Setup RAG-MCP server
{
"mcpServers": {
"mdn": {
"command": "npx",
"args": [
"-y",
"@deepsweet/mdn@latest",
"server"
],
"env": {}
}
}
}[!TIP] Remove
@latestfor a full offline experience, but keep in mind that this will cache a fixed version without auto-updating.
The stdio server will spawn llama.cpp under the hood, load the embedding model (~655 MB RAM/VRAM), and query the dataset – all on demand.
Settings
| Env variable | Default value | Description |
|----------------------------|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| MDN_DATASET_PATH | HuggingFace cache | Custom dataset directory path |
| MDN_MODEL_PATH | HuggingFace cache | Custom model file path |
| MDN_MODEL_TTL | 1800 | For how long llama.cpp with embedding model should be kept loaded in memory, in seconds; 0 to prevent unloading |
| MDN_QUERY_DESCRIPTION | Natural language query for hybrid vector and full-text search | Custom search query description in case your LLM does a poor job asking the MCP tool |
| MDN_SEARCH_RESULTS_LIMIT | 3 | Total search results limit |
| HF_TOKEN | | Optional HuggingFace access token, helps with occasional "HTTP 429 Too Many Requests" |
To do
- [x] automatically update and upload the dataset artifacts monthly with GitHub Actions
- [x] automatically prune old dataset revisions like
hf cache prune - [ ] figure out a better query description so that LLM doesn't over-generate keywords
License
The RAG-MCP server itself and the processing scripts are available under MIT.
