claude-local-docs
v1.0.23
Published
Local-first Context7 alternative — indexes JS/TS dependency docs with a 4-stage RAG pipeline. Uses TEI (Text Embeddings Inference) Docker containers for embeddings and reranking.
Downloads
129
Maintainers
Readme
claude-local-docs
A local-first alternative to Context7 for Claude Code. Indexes your project's dependency documentation and source code locally with production-grade semantic search. Embeddings and reranking run via TEI (HuggingFace Text Embeddings Inference) Docker containers with auto GPU detection. Supports JS/TS, Vue, Svelte, and Astro with AST-aware chunking, JSDoc extraction, and git-diff incremental indexing.
Benchmark: Semantic Search vs Grep

Tested on a real-world TypeScript monorepo (957 files, 4,484 indexed code chunks) with 187 generic queries across 16 categories (authentication, database, caching, error handling, etc.).
How scores are computed:
- Semantic search — scored 0-10 based on the top relevance score returned by the search pipeline (
top_relevance * 9.5 + result_count_bonus). Zero results = zero score. - Grep — scored 0-10 using a log-scale penalty for noise (
7.5 - 1.5 * log10(matching_files)). Few focused matches score high; hundreds of files score low. Capped at 7.5 since grep always requires manual review. - Combined —
max(Semantic, Grep)per query. Represents what Claude Code gets when both tools are available.
How the benchmark was run:
- 200 queries were written to cover common code search patterns that apply to any codebase (e.g., "How does error handling work?", "Where are background jobs defined?").
- 8 agents ran in parallel (25 queries each). Each query was run through
search_code(this plugin's MCP tool) andrg(ripgrep, same as Claude Code's built-in Grep). The MCP tool returns a relevance score (0-1) and result count; grep returns a file count. - Scores were computed automatically from raw metrics — no manual judgment involved.
Requirements
Hardware (GPU required)
A supported GPU is mandatory for embedding and reranking inference. CPU-only mode is not supported.
| Platform | GPU | Backend | VRAM needed | |---|---|---|---| | Windows / Linux | NVIDIA RTX 20x0+ (Turing or newer) | Docker with CUDA | ~5 GB | | macOS | Apple Silicon (M1/M2/M3/M4) | Native Metal (no Docker) | Uses unified memory |
The three TEI models require approximately:
nomic-ai/nomic-embed-text-v1.5— ~270 MBcross-encoder/ms-marco-MiniLM-L-6-v2— ~90 MBQodo/Qodo-Embed-1-1.5B— ~3 GB (FP16)
First run downloads all models (~3.4 GB total). Subsequent starts use cached models.
Software
| Requirement | NVIDIA path | Apple Silicon path | |---|---|---| | Node.js 20+ | Required | Required | | Docker Desktop | Required (install) | Not needed | | NVIDIA Container Toolkit | Linux only — required for GPU passthrough (install). Not needed on Windows (Docker Desktop handles it via WSL2). | N/A | | Rust | N/A | Required for first build (install) |
Ports
TEI uses three local ports (not exposed to the network):
| Port | Service | Configurable via |
|---|---|---|
| 39281 | Doc embeddings | TEI_EMBED_URL |
| 39282 | Cross-encoder reranker | TEI_RERANK_URL |
| 39283 | Code embeddings | TEI_CODE_EMBED_URL |
Installation
As a Claude Code plugin (recommended)
# Add the marketplace
/plugin marketplace add matteodante/claude-local-docs
# Install the plugin
/plugin install claude-local-docsThe plugin starts TEI containers automatically on session start via a SessionStart hook.
Manual / development setup
git clone https://github.com/matteodante/claude-local-docs.git
cd claude-local-docs
npm install
npm run build
# Start TEI (auto-detects GPU)
./start-tei.shWhy not Context7?
| | claude-local-docs | Context7 | |---|---|---| | Runs where | Your machine (TEI Docker) | Upstash cloud servers | | Privacy | Docs never leave your machine | Queries sent to cloud API | | Rate limits | None | API-dependent | | Offline | Full search works offline | Requires internet | | GPU accelerated | NVIDIA CUDA / Apple Metal | N/A | | Search quality | 4-stage RAG (vector + BM25 + RRF + cross-encoder reranking) | Single-stage retrieval | | Doc sources | Prefers llms.txt, falls back to official docs | Pre-indexed source repos | | Code search | Semantic AST-level search via Qodo-Embed-1-1.5B | N/A | | Framework support | JS, TS, Vue, Svelte, Astro (SFC script extraction) | N/A | | Scope | Your project's actual dependencies + source code | Any library | | Monorepo | Detects pnpm/npm/yarn workspaces, resolves catalogs | N/A | | Resilience | Retry with exponential backoff + 30s timeout on TEI calls | N/A |
How it works
Documentation search
/fetch-docs search_docs("how to use useState")
| |
v v
Detect monorepo +--- Vector search (LanceDB) ---+
Scan all workspace pkgs | nomic-embed-text-v1.5 |
Resolve catalog: versions | |
| | +-> RRF Fusion
v | | (k=60)
For each runtime dep: +-- BM25 search (LanceDB FTS) --+
- Search for llms.txt | keyword + stemming |
- Raw fetch (no truncation)| v
- Chunk + embed + store | Cross-encoder rerank
| ms-marco-MiniLM-L-6-v2
| (via TEI :39282)
+----------------------------------+
|
v
Top-K resultsCodebase search
/index-codebase search_code("RRF fusion logic")
| |
v v
Walk project files +--- Vector search (LanceDB) -------+
Respect .gitignore | Qodo-Embed-1-1.5B (1536-dim) |
Git-diff incremental skip | |
| | +-> RRF Fusion
v | | (k=60)
For each JS/TS/Vue/ +-- BM25 search (LanceDB FTS) ------+
Svelte/Astro file: | camelCase split + stemming |
- Extract <script> (SFC) | |
- Parse AST (tree-sitter) +-- File-path boost (optional) -----+
- Extract functions/classes | v
- Extract JSDoc/decorators | Cross-encoder rerank
- Contextual headers | ms-marco-MiniLM-L-6-v2
- Embed with Qodo-Embed | (via TEI :39282)
- Store in LanceDB +--------------------------------------+
|
v
Function-level results
(file, lines, scope, score)
+ neighbor chunk expansionUsage
1. Index your project's docs
/fetch-docsClaude analyzes your project (including monorepo workspaces), finds all runtime dependencies, searches the web for the best documentation for each one (preferring llms-full.txt > llms.txt > official docs), and indexes everything locally.
2. Index your source code
/index-codebaseParses all JS/TS/Vue/Svelte/Astro files with tree-sitter, extracts JSDoc comments and decorators, generates Qodo-Embed-1-1.5B embeddings for function/class/method-level chunks, and stores them in LanceDB. Incremental via git-diff (falls back to SHA-256 hashing for non-git projects). Only changed files are re-indexed.
3. Search
Ask Claude anything. It will automatically use the right search tool:
# Library documentation (search_docs)
How do I set up middleware in Express?
What are the options for useQuery in TanStack Query?
Show me the API for zod's .refine()
# Your codebase (search_code)
Where is the authentication middleware?
Find the database connection setup
How does the search pipeline work?4. Other tools
list_docs— See what's indexed, when it was fetched, chunk countsget_doc_section— Retrieve specific sections by heading or chunk IDget_codebase_status— Check index status, language breakdown, changed filesanalyze_dependencies— List all deps (monorepo-aware, catalog-resolved, runtime/dev tagged)fetch_and_store_doc— Fetch a URL and index it directly (no AI truncation)discover_and_fetch_docs— Auto-discover and index docs for any npm package
TEI backend
ML inference runs in TEI (HuggingFace Text Embeddings Inference) containers:
| Container | Port | Model | Purpose |
|---|---|---|---|
| tei-embed | :39281 | nomic-ai/nomic-embed-text-v1.5 | Doc embeddings (384-dim Matryoshka) |
| tei-rerank | :39282 | cross-encoder/ms-marco-MiniLM-L-6-v2 | Cross-encoder reranking (docs + code) |
| tei-code-embed | :39283 | Qodo/Qodo-Embed-1-1.5B | Code embeddings (1536-dim, 68.5 CoIR) |
All TEI communication goes through a shared TeiClient class (src/tei-client.ts) with automatic retry (2 attempts, exponential backoff), 30s timeout, and batch splitting. TEI containers must be running for both indexing and search — there is no fallback mode.
Starting TEI
./start-tei.sh # Auto-detect GPU
./start-tei.sh --metal # Force Apple Metal (native, no Docker)
./start-tei.sh --cpu # Force CPU Docker
./start-tei.sh --stop # Stop all TEIAuto-detection selects the optimal backend:
| Platform | Backend | Image tag |
|---|---|---|
| NVIDIA RTX 50x0 (Blackwell) | Docker CUDA | 120-1.9 |
| NVIDIA RTX 40x0 (Ada) | Docker CUDA | 89-1.9 |
| NVIDIA RTX 30x0 (Ampere) | Docker CUDA | 86-1.9 |
| NVIDIA RTX 20x0 (Turing) | Docker CUDA | turing-1.9 |
| Apple Silicon | Native Metal | cargo install --features metal |
GPU override for NVIDIA:
docker compose -f docker-compose.yml -f docker-compose.nvidia.yml up -dSearch pipeline
Doc search uses a 4-stage RAG pipeline. Code search adds a 5th file-path boost signal:
| Stage | Technology | Purpose |
|---|---|---|
| Vector search | LanceDB + nomic-embed / Qodo-Embed via TEI | Semantic similarity (understands meaning) |
| BM25 search | LanceDB native FTS (BM25, stemming, stop words) | Keyword matching (exact terms like useEffect) |
| RRF fusion | Reciprocal Rank Fusion (k=60) | Merges both ranked lists, handles different score scales |
| Cross-encoder rerank | ms-marco-MiniLM-L-6-v2 via TEI | Rescores top 50 candidates with deep relevance model |
Code search specifics
- AST chunking: tree-sitter parses JS/TS/Vue/Svelte/Astro into function/class/method/interface/namespace entities
- JSDoc + decorators: Extracted from AST and prepended to chunk text for richer search context
- Metadata flags:
exported,async,abstracttracked per entity - Qodo-Embed-1-1.5B: 1.5B parameter model, 68.5 CoIR score, 32K context window, 1536-dim embeddings
- Contextual headers: file path + scope chain + flags + decorators + JSDoc prepended for BM25
- File-path boost: Queries containing file names (e.g., "rrf.ts") get a third RRF signal boosting matching files
- Neighbor expansion: Adjacent chunks from the same file are merged for fuller context
- Incremental indexing: Git-diff based (fast, ~50-100ms), falls back to SHA-256 hashing for non-git projects
- No fallback: TEI must be running — search errors if containers are down
- SFC support: Vue
<script>/<script setup>, Svelte<script>/<script context="module">, Astro---frontmatter +<script>tags
Storage
All data stays in your project directory:
your-project/.claude/docs/
├── lancedb/ # Vector database (docs + code tables)
├── .metadata.json # Doc fetch timestamps, source URLs per library
├── .code-metadata.json # File hashes, language, chunk counts, last index
└── raw/
├── react.md # Raw fetched documentation
├── next.md
└── tanstack__query.mdMCP Tools
| Tool | Description |
|---|---|
| analyze_dependencies | Detect and list all npm dependencies (monorepo-aware, runtime/dev tagged) |
| store_and_index_doc | Index documentation content you already have as a string |
| fetch_and_store_doc | Fetch documentation from a URL and index it (raw HTTP, no truncation) |
| discover_and_fetch_docs | Auto-discover and index docs for an npm package |
| search_docs | Semantic search across indexed library documentation |
| list_docs | List indexed libraries with version and fetch date |
| get_doc_section | Retrieve specific doc sections by heading or chunk ID |
| index_codebase | Index project source code for semantic search (incremental, .gitignore-aware) |
| search_code | Semantic search across project source code (function/class-level) |
| get_codebase_status | Check codebase index status, language breakdown, changed files |
Dependencies
| Package | License | Purpose |
|---|---|---|
| @lancedb/lancedb | Apache 2.0 | Embedded vector database + native FTS |
| @modelcontextprotocol/sdk | MIT | MCP server framework |
| web-tree-sitter | MIT | WASM-based AST parsing for code chunking |
| tree-sitter-wasms | MIT | Pre-built WASM grammars (JS/TS/Vue/Svelte) |
| ignore | MIT | .gitignore pattern matching |
| zod | MIT | Schema validation |
TEI containers (Docker):
| Image | Model | Purpose |
|---|---|---|
| text-embeddings-inference:* | nomic-ai/nomic-embed-text-v1.5 | Doc embeddings |
| text-embeddings-inference:* | cross-encoder/ms-marco-MiniLM-L-6-v2 | Cross-encoder reranking |
| text-embeddings-inference:* | Qodo/Qodo-Embed-1-1.5B | Code embeddings (1536-dim) |
Development
npm run dev # Watch mode — rebuilds on file changes
npm run build # One-time build
npm run test:unit # Unit tests (no TEI needed)
npm run test:docs # Doc search integration tests (requires TEI on :39281, :39282)
npm run test:code # Code search integration tests (requires TEI on :39281, :39282, :39283)Project structure
claude-local-docs/
├── .claude-plugin/
│ ├── plugin.json # Plugin manifest
│ └── marketplace.json # Marketplace listing
├── .mcp.json # MCP server config (stdio transport)
├── commands/
│ ├── fetch-docs.md # /fetch-docs — Claude as research agent
│ └── index-codebase.md # /index-codebase — index source code
├── hooks/
│ └── hooks.json # SessionStart hook for TEI containers
├── scripts/
│ └── ensure-tei.sh # Idempotent TEI health check + start
├── docker-compose.yml # TEI containers (uses ${TEI_TAG})
├── docker-compose.nvidia.yml # NVIDIA GPU device passthrough
├── start-tei.sh # Auto-detect GPU, start TEI
├── src/
│ ├── index.ts # MCP server entry, 10 tool definitions
│ ├── tei-client.ts # Shared TEI HTTP client (retry, timeout, batching)
│ ├── indexer.ts # Doc chunking + nomic-embed-text embeddings
│ ├── search.ts # Doc search pipeline (vector + BM25 + RRF + rerank)
│ ├── rrf.ts # Shared Reciprocal Rank Fusion utility
│ ├── reranker.ts # TEI cross-encoder reranking
│ ├── store.ts # LanceDB "docs" table + metadata
│ ├── code-indexer.ts # AST chunking (tree-sitter) + Qodo-Embed embeddings
│ ├── code-search.ts # Code search pipeline (5-stage + neighbor expansion)
│ ├── code-store.ts # LanceDB "code" table + file hash tracking + schema migration
│ ├── file-walker.ts # Project file discovery + .gitignore + git-diff
│ ├── sfc-extractor.ts # Vue/Svelte/Astro <script> block extraction
│ ├── fetcher.ts # Raw HTTP fetch (no AI truncation)
│ ├── workspace.ts # Monorepo detection + pnpm catalog
│ ├── discovery.ts # npm registry + URL probing for docs
│ ├── types.ts # Shared TypeScript interfaces
│ ├── unit.test.ts # Unit tests (no TEI needed)
│ ├── docs.test.ts # Doc search integration tests
│ └── code.test.ts # Code search integration tests
├── LICENSE
├── package.json
└── tsconfig.jsonTroubleshooting
TEI containers not starting
# Check Docker is running
docker info
# Check container logs
docker compose logs tei-embed
docker compose logs tei-rerank
docker compose logs tei-code-embed
# Restart
./start-tei.sh --stop && ./start-tei.shPort conflicts
If 39281/39282/39283 are in use, override via env vars:
TEI_EMBED_URL=http://localhost:49281 TEI_RERANK_URL=http://localhost:49282 TEI_CODE_EMBED_URL=http://localhost:49283 node dist/index.jsApple Silicon — slow performance
The default Docker CPU image runs via Rosetta 2. Use native Metal instead:
./start-tei.sh --metalRequires Rust (curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh). First build takes a few minutes.
License
MIT
