claude-local-docs

v1.0.23

Published

5 months ago

Local-first Context7 alternative — indexes JS/TS dependency docs with a 4-stage RAG pipeline. Uses TEI (Text Embeddings Inference) Docker containers for embeddings and reranking.

0High
0Medium
0Low

matteodante

claude-code mcp documentation search rag embeddings context7-alternative local-first semantic-search llms-txt

claude-local-docs

A local-first alternative to Context7 for Claude Code. Indexes your project's dependency documentation and source code locally with production-grade semantic search. Embeddings and reranking run via TEI (HuggingFace Text Embeddings Inference) Docker containers with auto GPU detection. Supports JS/TS, Vue, Svelte, and Astro with AST-aware chunking, JSDoc extraction, and git-diff incremental indexing.

Benchmark: Semantic Search vs Grep

Benchmark Results

Tested on a real-world TypeScript monorepo (957 files, 4,484 indexed code chunks) with 187 generic queries across 16 categories (authentication, database, caching, error handling, etc.).

How scores are computed:

Semantic search — scored 0-10 based on the top relevance score returned by the search pipeline (top_relevance * 9.5 + result_count_bonus). Zero results = zero score.
Grep — scored 0-10 using a log-scale penalty for noise (7.5 - 1.5 * log10(matching_files)). Few focused matches score high; hundreds of files score low. Capped at 7.5 since grep always requires manual review.
Combined — max(Semantic, Grep) per query. Represents what Claude Code gets when both tools are available.

How the benchmark was run:

200 queries were written to cover common code search patterns that apply to any codebase (e.g., "How does error handling work?", "Where are background jobs defined?").
8 agents ran in parallel (25 queries each). Each query was run through search_code (this plugin's MCP tool) and rg (ripgrep, same as Claude Code's built-in Grep). The MCP tool returns a relevance score (0-1) and result count; grep returns a file count.
Scores were computed automatically from raw metrics — no manual judgment involved.

Requirements

Hardware (GPU required)

A supported GPU is mandatory for embedding and reranking inference. CPU-only mode is not supported.

| Platform | GPU | Backend | VRAM needed | |---|---|---|---| | Windows / Linux | NVIDIA RTX 20x0+ (Turing or newer) | Docker with CUDA | ~5 GB | | macOS | Apple Silicon (M1/M2/M3/M4) | Native Metal (no Docker) | Uses unified memory |

The three TEI models require approximately:

nomic-ai/nomic-embed-text-v1.5 — ~270 MB
cross-encoder/ms-marco-MiniLM-L-6-v2 — ~90 MB
Qodo/Qodo-Embed-1-1.5B — ~3 GB (FP16)

First run downloads all models (~3.4 GB total). Subsequent starts use cached models.

Software

| Requirement | NVIDIA path | Apple Silicon path | |---|---|---| | Node.js 20+ | Required | Required | | Docker Desktop | Required (install) | Not needed | | NVIDIA Container Toolkit | Linux only — required for GPU passthrough (install). Not needed on Windows (Docker Desktop handles it via WSL2). | N/A | | Rust | N/A | Required for first build (install) |

Ports

TEI uses three local ports (not exposed to the network):

| Port | Service | Configurable via | |---|---|---| | 39281 | Doc embeddings | TEI_EMBED_URL | | 39282 | Cross-encoder reranker | TEI_RERANK_URL | | 39283 | Code embeddings | TEI_CODE_EMBED_URL |

Installation

As a Claude Code plugin (recommended)

# Add the marketplace
/plugin marketplace add matteodante/claude-local-docs

# Install the plugin
/plugin install claude-local-docs

The plugin starts TEI containers automatically on session start via a SessionStart hook.

Manual / development setup

git clone https://github.com/matteodante/claude-local-docs.git
cd claude-local-docs
npm install
npm run build

# Start TEI (auto-detects GPU)
./start-tei.sh

Why not Context7?

| | claude-local-docs | Context7 | |---|---|---| | Runs where | Your machine (TEI Docker) | Upstash cloud servers | | Privacy | Docs never leave your machine | Queries sent to cloud API | | Rate limits | None | API-dependent | | Offline | Full search works offline | Requires internet | | GPU accelerated | NVIDIA CUDA / Apple Metal | N/A | | Search quality | 4-stage RAG (vector + BM25 + RRF + cross-encoder reranking) | Single-stage retrieval | | Doc sources | Prefers llms.txt, falls back to official docs | Pre-indexed source repos | | Code search | Semantic AST-level search via Qodo-Embed-1-1.5B | N/A | | Framework support | JS, TS, Vue, Svelte, Astro (SFC script extraction) | N/A | | Scope | Your project's actual dependencies + source code | Any library | | Monorepo | Detects pnpm/npm/yarn workspaces, resolves catalogs | N/A | | Resilience | Retry with exponential backoff + 30s timeout on TEI calls | N/A |

How it works

Documentation search

/fetch-docs                        search_docs("how to use useState")
     |                                       |
     v                                       v
 Detect monorepo              +--- Vector search (LanceDB) ---+
 Scan all workspace pkgs      |    nomic-embed-text-v1.5       |
 Resolve catalog: versions    |                                |
     |                        |                                +-> RRF Fusion
     v                        |                                |    (k=60)
 For each runtime dep:        +-- BM25 search (LanceDB FTS) --+
   - Search for llms.txt      |    keyword + stemming            |
   - Raw fetch (no truncation)|                                  v
   - Chunk + embed + store    |                        Cross-encoder rerank
                              |                        ms-marco-MiniLM-L-6-v2
                              |                          (via TEI :39282)
                              +----------------------------------+
                                                  |
                                                  v
                                          Top-K results

Codebase search

/index-codebase                    search_code("RRF fusion logic")
     |                                       |
     v                                       v
 Walk project files            +--- Vector search (LanceDB) -------+
 Respect .gitignore            |    Qodo-Embed-1-1.5B (1536-dim)   |
 Git-diff incremental skip     |                                    |
     |                         |                                    +-> RRF Fusion
     v                         |                                    |    (k=60)
 For each JS/TS/Vue/           +-- BM25 search (LanceDB FTS) ------+
 Svelte/Astro file:            |    camelCase split + stemming      |
   - Extract <script> (SFC)    |                                    |
   - Parse AST (tree-sitter)   +-- File-path boost (optional) -----+
   - Extract functions/classes |                                      v
   - Extract JSDoc/decorators  |                            Cross-encoder rerank
   - Contextual headers        |                            ms-marco-MiniLM-L-6-v2
   - Embed with Qodo-Embed     |                              (via TEI :39282)
   - Store in LanceDB          +--------------------------------------+
                                                  |
                                                  v
                                      Function-level results
                                   (file, lines, scope, score)
                                   + neighbor chunk expansion

Usage

1. Index your project's docs

/fetch-docs

Claude analyzes your project (including monorepo workspaces), finds all runtime dependencies, searches the web for the best documentation for each one (preferring llms-full.txt > llms.txt > official docs), and indexes everything locally.

2. Index your source code

/index-codebase

Parses all JS/TS/Vue/Svelte/Astro files with tree-sitter, extracts JSDoc comments and decorators, generates Qodo-Embed-1-1.5B embeddings for function/class/method-level chunks, and stores them in LanceDB. Incremental via git-diff (falls back to SHA-256 hashing for non-git projects). Only changed files are re-indexed.

3. Search

Ask Claude anything. It will automatically use the right search tool:

# Library documentation (search_docs)
How do I set up middleware in Express?
What are the options for useQuery in TanStack Query?
Show me the API for zod's .refine()

# Your codebase (search_code)
Where is the authentication middleware?
Find the database connection setup
How does the search pipeline work?

4. Other tools

list_docs — See what's indexed, when it was fetched, chunk counts
get_doc_section — Retrieve specific sections by heading or chunk ID
get_codebase_status — Check index status, language breakdown, changed files
analyze_dependencies — List all deps (monorepo-aware, catalog-resolved, runtime/dev tagged)
fetch_and_store_doc — Fetch a URL and index it directly (no AI truncation)
discover_and_fetch_docs — Auto-discover and index docs for any npm package

TEI backend

ML inference runs in TEI (HuggingFace Text Embeddings Inference) containers:

| Container | Port | Model | Purpose | |---|---|---|---| | tei-embed | :39281 | nomic-ai/nomic-embed-text-v1.5 | Doc embeddings (384-dim Matryoshka) | | tei-rerank | :39282 | cross-encoder/ms-marco-MiniLM-L-6-v2 | Cross-encoder reranking (docs + code) | | tei-code-embed | :39283 | Qodo/Qodo-Embed-1-1.5B | Code embeddings (1536-dim, 68.5 CoIR) |

All TEI communication goes through a shared TeiClient class (src/tei-client.ts) with automatic retry (2 attempts, exponential backoff), 30s timeout, and batch splitting. TEI containers must be running for both indexing and search — there is no fallback mode.

Starting TEI

./start-tei.sh           # Auto-detect GPU
./start-tei.sh --metal   # Force Apple Metal (native, no Docker)
./start-tei.sh --cpu     # Force CPU Docker
./start-tei.sh --stop    # Stop all TEI

Auto-detection selects the optimal backend:

| Platform | Backend | Image tag | |---|---|---| | NVIDIA RTX 50x0 (Blackwell) | Docker CUDA | 120-1.9 | | NVIDIA RTX 40x0 (Ada) | Docker CUDA | 89-1.9 | | NVIDIA RTX 30x0 (Ampere) | Docker CUDA | 86-1.9 | | NVIDIA RTX 20x0 (Turing) | Docker CUDA | turing-1.9 | | Apple Silicon | Native Metal | cargo install --features metal |

GPU override for NVIDIA:

docker compose -f docker-compose.yml -f docker-compose.nvidia.yml up -d

Search pipeline

Doc search uses a 4-stage RAG pipeline. Code search adds a 5th file-path boost signal:

| Stage | Technology | Purpose | |---|---|---| | Vector search | LanceDB + nomic-embed / Qodo-Embed via TEI | Semantic similarity (understands meaning) | | BM25 search | LanceDB native FTS (BM25, stemming, stop words) | Keyword matching (exact terms like useEffect) | | RRF fusion | Reciprocal Rank Fusion (k=60) | Merges both ranked lists, handles different score scales | | Cross-encoder rerank | ms-marco-MiniLM-L-6-v2 via TEI | Rescores top 50 candidates with deep relevance model |

Code search specifics

AST chunking: tree-sitter parses JS/TS/Vue/Svelte/Astro into function/class/method/interface/namespace entities
JSDoc + decorators: Extracted from AST and prepended to chunk text for richer search context
Metadata flags: exported, async, abstract tracked per entity
Qodo-Embed-1-1.5B: 1.5B parameter model, 68.5 CoIR score, 32K context window, 1536-dim embeddings
Contextual headers: file path + scope chain + flags + decorators + JSDoc prepended for BM25
File-path boost: Queries containing file names (e.g., "rrf.ts") get a third RRF signal boosting matching files
Neighbor expansion: Adjacent chunks from the same file are merged for fuller context
Incremental indexing: Git-diff based (fast, ~50-100ms), falls back to SHA-256 hashing for non-git projects
No fallback: TEI must be running — search errors if containers are down
SFC support: Vue <script>/<script setup>, Svelte <script>/<script context="module">, Astro --- frontmatter + <script> tags

Storage

All data stays in your project directory:

your-project/.claude/docs/
├── lancedb/                  # Vector database (docs + code tables)
├── .metadata.json            # Doc fetch timestamps, source URLs per library
├── .code-metadata.json       # File hashes, language, chunk counts, last index
└── raw/
    ├── react.md              # Raw fetched documentation
    ├── next.md
    └── tanstack__query.md

MCP Tools

| Tool | Description | |---|---| | analyze_dependencies | Detect and list all npm dependencies (monorepo-aware, runtime/dev tagged) | | store_and_index_doc | Index documentation content you already have as a string | | fetch_and_store_doc | Fetch documentation from a URL and index it (raw HTTP, no truncation) | | discover_and_fetch_docs | Auto-discover and index docs for an npm package | | search_docs | Semantic search across indexed library documentation | | list_docs | List indexed libraries with version and fetch date | | get_doc_section | Retrieve specific doc sections by heading or chunk ID | | index_codebase | Index project source code for semantic search (incremental, .gitignore-aware) | | search_code | Semantic search across project source code (function/class-level) | | get_codebase_status | Check codebase index status, language breakdown, changed files |

Dependencies

| Package | License | Purpose | |---|---|---| | @lancedb/lancedb | Apache 2.0 | Embedded vector database + native FTS | | @modelcontextprotocol/sdk | MIT | MCP server framework | | web-tree-sitter | MIT | WASM-based AST parsing for code chunking | | tree-sitter-wasms | MIT | Pre-built WASM grammars (JS/TS/Vue/Svelte) | | ignore | MIT | .gitignore pattern matching | | zod | MIT | Schema validation |

TEI containers (Docker):

| Image | Model | Purpose | |---|---|---| | text-embeddings-inference:* | nomic-ai/nomic-embed-text-v1.5 | Doc embeddings | | text-embeddings-inference:* | cross-encoder/ms-marco-MiniLM-L-6-v2 | Cross-encoder reranking | | text-embeddings-inference:* | Qodo/Qodo-Embed-1-1.5B | Code embeddings (1536-dim) |

Development

npm run dev         # Watch mode — rebuilds on file changes
npm run build       # One-time build
npm run test:unit   # Unit tests (no TEI needed)
npm run test:docs   # Doc search integration tests (requires TEI on :39281, :39282)
npm run test:code   # Code search integration tests (requires TEI on :39281, :39282, :39283)

Project structure

claude-local-docs/
├── .claude-plugin/
│   ├── plugin.json           # Plugin manifest
│   └── marketplace.json      # Marketplace listing
├── .mcp.json                 # MCP server config (stdio transport)
├── commands/
│   ├── fetch-docs.md         # /fetch-docs — Claude as research agent
│   └── index-codebase.md     # /index-codebase — index source code
├── hooks/
│   └── hooks.json            # SessionStart hook for TEI containers
├── scripts/
│   └── ensure-tei.sh         # Idempotent TEI health check + start
├── docker-compose.yml        # TEI containers (uses ${TEI_TAG})
├── docker-compose.nvidia.yml # NVIDIA GPU device passthrough
├── start-tei.sh              # Auto-detect GPU, start TEI
├── src/
│   ├── index.ts              # MCP server entry, 10 tool definitions
│   ├── tei-client.ts         # Shared TEI HTTP client (retry, timeout, batching)
│   ├── indexer.ts            # Doc chunking + nomic-embed-text embeddings
│   ├── search.ts             # Doc search pipeline (vector + BM25 + RRF + rerank)
│   ├── rrf.ts                # Shared Reciprocal Rank Fusion utility
│   ├── reranker.ts           # TEI cross-encoder reranking
│   ├── store.ts              # LanceDB "docs" table + metadata
│   ├── code-indexer.ts       # AST chunking (tree-sitter) + Qodo-Embed embeddings
│   ├── code-search.ts        # Code search pipeline (5-stage + neighbor expansion)
│   ├── code-store.ts         # LanceDB "code" table + file hash tracking + schema migration
│   ├── file-walker.ts        # Project file discovery + .gitignore + git-diff
│   ├── sfc-extractor.ts      # Vue/Svelte/Astro <script> block extraction
│   ├── fetcher.ts            # Raw HTTP fetch (no AI truncation)
│   ├── workspace.ts          # Monorepo detection + pnpm catalog
│   ├── discovery.ts          # npm registry + URL probing for docs
│   ├── types.ts              # Shared TypeScript interfaces
│   ├── unit.test.ts          # Unit tests (no TEI needed)
│   ├── docs.test.ts          # Doc search integration tests
│   └── code.test.ts          # Code search integration tests
├── LICENSE
├── package.json
└── tsconfig.json

Troubleshooting

TEI containers not starting

# Check Docker is running
docker info

# Check container logs
docker compose logs tei-embed
docker compose logs tei-rerank
docker compose logs tei-code-embed

# Restart
./start-tei.sh --stop && ./start-tei.sh

Port conflicts

If 39281/39282/39283 are in use, override via env vars:

TEI_EMBED_URL=http://localhost:49281 TEI_RERANK_URL=http://localhost:49282 TEI_CODE_EMBED_URL=http://localhost:49283 node dist/index.js

Apple Silicon — slow performance

The default Docker CPU image runs via Rosetta 2. Use native Metal instead:

./start-tei.sh --metal

Requires Rust (curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh). First build takes a few minutes.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

claude-local-docs

Benchmark: Semantic Search vs Grep

Requirements

Hardware (GPU required)

Software

Ports

Installation

As a Claude Code plugin (recommended)

Manual / development setup

Why not Context7?

How it works

Documentation search

Codebase search

Usage

1. Index your project's docs

2. Index your source code

3. Search

4. Other tools

TEI backend

Starting TEI

Search pipeline

Code search specifics

Storage

MCP Tools

Dependencies

Development

Project structure

Troubleshooting

TEI containers not starting

Port conflicts

Apple Silicon — slow performance

License