@abhinav2203/coderag
v1.0.3
Published
Standalone code retrieval and MCP server for multi-language repositories built on @abhinav2203/codeflow-core.
Downloads
166
Maintainers
Readme
CodeRag
CodeRag is a standalone npm package that gives coding agents targeted retrieval over a codebase. It uses @abhinav2203/codeflow-core with tree-sitter for multi-language repo analysis, stores node documents in LanceDB, traverses graph edges for surrounding context, and can optionally ask a local LLM server to turn the retrieved context into an answer.
Supported languages: TypeScript, JavaScript, Go, Python, C, C++, Rust.
What ships in this repo
- Library API for indexing and querying a repo
- CLI for local setup, indexing, querying, and git hook installation
- MCP server exposing
query,lookup,explain,impact, andstatus - Easy-to-swap interfaces for graph providers and LLM transports
Install
npm install @abhinav2203/coderagQuick start
- Create a config file in the target repo:
{
"repoPath": ".",
"storageRoot": ".coderag",
"retrieval": {
"topK": 6,
"rerankK": 3
},
"traversal": {
"defaultDepth": 1,
"maxDepth": 3
},
"llm": {
"enabled": false,
"transport": "openai-compatible",
"baseUrl": "http://127.0.0.1:1234/v1",
"model": "your-local-model"
}
}- Initialize the index:
npx coderag init- Query the repo:
npx coderag query "where is auth handled?"- Run the MCP server:
npx coderag serve-mcpConfiguration
CodeRag loads configuration in this order:
- Explicit
--configpath coderag.config.json.coderag.json.envvalues from the current working directory- Environment overrides
Supported environment overrides:
CODERAG_REPO_PATHCODERAG_STORAGE_ROOTCODERAG_EMBEDDING_PROVIDERCODERAG_EMBEDDING_DIMENSIONSCODERAG_ONNX_MODEL_DIRCODERAG_GEMINI_MODELCODERAG_GEMINI_API_KEYCODERAG_GEMINI_AI_KEYCODERAG_EMBEDDING_TIMEOUT_MSCODERAG_TOP_KCODERAG_RERANK_KCODERAG_MAX_CONTEXT_CHARSCODERAG_DEFAULT_DEPTHCODERAG_MAX_DEPTHCODERAG_LOCK_TIMEOUT_MSCODERAG_LOCK_POLL_MSCODERAG_LOCK_STALE_MSCODERAG_SERVICE_HOSTCODERAG_SERVICE_PORTCODERAG_SERVICE_API_KEYCODERAG_LLM_ENABLEDCODERAG_LLM_TRANSPORTCODERAG_LLM_BASE_URLCODERAG_LLM_MODELCODERAG_LLM_API_KEYCODERAG_LLM_TIMEOUT_MSCODERAG_CUSTOM_HTTP_FORMATCODERAG_LLM_HEADERS
When embedding.provider is gemini, CodeRag defaults to models/gemini-embedding-001 and requests 768-dimensional vectors explicitly so the stored embedding fingerprint matches the vectors written to LanceDB. It accepts either CODERAG_GEMINI_API_KEY or the compatibility alias CODERAG_GEMINI_AI_KEY.
When embedding.provider is onnx, CodeRag uses Xenova/gte-small (384-dim, ~33MB) running locally via @xenova/transformers. No API key or external server needed. The model must be downloaded to <onnxModelDir>/Xenova/gte-small/ (default .coderag-models/models/Xenova/gte-small/).
# Download the ONNX embedding model (~33MB)
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('Xenova/gte-small', local_dir='.coderag-models/models',
allow_patterns=['onnx/model_quantized.onnx', 'config.json',
'tokenizer.json', 'tokenizer_config.json',
'special_tokens_map.json'])
"
# Then set embedding.provider to "onnx" in your config and run coderag initLocal LLM integration
CodeRag does not require a hosted model. The default documented path is any local or self-hosted model server that exposes an OpenAI-compatible HTTP API on a port.
OpenAI-compatible server
Point CodeRag at a server that exposes /v1/chat/completions and streams tokens over SSE.
{
"llm": {
"enabled": true,
"transport": "openai-compatible",
"baseUrl": "http://127.0.0.1:1234/v1",
"model": "qwen2.5-coder-14b-instruct"
}
}CodeRag sends:
- the user question
- the assembled CodeRag context package
- a system prompt that tells the model to answer only from retrieved code context
Compatibility notes:
baseUrlmay already include/v1; CodeRag preserves that path when calling/chat/completions.- If a provider rejects
systemrole messages, CodeRag retries by folding the system prompt into the first user message. - Prompt assembly is compact and file-aware so small-context local models can still answer from retrieved code without receiving duplicated file bodies.
Custom HTTP server
If your local model server is not OpenAI-compatible, use transport: "custom-http".
Request body:
{
"question": "where is auth handled?",
"model": "local-model",
"stream": true,
"context": {
"graphSummary": "..."
},
"messages": [
{ "role": "system", "content": "..." },
{ "role": "user", "content": "..." }
]
}Supported response formats:
json:{ "answer": "..." }ndjson: one JSON object per line withtokenchunks and an optional finalanswersse:data:frames withtokenchunks and an optional finalanswer
Example:
{
"llm": {
"enabled": true,
"transport": "custom-http",
"baseUrl": "http://127.0.0.1:8080",
"model": "local-model",
"customHttpFormat": "ndjson"
}
}Retrieval behavior
- Indexing stores one generated markdown document per blueprint node.
- Search uses deterministic local embeddings, source-span-aware lexical reranking, query expansion for operational terms, and a penalty for oversized catch-all nodes.
- Page index retrieval reads full files from disk and caches them by
mtimeMs. - Graph traversal expands both upstream and downstream neighbors up to the requested depth.
- If no LLM is configured,
queryreturnsanswerMode: "context-only"with the same context package.
CLI
coderag init [--config path]
coderag index [--config path]
coderag reindex [--config path] [--full]
coderag query "question" [--config path] [--depth 2] [--json]
coderag serve-mcp [--config path]
coderag serve-http [--config path]
coderag doctor [--config path]Git hook
coderag init installs a post-commit hook that triggers coderag reindex and preserves any pre-existing hook logic.
Production notes
- TypeScript, JavaScript, Go, Python, C, C++, and Rust repos are supported.
- Excluded directories:
node_modules,.git,.next,dist,build,target,__pycache__,vendor,.venv,artifacts,coverage. - Call-site extraction is best effort for dynamic dispatch, reflection, or generated code. Missing call sites are returned as unresolved metadata, not guessed values.
- The built-in
local-hashembedding strategy is deterministic and zero-setup. Theonnxprovider runsXenova/gte-smalllocally (384-dim, ~33MB) for semantic-quality embeddings without any API key. If you need cloud-quality embeddings, use thegeminiprovider. serve-httpexposes/health,/ready,/metrics, and/v1/*endpoints./readyonly reports ready once the index exists, contains documents, and matches the configured embedding fingerprint.- If you use Gemini embeddings, set
CODERAG_GEMINI_API_KEYorCODERAG_GEMINI_AI_KEYbefore indexing. ChangingCODERAG_GEMINI_MODELrequires a full reindex because the persisted embedding fingerprint includes the model name and dimensions. - Live E2E runs in this repo were verified against an OpenAI-compatible NVIDIA endpoint and against both the CodeRag and CodeFlow repositories.
Development
npm install
npm run lint
npm run check
npm test
npm run build