@wb200/mgrep
v0.1.18
Published
Local semantic code search with LanceDB indexing and DeepInfra-powered retrieval
Downloads
16
Maintainers
Readme
Why mgrep?
- Ask your repo questions in natural language instead of guessing exact symbols.
- Keep a local LanceDB index on disk under
~/.mgrep/lancedb/. - Combine vector retrieval, full-text search, reranking, and optional answer synthesis.
- Work directly in the CLI or wire it into coding agents.
- Use it as a hybrid semantic complement to
rg,grep, andast-grep, not as a replacement for exact or structural search.
mgrep is for local repository search. It does not do web search in this fork.
# index a project
mgrep watch
# search semantically
mgrep "where do we set up auth?"
# synthesize an answer from retrieved local results
mgrep -a "how does the sync pipeline work?"Quick Start
Install
npm install -g @wb200/mgrepSet required API key
export DEEPINFRA_API_KEY=your_deepinfra_keyDEEPINFRA_API_KEYis used for embeddings, reranking, synthesized answers, and agentic query planning.- It is required for normal use in this fork.
Validate configuration
mgrep validateKnow where config lives
- Project-local:
.mgreprc.yamlor.mgreprc.ymlin the directory you are indexing/searching from - Global:
~/.config/mgrep/config.yamlor~/.config/mgrep/config.yml
- Project-local:
Index a project
cd path/to/repo mgrep watchInspect the effective indexing rules
mgrep rulesSearch
mgrep "where do we set up auth?" mgrep -m 25 "store schema" mgrep -a "how is rate limiting implemented?"
What It Does
mgrep keeps a local searchable index of your repository.
- Indexing is allowlist-first. A file must match an allowed extension, exact basename, or exact hidden basename before it is eligible for indexing.
- Configured
blockedPathscan exclude path prefixes regardless of filename. - After allowlist admission,
.gitignore,.mgrepignore, built-in deny patterns, hidden-directory blocking, and text-file detection can still exclude it. - Indexed content is chunked and stored locally in LanceDB.
- Embeddings, reranking, answer synthesis, and agentic planning are done through DeepInfra.
This means the index itself is local, but text chunks are sent to DeepInfra during embedding, reranking, and answer-generation flows.
Search Strategy
mgrep works best as the semantic layer in a local-search toolkit:
- Use
mgrepfor intent-based discovery, architecture questions, and unfamiliar codebases. - Use
rgorgrepfor exact strings, regexes, and exhaustive audits. - Use
ast-grepfor syntax-aware structural matches and refactor prep.
A common workflow is to use mgrep first to find candidate files or concepts, then confirm exact implementation details with rg or ast-grep.
Commands
Top-level commands:
mgrepormgrep search <pattern> [path]mgrep rules [path]mgrep watchmgrep validatemgrep install-claude-codemgrep uninstall-claude-codemgrep install-codexmgrep uninstall-codexmgrep install-opencodemgrep uninstall-opencodemgrep install-droidmgrep uninstall-droidmgrep mcp
Global options:
--store <string>: logical store name to use, defaultmgrep
Understanding Stores
The --store flag controls which logical index mgrep reads from and writes to.
This is one of the most important things to understand when using mgrep across multiple folders.
The default store name
If you do not pass --store, mgrep always uses the store named:
mgrepThat means these are equivalent:
mgrep watch
mgrep --store mgrep watchand:
mgrep "query"
mgrep --store mgrep "query"You can also change the default for a shell session with:
export MGREP_STORE=my-storeAfter that, plain mgrep ... commands in that shell use my-store unless you override them with --store.
One command searches exactly one store
mgrep does not search all stores automatically.
Each command uses exactly one logical store:
--store some-name- or
MGREP_STORE - or the built-in default
mgrep
So if you indexed a folder with:
mgrep --store factory-specs watchand later run:
mgrep "query"you are not searching factory-specs. You are searching the default store mgrep.
To search the store you indexed, you must use:
mgrep --store factory-specs "query"or:
export MGREP_STORE=factory-specs
mgrep "query"What watch indexes
mgrep watch indexes:
- the current working directory
- all subdirectories under it
- only files that match the current allowlist, then survive
.gitignore,.mgrepignore, built-in deny patterns, hidden-directory blocking, and text-file detection
So this:
cd /path/to/project
mgrep --store my-project watchindexes /path/to/project and all of its eligible subfolders into store my-project.
Stores are additive across multiple folders
If you run watch in two different folders with the same store name, the index is additive.
Example:
cd /path/project-a
mgrep --store shared watch
cd /path/project-b
mgrep --store shared watchAfter that, store shared contains indexed content for both:
/path/project-a/.../path/project-b/...
The second watch does not wipe the first one.
Deletions are scoped to the watched folder
When watch or search --sync removes stale entries, it only deletes files inside the current folder subtree being synced.
That means if project-a and project-b both live in store shared:
- syncing
project-acan remove stale entries fromproject-a - syncing
project-adoes not removeproject-b
This is what makes additive multi-root stores possible.
Search is store-scoped and path-scoped
Search always works in two layers:
- it selects a single store
- it filters results to the current directory or the path argument you pass
So if you are inside project-a and search against a shared store, mgrep still scopes results to your current path by default.
Examples:
cd /path/project-a
mgrep --store shared "auth middleware"searches store shared, but only for content under /path/project-a/....
And:
mgrep --store shared "auth middleware" /path/project-bsearches the same shared store, but only under /path/project-b/....
Recommended usage patterns
One store per project is the easiest model to reason about:
cd ~/code/project-a
mgrep --store project-a watch
cd ~/code/project-b
mgrep --store project-b watchThen search with the matching store name:
mgrep --store project-a "query"
mgrep --store project-b "query"Shared store across multiple roots is also supported if you want it intentionally:
cd ~/notes
mgrep --store personal watch
cd ~/specs
mgrep --store personal watchThis combines both roots into one logical store named personal.
Practical rule of thumb
If you are unsure, use this rule:
- for one project: default
mgrepis fine - for multiple unrelated projects: give each project its own
--store - if you want one combined multi-root index: reuse the same
--storedeliberately
mgrep search
mgrep search is the default command. It searches the current directory unless you pass a path.
Arguments:
<pattern>: natural-language query[path]: optional search root or scoped path
Options:
-m, --max-count <max_count>: maximum number of results, default10-c, --content: include matched chunk content in output-a, --answer: synthesize an answer from retrieved local results-s, --sync: sync files before searching-d, --dry-run: preview sync work without uploading or deleting--no-rerank: disable reranking--max-file-size <bytes>: override upload size limit for sync--max-file-count <count>: override sync file-count limit--agentic: enable multi-query planning before retrieval
Examples:
mgrep "Where is the auth middleware configured?"
mgrep "How are chunks defined?" src/lib
mgrep -m 5 "maximum concurrent workers"
mgrep -c "How does caching work?"
mgrep -a "How is rate limiting implemented?"
mgrep --agentic -a "How does authentication work and where is it configured?"
mgrep --sync "Where is the API server started?"
mgrep --sync --dry-run "search query"mgrep watch
mgrep watch performs an initial sync, then keeps the current project directory in sync via file watching.
Options:
-d, --dry-run: preview what would be uploaded or deleted--max-file-size <bytes>: override upload size limit--max-file-count <count>: override sync file-count limit
Examples:
mgrep watch
mgrep watch --dry-run
mgrep watch --max-file-size 1048576
mgrep watch --max-file-count 5000mgrep rules
mgrep rules shows the effective allow/block indexing logic for the current
directory after merging defaults, global config, and local config.
Arguments:
[path]: optional directory or file path to inspect
Options:
--json: emit the effective rules as JSON
Examples:
mgrep rules
mgrep rules --json
mgrep rules srcUse this when you want to confirm:
- the effective allowlists for extensions, exact names, and dotfiles
- the effective
ignorePatternsandblockedPaths - which local and global config files are currently being applied
mgrep validate
Validates the DeepInfra configuration by exercising embeddings, rerank, and chat completions.
mgrep validateAgent Integration Commands
mgrep includes helper installers for several agent environments:
mgrep install-claude-codemgrep uninstall-claude-codemgrep install-codexmgrep uninstall-codexmgrep install-opencodemgrep uninstall-opencodemgrep install-droidmgrep uninstall-droid
These integrations are focused on local search plus background indexing. After installation, mgrep warns that background sync will run automatically for supported agent flows.
mgrep mcp
Starts the internal MCP server process used by some integrations.
This command is not needed for normal CLI use.
Configuration
Configuration sources, highest precedence first:
- CLI flags
- Environment variables
- Local config file:
.mgreprc.yamlor.mgreprc.yml - Global config file:
~/.config/mgrep/config.yamlor~/.config/mgrep/config.yml - Built-in defaults
Config Locations
- Project-local:
.mgreprc.yamlor.mgreprc.ymlin the directory you are indexing/searching from - Global:
~/.config/mgrep/config.yamlor~/.config/mgrep/config.yml
Use the project-local file when you want rules that only apply to one repo or workspace. Use the global file when you want defaults applied across projects.
Config File
Example override:
This example narrows indexing further than the built-in defaults. For the full
current default allowlist and a full ready-to-paste .mgreprc.yaml, see
guides/README.md.
maxFileSize: 5242880
maxFileCount: 5000
syncConcurrency: 10
blockedPaths:
- private
- ~/scratch/generated-docs
ignorePatterns:
- "*.csv"
- "*.jsonl"blockedPaths is for path-prefix exclusions that should always be skipped.
Entries may be:
- relative paths, resolved relative to the config file that defines them
- absolute paths
~/...paths expanded against the current home directory
Use ignorePatterns for glob-style filename/path filtering after allowlist
admission. Use blockedPaths when you want to exclude a directory subtree or
specific path prefix regardless of file naming.
Defaults:
maxFileSize:4194304bytesmaxFileCount:10000syncConcurrency:20lancedbPath:~/.mgrep/lancedbembedModel:Qwen/Qwen3-Embedding-4BembedDimensions:2560rerankModel:Qwen/Qwen3-Reranker-4BllmModel:MiniMaxAI/MiniMax-M2.5blockedPaths: empty by default
Environment Variables
Provider key:
DEEPINFRA_API_KEY
Store:
MGREP_STOREMGREP_LANCEDB_PATH
Search behavior:
MGREP_MAX_COUNTMGREP_CONTENTMGREP_ANSWERMGREP_AGENTICMGREP_AGENTMGREP_SYNCMGREP_DRY_RUNMGREP_RERANK
Sync behavior:
MGREP_MAX_FILE_SIZEMGREP_MAX_FILE_COUNTMGREP_SYNC_CONCURRENCY
Model overrides:
MGREP_EMBED_MODELMGREP_EMBED_DIMENSIONSMGREP_RERANK_MODELMGREP_LLM_MODEL
Search Behavior and Limits
mgrepis text-first and allowlist-first.- A file must match an allowed extension, exact basename, or exact hidden basename before it is eligible for indexing.
- Configured
blockedPathsare always excluded from indexing. - Hidden directories remain excluded by default, even if a file inside them would otherwise match the allowlist.
- Default indexed content spans config and structured text, developer artifacts, docs and notes, exact filenames, exact hidden basenames, markup and templates, Python and related files, queries and infra, shell and automation, and web or mixed-language repo text.
- Built-in deny patterns include
*.bin,*.lock,*.pt,*.pyc,*.safetensors, and*.sqlite. .gitignore,.mgrepignore, and configuredignorePatternsstill apply after allowlist admission.- Non-text and binary files are skipped even if they match the allowlist.
- For the exhaustive default inventory, see
guides/README.md. watchandsearch --syncrefuse to operate on the home directory or parent directories of it.- Sync is bounded by
maxFileSizeandmaxFileCount.
Output
Search results are printed as:
./path/to/file:line-start-line-end (score% match)With --content, chunk text is included below each result.
With --answer, mgrep prints the synthesized answer and the cited local source chunks it used.
Architecture
- Local storage: LanceDB under
~/.mgrep/lancedb/ - Retrieval: vector similarity + full-text search
- Fusion: reciprocal-rank fusion
- Reranking: DeepInfra
- Answer synthesis and agentic planning: DeepInfra chat completions
Development
pnpm install
pnpm build
pnpm test
pnpm format
pnpm typecheckThe built CLI entrypoint is dist/index.js.
Troubleshooting
- Missing API keys: run
mgrep validate - Sync blocked at home directory: run from a specific project subdirectory
- Inspect current indexing rules: run
mgrep rulesormgrep rules --json - Why a file may not be indexed: it may be outside the allowlist, inside a hidden directory, excluded by
blockedPaths,.gitignore,.mgrepignore, orignorePatterns, or rejected as binary or non-text - Store incompatibility after changing embedding settings: delete the affected store under
~/.mgrep/lancedb/<store-name>/and re-index - Slow initial indexing: lower
syncConcurrencyif you are rate-limited, or tune file limits for very large repos
License
Apache-2.0. See LICENSE.
