npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@wb200/mgrep

v0.1.18

Published

Local semantic code search with LanceDB indexing and DeepInfra-powered retrieval

Downloads

16

Readme

Why mgrep?

  • Ask your repo questions in natural language instead of guessing exact symbols.
  • Keep a local LanceDB index on disk under ~/.mgrep/lancedb/.
  • Combine vector retrieval, full-text search, reranking, and optional answer synthesis.
  • Work directly in the CLI or wire it into coding agents.
  • Use it as a hybrid semantic complement to rg, grep, and ast-grep, not as a replacement for exact or structural search.

mgrep is for local repository search. It does not do web search in this fork.

# index a project
mgrep watch

# search semantically
mgrep "where do we set up auth?"

# synthesize an answer from retrieved local results
mgrep -a "how does the sync pipeline work?"

Quick Start

  1. Install

    npm install -g @wb200/mgrep
  2. Set required API key

    export DEEPINFRA_API_KEY=your_deepinfra_key
    • DEEPINFRA_API_KEY is used for embeddings, reranking, synthesized answers, and agentic query planning.
    • It is required for normal use in this fork.
  3. Validate configuration

    mgrep validate
  4. Know where config lives

    • Project-local: .mgreprc.yaml or .mgreprc.yml in the directory you are indexing/searching from
    • Global: ~/.config/mgrep/config.yaml or ~/.config/mgrep/config.yml
  5. Index a project

    cd path/to/repo
    mgrep watch
  6. Inspect the effective indexing rules

    mgrep rules
  7. Search

    mgrep "where do we set up auth?"
    mgrep -m 25 "store schema"
    mgrep -a "how is rate limiting implemented?"

What It Does

mgrep keeps a local searchable index of your repository.

  • Indexing is allowlist-first. A file must match an allowed extension, exact basename, or exact hidden basename before it is eligible for indexing.
  • Configured blockedPaths can exclude path prefixes regardless of filename.
  • After allowlist admission, .gitignore, .mgrepignore, built-in deny patterns, hidden-directory blocking, and text-file detection can still exclude it.
  • Indexed content is chunked and stored locally in LanceDB.
  • Embeddings, reranking, answer synthesis, and agentic planning are done through DeepInfra.

This means the index itself is local, but text chunks are sent to DeepInfra during embedding, reranking, and answer-generation flows.

Search Strategy

mgrep works best as the semantic layer in a local-search toolkit:

  • Use mgrep for intent-based discovery, architecture questions, and unfamiliar codebases.
  • Use rg or grep for exact strings, regexes, and exhaustive audits.
  • Use ast-grep for syntax-aware structural matches and refactor prep.

A common workflow is to use mgrep first to find candidate files or concepts, then confirm exact implementation details with rg or ast-grep.

Commands

Top-level commands:

  • mgrep or mgrep search <pattern> [path]
  • mgrep rules [path]
  • mgrep watch
  • mgrep validate
  • mgrep install-claude-code
  • mgrep uninstall-claude-code
  • mgrep install-codex
  • mgrep uninstall-codex
  • mgrep install-opencode
  • mgrep uninstall-opencode
  • mgrep install-droid
  • mgrep uninstall-droid
  • mgrep mcp

Global options:

  • --store <string>: logical store name to use, default mgrep

Understanding Stores

The --store flag controls which logical index mgrep reads from and writes to.

This is one of the most important things to understand when using mgrep across multiple folders.

The default store name

If you do not pass --store, mgrep always uses the store named:

mgrep

That means these are equivalent:

mgrep watch
mgrep --store mgrep watch

and:

mgrep "query"
mgrep --store mgrep "query"

You can also change the default for a shell session with:

export MGREP_STORE=my-store

After that, plain mgrep ... commands in that shell use my-store unless you override them with --store.

One command searches exactly one store

mgrep does not search all stores automatically.

Each command uses exactly one logical store:

  • --store some-name
  • or MGREP_STORE
  • or the built-in default mgrep

So if you indexed a folder with:

mgrep --store factory-specs watch

and later run:

mgrep "query"

you are not searching factory-specs. You are searching the default store mgrep.

To search the store you indexed, you must use:

mgrep --store factory-specs "query"

or:

export MGREP_STORE=factory-specs
mgrep "query"

What watch indexes

mgrep watch indexes:

  • the current working directory
  • all subdirectories under it
  • only files that match the current allowlist, then survive .gitignore, .mgrepignore, built-in deny patterns, hidden-directory blocking, and text-file detection

So this:

cd /path/to/project
mgrep --store my-project watch

indexes /path/to/project and all of its eligible subfolders into store my-project.

Stores are additive across multiple folders

If you run watch in two different folders with the same store name, the index is additive.

Example:

cd /path/project-a
mgrep --store shared watch

cd /path/project-b
mgrep --store shared watch

After that, store shared contains indexed content for both:

  • /path/project-a/...
  • /path/project-b/...

The second watch does not wipe the first one.

Deletions are scoped to the watched folder

When watch or search --sync removes stale entries, it only deletes files inside the current folder subtree being synced.

That means if project-a and project-b both live in store shared:

  • syncing project-a can remove stale entries from project-a
  • syncing project-a does not remove project-b

This is what makes additive multi-root stores possible.

Search is store-scoped and path-scoped

Search always works in two layers:

  1. it selects a single store
  2. it filters results to the current directory or the path argument you pass

So if you are inside project-a and search against a shared store, mgrep still scopes results to your current path by default.

Examples:

cd /path/project-a
mgrep --store shared "auth middleware"

searches store shared, but only for content under /path/project-a/....

And:

mgrep --store shared "auth middleware" /path/project-b

searches the same shared store, but only under /path/project-b/....

Recommended usage patterns

One store per project is the easiest model to reason about:

cd ~/code/project-a
mgrep --store project-a watch

cd ~/code/project-b
mgrep --store project-b watch

Then search with the matching store name:

mgrep --store project-a "query"
mgrep --store project-b "query"

Shared store across multiple roots is also supported if you want it intentionally:

cd ~/notes
mgrep --store personal watch

cd ~/specs
mgrep --store personal watch

This combines both roots into one logical store named personal.

Practical rule of thumb

If you are unsure, use this rule:

  • for one project: default mgrep is fine
  • for multiple unrelated projects: give each project its own --store
  • if you want one combined multi-root index: reuse the same --store deliberately

mgrep search

mgrep search is the default command. It searches the current directory unless you pass a path.

Arguments:

  • <pattern>: natural-language query
  • [path]: optional search root or scoped path

Options:

  • -m, --max-count <max_count>: maximum number of results, default 10
  • -c, --content: include matched chunk content in output
  • -a, --answer: synthesize an answer from retrieved local results
  • -s, --sync: sync files before searching
  • -d, --dry-run: preview sync work without uploading or deleting
  • --no-rerank: disable reranking
  • --max-file-size <bytes>: override upload size limit for sync
  • --max-file-count <count>: override sync file-count limit
  • --agentic: enable multi-query planning before retrieval

Examples:

mgrep "Where is the auth middleware configured?"
mgrep "How are chunks defined?" src/lib
mgrep -m 5 "maximum concurrent workers"
mgrep -c "How does caching work?"
mgrep -a "How is rate limiting implemented?"
mgrep --agentic -a "How does authentication work and where is it configured?"
mgrep --sync "Where is the API server started?"
mgrep --sync --dry-run "search query"

mgrep watch

mgrep watch performs an initial sync, then keeps the current project directory in sync via file watching.

Options:

  • -d, --dry-run: preview what would be uploaded or deleted
  • --max-file-size <bytes>: override upload size limit
  • --max-file-count <count>: override sync file-count limit

Examples:

mgrep watch
mgrep watch --dry-run
mgrep watch --max-file-size 1048576
mgrep watch --max-file-count 5000

mgrep rules

mgrep rules shows the effective allow/block indexing logic for the current directory after merging defaults, global config, and local config.

Arguments:

  • [path]: optional directory or file path to inspect

Options:

  • --json: emit the effective rules as JSON

Examples:

mgrep rules
mgrep rules --json
mgrep rules src

Use this when you want to confirm:

  • the effective allowlists for extensions, exact names, and dotfiles
  • the effective ignorePatterns and blockedPaths
  • which local and global config files are currently being applied

mgrep validate

Validates the DeepInfra configuration by exercising embeddings, rerank, and chat completions.

mgrep validate

Agent Integration Commands

mgrep includes helper installers for several agent environments:

  • mgrep install-claude-code
  • mgrep uninstall-claude-code
  • mgrep install-codex
  • mgrep uninstall-codex
  • mgrep install-opencode
  • mgrep uninstall-opencode
  • mgrep install-droid
  • mgrep uninstall-droid

These integrations are focused on local search plus background indexing. After installation, mgrep warns that background sync will run automatically for supported agent flows.

mgrep mcp

Starts the internal MCP server process used by some integrations.

This command is not needed for normal CLI use.

Configuration

Configuration sources, highest precedence first:

  1. CLI flags
  2. Environment variables
  3. Local config file: .mgreprc.yaml or .mgreprc.yml
  4. Global config file: ~/.config/mgrep/config.yaml or ~/.config/mgrep/config.yml
  5. Built-in defaults

Config Locations

  • Project-local: .mgreprc.yaml or .mgreprc.yml in the directory you are indexing/searching from
  • Global: ~/.config/mgrep/config.yaml or ~/.config/mgrep/config.yml

Use the project-local file when you want rules that only apply to one repo or workspace. Use the global file when you want defaults applied across projects.

Config File

Example override:

This example narrows indexing further than the built-in defaults. For the full current default allowlist and a full ready-to-paste .mgreprc.yaml, see guides/README.md.

maxFileSize: 5242880
maxFileCount: 5000
syncConcurrency: 10
blockedPaths:
  - private
  - ~/scratch/generated-docs
ignorePatterns:
  - "*.csv"
  - "*.jsonl"

blockedPaths is for path-prefix exclusions that should always be skipped. Entries may be:

  • relative paths, resolved relative to the config file that defines them
  • absolute paths
  • ~/... paths expanded against the current home directory

Use ignorePatterns for glob-style filename/path filtering after allowlist admission. Use blockedPaths when you want to exclude a directory subtree or specific path prefix regardless of file naming.

Defaults:

  • maxFileSize: 4194304 bytes
  • maxFileCount: 10000
  • syncConcurrency: 20
  • lancedbPath: ~/.mgrep/lancedb
  • embedModel: Qwen/Qwen3-Embedding-4B
  • embedDimensions: 2560
  • rerankModel: Qwen/Qwen3-Reranker-4B
  • llmModel: MiniMaxAI/MiniMax-M2.5
  • blockedPaths: empty by default

Environment Variables

Provider key:

  • DEEPINFRA_API_KEY

Store:

  • MGREP_STORE
  • MGREP_LANCEDB_PATH

Search behavior:

  • MGREP_MAX_COUNT
  • MGREP_CONTENT
  • MGREP_ANSWER
  • MGREP_AGENTIC
  • MGREP_AGENT
  • MGREP_SYNC
  • MGREP_DRY_RUN
  • MGREP_RERANK

Sync behavior:

  • MGREP_MAX_FILE_SIZE
  • MGREP_MAX_FILE_COUNT
  • MGREP_SYNC_CONCURRENCY

Model overrides:

  • MGREP_EMBED_MODEL
  • MGREP_EMBED_DIMENSIONS
  • MGREP_RERANK_MODEL
  • MGREP_LLM_MODEL

Search Behavior and Limits

  • mgrep is text-first and allowlist-first.
  • A file must match an allowed extension, exact basename, or exact hidden basename before it is eligible for indexing.
  • Configured blockedPaths are always excluded from indexing.
  • Hidden directories remain excluded by default, even if a file inside them would otherwise match the allowlist.
  • Default indexed content spans config and structured text, developer artifacts, docs and notes, exact filenames, exact hidden basenames, markup and templates, Python and related files, queries and infra, shell and automation, and web or mixed-language repo text.
  • Built-in deny patterns include *.bin, *.lock, *.pt, *.pyc, *.safetensors, and *.sqlite.
  • .gitignore, .mgrepignore, and configured ignorePatterns still apply after allowlist admission.
  • Non-text and binary files are skipped even if they match the allowlist.
  • For the exhaustive default inventory, see guides/README.md.
  • watch and search --sync refuse to operate on the home directory or parent directories of it.
  • Sync is bounded by maxFileSize and maxFileCount.

Output

Search results are printed as:

./path/to/file:line-start-line-end (score% match)

With --content, chunk text is included below each result.

With --answer, mgrep prints the synthesized answer and the cited local source chunks it used.

Architecture

  • Local storage: LanceDB under ~/.mgrep/lancedb/
  • Retrieval: vector similarity + full-text search
  • Fusion: reciprocal-rank fusion
  • Reranking: DeepInfra
  • Answer synthesis and agentic planning: DeepInfra chat completions

Development

pnpm install
pnpm build
pnpm test
pnpm format
pnpm typecheck

The built CLI entrypoint is dist/index.js.

Troubleshooting

  • Missing API keys: run mgrep validate
  • Sync blocked at home directory: run from a specific project subdirectory
  • Inspect current indexing rules: run mgrep rules or mgrep rules --json
  • Why a file may not be indexed: it may be outside the allowlist, inside a hidden directory, excluded by blockedPaths, .gitignore, .mgrepignore, or ignorePatterns, or rejected as binary or non-text
  • Store incompatibility after changing embedding settings: delete the affected store under ~/.mgrep/lancedb/<store-name>/ and re-index
  • Slow initial indexing: lower syncConcurrency if you are rate-limited, or tune file limits for very large repos

License

Apache-2.0. See LICENSE.