lmgrep
v0.1.17
Published
Semantic code search with any AI embedding provider
Maintainers
Readme
lmgrep
Semantic code search powered by AI embeddings. Index your codebase with any embedding provider and search it using natural language.
lmgrep uses Tree-sitter to parse source code into meaningful chunks (functions, classes, interfaces, etc.), embeds them with the AI model of your choice, and stores the vectors in a local LanceDB database. Queries are matched by semantic similarity, so you find code by intent rather than exact strings.
Features
- Any embedding provider — works with Ollama, OpenAI, Google, or any provider supported by the Vercel AI SDK
- Tree-sitter chunking — splits code at AST boundaries so search results are complete, meaningful units
- MCP server — built-in MCP server (
lmgrep mcp) for integration with Claude Code, Cursor, and other AI tools - File watching —
lmgrep servewatches for changes and incrementally re-indexes - P2P sharing — share your index with teammates via direct peer-to-peer transfer
- Cross-project search — search across multiple indexed projects
- Git-aware — respects
.gitignore, deduplicates across worktrees sharing the same remote - Configurable — global or per-project config, custom ignore patterns, extension filtering
Quick start
1. Install
pnpm install -g lmgrep2. Set up an embedding model
The fastest way to get started is with Ollama:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull an embedding model
ollama pull nomic-embed-text
# Auto-detect and write config
lmgrep initThis creates a config file at ~/.config/lmgrep/config.yml (Linux) or ~/Library/Application Support/lmgrep/config.yml (macOS).
3. Index your project
cd /path/to/your/project
lmgrep index4. Search
lmgrep search "how are users authenticated"
lmgrep search "database connection pooling" --limit 5
lmgrep search "error handling" --file-prefix src/lib --language .tsCLI commands
| Command | Description |
|---|---|
| lmgrep index | Index the current directory |
| lmgrep search <query> | Search using natural language |
| lmgrep status | Show index stats, embedding connectivity, and running processes |
| lmgrep serve | Watch for changes and re-index automatically |
| lmgrep mcp | Start the MCP server (stdio transport) |
| lmgrep init | Detect embedding setup and create config |
| lmgrep config | Open the global config in your editor |
| lmgrep repair | Detect and fix index inconsistencies |
| lmgrep migrate | Rename existing index directories to match the current slug scheme |
| lmgrep compact | Compact the index to reclaim disk space |
| lmgrep export | Share this project's index with a peer via P2P |
| lmgrep import [source] | Import from a peer (share code) or local database |
| lmgrep prune | Delete the index for the current directory |
| lmgrep completions zsh | Output or install zsh completions |
Search options
--limit <n> Max results (default: 25)
--file-prefix <path> Only search files under this path
--language <exts> Filter by file extension (e.g. .ts,.py)
--type <types> Filter by AST node type (e.g. function_declaration)
--not <query> Exclude results similar to this query
--scores Show relevance scores
--compact Show file paths only
--json Output as JSON
--project <path> Search a different project's index
--across <paths> Search multiple projects (comma-separated)Index options
--reset Rebuild the entire index from scratch
--since <dur> Only re-index files modified within duration (e.g. 10m, 2h, 1d)
--force Force re-embed even if file hash is unchanged
--dry Show what would be indexed without doing it
--verbose Show file-by-file progressP2P index sharing
Share your index with a teammate without any server or infrastructure. Uses Hyperswarm for direct encrypted peer-to-peer transfer with NAT hole punching.
# On your machine — start sharing
lmgrep export
# → Share code: lmgrepoceantiger7f3a
# → Waiting for peer...
# On their machine — receive the index
lmgrep import lmgrepoceantiger7f3a
# → Connecting to peer...
# → Receiving: 4823/4823 chunks
# → Imported 4823 chunks and 312 file hashes from peer.Requires hyperswarm to be installed (pnpm add hyperswarm). It's an optional dependency — lmgrep works fine without it.
MCP server
lmgrep includes an MCP server for use with AI coding assistants. When launched with no arguments over piped stdio (as MCP clients do), it automatically starts in MCP mode. Just add it to your tool's MCP configuration:
{
"mcpServers": {
"lmgrep": {
"command": "lmgrep"
}
}
}You can also start it explicitly with lmgrep mcp.
Claude Code
# If lmgrep is installed globally
claude mcp add lmgrep -s user -- lmgrep mcp
# Or without a global install
claude mcp add lmgrep -s user -- npx -y lmgrep mcpCodex CLI
# If lmgrep is installed globally
codex mcp add lmgrep -- lmgrep mcp
# Or without a global install
codex mcp add lmgrep -- npx -y lmgrep mcpGemini CLI
# If lmgrep is installed globally
gemini mcp add lmgrep -- lmgrep mcp
# Or without a global install
gemini mcp add lmgrep -- npx -y lmgrep mcpPi coding agent
Pi doesn't speak MCP — it uses TypeScript extensions instead. lmgrep ships one at pi-extension/ that registers two tools: lmgrep_search and lmgrep_list_other_indexed_projects. It imports lmgrep directly, runs an in-process file watcher to keep the index fresh, and gates tool visibility on embedder health — if lmgrep isn't configured, or the embedding provider is unreachable, the tools stay hidden so you get a clean tool surface instead of a broken one. Configure lmgrep first (lmgrep init) before relying on it inside Pi.
Install via Pi's package manager:
pi install git:github.com/Aetherall/lmgrepUpdate with pi update, remove with pi remove git:github.com/Aetherall/lmgrep, and list installed extensions with pi list.
OpenCode
OpenCode has no one-shot install flag — add an entry to ~/.config/opencode/opencode.json (or project-level opencode.json):
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"lmgrep": {
"type": "local",
// If lmgrep is installed globally
"command": ["lmgrep", "mcp"],
// Or without a global install
// "command": ["npx", "-y", "lmgrep", "mcp"],
"enabled": true
}
}
}The MCP server exposes a search tool and a list_other_indexed_projects tool. It automatically watches for file changes and keeps the index up to date.
Configuration
lmgrep looks for configuration in this order:
.lmgrep.ymlin the project root (per-project)~/.config/lmgrep/config.yml(global, Linux) or~/Library/Application Support/lmgrep/config.yml(macOS)~/.lmgrep.yml(legacy fallback)
Example config
# Embedding model in "provider:model" format
model: ollama:nomic-embed-text
# Base URL for the embedding API
baseURL: http://localhost:11434/v1
# Batch size for embedding API calls
batchSize: 100
# Optional: embedding dimensions (if model supports it)
# dimensions: 384
# Optional: max tokens per chunk (estimated at 4 chars/token)
# maxTokens: 8192
# Optional: prefixes for asymmetric embedding models
# queryPrefix: "search_query: "
# documentPrefix: "search_document: "
# Optional: additional ignore patterns (merged with .gitignore)
# ignore:
# - "*.generated.ts"
# - "fixtures/"
# Optional: file extension control
# extensions:
# include: [".sql", ".graphql", ".proto"]
# exclude: [".json"]Using other providers
Install the provider package globally and set the model accordingly:
# OpenAI
npm install -g @ai-sdk/openai
# then in config: model: openai:text-embedding-3-small
# Google
npm install -g @ai-sdk/google
# then in config: model: google:text-embedding-004Development
pnpm install
pnpm build # compile TypeScript
pnpm dev # watch mode
pnpm check # format and lint (Biome)License
GPL-3.0
