grepmax
v0.16.6
Published
Semantic code search for coding agents. Local embeddings, LLM summaries, call graph tracing.
Downloads
1,563
Readme
Natural-language search that works like grep. Fast, local, and built for coding agents.
- Semantic: Finds concepts ("where do transactions get created?"), not just strings.
- Call Graph Tracing: Map dependencies with
trace, find tests withtest, measure blast radius withimpact. - Role Detection: Distinguishes
ORCHESTRATION(high-level logic) fromDEFINITION(types/classes). - Local & Private: 100% local embeddings via ONNX (CPU) or MLX (Apple Silicon GPU).
- Centralized Index: One database at
~/.gmax/— index once, search from anywhere. - Agent-Ready:
--agentflag returns compact one-line output — ~90% fewer tokens than default.
Quick Start
npm install -g grepmax # 1. Install
cd my-repo && gmax add # 2. Add + index
gmax "where do we handle auth?" --agent # 3. SearchNo setup required — gmax auto-detects your platform (GPU on Apple Silicon, CPU elsewhere) and downloads models on first use.
Setup & Config
gmax setup # Interactive wizard (models, embedding mode, plugins)
gmax config # View current settings
gmax config --embed-mode gpu # Switch to GPU (Apple Silicon)
gmax doctor # Health check
gmax doctor --fix # Auto-repair (compact, prune, remove stale locks)Core Commands
gmax "where do we handle auth?" --agent # Semantic search (compact output)
gmax extract handleAuth # Full function body with line numbers
gmax peek handleAuth # Signature + callers + callees
gmax trace handleAuth -d 2 # Call graph (2-hop)
gmax skeleton src/lib/search/ # File structure (bodies collapsed)
gmax symbols auth # List indexed symbolsAnalysis Commands
gmax log src/lib/auth.ts # Git commit history for a path or symbol
gmax test handleAuth # Find tests via reverse call graph
gmax impact handleAuth # Dependents + affected tests
gmax similar handleAuth # Find similar code patterns
gmax context "auth system" --budget 4000 # Token-budgeted topic summaryProject Commands
gmax project # Languages, structure, key symbols
gmax related src/lib/auth.ts # Dependencies + dependents
gmax status # All indexed projects + chunk countsIn our public benchmarks, grepmax can save about 20% of your LLM tokens and deliver a 30% speedup.
Agent Plugins
gmax integrates with Claude Code, OpenCode, Codex, and Factory Droid. Install all detected clients at once:
gmax plugin add # Install all detected clients
gmax plugin # Show plugin status
gmax plugin remove # Remove all pluginsOr manage individually:
gmax plugin add claude # Claude Code only
gmax plugin add opencode # OpenCode only
gmax plugin add codex # Codex only
gmax plugin add droid # Factory Droid only
gmax plugin remove claude # Remove specific pluginPlugins auto-update when you run npm install -g grepmax@latest — no need to re-run gmax plugin add.
How it works per client
- Claude Code: Plugin with hooks (SessionStart, CwdChanged, SubagentStart, PreToolUse). Model uses CLI via
Bash(gmax ... --agent). - OpenCode: Tool shim with dynamic SKILL + session plugin for daemon startup. Model calls gmax tool directly.
- Codex: MCP server registration + AGENTS.md skill instructions.
- Factory Droid: Skills + SessionStart/SessionEnd hooks for daemon lifecycle.
MCP Server
gmax mcp starts a stdio-based MCP server for clients that support MCP but can't run shell commands (Cursor, Windsurf, custom agents).
| Tool | Description |
| --- | --- |
| semantic_search | Search by meaning. 16+ params: query, limit, role, language, scope (project/all), project filtering, etc. |
| code_skeleton | File structure with bodies collapsed (~4x fewer tokens). |
| trace_calls | Call graph: importers, callers (multi-hop), callees with file:line. |
| extract_symbol | Complete function/class body by symbol name. |
| peek_symbol | Compact overview: signature + callers + callees. |
| list_symbols | Indexed symbols with role and export status. |
| index_status | Index health: chunks, files, projects, watcher status. |
| summarize_project | Project overview: languages, structure, key symbols, entry points. |
| summarize_directory | Generate LLM summaries for indexed chunks. |
| related_files | Dependencies and dependents by shared symbols. |
| recent_changes | Recently modified indexed files. |
| diff_changes | Search scoped to git changes. |
| find_tests | Find tests via reverse call graph. |
| impact_analysis | Dependents + affected tests for a symbol or file. |
| find_similar | Vector similarity search. |
| build_context | Token-budgeted topic summary. |
| investigate | Agentic codebase Q&A using local LLM + gmax tools. |
| review_commit | Review a git commit for bugs, security issues, and breaking changes. |
| review_report | Get accumulated code review findings for the current project. |
Search Options
gmax "query" [options]| Flag | Description | Default |
| --- | --- | --- |
| --agent | Compact one-line output for AI agents. | false |
| -m <n> | Max results. | 5 |
| --per-file <n> | Max matches per file. | 3 |
| --role <role> | Filter: ORCHESTRATION, DEFINITION, IMPLEMENTATION. | — |
| --lang <ext> | Filter by extension (e.g. ts, py). | — |
| --file <name> | Filter by filename. | — |
| --exclude <prefix> | Exclude path prefix. | — |
| --symbol | Append call graph after results. | false |
| --imports | Prepend file imports per result. | false |
| --name <regex> | Filter by symbol name. | — |
| --skeleton | Show file skeletons for top matches. | false |
| --context-for-llm | Full function bodies + imports per result. | false |
| --budget <tokens> | Cap output tokens (for --context-for-llm). | 8000 |
| --explain | Show scoring breakdown per result. | false |
| -C <n> | Context lines before/after. | 0 |
| --root <dir> | Search a different project. | cwd |
| --min-score <n> | Minimum relevance score. | 0 |
Background Daemon
A single daemon watches all registered projects via native OS file events (FSEvents/inotify). Changes are detected in sub-second and incrementally reindexed. All writes to LanceDB are routed through the daemon via IPC, eliminating lock contention.
gmax watch --daemon -b # Start daemon manually
gmax watch stop # Stop daemon
gmax status # See all projects + watcher statusThe daemon auto-starts when you run gmax add, gmax index, gmax remove, or gmax summarize. It shuts down after 30 minutes of inactivity.
Local LLM (optional)
gmax can use a local LLM (via llama-server) for agentic codebase investigation. This is entirely opt-in and disabled by default — gmax works fine without it.
gmax llm on # Enable LLM features (persists to config)
gmax llm start # Start llama-server (auto-starts daemon too)
gmax llm status # Check server status
gmax llm stop # Stop llama-server
gmax llm off # Disable LLM + stop serverInvestigate
Ask questions about your codebase — the LLM autonomously uses gmax tools (search, trace, peek, impact, related) to gather evidence and synthesize an answer.
gmax investigate "how does authentication work?"
gmax investigate "what would break if I changed VectorDB?" -v
gmax investigate "where are API routes defined?" --root ~/projectReview
Automatic code review on git commits. Extracts the diff, gathers codebase context (callers, dependents, related files), and prompts the LLM for structured findings.
gmax review # Review HEAD
gmax review --commit abc1234 # Review specific commit
gmax review --commit HEAD~3 -v # Verbose — shows context gathering + LLM progress
gmax review report # Show accumulated findings
gmax review report --json # Raw JSON output
gmax review clear # Clear reportPost-commit hook
Install a git hook that automatically reviews every commit in the background via the daemon:
gmax review install # Install in current repo
gmax review install ~/other-repo # Install in another repoThe hook sends an IPC message to the daemon and returns instantly — it never blocks git commit. Findings accumulate in the report.
LLM Configuration
| Variable | Description | Default |
| --- | --- | --- |
| GMAX_LLM_MODEL | Path to GGUF model file | (none) |
| GMAX_LLM_BINARY | llama-server binary | llama-server |
| GMAX_LLM_PORT | Server port | 8079 |
| GMAX_LLM_IDLE_TIMEOUT | Minutes before auto-stop | 30 |
Architecture
All data lives in ~/.gmax/:
lancedb/— LanceDB vector store (centralized, all projects)cache/meta.lmdb— file metadata cache (hashes, mtimes)cache/watchers.lmdb— watcher/daemon registry (LMDB, crash-safe)daemon.sock— Unix domain socket for daemon IPCdaemon.pid— PID file for daemon deduplogs/— daemon and server logs (5MB rotation)config.json— global config (model tier, embed mode)models/— embedding modelsgrammars/— Tree-sitter grammarsprojects.json— registry of indexed directories
Pipeline: Walk (gitignore-aware) → Chunk (Tree-sitter) → Embed (384-dim Granite via ONNX/MLX) → Store (LanceDB + LMDB) → Search (vector + FTS + RRF fusion + ColBERT rerank)
Supported Languages: TypeScript, JavaScript, Python, Go, Rust, Java, C#, C++, C, Ruby, PHP, Swift, Kotlin, JSON, YAML, Markdown, SQL, Shell.
Configuration
// ~/.gmax/config.json
{
"modelTier": "small",
"vectorDim": 384,
"embedMode": "gpu"
}Ignoring Files
gmax respects .gitignore and .gmaxignore:
# .gmaxignore
docs/generated/
*.test.ts
fixtures/Environment Variables
| Variable | Description | Default |
| --- | --- | --- |
| GMAX_EMBED_MODE | Force cpu or gpu | Auto-detect |
| GMAX_WORKER_THREADS | Worker threads for embedding | 50% of cores |
| GMAX_DEBUG | Debug logging | Off |
| GMAX_SUMMARIZER | Enable summarizer auto-start (1) | Off |
Troubleshooting
gmax doctor # Check health
gmax doctor --fix # Auto-repair (compact, prune, fix locks)
gmax doctor --agent # Machine-readable health output
gmax index # Reindex (auto-detects and repairs cache/vector mismatches)
gmax index --reset # Full reindex from scratch
gmax watch stop && gmax watch --daemon -b # Restart daemonContributing
See CLAUDE.md for development setup, commands, and architecture details.
Attribution
grepmax is built upon the foundation of mgrep by MixedBread. See the NOTICE file for details.
License
Licensed under the Apache License, Version 2.0. See LICENSE.
