@nesszerra/memory-search
v1.0.0
Published
Hybrid BM25 + vector search CLI for personal knowledge bases. Cross-encoder reranking, SQLite + sqlite-vec.
Maintainers
Readme
memory-search
Semantic search CLI for your personal knowledge base. Hybrid BM25 + vector search with cross-encoder reranking over markdown files, notes, session logs, and docs.
Features
- Hybrid search — BM25 keyword matching + vector cosine similarity with RRF fusion
- Cross-encoder reranking — bge-reranker-base scores query-document pairs for better relevance
- Chunked indexing — markdown-aware splitting with heading context preserved
- Contextual retrieval — optional LLM-generated context prefixes per chunk for better search (Groq, Cloudflare AI, or any OpenAI-compatible endpoint)
- Query expansion — optional LLM-powered query rewriting + HyDE for better recall
- Collections — organize sources into named groups for filtered search
- Facts store — key-value pairs for hard facts (preferences, configs, decisions)
- Context builder — generate injectable context blocks for LLM prompts
- Date/path filters —
--after 7d,--before 2025-06-01,--path src/ - Export/import — backup and restore your entire database
- SQLite + sqlite-vec — single-file database, no external vector DB needed
Quick start
git clone https://github.com/Finesssee/memory-search.git
cd memory-search
pnpm install && pnpm build
# Deploy the embedding worker (requires Cloudflare account)
cd workers/embed-api
cp wrangler.toml.example wrangler.toml
# Edit wrangler.toml — add your Cloudflare account_id
pnpm install && npx wrangler deploy
# Configure
# Create ~/.memory-search/config.json:
# { "sources": ["/path/to/your/notes"], "embeddingEndpoint": "https://your-worker.workers.dev/embedding" }
# Index and search
memory index
memory search "how does authentication work"Usage
# Search
memory search "deploy steps" --compact # JSON for LLM consumption
memory search "pricing" --explain # Score breakdown per result
memory search "hooks" --collection skills # Filter by collection
memory search "config" --expand # LLM query expansion
memory search "bug" --after 7d # Last 7 days only
memory search "auth" --path src/ # Filter by file path
# Index
memory index # Index new/changed files
memory index --force # Re-embed everything
memory index --prune # Remove deleted files
memory index --contextualize # Add LLM context prefixes
# Facts
memory facts set "project.stack" "TypeScript" # Store a fact
memory facts get "project.*" # Query facts
# Context (for agents)
memory context build "deploy" --tokens 1000 # Build injectable context block
# Other
memory status # Index stats
memory doctor # Diagnose connectivity
memory export -o backup.json # Backup
memory import backup.json # RestoreDocs
- Configuration — config fields, env vars, collections, path contexts
- Embedding API — endpoint contracts, Cloudflare deploy, local server setup
- Agent Integration — Claude Code skill, key commands, facts, privacy tags
- Best Practices — chunking, indexing, searching, reranking tips
- Architecture — pipeline diagram, design decisions, references
- Benchmarking — eval framework, metrics, custom benchmarks
Architecture
Index: File → Chunker → Contextualizer (optional LLM) → BGE embed → SQLite
Query: Query → Expander (optional) → BGE embed → BM25 + Vector (parallel)
→ RRF fusion → Cross-encoder reranker → ResultsProviders
- Cloudflare Workers AI (included) — embeddings, reranking, and chat via the included worker (
workers/embed-api/). Free tier eligible. - Groq — fast LLM inference for contextual retrieval (
--contextualize). Configure as acontextLlmEndpointsslot. - Any OpenAI-compatible endpoint — the contextualizer and query expander work with any chat completions API.
See docs/architecture.md for details and design decisions.
References
License
MIT
