@masuidrive/bloom-local-rag
v0.3.3
Published
RAG (Retrieval-Augmented Generation) system for local directories - Index and search your documents with AI-powered answers
Maintainers
Readme
@masuidrive/bloom-local-rag
RAG (Retrieval-Augmented Generation) system for local directories. This tool enables semantic search across your local documents with AI-powered answers, without requiring a daemon process.
What is bloom-local-rag?
bloom-local-rag is a command-line tool that brings the power of RAG to your local files. It creates a vector database from your documents and uses Large Language Models (LLMs) to provide accurate, context-aware answers based on your actual content.
Key Features
- 🔍 Semantic Search: Find information based on meaning, not just keywords
- 🤖 AI-Powered Answers: Get contextual answers generated from your documents
- 📁 Multiple File Types: Supports Markdown, code files (JS/TS), YAML, and more
- 🔄 Smart Indexing: Automatically updates index when files change
- 🚀 No Daemon Required: Runs on-demand without background processes
- 💾 Efficient Storage: Uses LanceDB for fast vector operations
- 🌐 Multi-Provider Support: Works with both Google Gemini and OpenAI
Installation
No installation required! Use directly with npx:
npx @masuidrive/bloom-local-ragOr install globally:
npm install -g @masuidrive/bloom-local-ragQuick Start
1. Set up your API key
For Google Gemini (recommended):
export GOOGLE_API_KEY=your-api-key
# or
export GEMINI_API_KEY=your-api-keyFor OpenAI:
export OPENAI_API_KEY=your-api-key2. Initialize your directory
npx @masuidrive/bloom-local-rag --initThis creates a .bloom-local-rag directory with the vector database.
3. Search your documents
npx @masuidrive/bloom-local-rag "how do I authenticate users?"
# Search a specific directory
npx @masuidrive/bloom-local-rag "how do I authenticate users?" --dir /path/to/docs
# Or use --directory
npx @masuidrive/bloom-local-rag --directory /path/to/docs "how do I authenticate users?"Detailed Usage
Initialize Command
The --init option scans your directory and creates a searchable index:
npx @masuidrive/bloom-local-rag --init [options]Options:
-d, --dir, --directory <path>: Directory to initialize (default: current directory)-e, --extensions <exts...>: File extensions to index (default: .md, .mdx, .txt, .js, .ts, .jsx, .tsx, .yaml, .yml)--chunk-size <size>: Text chunk size for indexing (default: 1000)--chunk-overlap <size>: Overlap between chunks (default: 200)--embedding-provider <provider>: Choose between 'gemini' or 'openai' (default: gemini)--embedding-model <model>: Specific embedding model to use--llm-provider <provider>: LLM provider for answers (default: gemini)--llm-model <model>: Specific LLM model to use--exclude <patterns...>: Additional patterns to exclude from indexing
Example:
# Initialize current directory
npx @masuidrive/bloom-local-rag --init
# Initialize a specific directory
npx @masuidrive/bloom-local-rag --init --dir ./docs
# Index only markdown and TypeScript files
npx @masuidrive/bloom-local-rag --init --extensions .md .ts
# Use OpenAI for both embeddings and answers
npx @masuidrive/bloom-local-rag --init --embedding-provider openai --llm-provider openaiQuery Command (Default Mode)
Search your indexed documents and get AI-powered answers:
npx @masuidrive/bloom-local-rag "your question here" [options]Options:
-d, --dir, --directory <path>: Directory to search (default: current directory)-l, --limit <n>: Number of source documents to retrieve (default: 5)--no-context: Skip AI answer generation, show only source documents--json: Output results in JSON format--temperature <value>: Control creativity of AI answers (0-2, default: 0.7)-v, --verbose: Show detailed information including sources
Examples:
# Simple query
npx @masuidrive/bloom-local-rag "how to handle errors in async functions"
# Get more source documents
npx @masuidrive/bloom-local-rag "database schema design" --limit 10
# Get only relevant documents without AI summary
npx @masuidrive/bloom-local-rag "API endpoints" --no-context
# Get JSON output for integration with other tools
npx @masuidrive/bloom-local-rag "user authentication" --json
# Search in a different directory
npx @masuidrive/bloom-local-rag "deployment process" --dir ../other-project
# Directory option can come before or after the query
npx @masuidrive/bloom-local-rag --dir ../docs "deployment process"Reindex Command
Manually update the index (though this happens automatically during queries):
npx @masuidrive/bloom-local-rag --reindex [options]Options:
-d, --dir, --directory <path>: Directory to reindex (default: current directory)--force: Force reindex all files, ignoring cache-v, --verbose: Show detailed information
Status Command
Check the status of your indexed documents:
npx @masuidrive/bloom-local-rag --status [options]Options:
-d, --dir, --directory <path>: Directory to check status (default: current directory)
Shows:
- Configuration details
- Number of indexed files and chunks
- Last index update time
- Storage usage
How It Works
- Indexing: bloom-local-rag scans your directory and splits documents into chunks
- Embedding: Each chunk is converted to a vector embedding using AI models
- Storage: Vectors are stored in a local LanceDB database
- Search: Your query is converted to a vector and compared with stored vectors
- Context: Most relevant chunks are retrieved as context
- Answer: An LLM generates an answer based on the retrieved context
File Type Support
By default, bloom-local-rag indexes:
- Documentation:
.md,.mdx,.txt - Code:
.js,.ts,.jsx,.tsx - Configuration:
.yaml,.yml
Special handling:
- Markdown files: Frontmatter is extracted as metadata
- YAML files: Parsed for structured data
- .gitignore: Respected for file exclusion
Best Practices
- Choose the Right Files: Focus on documentation, well-commented code, and configuration files
- Chunk Size: Larger chunks (2000) for narrative docs, smaller (500) for code
- Exclusions: Exclude generated files, build outputs, and dependencies
- API Keys: Use environment variables, never commit keys to version control
- Regular Updates: Run queries regularly - the index updates automatically
Configuration
The .bloom-local-rag/config.json file stores your settings:
{
"version": "1.0",
"directory": "/path/to/your/project",
"extensions": [".md", ".js", ".ts"],
"embedding": {
"provider": "gemini",
"model": "text-embedding-004",
"chunkSize": 1000,
"chunkOverlap": 200
},
"llm": {
"provider": "gemini",
"model": "gemini-2.0-flash-exp",
"temperature": 0.7
}
}Troubleshooting
API Key Issues
Error: Gemini API key not foundSolution: Set the appropriate environment variable:
- For Gemini:
export GOOGLE_API_KEY=your-key - For OpenAI:
export OPENAI_API_KEY=your-key
Directory Not Initialized
Error: Directory not initialized. Run "init" command first.Solution: Run npx @masuidrive/bloom-local-rag init in your project directory
No Results Found
- Check if files match the configured extensions
- Verify files aren't excluded by .gitignore
- Try broader search terms
- Increase the
--limitparameter
Privacy & Security
- Local Processing: All data stays on your machine
- No Telemetry: We don't collect any usage data
- API Calls: Only your queries and relevant chunks are sent to the LLM provider
- Gitignore: Sensitive files excluded by default
Requirements
- Node.js v20 or higher
- API key for Google Gemini or OpenAI
License
MIT
Important Note: When modifying this README, please also update the Japanese translation in README.ja.md to maintain consistency across both versions.
