@masuidrive/bloom-local-rag

v0.3.3

Published

8 months ago

RAG (Retrieval-Augmented Generation) system for local directories - Index and search your documents with AI-powered answers

Downloads

0High
0Medium
0Low

masuidrive

rag search cli vector langchain

@masuidrive/bloom-local-rag

🇯🇵 Japanese version

RAG (Retrieval-Augmented Generation) system for local directories. This tool enables semantic search across your local documents with AI-powered answers, without requiring a daemon process.

What is bloom-local-rag?

bloom-local-rag is a command-line tool that brings the power of RAG to your local files. It creates a vector database from your documents and uses Large Language Models (LLMs) to provide accurate, context-aware answers based on your actual content.

Key Features

🔍 Semantic Search: Find information based on meaning, not just keywords
🤖 AI-Powered Answers: Get contextual answers generated from your documents
📁 Multiple File Types: Supports Markdown, code files (JS/TS), YAML, and more
🔄 Smart Indexing: Automatically updates index when files change
🚀 No Daemon Required: Runs on-demand without background processes
💾 Efficient Storage: Uses LanceDB for fast vector operations
🌐 Multi-Provider Support: Works with both Google Gemini and OpenAI

Installation

No installation required! Use directly with npx:

npx @masuidrive/bloom-local-rag

Or install globally:

npm install -g @masuidrive/bloom-local-rag

Quick Start

1. Set up your API key

For Google Gemini (recommended):

export GOOGLE_API_KEY=your-api-key
# or
export GEMINI_API_KEY=your-api-key

For OpenAI:

export OPENAI_API_KEY=your-api-key

2. Initialize your directory

npx @masuidrive/bloom-local-rag --init

This creates a .bloom-local-rag directory with the vector database.

3. Search your documents

npx @masuidrive/bloom-local-rag "how do I authenticate users?"

# Search a specific directory
npx @masuidrive/bloom-local-rag "how do I authenticate users?" --dir /path/to/docs

# Or use --directory
npx @masuidrive/bloom-local-rag --directory /path/to/docs "how do I authenticate users?"

Detailed Usage

Initialize Command

The --init option scans your directory and creates a searchable index:

npx @masuidrive/bloom-local-rag --init [options]

Options:

-d, --dir, --directory <path>: Directory to initialize (default: current directory)
-e, --extensions <exts...>: File extensions to index (default: .md, .mdx, .txt, .js, .ts, .jsx, .tsx, .yaml, .yml)
--chunk-size <size>: Text chunk size for indexing (default: 1000)
--chunk-overlap <size>: Overlap between chunks (default: 200)
--embedding-provider <provider>: Choose between 'gemini' or 'openai' (default: gemini)
--embedding-model <model>: Specific embedding model to use
--llm-provider <provider>: LLM provider for answers (default: gemini)
--llm-model <model>: Specific LLM model to use
--exclude <patterns...>: Additional patterns to exclude from indexing

Example:

# Initialize current directory
npx @masuidrive/bloom-local-rag --init

# Initialize a specific directory
npx @masuidrive/bloom-local-rag --init --dir ./docs

# Index only markdown and TypeScript files
npx @masuidrive/bloom-local-rag --init --extensions .md .ts

# Use OpenAI for both embeddings and answers
npx @masuidrive/bloom-local-rag --init --embedding-provider openai --llm-provider openai

Query Command (Default Mode)

Search your indexed documents and get AI-powered answers:

npx @masuidrive/bloom-local-rag "your question here" [options]

Options:

-d, --dir, --directory <path>: Directory to search (default: current directory)
-l, --limit <n>: Number of source documents to retrieve (default: 5)
--no-context: Skip AI answer generation, show only source documents
--json: Output results in JSON format
--temperature <value>: Control creativity of AI answers (0-2, default: 0.7)
-v, --verbose: Show detailed information including sources

Examples:

# Simple query
npx @masuidrive/bloom-local-rag "how to handle errors in async functions"

# Get more source documents
npx @masuidrive/bloom-local-rag "database schema design" --limit 10

# Get only relevant documents without AI summary
npx @masuidrive/bloom-local-rag "API endpoints" --no-context

# Get JSON output for integration with other tools
npx @masuidrive/bloom-local-rag "user authentication" --json

# Search in a different directory
npx @masuidrive/bloom-local-rag "deployment process" --dir ../other-project

# Directory option can come before or after the query
npx @masuidrive/bloom-local-rag --dir ../docs "deployment process"

Reindex Command

Manually update the index (though this happens automatically during queries):

npx @masuidrive/bloom-local-rag --reindex [options]

Options:

-d, --dir, --directory <path>: Directory to reindex (default: current directory)
--force: Force reindex all files, ignoring cache
-v, --verbose: Show detailed information

Status Command

Check the status of your indexed documents:

npx @masuidrive/bloom-local-rag --status [options]

Options:

-d, --dir, --directory <path>: Directory to check status (default: current directory)

Shows:

Configuration details
Number of indexed files and chunks
Last index update time
Storage usage

How It Works

Indexing: bloom-local-rag scans your directory and splits documents into chunks
Embedding: Each chunk is converted to a vector embedding using AI models
Storage: Vectors are stored in a local LanceDB database
Search: Your query is converted to a vector and compared with stored vectors
Context: Most relevant chunks are retrieved as context
Answer: An LLM generates an answer based on the retrieved context

File Type Support

By default, bloom-local-rag indexes:

Documentation: .md, .mdx, .txt
Code: .js, .ts, .jsx, .tsx
Configuration: .yaml, .yml

Special handling:

Markdown files: Frontmatter is extracted as metadata
YAML files: Parsed for structured data
.gitignore: Respected for file exclusion

Best Practices

Choose the Right Files: Focus on documentation, well-commented code, and configuration files
Chunk Size: Larger chunks (2000) for narrative docs, smaller (500) for code
Exclusions: Exclude generated files, build outputs, and dependencies
API Keys: Use environment variables, never commit keys to version control
Regular Updates: Run queries regularly - the index updates automatically

Configuration

The .bloom-local-rag/config.json file stores your settings:

{
  "version": "1.0",
  "directory": "/path/to/your/project",
  "extensions": [".md", ".js", ".ts"],
  "embedding": {
    "provider": "gemini",
    "model": "text-embedding-004",
    "chunkSize": 1000,
    "chunkOverlap": 200
  },
  "llm": {
    "provider": "gemini",
    "model": "gemini-2.0-flash-exp",
    "temperature": 0.7
  }
}

Troubleshooting

API Key Issues

Error: Gemini API key not found

Solution: Set the appropriate environment variable:

For Gemini: export GOOGLE_API_KEY=your-key
For OpenAI: export OPENAI_API_KEY=your-key

Directory Not Initialized

Error: Directory not initialized. Run "init" command first.

Solution: Run npx @masuidrive/bloom-local-rag init in your project directory

No Results Found

Check if files match the configured extensions
Verify files aren't excluded by .gitignore
Try broader search terms
Increase the --limit parameter

Privacy & Security

Local Processing: All data stays on your machine
No Telemetry: We don't collect any usage data
API Calls: Only your queries and relevant chunks are sent to the LLM provider
Gitignore: Sensitive files excluded by default

Requirements

Node.js v20 or higher
API key for Google Gemini or OpenAI

License

MIT

Important Note: When modifying this README, please also update the Japanese translation in README.ja.md to maintain consistency across both versions.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@masuidrive/bloom-local-rag

What is bloom-local-rag?

Key Features

Installation

Quick Start

1. Set up your API key

2. Initialize your directory

3. Search your documents

Detailed Usage

Initialize Command

Query Command (Default Mode)

Reindex Command

Status Command

How It Works

File Type Support

Best Practices

Configuration

Troubleshooting

API Key Issues

Directory Not Initialized

No Results Found

Privacy & Security

Requirements

License