bluera-knowledge

v0.33.0

Published

a day ago

CLI tool for managing knowledge stores with semantic search

0High
0Medium
0Low

chris-bluera

knowledge semantic-search vector-database cli

🧠 Bluera Knowledge

NPM Version NPM Downloads

🚀 Build a local knowledge base for your AI coding agent—dependency source code, crawled docs, and your own files, all instantly searchable.

Two ways to use it:

| | npm Package | Claude Code Plugin | |--|-------------|-------------------| | Install | npm install -g bluera-knowledge | /plugin install bluera-knowledge@bluera | | Interface | CLI commands | Slash commands + MCP tools | | Works with | Any AI tool, any editor | Claude Code specifically | | Best for | CI/CD, automation, other editors | Native Claude Code integration |

Both provide the same core functionality: index repos, crawl docs, semantic search.

Bluera Knowledge gives AI coding agents instant local access to authoritative context:

Dependency source code — Clone and search the repos of dependencies you actually use
Documentation — Crawl, index, and search any docs site
Your files — Index and search local folders for project-specific knowledge

All searchable in milliseconds, no rate limits, fully offline.

📑 Table of Contents

📦 Installation

npm Package (CLI)

# Global install (CLI available everywhere)
npm install -g bluera-knowledge

# Or project install
npm install --save-dev bluera-knowledge

Works with any AI coding tool, editor, CI/CD pipeline, or automation.

Claude Code Plugin

# Add the Bluera marketplace (one-time setup)
/plugin marketplace add blueraai/bluera-marketplace

# Install the plugin (or use /plugin to browse the UI)
/plugin install bluera-knowledge@bluera

Adds slash commands, MCP tools, and Skills for optimal Claude Code integration.

[!NOTE] Setup runs automatically on first session start (installs dependencies and Playwright browser). If you see "Setup running in background...", wait ~30s and restart Claude Code.

Prerequisites

| Requirement | Version | Notes | |-------------|---------|-------| | Node.js | v20.x or v22.x | LTS recommended. v24+ not yet supported (native module compatibility) | | Build tools | - | make + C++ compiler (build-essential on Debian/Ubuntu, Xcode CLI on macOS) | | Python 3 | Optional | Enables code-graph features |

✨ Why Bluera Knowledge?

When your AI coding assistant needs to answer "how do I handle errors in Express middleware?", it can:

Guess from training data — might be outdated or wrong
Search the web — slow, rate-limited, often returns blog posts instead of source
Read your local knowledge base — authoritative, complete, instant ✅

Bluera Knowledge enables option 3 by building a searchable knowledge base from three types of sources:

| Source Type | What It Does | Example | |------------|--------------|---------| | 📦 Dependency Source Code | Clone & search library repos you actually use | Express, React, Lodash | | 🌐 Documentation Sites | Crawl & index any docs site | Next.js docs, FastAPI guides | | 📁 Local Files | Index project-specific content | Your docs, standards, API specs |

The result: Your AI agent has local, instant access to authoritative information with zero rate limits:

| Capability | Without | With Bluera Knowledge | |------------|---------|----------------------| | Response time | 2-5 seconds (web) | ~100ms (local) | | Accuracy | Uncertain | Authoritative (source code + docs) | | Completeness | Partial docs | Full implementation + tests + your content | | Rate limits | Yes | None |

🎯 When Claude Code Should Query BK

The simple rule: Query BK for any question about libraries, dependencies, or reference material.

BK is cheap (~100ms, no rate limits), authoritative (actual source code), and complete (includes tests and internal APIs). Claude Code should query it frequently for external code questions.

Always Query BK For:

| Question Type | Examples | |--------------|----------| | Library internals | "How does Express handle middleware errors?", "What does useEffect cleanup do?" | | API signatures | "What parameters does axios.create() accept?", "What options can I pass to Hono?" | | Error handling | "What errors can Zod throw?", "Why might this library return undefined?" | | Version behavior | "What changed in React 18?", "Is this method deprecated?" | | Configuration | "What config options exist for Vite?", "What are the defaults?" | | Testing patterns | "How do the library authors test this?", "How should I mock this?" | | Performance/internals | "Is this cached internally?", "What's the complexity?" | | Security | "How does this library validate input?", "Is this safe against injection?" | | Integration | "How do I integrate X with Y?", "What's the idiomatic way to use this?" |

DO NOT Query BK For:

| Question Type | Use Instead | |--------------|-------------| | Your project code | Grep/Read directly ("Where is OUR auth middleware?") | | General concepts | Training data ("What is a closure?") | | Breaking news | Web search ("Latest React release notes") |

Quick Pattern Matching:

"How does [library] work..."           → Query BK
"What does [library function] do..."   → Query BK
"What options does [library] accept..."→ Query BK
"What errors can [library] throw..."   → Query BK
"Where is [thing] in OUR code..."      → Grep/Read directly
"What is [general concept]..."         → Training data

🚀 Quick Start

Using Claude Code Plugin

[ ] 📦 Add a library: /bluera-knowledge:add-repo https://github.com/lodash/lodash
[ ] 📁 Index your docs: /bluera-knowledge:add-folder ./docs --name=project-docs
[ ] 🔍 Test search: /bluera-knowledge:search "deep clone object"
[ ] 📋 View stores: /bluera-knowledge:stores

[!TIP] Not sure which libraries to index? Use /bluera-knowledge:suggest to analyze your project's dependencies.

Using CLI (npm package)

# Add a library
bluera-knowledge store create lodash --type repo --source https://github.com/lodash/lodash

# Index your docs
bluera-knowledge store create project-docs --type file --source ./docs

# Test search
bluera-knowledge search "deep clone object"

# View stores
bluera-knowledge store list

✨ Features

🎯 Core Features

🔬 Smart Dependency Analysis - Automatically scans your project to identify which libraries are most heavily used by counting import statements across all source files
📊 Usage-Based Suggestions - Ranks dependencies by actual usage frequency, showing you the top 5 most-imported packages with import counts and file counts
🔍 Automatic Repository Discovery - Queries package registries (NPM, PyPI, crates.io, Go modules) to automatically find GitHub repository URLs
📦 Git Repository Indexing - Clones and indexes dependency source code for both semantic search and direct file access
📁 Local Folder Indexing - Indexes any local content - documentation, standards, reference materials, or custom content
🌐 Web Crawling - Crawl and index web pages using Node.js Playwright - convert documentation sites to searchable markdown

🔍 Search Modes

🧠 Vector Search - AI-powered semantic search with relevance ranking
📂 File Access - Direct Grep/Glob operations on cloned source files

🗺️ Code Graph Analysis

📊 Code Graph Analysis - During indexing, builds a graph of code relationships (calls, imports, extends) to provide usage context in search results - shows how many callers/callees each function has
🌐 Multi-Language Support - Full AST parsing for JavaScript, TypeScript, Python, Rust, and Go; indexes code in any language
🔌 MCP Integration - Exposes all functionality as Model Context Protocol tools for AI coding agents

🌍 Language-Specific Features

While bluera-knowledge indexes and searches code in any language, certain advanced features are language-specific:

| Language | Code Graph | Call Analysis | Import Tracking | Method Tracking | |----------|------------|---------------|-----------------|-----------------| | TypeScript/JavaScript | ✅ Full Support | ✅ Functions & Methods | ✅ Full | ✅ Class Methods | | Python | ✅ Full Support | ✅ Functions & Methods | ✅ Full | ✅ Class Methods | | Rust | ✅ Full Support | ✅ Functions & Methods | ✅ Full | ✅ Struct/Trait Methods | | Go | ✅ Full Support | ✅ Functions & Methods | ✅ Full | ✅ Struct/Interface Methods | | ZIL | ✅ Full Support | ✅ Routines | ✅ INSERT-FILE | ✅ Objects/Rooms | | Other Languages | ⚠️ Basic Support | ❌ | ❌ | ❌ |

[!NOTE] Code graph features enhance search results by showing usage context (e.g., "this function is called by 15 other functions"), but all languages benefit from vector search and full-text search capabilities.

🔌 Custom Language Support

Bluera Knowledge provides an extensible adapter system for adding full graph support to any language. The built-in ZIL adapter (for Infocom/Zork-era source code) demonstrates this capability.

What adapters provide:

Smart chunking - Split files by language constructs (functions, classes, objects)
Symbol extraction - Parse definitions with signatures and line numbers
Import tracking - Resolve include/import relationships
Call graph analysis - Track function calls with special form filtering

Built-in adapters: | Language | Extensions | Symbols | Imports | |----------|------------|---------|---------| | ZIL | .zil, .mud | ROUTINE, OBJECT, ROOM, GLOBAL, CONSTANT | INSERT-FILE |

Example - ZIL indexing:

# Index a Zork source repository
bluera-knowledge store create zork1 --type repo --source https://github.com/historicalsource/zork1

# Search for routines
bluera-knowledge search "V-LOOK routine" --stores zork1

🎯 How It Works

The plugin provides AI agents with four complementary search capabilities:

🔍 1. Semantic Vector Search

AI-powered search across all indexed content

Searches by meaning and intent, not just keywords
Uses embeddings to find conceptually similar content
Ideal for discovering patterns and related concepts

📝 2. Full-Text Search (FTS)

Fast keyword and pattern matching

Traditional text search with exact matching
Supports regex patterns and boolean operators
Best for finding specific terms or identifiers

⚡ 3. Hybrid Mode (Recommended)

Combines vector and FTS search

Merges results from both search modes with weighted ranking
Balances semantic understanding with exact matching
Provides best overall results for most queries

📂 4. Direct File Access

Traditional file operations on cloned sources

Provides file paths to cloned repositories
Enables Grep, Glob, and Read operations on source files
Supports precise pattern matching and code navigation
Full access to complete file trees

When you use /bluera-knowledge: commands, here's what happens:

You issue a command - Type /bluera-knowledge:stores or similar in Claude Code
Claude Code receives instructions - The command provides step-by-step instructions for Claude Code
Claude Code executes MCP tools - Behind the scenes, Claude Code uses mcp__bluera-knowledge__* tools
Results are formatted - Claude Code formats and displays the output directly to you

Example Flow:

You: /bluera-knowledge:stores
  ↓
Command file instructs Claude Code to use execute("stores")
  ↓
MCP tool queries LanceDB for store metadata
  ↓
Claude Code formats results as a table
  ↓
You see: Beautiful table of all your knowledge stores

This architecture means commands provide a clean user interface while MCP tools handle the backend operations.

🎨 User Interface

👤 User Commands

You manage knowledge stores through /bluera-knowledge: commands:

🔬 Analyze your project to find important dependencies
📦 Add Git repositories (dependency source code)
📁 Add local folders (documentation, standards, etc.)
🌐 Crawl web pages and documentation
🔍 Search across all indexed content
🔄 Manage and re-index stores

🤖 MCP Tools

AI agents access knowledge through Model Context Protocol (3 tools for minimal context overhead):

| Tool | Purpose | |------|---------| | search | 🔍 Semantic vector search across all stores (includes store paths for direct file access) | | get_full_context | 📖 Retrieve complete code context by result ID or file path | | execute | ⚡ Meta-tool for store/job management commands |

The execute tool consolidates store and job management into a single tool with subcommands:

Store commands: stores, store:info, store:create, store:index, store:delete, stores:pull
Job commands: jobs, job:status, job:cancel
Help: help, commands

⚙️ Background Jobs

[!TIP] Long-running operations (git clone, indexing) run in the background, allowing you to continue working while they complete.

🔄 How It Works

When you add a repository or index content:

⚡ Instant Response - Operation starts immediately and returns a job ID
🔄 Background Processing - Indexing runs in a separate process
📊 Progress Updates - Check status anytime with /bluera-knowledge:check-status
🔔 Auto-Notifications - Active jobs appear automatically in context

📝 Example Workflow

# Add a large repository (returns immediately with job ID)
/bluera-knowledge:add-repo https://github.com/facebook/react

# Output:
# ✓ Created store: react (a1b2c3d4...)
# 🔄 Indexing started in background
#    Job ID: job_abc123def456
#
# Check status with: /bluera-knowledge:check-status job_abc123def456

# Check progress anytime
/bluera-knowledge:check-status job_abc123def456

# Output:
# Job Status: job_abc123def456
# Status:   running
# Progress: ███████████░░░░░░░░░ 45%
# Message:  Indexed 562/1,247 files

# View all active jobs
/bluera-knowledge:check-status

# Cancel if needed
/bluera-knowledge:cancel job_abc123def456

🚀 Performance

Background jobs include significant performance optimizations:

⚡ Parallel Embedding - Batch processes up to 32 chunks simultaneously
📂 Parallel File I/O - Processes multiple files concurrently (configurable, default: 4)
🔓 Non-Blocking - Continue working while indexing completes
📊 Progress Tracking - Real-time updates on files processed and progress percentage
🧹 Auto-Cleanup - Completed/stale jobs are cleaned up automatically

🎯 Use Cases

Dependency Source Code

Provide AI agents with canonical dependency implementation details:

/bluera-knowledge:suggest
/bluera-knowledge:add-repo https://github.com/expressjs/express

# AI agents can now:
# - Semantic search: "middleware error handling"
# - Direct access: Grep/Glob through the cloned express repo

Project Documentation

Make project-specific documentation available:

/bluera-knowledge:add-folder ./docs --name=project-docs
/bluera-knowledge:add-folder ./architecture --name=architecture

# AI agents can search across all documentation or access specific files

Coding Standards

Provide definitive coding standards and best practices:

/bluera-knowledge:add-folder ./company-standards --name=standards
/bluera-knowledge:add-folder ./api-specs --name=api-docs

# AI agents reference actual company standards, not generic advice

Mixed Sources

Combine canonical library code with project-specific patterns:

/bluera-knowledge:add-repo https://github.com/facebook/react --name=react
/bluera-knowledge:add-folder ./docs/react-patterns --name=react-patterns

# Search across both dependency source and team patterns

🔧 Troubleshooting

Quick Diagnostics

Run /bluera-knowledge:doctor to diagnose common issues. This checks:

Build tools (make/gcc) - required for native modules
Node.js installation
Plugin dependencies (node_modules)
MCP wrapper installation
Python 3 (optional, for embeddings)
Playwright browser (optional, for web crawling)

Works even when MCP is broken (uses Bash tool directly).

Ensure the plugin is installed and enabled:

/plugin list
/plugin enable bluera-knowledge

If the plugin isn't listed, install it:

/plugin marketplace add blueraai/bluera-marketplace
/plugin install bluera-knowledge@bluera

If the MCP server shows as failed after installation:

Run /bluera-knowledge:doctor - This diagnoses the most common issues
Restart Claude Code - MCP servers require a restart to initialize
Check logs: Run /logs errors to see recent errors or /logs module bootstrap for startup logs
Check status: Run /mcp to see connection status
Reinstall: Try /plugin uninstall bluera-knowledge then /plugin install bluera-knowledge@bluera

Common MCP failure causes:

| Symptom | Cause | Fix | |---------|-------|-----| | npm ERR! ERESOLVE in logs | Peer dependency conflict | Fixed in v0.22.6+ with npm flag | | Old plugin version running | Multiple cached versions | Fixed in v0.22.7+ (version sorting) | | make not found | Missing build tools | Install build-essential (Linux) | | Compilation errors on Node 24 | V8 API changes | Use Node.js v20.x or v22.x |

If the issue persists, check that Claude Code is v2.0.65 or later (earlier versions had MCP loading bugs).

Check Playwright browser installation:

npx playwright install chromium

The plugin attempts to auto-install Playwright browsers on first use, but manual installation may be needed in some environments.

Verify store exists: /bluera-knowledge:stores
Check store is indexed: /bluera-knowledge:index <store-name>
Try broader search terms
Verify you're searching the correct store with --stores=<name>

This error occurs when a store was indexed with a different embedding model than the current configuration. As of v0.20+, stores are indexed with bge-small-en-v1.5 (previously all-MiniLM-L6-v2).

Solution: Reindex the affected store:

/bluera-knowledge:index <store-name>

Check model status for all stores:

# Via MCP execute command
execute stores:check-models

This lists all stores with their embedding model and flags any that need reindexing.

List all stores to see available names and IDs:

/bluera-knowledge:stores

Use the exact store name or ID shown in the table.

tree-sitter and lancedb require native module compilation. Node.js v24 introduced V8 API changes that break these modules.

Solution: Use Node.js v20.x or v22.x (LTS versions):

# Using nvm
nvm install 22
nvm use 22

Run /bluera-knowledge:doctor to verify your environment.

Native modules require build tools:

# Debian/Ubuntu
sudo apt install build-essential

# Fedora/RHEL
sudo dnf groupinstall "Development Tools"

Then restart Claude Code for auto-setup to complete.

Large repositories (10,000+ files) take longer to index. If indexing fails:

Check available disk space
Ensure the source repository/folder is accessible
For repo stores, verify git is installed: git --version
Check for network connectivity (for repo stores)
Check logs for errors: /logs errors or /logs search "index"

This means intelligent crawling is unavailable. The crawler will automatically use simple BFS mode instead.

To enable intelligent crawling with --crawl and --extract:

Install Claude Code: https://claude.com/code
Ensure claude command is in PATH: which claude

Simple mode still crawls effectively—it just doesn't use AI to select which pages to crawl or extract specific content.

The plugin logs all MCP server operations to .bluera/bluera-knowledge/logs/app.log (relative to project root).

View logs using the /logs command:

/logs              # Watch logs in real-time (tail -f)
/logs view 50      # View last 50 log entries
/logs errors       # Show only error-level logs
/logs module mcp-store  # Filter by module (mcp-server, mcp-store, mcp-execute, mcp-job, mcp-sync, bootstrap)
/logs search "pattern"  # Search for a pattern

Log modules:

bootstrap - MCP server startup and dependency installation
mcp-server - Core MCP server operations
mcp-store - Store create/list/delete operations
mcp-search - Search queries and results
mcp-execute - Meta-tool command execution
mcp-job - Background job status
mcp-sync - Store sync operations

Manual access:

tail -f .bluera/bluera-knowledge/logs/app.log

Logs are JSON formatted (NDJSON) and can be processed with jq for pretty-printing.

🔧 Dependencies

Required for Web Crawling:

🎭 Playwright Chromium - Required for headless browser crawling (auto-installed on first session)

Optional:

🐍 Python 3.8+ - Only needed for Python AST parsing in code-graph features (not required for web crawling)

Setup (automatic):

Setup runs automatically on first session start. This installs:

✅ MCP wrapper script
✅ Node.js dependencies (via bun/npm)
✅ Playwright Chromium browser binaries

If you see "Setup running in background...", wait ~30s and restart Claude Code.

Manual setup (if auto-setup fails):

npx playwright install chromium

[!NOTE] Web crawling uses Node.js Playwright directly—no Python dependencies required. The default mode uses headless browser for maximum compatibility with JavaScript-rendered sites. Use --fast for static sites when speed is critical.

Update Plugin:

/plugin update bluera-knowledge

🎓 Skills for Claude Code

[!NOTE] Skills are a Claude Code-specific feature. They're automatically loaded when using the plugin but aren't available when using the npm package directly.

Bluera Knowledge includes built-in Skills that teach Claude Code how to use the plugin effectively. Skills provide procedural knowledge that complements the MCP tools.

📚 Available Skills

`knowledge-search`

Teaches the two approaches for accessing dependency sources:

Vector search via MCP/slash commands for discovery
Direct Grep/Read access to cloned repos for precision

When to use: Understanding how to query indexed libraries

`when-to-query`

Decision guide for when to query BK stores vs using Grep/Read on current project.

When to use: Deciding whether a question is about libraries or your project code

`advanced-workflows`

Multi-tool orchestration patterns for complex operations.

When to use: Progressive library exploration, adding libraries, handling large results

`search-optimization`

Guide on search parameters and progressive detail strategies.

When to use: Optimizing search results, choosing the right intent and detail level

`store-lifecycle`

Best practices for creating, indexing, and managing stores.

When to use: Adding new stores, understanding when to use repo/folder/crawl

🔄 MCP + Skills Working Together

Skills teach how to use the MCP tools effectively:

MCP provides the capabilities (search, get_full_context, execute commands)
Skills provide procedural knowledge (when to use which tool, best practices, workflows)

This hybrid approach reduces unnecessary tool calls and context usage while maintaining universal MCP compatibility.

Example:

MCP tool: search(query, intent, detail, limit, stores)
Skill teaches: Which intent for your question type, when to use detail='minimal' vs 'full', how to narrow with stores

Result: Fewer tool calls, more accurate results, less context consumed.

🎯 Skill Auto-Activation

Skills can automatically suggest themselves when your prompt matches certain patterns.

Toggle via slash command:

/bluera-knowledge:skill-activation - Show current status
/bluera-knowledge:skill-activation on - Enable (default)
/bluera-knowledge:skill-activation off - Disable
/bluera-knowledge:skill-activation config - Toggle individual skills

How it works: When enabled, a UserPromptSubmit hook analyzes your prompt for patterns like:

"How does [library] work?" → suggests knowledge-search
"Should I grep or search?" → suggests when-to-query
"Too many results" → suggests search-optimization
"Multi-step workflow" → suggests advanced-workflows
"Add/delete store" → suggests store-lifecycle

Claude evaluates each suggestion and invokes relevant skills before answering. Users who already use BK terminology are excluded (they already know the tool).

Configuration stored in: .bluera/bluera-knowledge/skill-activation.json (relative to project root)

💾 Data Storage

Knowledge stores are stored in your project root:

<project-root>/.bluera/bluera-knowledge/
├── data/
│   ├── repos/<store-id>/       # Cloned Git repositories
│   ├── documents_*.lance/      # Vector indices (Lance DB)
│   └── stores.json             # Store registry
├── stores.config.json          # Store definitions (git-committable!)
└── config.json                 # Configuration

📋 Store Definitions (Team Sharing)

Store definitions are automatically saved to .bluera/bluera-knowledge/stores.config.json. This file is designed to be committed to git, allowing teams to share store configurations.

Example stores.config.json:

{
  "version": 1,
  "stores": [
    { "type": "file", "name": "my-docs", "path": "./docs" },
    { "type": "repo", "name": "react", "url": "https://github.com/facebook/react" },
    { "type": "web", "name": "api-docs", "url": "https://api.example.com/docs", "depth": 2 }
  ]
}

When a teammate clones the repo, they can run /bluera-knowledge:sync to recreate all stores locally.

🚫 Recommended `.gitignore` Patterns

When you first create a store, the plugin automatically updates your .gitignore with:

# Bluera Knowledge - data directory (not committed)
.bluera/
!.bluera/bluera-knowledge/
!.bluera/bluera-knowledge/stores.config.json

This ensures:

Vector indices and cloned repos are NOT committed (they're large and can be recreated)
Store definitions ARE committed (small JSON file for team sharing)

🔬 Technologies

🔌 Claude Code Plugin System with MCP server
✅ Runtime Validation - Zod schemas for Python-TypeScript boundary
🌳 AST Parsing - @babel/parser for JS/TS, Python AST module, tree-sitter for Rust and Go
🗺️ Code Graph - Static analysis of function calls, imports, and class relationships
🧠 Semantic Search - AI-powered vector embeddings with LanceDB
📦 Git Operations - Native git clone
💻 CLI - Commander.js
🕷️ Web Crawling - Playwright (Node.js headless browser) with Cheerio (HTML parsing)

📚 Documentation

| Document | Description | |----------|-------------| | CLI Reference | Complete CLI commands, options, and usage examples | | MCP Integration | MCP server configuration and tool documentation | | Commands Reference | All slash commands with parameters and examples | | Crawler Architecture | How the intelligent web crawler works | | Token Efficiency | How BK reduces token consumption vs web search | | CONTRIBUTING | Development setup, testing, and release process |

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for development setup, testing, and release process.

📄 License

MIT - See LICENSE for details.

🙏 Acknowledgments

This project includes software developed by third parties. See NOTICE for full attribution.

Key dependencies:

Playwright - Headless browser for web crawling (Apache-2.0)
LanceDB - Vector database (Apache-2.0)
Hugging Face Transformers - Embeddings (Apache-2.0)
Cheerio - HTML parsing (MIT)

💬 Support

🐛 Issues: GitHub Issues
📚 Documentation: Claude Code Plugins
📝 Changelog: CHANGELOG.md