@teknologika/mcp-codebase-search
v0.1.9
Published
Local-first semantic search system for codebases using MCP protocol with Tree-sitter parsing and ChromaDB vector storage
Maintainers
Readme
@teknologika/mcp-codebase-search
A local-first semantic search system for codebases using the Model Context Protocol (MCP)
📋 Table of Contents
- Overview
- Features
- Installation
- Quick Start
- Usage
- Configuration
- MCP Client Configuration
- Supported Languages
- Architecture
- Troubleshooting
- Development
- Contributing
- License
Overview
The Codebase Memory MCP Server enables LLM coding assistants to reliably discover existing code in a codebase, preventing duplicate implementations and wrong-file edits. It uses local embeddings, Tree-sitter-aware chunking, and LanceDB for vector storage, ensuring all operations run locally without cloud dependencies.
Why Use This?
- Prevent Duplicate Code: AI assistants can find existing implementations before creating new ones
- Accurate Code Navigation: Semantic search understands code meaning, not just keywords
- Privacy-First: All processing happens locally—your code never leaves your machine
- Fast & Efficient: Optimized for quick search responses with intelligent caching
- Multi-Language: Support for C#, Java, JavaScript, TypeScript, and Python
- Smart Filtering: Exclude test files and library code from search results
Features
- 🔒 Local-First: All operations run locally without external API calls
- 🔍 Semantic Search: Find code by meaning, not just keywords
- 🌳 Tree-sitter Parsing: AST-aware code chunking for meaningful results
- 🤖 MCP Integration: Seamless integration with MCP-compatible AI assistants (Claude Desktop, etc.)
- 🌐 Multi-Language Support: C#, Java, JavaScript, TypeScript, Python
- 🖥️ Web Management UI: Manage indexed codebases through a web interface
- ⚡ Performance Optimized: Sub-500ms search responses with intelligent caching
- 🎯 Smart Filtering: Exclude test files and library code from results
- 📊 Detailed Statistics: Track chunk counts, file counts, and language distribution
- 🔄 Gitignore Support: Respects .gitignore patterns during ingestion
Installation
Global Installation (Recommended)
npm install -g @teknologika/mcp-codebase-searchThis makes three commands available globally:
mcp-codebase-search- MCP server for AI assistantsmcp-codebase-ingest- CLI for indexing codebasesmcp-codebase-manager- Web UI for management
Local Installation
npm install @teknologika/mcp-codebase-searchThen use with npx:
npx mcp-codebase-ingest --path ./my-project --name my-project
npx mcp-codebase-search
npx mcp-codebase-managerRequirements
- Node.js: 23.0.0 or higher
- npm: 10.0.0 or higher
- Disk Space: ~500MB for embedding models (downloaded on first use)
Quick Start
1. Index Your First Codebase
mcp-codebase-ingest --path ./my-project --name my-projectExample Output:
Ingesting codebase: my-project
Path: /Users/dev/projects/my-project
Scanning directory: [████████████████████] 100% (1,234/1,234)
Parsing files: [████████████████████] 100% (1,100/1,100)
Generating embeddings: [████████████████████] 100% (5,678/5,678)
Storing chunks: [████████████████████] 100% (5,678/5,678)
✓ Ingestion completed successfully!
Statistics:
Total files scanned: 1,234
Supported files: 1,100
Unsupported files: 134
Chunks created: 5,678
Duration: 45.2s
Languages detected:
typescript: 3,200 chunks (800 files)
python: 1,500 chunks (200 files)
java: 978 chunks (100 files)2. Configure Your MCP Client
For Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"codebase-search": {
"command": "mcp-codebase-search",
"args": []
}
}
}3. Setp your Agents.md
# AGENTS.md — Codebase Dedupe Protocol
## Goal
Prevent duplicate implementations and “wrong file” edits by making **codebase-search** the *only valid source* for claims about what already exists in this repo during this session.
This project has a strict rule: **you must not create new code, new files, or new implementations unless you have first searched the codebase using the MCP tool and compared against the results.**
## Tools you MUST use for codebase discovery
Use **codebase-search** tools for discovery and evidence:
- `list_codebases`
- `search_codebases`
- `get_chunk_content`
- `get_codebase_stats`
After updates run `update_codebase_scan` to refresh the codebase search results.
These are the only approved discovery tools for “what exists already.”
## Hard rule: No creation without a Dedupe Ticket
Before you do *any* of the following, you must produce a Dedupe Ticket and run the searches it specifies:
Creation triggers include: adding a new file, adding a new module/class/function, introducing a new utility/helper, duplicating a configuration pattern, or proposing a new “approach/framework” that sounds like it could already exist.
A **Dedupe Ticket** is a short structured note you write in your response (keep it compact):
**Dedupe Ticket**
- Intent signature: `<one sentence describing exactly what you are about to add/change>`
- Queries: `<2–4 searches you will run in search_codebases>`
- Top matches: `<up to 5 result identifiers or file paths returned by the tool>`
- Decision: `reuse | extend | new`
- Rationale: `<why reuse/extend is sufficient, or why new is justified>`
You must actually call `search_codebases` before finalizing the ticket. Do not guess.
## Execution protocol
When asked to implement or change code:
1) If the request implies any creation trigger, begin by calling `search_codebases` (and `list_codebases` if you have not yet selected the codebase in this session).
2) Review results and decide `reuse | extend | new`.
3) Only then propose edits, and prefer extending existing implementations over creating new ones.
4) After making significant edits run update_codebase_scan to re-index the codebase
## What you may not do
You may not claim “there is no existing implementation” or “this doesn’t exist” unless you have run `search_codebases` in this session and the results support that claim. “I didn’t see it” is not acceptable without a tool call.
You may not create “parallel” implementations alongside existing ones unless the Dedupe Ticket explicitly justifies why reuse/extension is not viable.
## Graceful degradation
If the MCP server is unavailable or returning errors:
State **DEGRADED MODE** at the top of your reply and stop before making changes. Ask for the MCP server to be enabled/fixed, or ask for explicit user approval to proceed best-effort without search. Do not proceed silently.
## Tool intent alignment
When you need to know what exists, where it is, or how similar code is structured, you must treat `search_codebases` as authoritative. Do not infer from local context alone.
### 4. Start Using in Your AI Assistant
Once configured, your AI assistant can use these tools:
- **list_codebases**: See all indexed codebases
- **search_codebases**: Search for code semantically
- **get_codebase_stats**: View detailed statistics
- **open_codebase_manager**: Launch and open the Manager UI in your browser
### 5. (Optional) Explore the Manager UI
```bash
mcp-codebase-managerOpens http://localhost:8008 in your default browser with a visual interface for:
- Searching codebases with filters
- Managing indexed codebases
- Viewing statistics and metadata
- Adding new codebases with real-time progress
Usage
Ingestion CLI
The mcp-codebase-ingest command indexes a codebase for semantic search.
Basic Usage
mcp-codebase-ingest --path <directory> --name <codebase-name>Options
| Option | Description | Required | Example |
|--------|-------------|----------|---------|
| -p, --path | Path to codebase directory | Yes | --path ./my-project |
| -n, --name | Unique name for the codebase | Yes | --name my-project |
| -c, --config | Path to configuration file | No | --config ./config.json |
| --no-gitignore | Disable .gitignore filtering | No | --no-gitignore |
Examples
Index a local project:
mcp-codebase-ingest --path ~/projects/my-app --name my-appIndex with custom config:
mcp-codebase-ingest --path ./backend --name backend-api --config ./custom-config.jsonIndex without gitignore filtering:
mcp-codebase-ingest --path ./my-project --name my-project --no-gitignoreRe-index an existing codebase:
# Simply run the same command again - old data is automatically replaced
mcp-codebase-ingest --path ~/projects/my-app --name my-appWhat Gets Indexed?
- ✅ All files with supported extensions (
.cs,.java,.js,.jsx,.ts,.tsx,.py) - ✅ Files in nested subdirectories (recursive scanning)
- ✅ Semantic code chunks (functions, classes, methods, interfaces)
- ✅ Metadata tags (test files, library files)
- ❌ Files larger than 1MB (configurable via
maxFileSize) - ❌ Files in
.gitignore(by default, use--no-gitignoreto include) - ❌ Binary files and unsupported formats
- ❌ Hidden directories (starting with
.)
MCP Server
The MCP server exposes tools for AI assistants to search and explore codebases.
Starting the Server
mcp-codebase-searchThe server runs in stdio mode and communicates with MCP clients via standard input/output.
Available Tools
1. list_codebases
Lists all indexed codebases with metadata.
Input: None
Output:
{
"codebases": [
{
"name": "my-project",
"path": "/path/to/project",
"chunkCount": 5678,
"fileCount": 1100,
"lastIngestion": "2024-01-15T10:30:00Z",
"languages": ["typescript", "python", "java"]
}
]
}2. search_codebases
Performs semantic search across indexed codebases.
Input:
{
"query": "authentication function",
"codebaseName": "my-project", // Optional
"language": "typescript", // Optional
"maxResults": 25 // Optional (default: 50)
}Output:
{
"results": [
{
"filePath": "src/auth/authenticate.ts",
"startLine": 15,
"endLine": 45,
"language": "typescript",
"chunkType": "function",
"content": "export async function authenticate(credentials: Credentials) { ... }",
"similarityScore": 0.92,
"codebaseName": "my-project"
}
],
"totalResults": 1,
"queryTime": 45
}3. get_codebase_stats
Retrieves detailed statistics for a specific codebase.
Input:
{
"name": "my-project"
}Output:
{
"name": "my-project",
"path": "/path/to/project",
"chunkCount": 5678,
"fileCount": 1100,
"lastIngestion": "2024-01-15T10:30:00Z",
"languages": [
{ "language": "typescript", "fileCount": 800, "chunkCount": 3200 },
{ "language": "python", "fileCount": 200, "chunkCount": 1500 }
],
"chunkTypes": [
{ "type": "function", "count": 2500 },
{ "type": "class", "count": 1200 },
{ "type": "method", "count": 1978 }
],
"sizeBytes": 2500000
}4. open_codebase_manager
Opens the web-based Manager UI in the default browser. Automatically launches the server if it's not already running.
Input: None
Output:
{
"success": true,
"url": "http://localhost:8008",
"serverStarted": true,
"message": "Manager UI opened in browser. Server was started."
}Note: The tool checks if the Manager server is running on the configured port. If not, it launches the server in the background before opening the browser.
Manager UI
The Manager UI provides a web-based interface for managing indexed codebases.
Starting the Manager
mcp-codebase-managerThis will:
- Start a Fastify server on port 8008 (configurable)
- Automatically open
http://localhost:8008in your default browser - Display all indexed codebases with statistics
Features
Search Tab:
- Semantic search across all codebases
- Filter by codebase and max results
- Exclude test files checkbox
- Exclude library files checkbox
- Collapsible results with color-coded confidence scores:
- 🟢 Green (0.80-1.00): Excellent match
- 🟡 Yellow (0.60-0.79): Good match
- 🔵 Blue (0.00-0.59): Lower match
Manage Codebases Tab:
- View all indexed codebases
- See chunk counts, file counts, and last indexed date
- Add new codebases with real-time progress tracking
- Rename codebases
- Remove codebases
- Gitignore filtering checkbox (checked by default)
Manager Controls:
- Quit Manager button with confirmation dialog (stops server and closes browser tab)
Configuration
The system can be configured using a JSON configuration file. The default location is ~/.codebase-memory/config.json.
Configuration File Example
{
"lancedb": {
"persistPath": "~/.codebase-memory/lancedb"
},
"embedding": {
"modelName": "Xenova/all-MiniLM-L6-v2",
"cachePath": "~/.codebase-memory/models"
},
"server": {
"port": 8008,
"host": "localhost",
"sessionSecret": "change-me-in-production"
},
"mcp": {
"transport": "stdio"
},
"ingestion": {
"batchSize": 100,
"maxFileSize": 1048576,
"maxChunkTokens": 512,
"chunkOverlapTokens": 50
},
"search": {
"defaultMaxResults": 50,
"cacheTimeoutSeconds": 60
},
"logging": {
"level": "info"
},
"schemaVersion": "1.0.0"
}Configuration Options
LanceDB Settings
| Option | Description | Default |
|--------|-------------|---------|
| persistPath | Directory for LanceDB storage | ~/.codebase-memory/lancedb |
Embedding Settings
| Option | Description | Default |
|--------|-------------|---------|
| modelName | Hugging Face model for embeddings | Xenova/all-MiniLM-L6-v2 |
| cachePath | Directory for model cache | ~/.codebase-memory/models |
Server Settings
| Option | Description | Default |
|--------|-------------|---------|
| port | Port for Manager UI server | 8008 |
| host | Host for Manager UI server | localhost |
| sessionSecret | Secret for session cookies | Auto-generated |
Ingestion Settings
| Option | Description | Default |
|--------|-------------|---------|
| batchSize | Chunks per batch during ingestion | 100 |
| maxFileSize | Maximum file size in bytes | 1048576 (1MB) |
| maxChunkTokens | Maximum tokens per chunk (optimized for embedding model) | 512 |
| chunkOverlapTokens | Token overlap between split chunks for context preservation | 50 |
Note: The maxChunkTokens setting is optimized for the Xenova/all-MiniLM-L6-v2 model. Adjust based on your embedding model's optimal input size.
Search Settings
| Option | Description | Default |
|--------|-------------|---------|
| defaultMaxResults | Default maximum search results | 50 |
| cacheTimeoutSeconds | Search result cache timeout | 60 |
Logging Settings
| Option | Description | Default | Options |
|--------|-------------|---------|---------|
| level | Log level | info | debug, info, warn, error |
Custom Configuration
To use a custom configuration file:
# For ingestion
mcp-codebase-ingest --config ./my-config.json --path ./code --name my-code
# For MCP server (via environment variable)
CONFIG_PATH=./my-config.json mcp-codebase-searchMCP Client Configuration
Using Codex CLI (Recommended)
The easiest way to configure this MCP server is using the Codex CLI:
codex mcp add codebase-search -- mcp-codebase-searchWith custom environment variables:
codex mcp add codebase-search \
--env CONFIG_PATH=~/.codebase-memory/config.json \
--env LOG_LEVEL=info \
-- mcp-codebase-searchThe codex mcp add command automatically:
- Detects your MCP client (Claude Desktop, Cline, etc.)
- Updates the appropriate configuration file
- Validates the configuration
- Restarts the MCP client if needed
Manual Configuration
Claude Desktop
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"codebase-search": {
"command": "mcp-codebase-search",
"args": []
}
}
}Other MCP Clients
For other MCP-compatible clients, use the stdio transport:
{
"mcpServers": {
"codebase-search": {
"command": "mcp-codebase-search",
"args": [],
"env": {
"CONFIG_PATH": "~/.codebase-memory/config.json",
"LOG_LEVEL": "info"
}
}
}
}Verifying Configuration
After configuring your MCP client:
- Restart the client application
- Check that the
codebase-searchserver appears in the MCP server list - Try using the
list_codebasestool to verify connectivity
Supported Languages
The system uses Tree-sitter for AST-aware code parsing. Currently supported languages:
| Language | Extensions | Chunk Types |
|----------|-----------|-------------|
| C# | .cs | class, method, property, interface |
| Java | .java | class, method, field, interface |
| JavaScript | .js, .jsx | function, class, method |
| TypeScript | .ts, .tsx | function, class, method, interface |
| Python | .py | function, class, method |
What Gets Extracted?
For each supported language, the system extracts:
- Functions: Top-level and nested functions
- Classes: Class declarations with their context
- Methods: Class methods and instance methods
- Interfaces: Interface definitions (TypeScript, C#, Java)
- Properties: Class properties (C#)
- Fields: Class fields (Java)
File Classification
The system automatically classifies files during ingestion:
Test Files (tagged with isTestFile: true):
- Files ending in
.test.ts,.spec.ts,_test.py, etc. - Files in
__tests__/,test/,tests/,spec/directories
Library Files (tagged with isLibraryFile: true):
- Files in
node_modules/,vendor/,dist/,build/,venv/, etc.
These tags enable filtering in search results.
Architecture
System Overview
┌─────────────────────────────────────────────────────────────┐
│ Entry Points │
├──────────────┬──────────────────┬──────────────────────────┤
│ MCP Server │ Ingestion CLI │ Manager UI │
│ (stdio) │ (command-line) │ (web interface) │
└──────┬───────┴────────┬─────────┴──────────┬───────────────┘
│ │ │
│ │ │
┌──────▼────────────────▼────────────────────▼───────────────┐
│ Core Services │
├─────────────┬──────────────┬──────────────┬────────────────┤
│ Codebase │ Search │ Ingestion │ Embedding │
│ Service │ Service │ Service │ Service │
└──────┬──────┴──────┬───────┴──────┬───────┴────────┬───────┘
│ │ │ │
│ │ │ │
┌──────▼─────────────▼──────────────▼────────────────▼───────┐
│ Storage & External │
├──────────────┬──────────────────┬─────────────────────────┤
│ LanceDB │ Tree-sitter │ Hugging Face │
│ (Vector DB) │ (Code Parsing) │ (Embeddings) │
└──────────────┴──────────────────┴─────────────────────────┘Component Responsibilities
MCP Server (mcp-codebase-search)
- Exposes tools via Model Context Protocol
- Validates inputs and outputs
- Handles stdio communication
Ingestion CLI (mcp-codebase-ingest)
- Scans directories recursively
- Respects .gitignore patterns
- Parses code with Tree-sitter
- Classifies test and library files
- Generates embeddings
- Stores chunks in LanceDB
Manager UI (mcp-codebase-manager)
- Fastify web server with SSR
- Real-time ingestion progress via SSE
- Search interface with filters
- Codebase management
Core Services
- Codebase Service: CRUD operations for codebases
- Search Service: Semantic search with filtering and caching
- Ingestion Service: Orchestrates indexing pipeline
- Embedding Service: Generates vector embeddings locally
Data Flow
Ingestion Flow
Source Code → File Scanner → Tree-sitter Parser → Semantic Chunks
↓
Token Counter
↓
Split Oversized Chunks
↓
File Classifier
↓
LanceDB ← Embeddings ← Embedding Service ← Tagged ChunksChunking Strategy: The system uses a hybrid approach optimized for the Xenova/all-MiniLM-L6-v2 model:
- AST-Based Extraction: Tree-sitter extracts semantic units (functions, classes, methods)
- Token-Aware Splitting: Large chunks exceeding 512 tokens are intelligently split:
- Splits on line boundaries (preferred)
- Falls back to sentence boundaries
- Maintains 50-token overlap for context
- Preserves metadata (file path, language, line numbers)
This ensures optimal embedding quality while maintaining semantic coherence.
Search Flow
Query → Embedding Service → Vector
↓
LanceDB Search
↓
Apply Filters (tests, libraries)
↓
Ranked Results → Format → ResponseStorage Schema
LanceDB Tables:
- Table naming:
codebase_{name}_{schemaVersion} - Example:
codebase_my-project_1_0_0
Row Structure:
{
"id": "my-project_2024-01-15T10:30:00Z_0",
"vector": [0.1, 0.2, ...],
"content": "export async function authenticate(...) { ... }",
"filePath": "src/auth.ts",
"startLine": 15,
"endLine": 45,
"language": "typescript",
"chunkType": "function",
"isTestFile": false,
"isLibraryFile": false,
"ingestionTimestamp": "2024-01-15T10:30:00Z",
"_codebaseName": "my-project",
"_path": "/path/to/project",
"_lastIngestion": "2024-01-15T10:30:00Z"
}Troubleshooting
Common Issues
1. "Command not found: mcp-codebase-search"
Problem: The package is not installed globally or not in PATH.
Solution:
# Reinstall globally
npm install -g @teknologika/mcp-codebase-search
# Or use npx
npx mcp-codebase-search2. "Failed to initialize LanceDB"
Problem: LanceDB persistence directory is not writable or corrupted.
Solution:
# Check permissions
ls -la ~/.codebase-memory/lancedb
# Reset LanceDB (WARNING: deletes all data)
rm -rf ~/.codebase-memory/lancedb
# Re-ingest codebases
mcp-codebase-ingest --path ./my-project --name my-project3. "Embedding model download failed"
Problem: Network issues or insufficient disk space.
Solution:
# Check disk space
df -h ~/.codebase-memory
# Clear model cache and retry
rm -rf ~/.codebase-memory/models
# Run ingestion again (will re-download)
mcp-codebase-ingest --path ./my-project --name my-project4. "Search returns no results"
Problem: Codebase not indexed or query too specific.
Solution:
# Verify codebase is indexed
mcp-codebase-manager
# Check the UI for your codebase
# Try broader queries
# Instead of: "validateEmailAddress"
# Try: "email validation function"5. "Manager UI won't open"
Problem: Port 8008 is already in use.
Solution:
# Check what's using port 8008
lsof -i :8008
# Kill the process or use a different port
# Edit ~/.codebase-memory/config.json
{
"server": {
"port": 8009
}
}6. "MCP client can't connect to server"
Problem: Configuration issue or server not starting.
Solution:
# Test server manually
mcp-codebase-search
# Verify configuration path
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json
# Check logs for errorsPerformance Tips
Increase batch size for faster ingestion (if you have sufficient RAM):
{ "ingestion": { "batchSize": 200 } }Adjust cache timeout for frequently repeated queries:
{ "search": { "cacheTimeoutSeconds": 120 } }Use SSD storage for LanceDB persistence directory
Exclude unnecessary files using .gitignore patterns
Development
Setup
# Clone the repository
git clone https://github.com/teknologika/mcp-codebase-search.git
cd mcp-codebase-search
# Install dependencies
npm install
# Build the project
npm run buildScripts
# Build TypeScript
npm run build
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
# Lint code
npm run lint
# Fix linting issues
npm run lint:fix
# Security audit
npm run security
# Clean build artifacts
npm run clean
# Type check without building
npm run typecheckProject Structure
src/
├── bin/ # Entry points (mcp-server, ingest, manager)
├── domains/ # Domain-specific business logic
│ ├── codebase/ # Codebase CRUD operations
│ ├── search/ # Semantic search functionality
│ ├── ingestion/ # File scanning and indexing
│ ├── embedding/ # Embedding generation
│ └── parsing/ # Tree-sitter code parsing
├── infrastructure/ # External integrations
│ ├── lancedb/ # LanceDB client wrapper
│ ├── mcp/ # MCP server implementation
│ └── fastify/ # Fastify server and routes
├── shared/ # Shared utilities
│ ├── config/ # Configuration management
│ ├── logging/ # Structured logging with Pino
│ ├── types/ # Shared TypeScript types
│ └── utils/ # Utility functions
└── ui/ # Web interface
└── manager/ # Single-page management UITesting
The project uses Vitest for testing with both unit tests and property-based tests.
Test Coverage Requirements:
- Minimum 80% statement coverage
- Minimum 80% branch coverage
- 90%+ coverage for critical paths
Run specific tests:
# Test a specific file
npm test -- src/domains/search/search.service.test.ts
# Test with coverage
npm run test:coverage
# Watch mode for TDD
npm run test:watchBuilding and Packaging
# Clean and build
npm run clean && npm run build
# Create npm package
npm pack
# Install package globally for testing
npm install -g ./teknologika-mcp-codebase-search-0.1.0.tgz
# Test commands
mcp-codebase-search --version
mcp-codebase-ingest --help
mcp-codebase-managerContributing
We welcome contributions! Here's how you can help:
Reporting Issues
- Search existing issues to avoid duplicates
- Provide details:
- Node.js version
- Operating system
- Steps to reproduce
- Expected vs actual behavior
- Error messages and logs
Submitting Pull Requests
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes:
- Follow existing code style
- Add tests for new functionality
- Update documentation
- Run tests:
npm test - Run linter:
npm run lint - Commit with clear messages:
git commit -m "feat: add new feature" - Push to your fork:
git push origin feature/my-feature - Open a pull request
Code Style
- TypeScript: Strict mode enabled
- Formatting: Follow existing patterns
- Naming: Use descriptive names (camelCase for variables, PascalCase for classes)
- Comments: Document complex logic and public APIs
- Tests: Write both unit tests and property-based tests
Commit Messages
Follow Conventional Commits:
feat:New featurefix:Bug fixdocs:Documentation changestest:Test changesrefactor:Code refactoringperf:Performance improvementschore:Build/tooling changes
Areas for Contribution
- 🌐 Language support: Add more Tree-sitter grammars
- ⚡ Performance: Optimize search and ingestion
- 🎨 UI improvements: Enhance the Manager UI
- 📚 Documentation: Improve guides and examples
- 🧪 Testing: Increase test coverage
- 🐛 Bug fixes: Fix reported issues
- 🔍 Search improvements: Better ranking algorithms
- 🏷️ File classification: More patterns for test/library detection
Security
Local-First Architecture
- ✅ No external API calls: All processing happens locally
- ✅ No telemetry: No usage data is collected or transmitted
- ✅ No cloud dependencies: Embeddings generated locally with Hugging Face Transformers
File System Security
- Path validation: All file paths are validated to prevent directory traversal
- Permission checks: Respects file system permissions
- Gitignore support: Automatically skips files in
.gitignore
Input Validation
- Schema validation: All inputs validated with Zod schemas
- Type checking: Strict TypeScript types throughout
- Sanitization: User inputs sanitized before processing
Resource Limits
- Max file size: 1MB default (configurable)
- Max results: 200 maximum per search
- Batch size limits: Prevents memory exhaustion
Network Security
- Localhost only: Manager UI binds to localhost by default
- Security headers: Helmet.js for HTTP security headers
- Session management: Secure session cookies
Recommendations
- Do not expose Manager UI to public networks
- Keep the package updated for security patches
- Run regular security audits:
npm audit - Use strong file system permissions
- Back up data regularly before major updates
License
MIT License - see LICENSE file for details.
Author
Teknologika
Acknowledgments
- Model Context Protocol - MCP specification
- LanceDB - Vector database
- Tree-sitter - Code parsing
- Hugging Face - Embedding models
- Fastify - Web framework
Questions or Issues? Open an issue on GitHub
Need Help? Check the Troubleshooting section above
