@maxinedotdev/saga
v1.1.0
Published
A TypeScript MCP server for document management and semantic search with embeddings, featuring web crawling, multiple AI providers, and intelligent chunking
Maintainers
Readme
Saga MCP Server
Saga is a TypeScript-based Model Context Protocol (MCP) server for local-first document management and semantic search using embeddings. It ships with LanceDB vector storage, web crawling, and optional LLM integration.
Installation
Local Development
You can install from npm or clone and link locally:
# Install from npm
npm install -g @maxinedotdev/saga
# Or clone and build
git clone https://github.com/maxinedotdev/saga.git
cd saga
npm install
npm run build
# Link globally so it's available in other MCP consumers
npm linkAfter linking, the saga command will be available globally across all VSCode windows.
Direct Path Method (Alternative)
If you prefer not to use npm link, you can reference the server directly in your MCP configuration:
{
"mcpServers": {
"saga": {
"command": "node",
"args": ["/full/path/to/saga/dist/server.js"],
"env": {
"MCP_BASE_DIR": "~/.saga",
"MCP_EMBEDDING_PROVIDER": "openai",
"MCP_EMBEDDING_BASE_URL": "http://localhost:1234",
"MCP_EMBEDDING_MODEL": "llama-nemotron-embed-1b-v2"
}
}
}
}Via npm
npm install -g @maxinedotdev/sagaQuick Start
Configure an MCP Client
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"saga": {
"command": "saga",
"env": {
"MCP_BASE_DIR": "~/.saga",
"MCP_EMBEDDING_PROVIDER": "openai",
"MCP_EMBEDDING_BASE_URL": "http://localhost:1234",
"MCP_EMBEDDING_MODEL": "llama-nemotron-embed-1b-v2"
}
}
}
}Note: If you didn't run
npm linkduring installation, use the direct path method shown in the Installation section above.
Basic Usage
- Add documents: Use
add_documenttool or place.txt/.mdfiles in the uploads folder and callprocess_uploads - Search: Use
queryfor semantic document discovery - Analyze: Use
search_documents_with_aifor LLM-powered analysis (requires LLM configuration)
Features
- Semantic Search: Vector-based search with LanceDB and HNSW indexing
- Two-Stage Retrieval: Optional cross-encoder reranking for improved result quality
- Query-First Discovery: Find relevant documents quickly with hybrid ranking (vector + keyword fallback)
- Web Crawling: Crawl public documentation with
crawl_documentation - LLM Integration: Optional AI-powered analysis via OpenAI-compatible providers (LM Studio, synthetic.new)
- Performance: LRU caching, parallel processing, streaming file reads
- Local-First: All data stored in
~/.saga/- no external services required
Reranking
Saga supports optional two-stage retrieval that improves search result quality by combining vector search with cross-encoder reranking:
- Stage 1 - Vector Search: Retrieve a larger pool of candidate results (5x the requested limit)
- Stage 2 - Reranking: Use a cross-encoder model to re-rank candidates based on semantic similarity to the query
This approach provides more accurate results, especially for:
- Multilingual queries (Norwegian, English, and mixed-language content)
- Code snippet searches
- Complex technical queries
Note: Reranking is enabled by default but can be disabled via configuration or per-query. The feature gracefully degrades to vector-only search if the reranking service is unavailable.
Database v1.0.0
Saga now uses a redesigned v1.0.0 database schema with significant improvements in performance, scalability, and data integrity.
Key Improvements
| Area | Improvement | Benefit | |------|-------------|---------| | Schema | Flattened metadata, normalized tables | Type safety, better queries | | Indexes | Dynamic IVF_PQ, scalar indexes | Fast queries, scalable | | Storage | Single source of truth (LanceDB only) | No duplication, consistency | | Memory | Optional LRU caches | Scalable, configurable | | Migration | Migrationless (manual reset) | Clear state, no legacy coupling | | Performance | <100ms query latency | Better UX |
Quick Start
New Installation
For new installations, the v1.0.0 schema is initialized automatically:
# The database will be initialized on first run
sagaLegacy Data (No Migration)
Saga v1 is migrationless. If you have legacy data, discard it and re-ingest. There is no backward compatibility, and the server will prompt you to manually delete the database when it detects a schema mismatch.
rm -rf ~/.saga/lancedbPerformance Targets
| Metric | Target | |--------|--------| | Vector search (top-10) | < 100ms | | Scalar filter (document_id) | < 10ms | | Tag filter query | < 50ms | | Keyword search | < 75ms | | Combined query | < 150ms |
Storage Layout
~/.saga/lancedb/
├── documents.lance/ # Document metadata
├── document_tags.lance/ # Tag relationships
├── document_languages.lance/# Language relationships
├── chunks.lance/ # Text chunks with embeddings
├── code_blocks.lance/ # Code blocks with embeddings
├── keywords.lance/ # Keyword inverted index
└── schema_version.lance/ # Schema trackingDocumentation
- Schema Reference - Complete schema documentation
- API Reference - LanceDBV1 API documentation
- Design Document - Detailed design rationale
Database Management
Check Database Status
# View database statistics
tsx scripts/benchmark-db.tsInitialize Fresh Database
# Initialize a new v1.0.0 database
npm run db:initDrop Database
# Remove all database data
npm run db:dropTroubleshooting
Schema/Initialization Issues
Symptom: Startup error mentions schema mismatch or missing tables.
Solutions:
- Stop the server
- Delete the database directory:
rm -rf ~/.saga/lancedb - Restart the server and re-ingest documents
Performance Issues
Symptom: Slow queries
Solutions:
- Check database metrics:
tsx scripts/benchmark-db.ts - Reduce result limit for faster queries
- Monitor with
npm run db:benchmark
Symptom: High memory usage
Solutions:
- Use pagination for large result sets
- Reduce batch size for inserts
- Close database connections when done
Available Tools
Document Management
add_document- Add a document with title, content, and metadatalist_documents- List documents with paginationget_document- Retrieve full document by IDdelete_document- Remove a document and its chunksquery- Query-first document discovery with semantic ranking
File Processing
process_uploads- Convert files in uploads folder to documentsget_uploads_path- Get the absolute uploads folder pathlist_uploads_files- List files in uploads folder
Search & Analysis
search_documents- Search chunks within a specific documentsearch_documents_with_ai- LLM-powered analysis (requires provider config)search_code_blocks- Semantic code block search across documentsget_code_blocks- Return grouped code block variants for a documentget_context_window- Get neighboring chunks for contextcrawl_documentation- Crawl public docs from a seed URLdelete_crawl_session- Remove all documents from a crawl session
Configuration
Configure via environment variables or TOML.
Saga will read a TOML config file from ~/.saga/saga.toml by default. Override the path with MCP_CONFIG_TOML or --config /path/to/saga.toml. Environment variables already set take precedence over TOML.
Example TOML:
transport = "stdio"
[http]
host = "127.0.0.1"
port = 8080
endpoint = "/mcp"
public = false
stateless = false
[env]
MCP_BASE_DIR = "~/.saga"
MCP_EMBEDDING_PROVIDER = "openai"
MCP_EMBEDDING_BASE_URL = "http://localhost:1234"
MCP_EMBEDDING_MODEL = "llama-nemotron-embed-1b-v2"Single-instance mode (recommended for multiple MCP clients)
Run one Saga process as a background service and point MCP clients at the same HTTP endpoint. This is supported across all three platforms: macOS (launchd), Linux (systemd user service), and Windows (Windows service).
- Create
~/.saga/saga.toml:
[server]
transport = "httpStream"
base_dir = "~/.saga"
[server.http]
host = "127.0.0.1"
port = 8080
endpoint = "/mcp"
public = false
stateless = false
[env]
MCP_EMBEDDING_PROVIDER = "openai"
MCP_EMBEDDING_BASE_URL = "http://127.0.0.1:1234/v1"
MCP_EMBEDDING_MODEL = "llama-nemotron-embed-1b-v2"
MCP_AI_PROVIDER = "openai"
MCP_AI_BASE_URL = "http://127.0.0.1:1234/v1"
MCP_AI_MODEL = "ministral-3-8b-instruct-2512"- Install a background service (choose your OS):
macOS (launchd):
npm run service:install:macLinux (systemd user service):
npm run service:install:linuxWindows (PowerShell + Windows service, run terminal as Administrator):
npm run service:install:windowsBy default service scripts run dist/server.js from the current Saga checkout. Override runtime path when needed:
SAGA_RUNTIME_DIR=~/Documents/git/saga-staging npm run service:install:mac
SAGA_RUNTIME_DIR=~/Documents/git/saga-staging npm run service:install:linuxWindows override example:
powershell -ExecutionPolicy Bypass -File scripts/install-windows-service.ps1 -RuntimeDir C:\path\to\saga-staging- Check service status:
macOS:
npm run service:status:macLinux:
npm run service:status:linuxWindows:
npm run service:status:windows- Set MCP clients to URL mode:
http://127.0.0.1:8080/mcpThis avoids one-process-per-client stdio spawning and keeps Saga as a single managed process.
Example MCP client configs:
Kilo (mcp_settings.json) Saga entry:
{
"saga": {
"type": "streamable-http",
"url": "http://127.0.0.1:8080/mcp",
"timeout": 600,
"disabled": false
}
}Codex (~/.codex/config.toml) Saga entry:
[mcp_servers.saga]
enabled = true
url = "http://127.0.0.1:8080/mcp"Single-instance diagnostics:
- If you see high CPU from many
Code Helper (Plugin)processes, check parent commands. Non-Saga MCP servers (for examplesynthetic-search) can still spawn per client/session. - Confirm Saga is single-instance:
- macOS:
launchctl list | rg saga-mcp - Linux:
systemctl --user status dev.maxinedot.saga-mcp.service - Windows:
Get-Service SagaMcpService
- macOS:
- Trace process parents:
ps -axo pid,ppid,pcpu,command | rg -i "dist/server.js|synthetic-search|Code Helper"
Uninstall service helpers:
npm run service:uninstall:mac
npm run service:uninstall:linux
npm run service:uninstall:windowsEnvironment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| MCP_CONFIG_TOML | Path to Saga TOML config | ~/.saga/saga.toml |
| MCP_BASE_DIR | Data storage directory | ~/.saga |
| MCP_TRANSPORT | Transport: stdio or httpStream | stdio |
| MCP_HTTP_HOST | Host for HTTP Stream transport | 127.0.0.1 |
| MCP_HTTP_PORT | Port for HTTP Stream transport | 8080 |
| MCP_HTTP_ENDPOINT | Endpoint for HTTP Stream transport | /mcp |
| MCP_HTTP_PUBLIC | Bind HTTP Stream to 0.0.0.0 when true | false |
| MCP_HTTP_STATELESS | Enable stateless HTTP Stream mode | false |
| MCP_EMBEDDING_PROVIDER | openai (OpenAI-compatible API only) | openai |
| MCP_EMBEDDING_MODEL | Embedding model name | llama-nemotron-embed-1b-v2 |
| MCP_EMBEDDING_BASE_URL | OpenAI-compatible base URL (required) | - |
| MCP_AI_BASE_URL | LLM provider URL (LM Studio/synthetic.new) | - |
| MCP_AI_MODEL | LLM model name | Provider default |
| MCP_AI_API_KEY | API key for remote providers | - |
| MCP_TAG_GENERATION_ENABLED | Auto-generate tags with AI | false |
| MCP_SIMILARITY_THRESHOLD | Min similarity score (0.0-1.0) | 0.5 |
Reranking Configuration
Reranking improves search result quality by using cross-encoder models to re-rank vector search results.
| Variable | Description | Default |
|----------|-------------|---------|
| MCP_RERANKING_ENABLED | Enable/disable reranking feature | true |
| MCP_RERANKING_PROVIDER | Reranking provider: cohere, jina, openai, custom | cohere |
| MCP_RERANKING_BASE_URL | Base URL for custom provider | (provider default) |
| MCP_RERANKING_API_KEY | API key for reranking provider | - |
| MCP_RERANKING_MODEL | Reranking model name | (provider default) |
| MCP_RERANKING_CANDIDATES | Max candidates to retrieve for reranking | 50 |
| MCP_RERANKING_TOP_K | Number of results to return after reranking | 10 |
| MCP_RERANKING_TIMEOUT | Reranking API timeout (ms) | 30000 |
Provider-Specific Defaults:
- Cohere:
https://api.cohere.ai/v1, model:rerank-multilingual-v3.0 - Jina AI:
https://api.jina.ai/v1, model:jina-reranker-v1-base-en - OpenAI:
https://api.openai.com/v1, model:gpt-4o-mini
Example Configurations:
# Cohere (recommended for multilingual)
MCP_RERANKING_ENABLED=true
MCP_RERANKING_PROVIDER=cohere
MCP_RERANKING_API_KEY=your-cohere-api-key
# Jina AI
MCP_RERANKING_ENABLED=true
MCP_RERANKING_PROVIDER=jina
MCP_RERANKING_API_KEY=your-jina-api-key
# Custom endpoint
MCP_RERANKING_ENABLED=true
MCP_RERANKING_PROVIDER=custom
MCP_RERANKING_BASE_URL=https://your-reranker.example.com/v1
MCP_RERANKING_API_KEY=your-api-key
MCP_RERANKING_MODEL=your-model-nameRequest Timeouts
The server supports configurable HTTP request timeouts to handle slow or unresponsive providers. All timeout values are in milliseconds.
Timeout Hierarchy (from highest to lowest priority):
- Operation-specific timeout (e.g.,
MCP_AI_SEARCH_TIMEOUT_MS) - Global timeout (
MCP_REQUEST_TIMEOUT_MS) - Default (30000ms = 30 seconds)
| Variable | Description | Default |
|----------|-------------|---------|
| MCP_REQUEST_TIMEOUT_MS | Global timeout for all HTTP requests | 30000 |
| MCP_AI_SEARCH_TIMEOUT_MS | Timeout for AI search requests (search_documents_with_ai) | Global timeout |
| MCP_EMBEDDING_TIMEOUT_MS | Timeout for embedding generation requests | Global timeout |
Timeout Error Behavior:
When a request exceeds its timeout, a RequestTimeoutError is thrown with details:
- Error message includes the timeout duration and URL
- The
isTimeoutproperty is set totruefor programmatic detection - Provider health tracking marks the failure and may trigger fallback to other providers (in multi-provider mode)
Example Configurations:
# Fast local setup (15 second global timeout)
MCP_REQUEST_TIMEOUT_MS=15000
# Slow remote APIs (60 second global timeout)
MCP_REQUEST_TIMEOUT_MS=60000
# Different timeouts per operation
MCP_REQUEST_TIMEOUT_MS=30000 # 30s default
MCP_AI_SEARCH_TIMEOUT_MS=120000 # 2 min for AI search (slow LLMs)
MCP_EMBEDDING_TIMEOUT_MS=45000 # 45s for embeddingsValidation:
- Values must be positive integers (e.g.,
30000, not30s) - Non-numeric, zero, or negative values are rejected with a warning
- Invalid values fall back to the next level in the hierarchy
LLM Provider Examples
LM Studio (local):
MCP_AI_BASE_URL=http://localhost:1234
MCP_AI_MODEL=ministral-3-8b-instruct-2512synthetic.new (remote):
MCP_AI_BASE_URL=https://api.synthetic.new/openai/v1
MCP_AI_API_KEY=your-api-keyTroubleshooting
MCP Server Keeps Restarting
Symptom: VS Code shows MCP server continuously restarting
Common causes:
- LanceDB data corruption in
~/.saga/lancedb/ - Embedding provider not running (e.g., LM Studio on port 1234)
- Missing or incorrect environment variables
Solutions:
- Clear LanceDB data:
rm -rf ~/.saga/lancedb/ - Verify embedding endpoint:
curl http://localhost:1234/v1/embeddings \ -H "Content-Type: application/json" \ -d '{"input": ["test"], "model": "llama-nemotron-embed-1b-v2"}' - Check VS Code MCP logs: Open Output panel and inspect the
sagaMCP server log - Restart VS Code after applying fixes
LM Studio "Unexpected endpoint or method" Errors
Symptom: LM Studio logs show repeated errors like:
Unexpected endpoint or method. (HEAD /). Returning 200 anywayCause: LM Studio is configured to use HTTP transport for the Saga MCP server, but Saga uses stdio transport by default. LM Studio attempts to ping an HTTP endpoint that doesn't exist.
Solutions:
- Configure LM Studio to use stdio transport: Ensure your LM Studio MCP configuration uses
commandandargsinstead of HTTP URL - Example correct configuration:
{ "mcpServers": { "saga": { "command": "node", "args": ["/path/to/saga/dist/server.js"], "env": { "MCP_BASE_DIR": "~/.saga", "MCP_EMBEDDING_PROVIDER": "openai", "MCP_EMBEDDING_BASE_URL": "http://localhost:1234", "MCP_EMBEDDING_MODEL": "llama-nemotron-embed-1b-v2" } } } } - Note: These errors are harmless and don't affect server functionality, but fixing the configuration will clean up the logs
Vector Index Creation Errors
Symptom: Logs show warnings about vector index creation:
Failed to create vector index on chunks: Not enough rows to train PQ. Requires 256 rows but only 33 availableCause: LanceDB's IVF_PQ indexing requires at least 256 vectors for Product Quantization training. Small datasets don't have enough data.
Solutions:
- No action needed: The server gracefully handles this by skipping index creation for small datasets
- Use HNSW indexing: Set
MCP_USE_HNSW=true(default) - HNSW works with any dataset size - Add more documents: When your dataset grows beyond 256 vectors, indexes will be created automatically
- Note: Brute force search is efficient for small datasets (< 1000 vectors), so missing indexes won't impact performance
Graceful Degradation
If the vector database fails to initialize, the server will continue running without vector search capabilities. Document management tools (add, list, delete) remain functional, but semantic search will be unavailable. Check the MCP logs to identify and resolve the underlying issue.
Reranking Issues
Symptom: Search results don't use reranking despite being enabled
Common causes:
- Missing or invalid
MCP_RERANKING_API_KEY - Reranking API endpoint unreachable
- Timeout value too low for the provider
Solutions:
- Verify API key: Ensure the API key is valid for your reranking provider
- Check endpoint connectivity:
curl https://api.cohere.ai/v1/rerank \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"query": "test", "documents": ["test"], "model": "rerank-multilingual-v3.0"}' - Increase timeout: Set
MCP_RERANKING_TIMEOUTto a higher value (e.g.,60000for 60 seconds) - Check MCP logs: Look for reranking-related errors in the MCP server logs
Symptom: Reranking causes search to be slow
Solutions:
- Reduce candidate pool: Lower
MCP_RERANKING_CANDIDATES(default: 50) - Disable for quick searches: Set
MCP_RERANKING_ENABLED=falsefor faster results - Use per-query override: Pass
useReranking: falsein query options for specific queries
Note: Reranking gracefully degrades to vector-only search if the reranking service is unavailable or times out.
LM Studio Embedding Model Loading Error
Symptom: LM Studio shows an error when Saga tries to use embeddings:
Invalid model identifier 'text-embedding-llama-nemotron-embed-1b-v2@q4_k_s'. No matching loaded model found, and just-in-time (JIT) model loading is disabled. Ensure you have this model loaded first. JIT loading can be enabled in LM Studio Server Settings.Cause: LM Studio has Just-In-Time (JIT) model loading disabled, which requires models to be pre-loaded before use. Saga requests the embedding model by name, but LM Studio cannot automatically load it because JIT loading is turned off.
Solutions:
Option 1: Enable JIT Model Loading (Recommended)
Enable JIT loading in LM Studio Server Settings to allow automatic model loading:
- Open LM Studio
- Go to Server Settings:
- Click the "Server" tab in the left sidebar
- Click "Server Settings" (gear icon)
- Enable JIT Loading:
- Find the "JIT Model Loading" or "Just-In-Time Loading" option
- Toggle it to Enabled
- Restart the LM Studio Server:
- Stop the server (if running)
- Start it again
- Verify the fix:
- Try using Saga again
- The model should now load automatically when requested
Option 2: Pre-load the Embedding Model
If you prefer to keep JIT loading disabled, manually load the model first:
- Open LM Studio
- Download the embedding model:
- Search for "text-embedding-llama-nemotron-embed-1b-v2@q4_k_s" in the model marketplace
- Download and install the model
- Load the model:
- Go to the "Local Models" tab
- Find "text-embedding-llama-nemotron-embed-1b-v2@q4_k_s"
- Click "Load" or "Start" to load the model into memory
- Keep the model loaded:
- Ensure the model remains loaded while using Saga
- If LM Studio unloads the model, you'll need to reload it
- Verify the fix:
- Try using Saga again
- The model should now be available
Recommended LM Studio Configuration for Saga
For the best experience with Saga, configure LM Studio with these settings:
# LM Studio Server Settings
- Port: 1234 (or your preferred port)
- JIT Model Loading: Enabled (recommended)
- Host: 127.0.0.1 (localhost)
- CORS: Enabled (if accessing from other applications)Why Enable JIT Loading?
- Flexibility: Automatically loads models as needed
- Convenience: No need to manually pre-load models
- Resource Management: Only loads models when they're actually used
- Better Experience: Seamless integration with Saga and other MCP servers
Troubleshooting Tips:
Check if the model is installed:
- In LM Studio, go to "Local Models"
- Search for "text-embedding-llama-nemotron-embed-1b-v2@q4_k_s"
- If not found, download it from the marketplace
Verify LM Studio server is running:
curl http://localhost:1234/v1/modelsYou should see a list of available models including the embedding model.
Test the embedding endpoint directly:
curl http://localhost:1234/v1/embeddings \ -H "Content-Type: application/json" \ -d '{"input": ["test"], "model": "llama-nemotron-embed-1b-v2"}'Check LM Studio logs:
- Open LM Studio's log panel
- Look for errors related to model loading or JIT settings
- Verify the server is listening on the correct port
Restart LM Studio:
- Sometimes a simple restart resolves configuration issues
- Stop the server, make changes, then start it again
Storage Layout
~/.saga/
├── data/ # Document JSON files
├── lancedb/ # Vector storage
└── uploads/ # Drop files here to importDevelopment
npm run dev # Development mode
npm run build # Build TypeScriptTesting
The project uses Vitest for testing. Available test commands:
npm run test:unit # Run unit tests only
npm run test:integration # Run integration tests only
npm run test:benchmark # Run performance benchmarks
npm run test:all # Run all tests
npm run test:watch # Run tests in watch mode
npm run test:coverage # Run tests with coverage reportCoverage Reporting:
- Coverage reports are generated in the
coverage/directory - HTML reports can be opened at
coverage/index.html - Coverage thresholds are enforced: 80% for statements, branches, functions, and lines
CI/CD Integration:
- JUnit XML reports are generated for CI environments
- Reports are saved to
test-results/junit.xmlwhen running in CI
Test Output Control:
- By default, console output from tests is suppressed to keep results clean and readable
- To enable verbose output for debugging, set the
MCP_VERBOSE_TESTSenvironment variable:MCP_VERBOSE_TESTS=true npm run test:all - This is useful when debugging test failures or investigating specific test behavior
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/name - Follow Conventional Commits
- Open a pull request
License
MIT - see LICENSE file
Built with FastMCP and TypeScript
