@maxinedotdev/saga

v1.1.0

Published

13 days ago

A TypeScript MCP server for document management and semantic search with embeddings, featuring web crawling, multiple AI providers, and intelligent chunking

0High
0Medium
0Low

maxinedotdev

mcp model-context-protocol documentation search embeddings

GitHub Release npm version

Saga MCP Server

npm package

Saga is a TypeScript-based Model Context Protocol (MCP) server for local-first document management and semantic search using embeddings. It ships with LanceDB vector storage, web crawling, and optional LLM integration.

Installation

Local Development

You can install from npm or clone and link locally:

# Install from npm
npm install -g @maxinedotdev/saga

# Or clone and build
git clone https://github.com/maxinedotdev/saga.git
cd saga
npm install
npm run build

# Link globally so it's available in other MCP consumers
npm link

After linking, the saga command will be available globally across all VSCode windows.

Direct Path Method (Alternative)

If you prefer not to use npm link, you can reference the server directly in your MCP configuration:

{
  "mcpServers": {
    "saga": {
      "command": "node",
      "args": ["/full/path/to/saga/dist/server.js"],
      "env": {
        "MCP_BASE_DIR": "~/.saga",
        "MCP_EMBEDDING_PROVIDER": "openai",
        "MCP_EMBEDDING_BASE_URL": "http://localhost:1234",
        "MCP_EMBEDDING_MODEL": "llama-nemotron-embed-1b-v2"
      }
    }
  }
}

Via npm

npm install -g @maxinedotdev/saga

Quick Start

Configure an MCP Client

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "saga": {
      "command": "saga",
      "env": {
        "MCP_BASE_DIR": "~/.saga",
        "MCP_EMBEDDING_PROVIDER": "openai",
        "MCP_EMBEDDING_BASE_URL": "http://localhost:1234",
        "MCP_EMBEDDING_MODEL": "llama-nemotron-embed-1b-v2"
      }
    }
  }
}

Note: If you didn't run npm link during installation, use the direct path method shown in the Installation section above.

Basic Usage

Add documents: Use add_document tool or place .txt/.md files in the uploads folder and call process_uploads
Search: Use query for semantic document discovery
Analyze: Use search_documents_with_ai for LLM-powered analysis (requires LLM configuration)

Features

Semantic Search: Vector-based search with LanceDB and HNSW indexing
Two-Stage Retrieval: Optional cross-encoder reranking for improved result quality
Query-First Discovery: Find relevant documents quickly with hybrid ranking (vector + keyword fallback)
Web Crawling: Crawl public documentation with crawl_documentation
LLM Integration: Optional AI-powered analysis via OpenAI-compatible providers (LM Studio, synthetic.new)
Performance: LRU caching, parallel processing, streaming file reads
Local-First: All data stored in ~/.saga/ - no external services required

Reranking

Saga supports optional two-stage retrieval that improves search result quality by combining vector search with cross-encoder reranking:

Stage 1 - Vector Search: Retrieve a larger pool of candidate results (5x the requested limit)
Stage 2 - Reranking: Use a cross-encoder model to re-rank candidates based on semantic similarity to the query

This approach provides more accurate results, especially for:

Multilingual queries (Norwegian, English, and mixed-language content)
Code snippet searches
Complex technical queries

Note: Reranking is enabled by default but can be disabled via configuration or per-query. The feature gracefully degrades to vector-only search if the reranking service is unavailable.

Database v1.0.0

Saga now uses a redesigned v1.0.0 database schema with significant improvements in performance, scalability, and data integrity.

Key Improvements

| Area | Improvement | Benefit | |------|-------------|---------| | Schema | Flattened metadata, normalized tables | Type safety, better queries | | Indexes | Dynamic IVF_PQ, scalar indexes | Fast queries, scalable | | Storage | Single source of truth (LanceDB only) | No duplication, consistency | | Memory | Optional LRU caches | Scalable, configurable | | Migration | Migrationless (manual reset) | Clear state, no legacy coupling | | Performance | <100ms query latency | Better UX |

Quick Start

New Installation

For new installations, the v1.0.0 schema is initialized automatically:

# The database will be initialized on first run
saga

Legacy Data (No Migration)

Saga v1 is migrationless. If you have legacy data, discard it and re-ingest. There is no backward compatibility, and the server will prompt you to manually delete the database when it detects a schema mismatch.

rm -rf ~/.saga/lancedb

Performance Targets

| Metric | Target | |--------|--------| | Vector search (top-10) | < 100ms | | Scalar filter (document_id) | < 10ms | | Tag filter query | < 50ms | | Keyword search | < 75ms | | Combined query | < 150ms |

Storage Layout

~/.saga/lancedb/
├── documents.lance/         # Document metadata
├── document_tags.lance/     # Tag relationships
├── document_languages.lance/# Language relationships
├── chunks.lance/            # Text chunks with embeddings
├── code_blocks.lance/       # Code blocks with embeddings
├── keywords.lance/          # Keyword inverted index
└── schema_version.lance/    # Schema tracking

Documentation

Schema Reference - Complete schema documentation
API Reference - LanceDBV1 API documentation
Design Document - Detailed design rationale

Database Management

Check Database Status

# View database statistics
tsx scripts/benchmark-db.ts

Initialize Fresh Database

# Initialize a new v1.0.0 database
npm run db:init

Drop Database

# Remove all database data
npm run db:drop

Troubleshooting

Schema/Initialization Issues

Symptom: Startup error mentions schema mismatch or missing tables.

Solutions:

Stop the server
Delete the database directory:
```
rm -rf ~/.saga/lancedb
```
Restart the server and re-ingest documents

Performance Issues

Symptom: Slow queries

Solutions:

Check database metrics: tsx scripts/benchmark-db.ts
Reduce result limit for faster queries
Monitor with npm run db:benchmark

Symptom: High memory usage

Solutions:

Use pagination for large result sets
Reduce batch size for inserts
Close database connections when done

Available Tools

Document Management

add_document - Add a document with title, content, and metadata
list_documents - List documents with pagination
get_document - Retrieve full document by ID
delete_document - Remove a document and its chunks
query - Query-first document discovery with semantic ranking

File Processing

process_uploads - Convert files in uploads folder to documents
get_uploads_path - Get the absolute uploads folder path
list_uploads_files - List files in uploads folder

Search & Analysis

search_documents - Search chunks within a specific document
search_documents_with_ai - LLM-powered analysis (requires provider config)
search_code_blocks - Semantic code block search across documents
get_code_blocks - Return grouped code block variants for a document
get_context_window - Get neighboring chunks for context
crawl_documentation - Crawl public docs from a seed URL
delete_crawl_session - Remove all documents from a crawl session

Configuration

Configure via environment variables or TOML.

Saga will read a TOML config file from ~/.saga/saga.toml by default. Override the path with MCP_CONFIG_TOML or --config /path/to/saga.toml. Environment variables already set take precedence over TOML.

Example TOML:

transport = "stdio"

[http]
host = "127.0.0.1"
port = 8080
endpoint = "/mcp"
public = false
stateless = false

[env]
MCP_BASE_DIR = "~/.saga"
MCP_EMBEDDING_PROVIDER = "openai"
MCP_EMBEDDING_BASE_URL = "http://localhost:1234"
MCP_EMBEDDING_MODEL = "llama-nemotron-embed-1b-v2"

Single-instance mode (recommended for multiple MCP clients)

Run one Saga process as a background service and point MCP clients at the same HTTP endpoint. This is supported across all three platforms: macOS (launchd), Linux (systemd user service), and Windows (Windows service).

Create ~/.saga/saga.toml:

[server]
transport = "httpStream"
base_dir = "~/.saga"

[server.http]
host = "127.0.0.1"
port = 8080
endpoint = "/mcp"
public = false
stateless = false

[env]
MCP_EMBEDDING_PROVIDER = "openai"
MCP_EMBEDDING_BASE_URL = "http://127.0.0.1:1234/v1"
MCP_EMBEDDING_MODEL = "llama-nemotron-embed-1b-v2"
MCP_AI_PROVIDER = "openai"
MCP_AI_BASE_URL = "http://127.0.0.1:1234/v1"
MCP_AI_MODEL = "ministral-3-8b-instruct-2512"

Install a background service (choose your OS):

macOS (launchd):

npm run service:install:mac

Linux (systemd user service):

npm run service:install:linux

Windows (PowerShell + Windows service, run terminal as Administrator):

npm run service:install:windows

By default service scripts run dist/server.js from the current Saga checkout. Override runtime path when needed:

SAGA_RUNTIME_DIR=~/Documents/git/saga-staging npm run service:install:mac
SAGA_RUNTIME_DIR=~/Documents/git/saga-staging npm run service:install:linux

Windows override example:

powershell -ExecutionPolicy Bypass -File scripts/install-windows-service.ps1 -RuntimeDir C:\path\to\saga-staging

Check service status:

macOS:

npm run service:status:mac

Linux:

npm run service:status:linux

Windows:

npm run service:status:windows

Set MCP clients to URL mode:

http://127.0.0.1:8080/mcp

This avoids one-process-per-client stdio spawning and keeps Saga as a single managed process.

Example MCP client configs:

Kilo (mcp_settings.json) Saga entry:

{
  "saga": {
    "type": "streamable-http",
    "url": "http://127.0.0.1:8080/mcp",
    "timeout": 600,
    "disabled": false
  }
}

Codex (~/.codex/config.toml) Saga entry:

[mcp_servers.saga]
enabled = true
url = "http://127.0.0.1:8080/mcp"

Single-instance diagnostics:

If you see high CPU from many Code Helper (Plugin) processes, check parent commands. Non-Saga MCP servers (for example synthetic-search) can still spawn per client/session.
Confirm Saga is single-instance:
- macOS: launchctl list | rg saga-mcp
- Linux: systemctl --user status dev.maxinedot.saga-mcp.service
- Windows: Get-Service SagaMcpService
Trace process parents:
- ps -axo pid,ppid,pcpu,command | rg -i "dist/server.js|synthetic-search|Code Helper"

Uninstall service helpers:

npm run service:uninstall:mac
npm run service:uninstall:linux
npm run service:uninstall:windows

Environment variables:

| Variable | Description | Default | |----------|-------------|---------| | MCP_CONFIG_TOML | Path to Saga TOML config | ~/.saga/saga.toml | | MCP_BASE_DIR | Data storage directory | ~/.saga | | MCP_TRANSPORT | Transport: stdio or httpStream | stdio | | MCP_HTTP_HOST | Host for HTTP Stream transport | 127.0.0.1 | | MCP_HTTP_PORT | Port for HTTP Stream transport | 8080 | | MCP_HTTP_ENDPOINT | Endpoint for HTTP Stream transport | /mcp | | MCP_HTTP_PUBLIC | Bind HTTP Stream to 0.0.0.0 when true | false | | MCP_HTTP_STATELESS | Enable stateless HTTP Stream mode | false | | MCP_EMBEDDING_PROVIDER | openai (OpenAI-compatible API only) | openai | | MCP_EMBEDDING_MODEL | Embedding model name | llama-nemotron-embed-1b-v2 | | MCP_EMBEDDING_BASE_URL | OpenAI-compatible base URL (required) | - | | MCP_AI_BASE_URL | LLM provider URL (LM Studio/synthetic.new) | - | | MCP_AI_MODEL | LLM model name | Provider default | | MCP_AI_API_KEY | API key for remote providers | - | | MCP_TAG_GENERATION_ENABLED | Auto-generate tags with AI | false | | MCP_SIMILARITY_THRESHOLD | Min similarity score (0.0-1.0) | 0.5 |

Reranking Configuration

Reranking improves search result quality by using cross-encoder models to re-rank vector search results.

| Variable | Description | Default | |----------|-------------|---------| | MCP_RERANKING_ENABLED | Enable/disable reranking feature | true | | MCP_RERANKING_PROVIDER | Reranking provider: cohere, jina, openai, custom | cohere | | MCP_RERANKING_BASE_URL | Base URL for custom provider | (provider default) | | MCP_RERANKING_API_KEY | API key for reranking provider | - | | MCP_RERANKING_MODEL | Reranking model name | (provider default) | | MCP_RERANKING_CANDIDATES | Max candidates to retrieve for reranking | 50 | | MCP_RERANKING_TOP_K | Number of results to return after reranking | 10 | | MCP_RERANKING_TIMEOUT | Reranking API timeout (ms) | 30000 |

Provider-Specific Defaults:

Cohere: https://api.cohere.ai/v1, model: rerank-multilingual-v3.0
Jina AI: https://api.jina.ai/v1, model: jina-reranker-v1-base-en
OpenAI: https://api.openai.com/v1, model: gpt-4o-mini

Example Configurations:

# Cohere (recommended for multilingual)
MCP_RERANKING_ENABLED=true
MCP_RERANKING_PROVIDER=cohere
MCP_RERANKING_API_KEY=your-cohere-api-key

# Jina AI
MCP_RERANKING_ENABLED=true
MCP_RERANKING_PROVIDER=jina
MCP_RERANKING_API_KEY=your-jina-api-key

# Custom endpoint
MCP_RERANKING_ENABLED=true
MCP_RERANKING_PROVIDER=custom
MCP_RERANKING_BASE_URL=https://your-reranker.example.com/v1
MCP_RERANKING_API_KEY=your-api-key
MCP_RERANKING_MODEL=your-model-name

Request Timeouts

The server supports configurable HTTP request timeouts to handle slow or unresponsive providers. All timeout values are in milliseconds.

Timeout Hierarchy (from highest to lowest priority):

Operation-specific timeout (e.g., MCP_AI_SEARCH_TIMEOUT_MS)
Global timeout (MCP_REQUEST_TIMEOUT_MS)
Default (30000ms = 30 seconds)

| Variable | Description | Default | |----------|-------------|---------| | MCP_REQUEST_TIMEOUT_MS | Global timeout for all HTTP requests | 30000 | | MCP_AI_SEARCH_TIMEOUT_MS | Timeout for AI search requests (search_documents_with_ai) | Global timeout | | MCP_EMBEDDING_TIMEOUT_MS | Timeout for embedding generation requests | Global timeout |

Timeout Error Behavior:

When a request exceeds its timeout, a RequestTimeoutError is thrown with details:

Error message includes the timeout duration and URL
The isTimeout property is set to true for programmatic detection
Provider health tracking marks the failure and may trigger fallback to other providers (in multi-provider mode)

Example Configurations:

# Fast local setup (15 second global timeout)
MCP_REQUEST_TIMEOUT_MS=15000

# Slow remote APIs (60 second global timeout)
MCP_REQUEST_TIMEOUT_MS=60000

# Different timeouts per operation
MCP_REQUEST_TIMEOUT_MS=30000        # 30s default
MCP_AI_SEARCH_TIMEOUT_MS=120000     # 2 min for AI search (slow LLMs)
MCP_EMBEDDING_TIMEOUT_MS=45000      # 45s for embeddings

Validation:

Values must be positive integers (e.g., 30000, not 30s)
Non-numeric, zero, or negative values are rejected with a warning
Invalid values fall back to the next level in the hierarchy

LLM Provider Examples

LM Studio (local):

MCP_AI_BASE_URL=http://localhost:1234
MCP_AI_MODEL=ministral-3-8b-instruct-2512

synthetic.new (remote):

MCP_AI_BASE_URL=https://api.synthetic.new/openai/v1
MCP_AI_API_KEY=your-api-key

Troubleshooting

MCP Server Keeps Restarting

Symptom: VS Code shows MCP server continuously restarting

Common causes:

LanceDB data corruption in ~/.saga/lancedb/
Embedding provider not running (e.g., LM Studio on port 1234)
Missing or incorrect environment variables

Solutions:

Clear LanceDB data: rm -rf ~/.saga/lancedb/

Verify embedding endpoint:

curl http://localhost:1234/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": ["test"], "model": "llama-nemotron-embed-1b-v2"}'

Check VS Code MCP logs: Open Output panel and inspect the saga MCP server log
Restart VS Code after applying fixes

LM Studio "Unexpected endpoint or method" Errors

Symptom: LM Studio logs show repeated errors like:

Unexpected endpoint or method. (HEAD /). Returning 200 anyway

Cause: LM Studio is configured to use HTTP transport for the Saga MCP server, but Saga uses stdio transport by default. LM Studio attempts to ping an HTTP endpoint that doesn't exist.

Solutions:

Configure LM Studio to use stdio transport: Ensure your LM Studio MCP configuration uses command and args instead of HTTP URL

Example correct configuration:

{
  "mcpServers": {
    "saga": {
      "command": "node",
      "args": ["/path/to/saga/dist/server.js"],
      "env": {
        "MCP_BASE_DIR": "~/.saga",
        "MCP_EMBEDDING_PROVIDER": "openai",
        "MCP_EMBEDDING_BASE_URL": "http://localhost:1234",
        "MCP_EMBEDDING_MODEL": "llama-nemotron-embed-1b-v2"
      }
    }
  }
}

Note: These errors are harmless and don't affect server functionality, but fixing the configuration will clean up the logs

Vector Index Creation Errors

Symptom: Logs show warnings about vector index creation:

Failed to create vector index on chunks: Not enough rows to train PQ. Requires 256 rows but only 33 available

Cause: LanceDB's IVF_PQ indexing requires at least 256 vectors for Product Quantization training. Small datasets don't have enough data.

Solutions:

No action needed: The server gracefully handles this by skipping index creation for small datasets
Use HNSW indexing: Set MCP_USE_HNSW=true (default) - HNSW works with any dataset size
Add more documents: When your dataset grows beyond 256 vectors, indexes will be created automatically
Note: Brute force search is efficient for small datasets (< 1000 vectors), so missing indexes won't impact performance

Graceful Degradation

If the vector database fails to initialize, the server will continue running without vector search capabilities. Document management tools (add, list, delete) remain functional, but semantic search will be unavailable. Check the MCP logs to identify and resolve the underlying issue.

Reranking Issues

Symptom: Search results don't use reranking despite being enabled

Common causes:

Missing or invalid MCP_RERANKING_API_KEY
Reranking API endpoint unreachable
Timeout value too low for the provider

Solutions:

Verify API key: Ensure the API key is valid for your reranking provider

Check endpoint connectivity:

curl https://api.cohere.ai/v1/rerank \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "test", "documents": ["test"], "model": "rerank-multilingual-v3.0"}'

Increase timeout: Set MCP_RERANKING_TIMEOUT to a higher value (e.g., 60000 for 60 seconds)
Check MCP logs: Look for reranking-related errors in the MCP server logs

Symptom: Reranking causes search to be slow

Solutions:

Reduce candidate pool: Lower MCP_RERANKING_CANDIDATES (default: 50)
Disable for quick searches: Set MCP_RERANKING_ENABLED=false for faster results
Use per-query override: Pass useReranking: false in query options for specific queries

Note: Reranking gracefully degrades to vector-only search if the reranking service is unavailable or times out.

LM Studio Embedding Model Loading Error

Symptom: LM Studio shows an error when Saga tries to use embeddings:

Invalid model identifier 'text-embedding-llama-nemotron-embed-1b-v2@q4_k_s'. No matching loaded model found, and just-in-time (JIT) model loading is disabled. Ensure you have this model loaded first. JIT loading can be enabled in LM Studio Server Settings.

Cause: LM Studio has Just-In-Time (JIT) model loading disabled, which requires models to be pre-loaded before use. Saga requests the embedding model by name, but LM Studio cannot automatically load it because JIT loading is turned off.

Solutions:

Option 1: Enable JIT Model Loading (Recommended)

Enable JIT loading in LM Studio Server Settings to allow automatic model loading:

Open LM Studio
Go to Server Settings:
- Click the "Server" tab in the left sidebar
- Click "Server Settings" (gear icon)
Enable JIT Loading:
- Find the "JIT Model Loading" or "Just-In-Time Loading" option
- Toggle it to Enabled
Restart the LM Studio Server:
- Stop the server (if running)
- Start it again
Verify the fix:
- Try using Saga again
- The model should now load automatically when requested

Option 2: Pre-load the Embedding Model

If you prefer to keep JIT loading disabled, manually load the model first:

Open LM Studio
Download the embedding model:
- Search for "text-embedding-llama-nemotron-embed-1b-v2@q4_k_s" in the model marketplace
- Download and install the model
Load the model:
- Go to the "Local Models" tab
- Find "text-embedding-llama-nemotron-embed-1b-v2@q4_k_s"
- Click "Load" or "Start" to load the model into memory
Keep the model loaded:
- Ensure the model remains loaded while using Saga
- If LM Studio unloads the model, you'll need to reload it
Verify the fix:
- Try using Saga again
- The model should now be available

Recommended LM Studio Configuration for Saga

For the best experience with Saga, configure LM Studio with these settings:

# LM Studio Server Settings
- Port: 1234 (or your preferred port)
- JIT Model Loading: Enabled (recommended)
- Host: 127.0.0.1 (localhost)
- CORS: Enabled (if accessing from other applications)

Why Enable JIT Loading?

Flexibility: Automatically loads models as needed
Convenience: No need to manually pre-load models
Resource Management: Only loads models when they're actually used
Better Experience: Seamless integration with Saga and other MCP servers

Troubleshooting Tips:

Check if the model is installed:
- In LM Studio, go to "Local Models"
- Search for "text-embedding-llama-nemotron-embed-1b-v2@q4_k_s"
- If not found, download it from the marketplace
Verify LM Studio server is running:
```
curl http://localhost:1234/v1/models
```
You should see a list of available models including the embedding model.

Test the embedding endpoint directly:

curl http://localhost:1234/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": ["test"], "model": "llama-nemotron-embed-1b-v2"}'

Check LM Studio logs:
- Open LM Studio's log panel
- Look for errors related to model loading or JIT settings
- Verify the server is listening on the correct port
Restart LM Studio:
- Sometimes a simple restart resolves configuration issues
- Stop the server, make changes, then start it again

Storage Layout

~/.saga/
├── data/        # Document JSON files
├── lancedb/     # Vector storage
└── uploads/     # Drop files here to import

Development

npm run dev      # Development mode
npm run build    # Build TypeScript

Testing

The project uses Vitest for testing. Available test commands:

npm run test:unit        # Run unit tests only
npm run test:integration # Run integration tests only
npm run test:benchmark   # Run performance benchmarks
npm run test:all         # Run all tests
npm run test:watch       # Run tests in watch mode
npm run test:coverage    # Run tests with coverage report

Coverage Reporting:

Coverage reports are generated in the coverage/ directory
HTML reports can be opened at coverage/index.html
Coverage thresholds are enforced: 80% for statements, branches, functions, and lines

CI/CD Integration:

JUnit XML reports are generated for CI environments
Reports are saved to test-results/junit.xml when running in CI

Test Output Control:

By default, console output from tests is suppressed to keep results clean and readable
To enable verbose output for debugging, set the MCP_VERBOSE_TESTS environment variable:
```
MCP_VERBOSE_TESTS=true npm run test:all
```
This is useful when debugging test failures or investigating specific test behavior

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/name
Follow Conventional Commits
Open a pull request

License

MIT - see LICENSE file

Built with FastMCP and TypeScript

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Saga MCP Server

Installation

Local Development

Direct Path Method (Alternative)

Via npm

Quick Start

Configure an MCP Client

Basic Usage

Features

Reranking

Database v1.0.0

Key Improvements

Quick Start

New Installation

Legacy Data (No Migration)

Performance Targets

Storage Layout

Documentation

Database Management

Check Database Status

Initialize Fresh Database

Drop Database

Troubleshooting

Schema/Initialization Issues

Performance Issues

Available Tools

Document Management

File Processing

Search & Analysis

Configuration

Single-instance mode (recommended for multiple MCP clients)

Reranking Configuration

Request Timeouts

LLM Provider Examples

Troubleshooting

MCP Server Keeps Restarting

LM Studio "Unexpected endpoint or method" Errors

Vector Index Creation Errors

Graceful Degradation

Reranking Issues

LM Studio Embedding Model Loading Error

Option 1: Enable JIT Model Loading (Recommended)

Option 2: Pre-load the Embedding Model

Recommended LM Studio Configuration for Saga

Storage Layout

Development

Testing

Contributing

License