@glutamateapp/docsassist
v1.0.2
Published
Documentation Assistant MCP server for processing and responding to documentation queries
Readme
Documentation Assistant MCP Server
A Model Context Protocol (MCP) server implementation for scraping, indexing, and searching documentation using Server-Sent Events (SSE).
Features
- SSE-based communication
- Documentation scraping and indexing with background job support
- Full-text search capabilities with content segment highlighting
- Local caching of documentation and sitemaps
- Intelligent sitemap generation
- Built with TypeScript and Express
- Follows MCP specifications
Installation
npm installBuilding
npm run buildRunning the Server
# Start with default port (9031)
npm start
# Start with custom port
node dist/index.js --port=9006
# Start with custom cache directory
node dist/index.js --cache-dir=/path/to/cacheThe server will start on port 9031 by default.
API Endpoints
- SSE Connection:
GET http://localhost:9031/sse - Message Endpoint:
POST http://localhost:9031/messages
Available Tools
1. get_cache_info
Get information about the cache location and contents.
{
// No parameters required
}2. get_sitemap
Creates or retrieves a sitemap by crawling the site for metadata and link structure. Returns a job ID for tracking progress.
{
url: string; // The URL of the documentation site
}3. get_job_status
Returns the current status, progress, and results of a background job.
{
jobId: string; // The ID of the job to check
}4. list_jobs
Returns a list of all background jobs with their current status and progress.
{
// No parameters required
}5. scrape_docs
Scrapes and indexes pages matching a search query. Uses cached data if available unless forceScrape=true. Returns a job ID for tracking progress.
{
url: string; // The URL of the documentation to scrape
query: string; // The search query to match content
maxResults?: number; // Maximum number of matching results (default: 5)
forceScrape?: boolean; // Whether to force a new scrape or use cache (default: false)
}6. search_docs
Search through previously scraped documentation across single or multiple URLs. Returns relevant matches with title, description, URL and content segments, sorted by relevance.
{
url: string | string[]; // Single URL or array of URLs to search
query: string; // The search query
maxResults?: number; // Maximum number of results (default: 10)
maxSegments?: number; // Maximum number of content segments per result (default: 3)
}Cache Directory Structure
The server uses a local cache directory to store scraped documentation and sitemaps. By default, it's located at:
.cache/
|- {base64_url}_sitemap.json // Sitemap cache
|- {base64_url}.json // Documentation content cacheYou can configure the cache directory location using:
- Command line:
--cache-dir=/path/to/cache - Environment variable:
DOCS_CACHE_DIR=/path/to/cache
Background Jobs
The server supports long-running operations through a background job system. Jobs can be:
- PENDING: Job created but not started
- RUNNING: Job is currently executing
- COMPLETED: Job finished successfully
- FAILED: Job encountered an error
Use the get_job_status tool to monitor job progress and retrieve results.
License
MIT
