@glutamateapp/docsassist

v1.0.2

Published

a year ago

Documentation Assistant MCP server for processing and responding to documentation queries

0High
0Medium
0Low

Documentation Assistant MCP Server

A Model Context Protocol (MCP) server implementation for scraping, indexing, and searching documentation using Server-Sent Events (SSE).

Features

SSE-based communication
Documentation scraping and indexing with background job support
Full-text search capabilities with content segment highlighting
Local caching of documentation and sitemaps
Intelligent sitemap generation
Built with TypeScript and Express
Follows MCP specifications

Installation

npm install

Building

npm run build

Running the Server

# Start with default port (9031)
npm start

# Start with custom port
node dist/index.js --port=9006

# Start with custom cache directory
node dist/index.js --cache-dir=/path/to/cache

The server will start on port 9031 by default.

API Endpoints

SSE Connection: GET http://localhost:9031/sse
Message Endpoint: POST http://localhost:9031/messages

Available Tools

1. get_cache_info

Get information about the cache location and contents.

{
  // No parameters required
}

2. get_sitemap

Creates or retrieves a sitemap by crawling the site for metadata and link structure. Returns a job ID for tracking progress.

{
  url: string; // The URL of the documentation site
}

3. get_job_status

Returns the current status, progress, and results of a background job.

{
  jobId: string; // The ID of the job to check
}

4. list_jobs

Returns a list of all background jobs with their current status and progress.

{
  // No parameters required
}

5. scrape_docs

Scrapes and indexes pages matching a search query. Uses cached data if available unless forceScrape=true. Returns a job ID for tracking progress.

{
  url: string;      // The URL of the documentation to scrape
  query: string;    // The search query to match content
  maxResults?: number;  // Maximum number of matching results (default: 5)
  forceScrape?: boolean;  // Whether to force a new scrape or use cache (default: false)
}

6. search_docs

Search through previously scraped documentation across single or multiple URLs. Returns relevant matches with title, description, URL and content segments, sorted by relevance.

{
  url: string | string[];  // Single URL or array of URLs to search
  query: string;           // The search query
  maxResults?: number;     // Maximum number of results (default: 10)
  maxSegments?: number;    // Maximum number of content segments per result (default: 3)
}

Cache Directory Structure

The server uses a local cache directory to store scraped documentation and sitemaps. By default, it's located at:

.cache/
  |- {base64_url}_sitemap.json  // Sitemap cache
  |- {base64_url}.json         // Documentation content cache

You can configure the cache directory location using:

Command line: --cache-dir=/path/to/cache
Environment variable: DOCS_CACHE_DIR=/path/to/cache

Background Jobs

The server supports long-running operations through a background job system. Jobs can be:

PENDING: Job created but not started
RUNNING: Job is currently executing
COMPLETED: Job finished successfully
FAILED: Job encountered an error

Use the get_job_status tool to monitor job progress and retrieve results.

License

MIT