docs-hub-mcp

v1.0.14

Published

18 days ago

MCP server for documentation hub — sync from Wiki, Slack, Google Docs, Notion, Confluence to .md, git versioned, with 3-tier search

Downloads

163

0High
0Medium
0Low

ngocdd

doc-hub-mcp

MCP server for documentation hub — sync knowledge from multiple sources into version-controlled .md files with 3-tier search, designed for AI agents.

Overview

AI agents need fast, reliable access to team documentation. But docs are scattered across Wiki, Slack, Google Docs, Notion, and Confluence. doc-hub-mcp solves this by:

Syncing documentation from multiple sources into .md files
Indexing with an auto-generated knowledge graph
Serving via MCP with 3-tier search (Exact → Graph → Semantic)

Market Differentiator

| | codegraph | agentmemory | coral | doc-hub-mcp | |---|---|---|---|---| | Target | Code | Agent memory | APIs | Documentation | | Input | Source code | Agent calls | SQL | Wiki, Slack, Docs, Notion, Confluence | | Format | SQLite | Internal | Tables | .md + git | | Auto-sync | File watch | Hook | Manual | Scheduled / incremental | | Search | Symbol graph | Semantic hybrid | SQL | 3-tier (5ms → 20ms → 1s) |

Features

5 sync adapters — Git Wiki, Google Docs, Slack, Notion, Confluence → .md
3-tier search — Exact (ripgrep, ~5ms) → Knowledge Graph (YAML, ~20ms) → AI Semantic (TF-IDF, ~25ms)
Incremental sync — Only sync changes since last run, with per-source state tracking
Git versioned — Auto-commit after each sync, full history and audit trail
MCP server — 3 tools: search_knowledge, read_document, get_document_structure
Web UI — Browser dashboard for search, browse, and sync management
LRU cache — Repeat queries return in <1ms
Zero external dependencies — No vector database (Pinecone/Weaviate/Chroma), no GPU required

Quick Start

# Install globally
npm install -g doc-hub-mcp

# Initialize a new project
dhm init

# Edit config.json to add your sources, then sync
dhm sync

# Build the search index
dhm index

# Run a search
dhm search "how to deploy to production"

# Start the Web UI
dhm web

# Or start the MCP server for AI agents
dhm serve

CLI Reference

dhm init                  # Create config.json + knowledge-base/
dhm sync [--incremental]  # Sync all sources (or incremental)
dhm index                 # Rebuild .dhm-index.yaml
dhm search "query"        # Run 3-tier search
  --tier 1|2|3            #   Force tier
  --max 10                #   Max results
  --json                  #   JSON output
dhm serve                 # Start MCP server
dhm web                   # Start Web UI (http://localhost:3456)
dhm prewarm               # Pre-warm cache

Configuration

{
  "knowledgeBase": "./knowledge-base",
  "maxResults": 10,
  "minResults": 3,
  "cache": { "maxSize": 100, "ttl": 3600 },
  "prewarm": { "queries": ["deploy", "api", "config"] },
  "sources": [
    { "type": "git-wiki", "name": "company-wiki", "enabled": true, "url": "https://github.com/company/wiki.git", "branch": "main" },
    { "type": "google-docs", "name": "team-docs", "enabled": false, "credentialsFile": "./credentials.json", "folderId": "xxx" },
    { "type": "slack", "name": "engineering-slack", "enabled": false },
    { "type": "notion", "name": "product-docs", "enabled": false, "url": "your-database-id" },
    { "type": "confluence", "name": "company-confluence", "enabled": false, "url": "https://company.atlassian.net/wiki", "folderId": "SPACEKEY" }
  ]
}

MCP Integration

Add to Claude Desktop or Cursor mcp.json:

{
  "mcpServers": {
    "doc-hub-mcp": {
      "command": "npx",
      "args": ["doc-hub-mcp", "serve"],
      "cwd": "/path/to/your/doc-hub"
    }
  }
}

MCP Tools

| Tool | Description | |---|---| | search_knowledge | 3-tier search returning ranked results with snippets | | read_document | Read full content of a document by relative path | | get_document_structure | Browse directory tree, source list, and sync status |

Environment Variables

| Variable | Adapter | Required | |---|---|---| | SLACK_BOT_TOKEN | Slack | Yes | | NOTION_TOKEN | Notion | Yes | | CONFLUENCE_URL | Confluence | Yes | | CONFLUENCE_API_TOKEN | Confluence | Yes | | GOOGLE_ACCESS_TOKEN | Google Docs | Yes | | DHM_LOG_LEVEL | All | No (default: info) | | DHM_PORT | Web UI | No (default: 3456) |

Docker

docker compose up -d

Search Architecture

Agent query
    │
    ├── Tier 1: Exact Match (ripgrep)         ~5ms   ← 80% of queries
    │       ↓ miss
    ├── Tier 2: Knowledge Graph (.dhm-index)  ~20ms  ← 15% of queries
    │       ↓ still insufficient
    └── Tier 3: AI Semantic (TF-IDF)          ~25ms  ←  5% of queries
            │
            ▼
      Deduplicate + Rank (Tier1=1.0, Tier2=0.8, Tier3=0.5)

Project Structure

src/
├── cli/          # CLI commands + config loader (zod-validated)
├── mcp/          # MCP server + 3 tool implementations
├── search/       # 3-tier engine: exact, graph, semantic, cache, pipeline
├── index/        # YAML index builder + markdown parser
├── sync/         # 5 adapters: git-wiki, google-docs, slack, notion, confluence
├── web/          # Express server + browser UI
└── utils/        # FS, git, logger utilities

Documentation

Quickstart Guide — Set up and first sync in 5 minutes
Architecture Guide — Detailed design and data flow
Contributing Guide — Conventions, testing, PR process

License

MIT