wso2-docs-mcp-server

v1.2.0

Published

2 months ago

Production-ready MCP server providing RAG-based access to WSO2 documentation

Downloads

0High
0Medium
0Low

iamvirul

mcp modelcontextprotocol wso2 rag vector docs ai

WSO2 Docs MCP Server

A production-ready Model Context Protocol (MCP) server that provides AI assistants (Claude Desktop, Claude Code, Cursor, VS Code) with semantic search over WSO2 documentation via Retrieval-Augmented Generation (RAG).

Under the hood, it uses a blazing-fast dual-ingestion engine:

GitHub Native: Fetches raw Markdown directly from WSO2's public GitHub repositories via the Git Trees API (avoids web-scraping noise and rate limits)
Web Crawl Fallback: For products without dedicated GitHub docs repos (like the WSO2 Library)

Architecture

flowchart TD
    %% Styling
    classDef github fill:#1f2328,color:#fff,stroke:#e1e4e8
    classDef default fill:#0969da,color:#fff,stroke:#0969da
    classDef db fill:#218bff,color:#fff,stroke:#218bff
    classDef web fill:#0969da,color:#fff,stroke:#0969da

    subgraph Sources ["Information Sources"]
        GH["GitHub Repos\n(wso2/docs-apim, etc.)"]:::github
        Web["WSO2 Websites\n(ballerina.io, lib)"]:::web
    end

    subgraph Ingestion ["Dual-Ingestion Pipeline"]
        A["GitHubDocFetcher\n(Git Trees API)"] 
        B["DocCrawler\n(HTML scraping)"]
        
        C["MarkdownParser\n(Front-matter & Heads)"]
        D["DocParser\n(Cheerio HTML parsing)"]
        
        E["DocChunker\n(Semantic Splitting)"]
    end

    subgraph Embedding ["Vectorization & Storage"]
        F["EmbedderFactory\n(Ollama / HuggingFace)"]
        G[("pgvector\n(PostgreSQL)")]:::db
    end

    %% Flow
    GH --> A
    Web --> B
    
    A -->|"Raw .md"| C
    B -->|"Clean HTML"| D
    
    C -->|"ParsedSection[]"| E
    D -->|"ParsedSection[]"| E
    
    E -->|"Tokens/Chunks"| F
    F -->|"768-dim Vectors"| G

Documentation Sources

| Product | URL | |---|---| | API Manager | https://apim.docs.wso2.com | | Micro Integrator | https://mi.docs.wso2.com/en/4.4.0 | | Choreo | https://wso2.com/choreo/docs | | Ballerina | https://ballerina.io/learn | | Ballerina Integrator | https://bi.docs.wso2.com | | WSO2 Library | https://wso2.com/library |

Prerequisites

Node.js ≥ 20
Docker (for pgvector)
Embeddings — no API key required by default:
- Ollama (recommended) — runs locally, model auto-downloaded on first run
- If Ollama is not running, the server automatically falls back to HuggingFace ONNX (in-process, also downloads automatically)
- Cloud providers are also supported: OpenAI, Google Gemini, Voyage AI

Quick Start

Choose the setup path that fits your use case:

Install from npm — simplest, no cloning required
Clone and build — for development or contributions

Install from npm

Install the package globally to get the wso2-docs-mcp-server, wso2-docs-crawl, and wso2-docs-migrate commands available system-wide:

npm install -g wso2-docs-mcp-server

Prefer no global install? You can use npx wso2-docs-mcp-server, npx wso2-docs-crawl, and npx wso2-docs-migrate in every step below — just replace the bare command with its npx equivalent.

1. Start pgvector

Download the docker-compose.yml and start the database:

curl -O https://raw.githubusercontent.com/iamvirul/wso2-docs-mcp-server/main/docker-compose.yml
docker compose up -d

2. Start Ollama (optional but recommended)

Install Ollama and pull the default embedding model:

ollama pull nomic-embed-text
ollama serve

No Ollama? Skip this step. The server automatically falls back to HuggingFace ONNX — model downloads on first use with no extra setup.

3. Run database migration

DATABASE_URL="postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs" \
  wso2-docs-migrate

Run migration again whenever you change EMBEDDING_DIMENSIONS (i.e. switch embedding provider). The script detects and handles dimension changes automatically.

4. Index WSO2 documentation

# Index all products (first run downloads the embedding model automatically)
DATABASE_URL="postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs" \
  wso2-docs-crawl

# Index a single product (faster, great for testing)
DATABASE_URL="postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs" \
  wso2-docs-crawl --product ballerina --limit 20

# Force re-index even unchanged pages
DATABASE_URL="postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs" \
  wso2-docs-crawl --force

Available product IDs: apim, mi, choreo, ballerina, bi, library

5. Configure your AI client

The MCP server is launched on demand by your AI client — no background process needed.

Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "wso2-docs": {
      "command": "wso2-docs-mcp-server",
      "env": {
        "DATABASE_URL": "postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs",
        "EMBEDDING_PROVIDER": "ollama"
      }
    }
  }
}

Claude Code — run once in your terminal:

claude mcp add wso2-docs \
  --transport stdio \
  -e DATABASE_URL="postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs" \
  -e EMBEDDING_PROVIDER="ollama" \
  -- wso2-docs-mcp-server

# Verify
claude mcp list

Cursor — create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "wso2-docs": {
      "command": "wso2-docs-mcp-server",
      "env": {
        "DATABASE_URL": "postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs",
        "EMBEDDING_PROVIDER": "ollama"
      }
    }
  }
}

VS Code — create .vscode/mcp.json:

{
  "servers": {
    "wso2-docs": {
      "type": "stdio",
      "command": "wso2-docs-mcp-server",
      "env": {
        "DATABASE_URL": "postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs",
        "EMBEDDING_PROVIDER": "ollama"
      }
    }
  }
}

Using npx instead of global install? Replace "command": "wso2-docs-mcp-server" with "command": "npx" and add "args": ["-y", "wso2-docs-mcp-server"].

Cloud embedding provider? Add the key to env, e.g. "EMBEDDING_PROVIDER": "openai", "OPENAI_API_KEY": "sk-...".

Clone and build

1. Clone and install

git clone https://github.com/iamvirul/wso2-docs-mcp-server.git
cd wso2-docs-mcp-server
npm install

2. Start Ollama (optional but recommended)

Install Ollama and start it:

ollama serve

No Ollama? Skip this step. The server detects Ollama is not running and automatically falls back to HuggingFace ONNX inference — the model downloads on first use with no extra setup.

3. Configure environment

cp .env.example .env
# Defaults work out of the box with Ollama.
# Only edit if using a cloud provider (OpenAI / Gemini / Voyage).

4. Start pgvector

docker compose up -d
# pgAdmin available at http://localhost:5050 ([email protected] / admin)

5. Run database migration

npm run db:migrate

Note: Run migration again whenever you change EMBEDDING_DIMENSIONS (i.e. switch embedding provider). The script detects and handles dimension changes automatically.

6. Index documentation

# Index all products
# On first run the embedding model is downloaded automatically (Ollama or HuggingFace)
npm run crawl

# Index a single product (faster, great for testing)
npm run crawl -- --product ballerina --limit 20

# Force re-index even unchanged pages
npm run crawl -- --force

7. Build and start the MCP server

npm run build
npm start

For development (no build step):

npm run dev

8. Configure your AI client

Replace /ABSOLUTE/PATH/TO/wso2-docs-mcp-server with your actual clone path.

Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "wso2-docs": {
      "command": "node",
      "args": ["/ABSOLUTE/PATH/TO/wso2-docs-mcp-server/dist/src/index.js"],
      "env": {
        "DATABASE_URL": "postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs",
        "EMBEDDING_PROVIDER": "ollama"
      }
    }
  }
}

Claude Code:

claude mcp add wso2-docs \
  --transport stdio \
  -e DATABASE_URL="postgresql://wso2mcp:wso2mcp@localhost:5432/wso2docs" \
  -e EMBEDDING_PROVIDER="ollama" \
  -- node "/ABSOLUTE/PATH/TO/wso2-docs-mcp-server/dist/src/index.js"

# Verify
claude mcp list

See config-examples/claude_code.sh for a convenience script.

Cursor — create .cursor/mcp.json — see config-examples/cursor_mcp.json.

VS Code — create .vscode/mcp.json — see config-examples/vscode_mcp.json.

MCP Tools

| Tool | Description | |---|---| | search_wso2_docs | Semantic search across all products. Optional product and limit filters. | | get_wso2_guide | Search within a specific product (apim, mi, choreo, ballerina, bi, library). | | explain_wso2_concept | Broad concept search across all products, returns 8 top results. | | list_wso2_products | Returns all supported products with IDs and base URLs. |

Example response

[
  {
    "title": "Deploying WSO2 API Manager",
    "snippet": "WSO2 API Manager can be deployed in various topologies…",
    "source_url": "https://apim.docs.wso2.com/en/latest/install-and-setup/...",
    "product": "apim",
    "section": "Deployment Patterns",
    "score": 0.8712
  }
]

Local Embeddings

The default EMBEDDING_PROVIDER=ollama runs entirely on your machine with no API key. The startup sequence is:

Is Ollama running?
├── Yes → Is model present?
│         ├── Yes → Ready (instant)
│         └── No  → Pull via Ollama (streamed, runs once)
└── No  → Download ONNX model from HuggingFace Hub (~250 MB, cached after first run)
           and run inference in-process via @huggingface/transformers

Both paths use nomic-embed-text / Xenova/nomic-embed-text-v1 by default and produce identical 768-dim vectors, so you can switch between them without re-indexing.

Hardware acceleration (HuggingFace ONNX fallback)

When Ollama is not available, the server auto-detects the best compute backend:

| Machine | Detection | ONNX dtype | Batch size | Throughput | |---|---|---|---|---| | Apple Silicon (M1/M2/M3/M4) | process.arch === 'arm64' | q8 INT8 | 32 | ~9 ms/chunk | | NVIDIA GPU | nvidia-smi probe | fp32 | 64 | GPU-dependent | | All others | fallback | q8 INT8 | 16 | ~10 ms/chunk |

Why q8 on Apple Silicon instead of CoreML/Metal? CoreML compiles Metal shaders on first use (~20 min cold-start). For the typical chunk sizes produced by this server (6–20 chunks per page), the CPU↔GPU transfer overhead eliminates any inference gain. INT8 quantized inference on ARM NEON SIMD is consistently ~100× faster than fp32 CPU with zero cold-start cost.

Benchmark (Apple M-chip, Xenova/nomic-embed-text-v1):

fp32 CPU (before): ~1,000 ms/chunk   (68 chunks ≈ 68 s of embedding)
q8  ARM NEON:          ~9 ms/chunk   (68 chunks ≈  0.6 s of embedding)  ← ~100× speedup

Note: For small crawls (≤ 10 pages) total wall-clock time is dominated by network I/O (HTTPS fetches to docs sites), so the end-to-end improvement is modest. The embedding speedup becomes significant at scale — crawling 500+ pages where embedding previously accounted for hours of runtime. For best crawl performance, run Ollama (ollama serve) which parallelises inference natively and has no per-chunk overhead.

Environment Variables

Core

| Variable | Default | Description | |---|---|---| | DATABASE_URL | — | PostgreSQL connection string (required) | | EMBEDDING_PROVIDER | ollama | ollama | openai | gemini | voyage | | EMBEDDING_DIMENSIONS | 768 | Must match model output dimensions | | CRAWL_CONCURRENCY | 5 | Concurrent HTTP requests during crawl | | CHUNK_SIZE | 800 | Approximate tokens per chunk | | CHUNK_OVERLAP | 100 | Overlap tokens between chunks | | CACHE_TTL_SECONDS | 3600 | In-memory query cache TTL | | TOP_K_RESULTS | 10 | Default search result count |

Ollama (default)

| Variable | Default | Description | |---|---|---| | OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL | | OLLAMA_EMBEDDING_MODEL | nomic-embed-text | Model pulled and used via Ollama | | HUGGINGFACE_EMBEDDING_MODEL | Xenova/nomic-embed-text-v1 | ONNX fallback when Ollama is not running |

Cloud providers

| Variable | Default | Description | |---|---|---| | OPENAI_API_KEY | — | Required if EMBEDDING_PROVIDER=openai | | OPENAI_EMBEDDING_MODEL | text-embedding-3-small | OpenAI model | | GEMINI_API_KEY | — | Required if EMBEDDING_PROVIDER=gemini | | GEMINI_EMBEDDING_MODEL | text-embedding-004 | Gemini model | | VOYAGE_API_KEY | — | Required if EMBEDDING_PROVIDER=voyage | | VOYAGE_EMBEDDING_MODEL | voyage-3 | Voyage model |

Embedding dimension reference

| Provider | Model | Dimensions | |---|---|---| | Ollama / HuggingFace | nomic-embed-text / Xenova/nomic-embed-text-v1 | 768 (default) | | Ollama / HuggingFace | mxbai-embed-large / Xenova/mxbai-embed-large-v1 | 1024 | | Ollama / HuggingFace | all-minilm / Xenova/all-MiniLM-L6-v2 | 384 | | OpenAI | text-embedding-3-small | 1536 | | OpenAI | text-embedding-3-large | 3072 | | Gemini | text-embedding-004 | 768 | | Voyage | voyage-3 | 1024 | | Voyage | voyage-3-lite | 512 |

Scheduled Re-indexing

# Run a one-off re-index (checks hashes, skips unchanged pages)
npm run reindex

# Or from the project directory using node-cron (runs daily at 2 AM)
DATABASE_URL=... node -e "
  const { ReindexJob } = require('./dist/jobs/reindexDocs');
  const job = new ReindexJob();
  job.initialize().then(() => job.scheduleDaily());
"

Project Structure

src/
  config/          env.ts · constants.ts
  vectorstore/     pgvector.ts · schema.sql
  ingestion/       crawler.ts · parser.ts · githubFetcher.ts · markdownParser.ts · chunker.ts · embedder.ts
  server/          mcpServer.ts · toolRegistry.ts
  jobs/            reindexDocs.ts
  index.ts
scripts/
  crawl.ts         CLI ingestion pipeline
  migrate.ts       Dynamic schema migration
config-examples/   claude_desktop.json · claude_code.sh · cursor_mcp.json · vscode_mcp.json
docker-compose.yml
.env.example

Development

# Type-check
npx tsc --noEmit

# Run crawl with tsx (no build needed)
npm run crawl -- --product ballerina --limit 5

# Run server in dev mode
npm run dev

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

WSO2 Docs MCP Server

Architecture

Documentation Sources

Prerequisites

Quick Start

Install from npm

1. Start pgvector

2. Start Ollama (optional but recommended)

3. Run database migration

4. Index WSO2 documentation

5. Configure your AI client

Clone and build

1. Clone and install

2. Start Ollama (optional but recommended)

3. Configure environment

4. Start pgvector

5. Run database migration

6. Index documentation

7. Build and start the MCP server

8. Configure your AI client

MCP Tools

Example response

Local Embeddings

Hardware acceleration (HuggingFace ONNX fallback)

Environment Variables

Core

Ollama (default)

Cloud providers

Embedding dimension reference

Scheduled Re-indexing

Project Structure

Development