mcp-external-expert
v0.2.1
Published
MCP server that allows primary LLMs to delegate sub-tasks to external expert models via APIs (Ollama, OpenAI-compatible)
Downloads
34
Maintainers
Readme
MCP External Expert Server
An MCP (Model Context Protocol) server that allows a primary LLM to delegate sub-tasks to external expert models via APIs (Ollama, OpenAI-compatible).
Think of it like "phone a friend" - when the primary model needs help with planning, critique, or reasoning, it can call external expert models for assistance.
Installation
Install from npmjs.com:
npm install -g mcp-external-expertThat's it! The package is ready to use.
Purpose
This server enables:
- Primary LLMs (e.g., Qwen3 Coder, Claude, GPT-4, etc.) to delegate planning, critique, testing, and explanation tasks to external expert models
- Avoids unloading/cache loss on the primary
llama-server - Supports routing to Ollama or OpenAI-compatible endpoints
- Configurable via environment variables
- Supports both STDIO (for desktop tools) and HTTP (for remote/shared usage)
Configuration
The server can be configured via environment variables, either:
- Environment variables (set in your shell or system)
.envfile (recommended for local development - automatically loaded)
Using .env File (Recommended)
Copy the example file:
cp .env.example .envEdit
.envwith your settings:DELEGATE_PROVIDER=ollama DELEGATE_BASE_URL=http://localhost:11434 DELEGATE_MODEL=qwen2.5:14b-instruct DELEGATE_API_KEY=your-api-key-here
The .env file is gitignored and will not be committed to version control.
Environment Variables
Provider Selection
# In .env file or as environment variables:
DELEGATE_PROVIDER=ollama | openai_compat
DELEGATE_BASE_URL=http://host:port
DELEGATE_MODEL=model-nameOpenAI-compatible Only
# In .env file or as environment variables:
DELEGATE_API_KEY=sk-...
DELEGATE_OPENAI_PATH=/v1/chat/completionsBehavior
# Timeout for API calls in milliseconds (default: 60000 = 60 seconds)
# Increase this if your Ollama server is slow (e.g., 300000 for 5 minutes)
DELEGATE_TIMEOUT_MS=60000
DELEGATE_MAX_TOKENS=800
DELEGATE_TEMPERATURE=0.2Optional Per-Mode System Prompts
DELEGATE_SYSTEM_PLAN="..."
DELEGATE_SYSTEM_CRITIC="..."
DELEGATE_SYSTEM_TESTS="..."
DELEGATE_SYSTEM_EXPLAIN="..."MCP Transport Toggles
MCP_HTTP=true
MCP_HTTP_PORT=3333
MCP_STDIO=true # defaultUsage
Development
For local development, clone the repository and install dependencies:
git clone <repository-url>
cd mcp-external-expert-server
npm install
npm run devProduction
npm run build
npm startTesting
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverageExample Runs
Using .env File (Recommended)
Create
.envfile with your configuration:cp .env.example .env # Edit .env with your settingsRun the server:
npm start
Using Environment Variables
Ollama Helper (Remote Box)
DELEGATE_PROVIDER=ollama \
DELEGATE_BASE_URL=http://ollama-box:11434 \
DELEGATE_MODEL=qwen2.5:14b-instruct \
npm startllama-server OpenAI API
DELEGATE_PROVIDER=openai_compat \
DELEGATE_BASE_URL=http://localhost:8080 \
DELEGATE_MODEL=qwen2.5:14b-instruct \
DELEGATE_API_KEY="" \
npm startEnable HTTP MCP
MCP_HTTP=true MCP_HTTP_PORT=3333 npm startNote: Environment variables set on the command line will override values in .env files.
Exposed MCP Tool
Tool: delegate
Delegates a subtask to an external expert model.
Input Schema:
{
"mode": "plan | review | challenge | explain | tests",
"input": "string (required)",
"context": "string (optional)",
"maxChars": "number (optional, default 12000)"
}Modes:
plan→ step-by-step plan + assumptions + risksreview→ code review - identify bugs, quality issues, and provide fixes (code-specific)challenge→ devil's advocate - challenge ideas and find flaws in any concept/proposal (general)tests→ test checklist + edge casesexplain→ concise explanation
Supported Providers
1. Ollama (Recommended)
- Keeps a helper model warm on a separate machine
- No auth complexity
- No impact on primary llama.cpp cache
Uses: POST /api/chat
2. OpenAI-compatible Endpoints
Works with:
- OpenAI
- llama-server (
--api) - LiteLLM
- vLLM OpenAI shims
Uses: POST /v1/chat/completions
Transport Modes
STDIO (Default)
Used by:
- Cursor
- Goose Desktop
- Claude Desktop
- Other MCP desktop tools
JSON-RPC over stdin/stdout.
HTTP MCP (Optional)
- Long-running server
- Shared across machines
- Keeps helper model hot
- Supports both regular HTTP POST and SSE (Server-Sent Events) streaming
- CORS enabled for web-based clients (MCP Inspector, etc.)
Endpoints:
POST /mcp- Main MCP endpoint (JSON-RPC)GET/POST /sse- SSE streaming endpointGET/POST /mcp- Also supports SSE streaming
This is MCP over HTTP using the Streamable HTTP transport specification, which supports:
- Regular HTTP POST requests (JSON-RPC)
- SSE (Server-Sent Events) for streaming responses
- CORS headers for browser-based clients
- Compatible with MCP Inspector, Goose Desktop, Cursor, and other MCP clients
Security Notes
- HTTP mode should be LAN-only or behind auth
- Delegated prompts may contain sensitive code
- STDIO mode is safest by default
- Secrets in input are automatically redacted
Design Notes
- The helper model must not call tools recursively
- The helper model output is returned as plain text
- The main model decides when to delegate (like "phoning a friend" when it needs help)
- Delegation should be used sparingly (planning, critique, validation)
- This avoids KV cache eviction on the primary inference host
- The helper model is completely isolated - it only sees what the primary model explicitly passes to it
