mcp-external-expert

v0.2.1

Published

6 months ago

MCP server that allows primary LLMs to delegate sub-tasks to external expert models via APIs (Ollama, OpenAI-compatible)

0High
0Medium
0Low

jharding_npm

mcp model-context-protocol llm delegation ollama openai ai expert-system

MCP External Expert Server

An MCP (Model Context Protocol) server that allows a primary LLM to delegate sub-tasks to external expert models via APIs (Ollama, OpenAI-compatible).

Think of it like "phone a friend" - when the primary model needs help with planning, critique, or reasoning, it can call external expert models for assistance.

Installation

Install from npmjs.com:

npm install -g mcp-external-expert

That's it! The package is ready to use.

Purpose

This server enables:

Primary LLMs (e.g., Qwen3 Coder, Claude, GPT-4, etc.) to delegate planning, critique, testing, and explanation tasks to external expert models
Avoids unloading/cache loss on the primary llama-server
Supports routing to Ollama or OpenAI-compatible endpoints
Configurable via environment variables
Supports both STDIO (for desktop tools) and HTTP (for remote/shared usage)

Configuration

The server can be configured via environment variables, either:

Environment variables (set in your shell or system)
.env file (recommended for local development - automatically loaded)

Using .env File (Recommended)

Copy the example file:
```
cp .env.example .env
```

Edit .env with your settings:

DELEGATE_PROVIDER=ollama
DELEGATE_BASE_URL=http://localhost:11434
DELEGATE_MODEL=qwen2.5:14b-instruct
DELEGATE_API_KEY=your-api-key-here

The .env file is gitignored and will not be committed to version control.

Environment Variables

Provider Selection

# In .env file or as environment variables:
DELEGATE_PROVIDER=ollama | openai_compat
DELEGATE_BASE_URL=http://host:port
DELEGATE_MODEL=model-name

OpenAI-compatible Only

# In .env file or as environment variables:
DELEGATE_API_KEY=sk-...
DELEGATE_OPENAI_PATH=/v1/chat/completions

Behavior

# Timeout for API calls in milliseconds (default: 60000 = 60 seconds)
# Increase this if your Ollama server is slow (e.g., 300000 for 5 minutes)
DELEGATE_TIMEOUT_MS=60000
DELEGATE_MAX_TOKENS=800
DELEGATE_TEMPERATURE=0.2

Optional Per-Mode System Prompts

DELEGATE_SYSTEM_PLAN="..."
DELEGATE_SYSTEM_CRITIC="..."
DELEGATE_SYSTEM_TESTS="..."
DELEGATE_SYSTEM_EXPLAIN="..."

MCP Transport Toggles

MCP_HTTP=true
MCP_HTTP_PORT=3333
MCP_STDIO=true   # default

Usage

Development

For local development, clone the repository and install dependencies:

git clone <repository-url>
cd mcp-external-expert-server
npm install
npm run dev

Production

npm run build
npm start

Testing

# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

Example Runs

Using .env File (Recommended)

Create .env file with your configuration:

cp .env.example .env
# Edit .env with your settings

Run the server:
```
npm start
```

Using Environment Variables

Ollama Helper (Remote Box)

DELEGATE_PROVIDER=ollama \
DELEGATE_BASE_URL=http://ollama-box:11434 \
DELEGATE_MODEL=qwen2.5:14b-instruct \
npm start

llama-server OpenAI API

DELEGATE_PROVIDER=openai_compat \
DELEGATE_BASE_URL=http://localhost:8080 \
DELEGATE_MODEL=qwen2.5:14b-instruct \
DELEGATE_API_KEY="" \
npm start

Enable HTTP MCP

MCP_HTTP=true MCP_HTTP_PORT=3333 npm start

Note: Environment variables set on the command line will override values in .env files.

Exposed MCP Tool

Tool: `delegate`

Delegates a subtask to an external expert model.

Input Schema:

{
  "mode": "plan | review | challenge | explain | tests",
  "input": "string (required)",
  "context": "string (optional)",
  "maxChars": "number (optional, default 12000)"
}

Modes:

plan → step-by-step plan + assumptions + risks
review → code review - identify bugs, quality issues, and provide fixes (code-specific)
challenge → devil's advocate - challenge ideas and find flaws in any concept/proposal (general)
tests → test checklist + edge cases
explain → concise explanation

Supported Providers

1. Ollama (Recommended)

Keeps a helper model warm on a separate machine
No auth complexity
No impact on primary llama.cpp cache

Uses: POST /api/chat

2. OpenAI-compatible Endpoints

Works with:

OpenAI
llama-server (--api)
LiteLLM
vLLM OpenAI shims

Uses: POST /v1/chat/completions

Transport Modes

STDIO (Default)

Used by:

Cursor
Goose Desktop
Claude Desktop
Other MCP desktop tools

JSON-RPC over stdin/stdout.

HTTP MCP (Optional)

Long-running server
Shared across machines
Keeps helper model hot
Supports both regular HTTP POST and SSE (Server-Sent Events) streaming
CORS enabled for web-based clients (MCP Inspector, etc.)

Endpoints:

POST /mcp - Main MCP endpoint (JSON-RPC)
GET/POST /sse - SSE streaming endpoint
GET/POST /mcp - Also supports SSE streaming

This is MCP over HTTP using the Streamable HTTP transport specification, which supports:

Regular HTTP POST requests (JSON-RPC)
SSE (Server-Sent Events) for streaming responses
CORS headers for browser-based clients
Compatible with MCP Inspector, Goose Desktop, Cursor, and other MCP clients

Security Notes

HTTP mode should be LAN-only or behind auth
Delegated prompts may contain sensitive code
STDIO mode is safest by default
Secrets in input are automatically redacted

Design Notes

The helper model must not call tools recursively
The helper model output is returned as plain text
The main model decides when to delegate (like "phoning a friend" when it needs help)
Delegation should be used sparingly (planning, critique, validation)
This avoids KV cache eviction on the primary inference host
The helper model is completely isolated - it only sees what the primary model explicitly passes to it

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

MCP External Expert Server

Installation

Purpose

Configuration

Using .env File (Recommended)

Environment Variables

Provider Selection

OpenAI-compatible Only

Behavior

Optional Per-Mode System Prompts

MCP Transport Toggles

Usage

Development

Production

Testing

Example Runs

Using .env File (Recommended)

Using Environment Variables

Ollama Helper (Remote Box)

llama-server OpenAI API

Enable HTTP MCP

Exposed MCP Tool

Tool: delegate

Supported Providers

1. Ollama (Recommended)

2. OpenAI-compatible Endpoints

Transport Modes

STDIO (Default)

HTTP MCP (Optional)

Security Notes

Design Notes

Tool: `delegate`