delphi-mcp

v2.2.2

Published

2 months ago

Delphi MCP Server - Multi-model AI consensus for complex questions. Query Claude, GPT-5, Gemini, DeepSeek simultaneously and synthesize diverse perspectives.

0High
0Medium
0Low

thormatt

mcp model-context-protocol claude delphi consensus ai llm hallucination-detection fact-checking multi-model openrouter

The Problem

Complex questions rarely have simple answers. A single AI model gives you one perspective shaped by its training data and architecture. For nuanced topics—technical trade-offs, research questions, multi-faceted decisions—one viewpoint isn't enough.

The insight: Different AI models reason differently. When multiple models independently arrive at the same conclusion, you can trust it. When they disagree, you've found genuine complexity worth exploring.

Requirements

Node.js 18+
OpenRouter API key — Get one at openrouter.ai/keys ($5-10 credit is plenty to start)
Claude Desktop or any MCP-compatible client

Quick Start

npm install -g delphi-mcp

Add to Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "delphi": {
      "command": "delphi-mcp",
      "env": { "OPENROUTER_API_KEY": "sk-or-v1-your-key" }
    }
  }
}

Restart Claude Desktop. Done.

How It Works

Independent responses — Each model answers without seeing others
Revision rounds — Models see the synthesis and can revise or challenge
Convergence detection — Stops when 85% agreement is reached
Hallucination flagging — Claims from only one model get flagged

Example Output

Question: Should we use microservices or a monolith for a new e-commerce platform?

Consensus (87% agreement after 3 rounds)

Round 1 — Initial Positions: | Model | Position | |-------|----------| | Claude | Monolith first, extract services later | | GPT-5 | Microservices for scalability from day one | | Gemini | Depends on team size and experience | | DeepSeek | Modular monolith as middle ground |

Round 2 — After seeing each other's reasoning:

GPT-5 revised: "Agreed that premature microservices add complexity. Team size matters."
Claude maintained position but acknowledged: "Microservices make sense if team is 50+ engineers"
All models converged on team size as the key factor

Round 3 — Final Synthesis:

| Claim | Strength | Agreement | |-------|----------|-----------| | Start with monolith for teams < 20 engineers | unanimous | 5/5 | | Modular boundaries enable future extraction | unanimous | 5/5 | | Microservices add 3-5x operational overhead | strong | 4/5 | | Extract services only when team/traffic demands | strong | 4/5 | | Kubernetes required for microservices | disputed | 2/5 |

Key Disagreement Surfaced:

"Kubernetes required for microservices" — Claude and DeepSeek disagreed, noting alternatives like ECS, Nomad, or even simple VM deployments. This flags an area where the "conventional wisdom" may be overconfident.

Control Drift: 45% — A single model would have given a more opinionated answer without surfacing the team-size nuance or the Kubernetes debate.

Presets

| Preset | Tier | Rounds | Grounding | Cost | Use Case | |--------|------|--------|-----------|------|----------| | quick | fast | 2 | off | ~$0.04 | Quick checks | | balanced | standard | 4 | off | ~$0.20 | General queries | | research | premium | 6 | on | ~$0.50 | Deep analysis | | factcheck | standard | 3 | on | ~$0.25 | Verify claims |

When to Use Delphi

Use Delphi for:

Complex technical decisions with trade-offs
Research questions with multiple valid perspectives
High-dimensional problems (many factors to weigh)
Topics where experts genuinely disagree
Validating important conclusions before acting

Skip Delphi for:

Simple factual lookups → single model is fine
Creative writing → diversity unhelpful
Real-time chat → too slow
Well-defined problems with clear answers

Decision rule: If the question has genuine complexity and the answer matters, use Delphi.

Features

Multi-Model Consensus — Claude, GPT-5, Gemini, DeepSeek working together
Dynamic Convergence — Iterates until 85% agreement or surfaces disagreement
Claim Strength — See which points are unanimous vs genuinely disputed
Revision Rounds — Models can challenge and refine each other's reasoning
Expert Personas — Frame panelists as domain experts for deeper analysis
Diverse Panel Mode — Assign complementary expert roles within a domain
Web Grounding — Optionally verify claims against live sources
Budget Controls — Token and cost limits for predictable spend
Multiple Formats — Markdown, JSON, HTML, plain text

Expert Personas

Like a real Delphi study, you can frame panelists as domain experts:

{
  "question": "What are the security implications of storing JWTs in localStorage?",
  "expertise": "security"
}

Available domains: | Domain | Expert Type | |--------|-------------| | security | Security Engineer (15+ years, penetration testing, secure development) | | finance | Financial Analyst (investment banking, risk management) | | medical | Medical Researcher (clinical medicine, evidence-based medicine) | | legal | Legal Expert (corporate law, IP, regulatory compliance) | | engineering | Software Engineer (system design, architecture patterns) | | data-science | Data Scientist (ML, statistical analysis) | | economics | Economist (micro/macro economics, policy analysis) | | architecture | Systems Architect (distributed systems, cloud platforms) | | devops | DevOps Engineer (CI/CD, infrastructure automation) | | product | Product Manager (strategy, user research, go-to-market) |

Diverse Panel Mode

Add diversePersonas: true to give each panelist a different complementary role within the domain — just like assembling a real expert panel:

{
  "question": "Should we migrate to microservices?",
  "expertise": "architecture",
  "diversePersonas": true
}

For architecture, this creates a panel of:

Cloud architect (AWS/GCP/Azure best practices)
Platform architect (internal developer platforms)
Data architect (data modeling, warehousing)
Integration architect (APIs, messaging)
Security architect (zero-trust, identity management)
Solutions architect (customer requirements)

Auto-Expertise Mode

For the most authentic Delphi experience, let the administrator automatically determine what experts are needed based on your question:

{
  "question": "Should we implement rate limiting at the API gateway or application layer?",
  "autoExpertise": true
}

The administrator analyzes your question and dynamically generates an optimal expert panel:

| Expert | Focus | Perspective | |--------|-------|-------------| | API Gateway Architect | Rate limiting patterns, edge vs origin | Infrastructure scalability | | Security Engineer | DDoS protection, abuse prevention | Defensive, assumes adversarial users | | Backend Developer | Application-level implementation | Developer experience, maintainability | | SRE/Platform Engineer | Observability, failure modes | Operational reliability |

Why auto-expertise?

Mimics how real Delphi studies select experts based on the question
No need to guess which domain fits best
Gets complementary perspectives without manual configuration
Shows rationale for why each expert was chosen

API

| Tool | Description | |------|-------------| | delphi_query | Multi-model consensus query | | delphi_factcheck | Fact-check a specific claim | | delphi_list_models | List available models | | delphi_estimate_cost | Estimate before running |

Documentation

For full technical documentation including:

All configuration options
Test results & insights
Architecture internals
Cost analysis
Safety features

See docs/TECHNICAL.md

License

MIT — Built by Thor Matthiasson