delphi-mcp
v2.2.2
Published
Delphi MCP Server - Multi-model AI consensus for complex questions. Query Claude, GPT-5, Gemini, DeepSeek simultaneously and synthesize diverse perspectives.
Maintainers
Readme
The Problem
Complex questions rarely have simple answers. A single AI model gives you one perspective shaped by its training data and architecture. For nuanced topics—technical trade-offs, research questions, multi-faceted decisions—one viewpoint isn't enough.
The insight: Different AI models reason differently. When multiple models independently arrive at the same conclusion, you can trust it. When they disagree, you've found genuine complexity worth exploring.
Requirements
- Node.js 18+
- OpenRouter API key — Get one at openrouter.ai/keys ($5-10 credit is plenty to start)
- Claude Desktop or any MCP-compatible client
Quick Start
npm install -g delphi-mcpAdd to Claude Desktop config:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"delphi": {
"command": "delphi-mcp",
"env": { "OPENROUTER_API_KEY": "sk-or-v1-your-key" }
}
}
}Restart Claude Desktop. Done.
How It Works
- Independent responses — Each model answers without seeing others
- Revision rounds — Models see the synthesis and can revise or challenge
- Convergence detection — Stops when 85% agreement is reached
- Hallucination flagging — Claims from only one model get flagged
Example Output
Question: Should we use microservices or a monolith for a new e-commerce platform?
Consensus (87% agreement after 3 rounds)
Round 1 — Initial Positions: | Model | Position | |-------|----------| | Claude | Monolith first, extract services later | | GPT-5 | Microservices for scalability from day one | | Gemini | Depends on team size and experience | | DeepSeek | Modular monolith as middle ground |
Round 2 — After seeing each other's reasoning:
- GPT-5 revised: "Agreed that premature microservices add complexity. Team size matters."
- Claude maintained position but acknowledged: "Microservices make sense if team is 50+ engineers"
- All models converged on team size as the key factor
Round 3 — Final Synthesis:
| Claim | Strength | Agreement | |-------|----------|-----------| | Start with monolith for teams < 20 engineers | unanimous | 5/5 | | Modular boundaries enable future extraction | unanimous | 5/5 | | Microservices add 3-5x operational overhead | strong | 4/5 | | Extract services only when team/traffic demands | strong | 4/5 | | Kubernetes required for microservices | disputed | 2/5 |
Key Disagreement Surfaced:
"Kubernetes required for microservices" — Claude and DeepSeek disagreed, noting alternatives like ECS, Nomad, or even simple VM deployments. This flags an area where the "conventional wisdom" may be overconfident.
Control Drift: 45% — A single model would have given a more opinionated answer without surfacing the team-size nuance or the Kubernetes debate.
Presets
| Preset | Tier | Rounds | Grounding | Cost | Use Case |
|--------|------|--------|-----------|------|----------|
| quick | fast | 2 | off | ~$0.04 | Quick checks |
| balanced | standard | 4 | off | ~$0.20 | General queries |
| research | premium | 6 | on | ~$0.50 | Deep analysis |
| factcheck | standard | 3 | on | ~$0.25 | Verify claims |
When to Use Delphi
Use Delphi for:
- Complex technical decisions with trade-offs
- Research questions with multiple valid perspectives
- High-dimensional problems (many factors to weigh)
- Topics where experts genuinely disagree
- Validating important conclusions before acting
Skip Delphi for:
- Simple factual lookups → single model is fine
- Creative writing → diversity unhelpful
- Real-time chat → too slow
- Well-defined problems with clear answers
Decision rule: If the question has genuine complexity and the answer matters, use Delphi.
Features
- Multi-Model Consensus — Claude, GPT-5, Gemini, DeepSeek working together
- Dynamic Convergence — Iterates until 85% agreement or surfaces disagreement
- Claim Strength — See which points are unanimous vs genuinely disputed
- Revision Rounds — Models can challenge and refine each other's reasoning
- Expert Personas — Frame panelists as domain experts for deeper analysis
- Diverse Panel Mode — Assign complementary expert roles within a domain
- Web Grounding — Optionally verify claims against live sources
- Budget Controls — Token and cost limits for predictable spend
- Multiple Formats — Markdown, JSON, HTML, plain text
Expert Personas
Like a real Delphi study, you can frame panelists as domain experts:
{
"question": "What are the security implications of storing JWTs in localStorage?",
"expertise": "security"
}Available domains:
| Domain | Expert Type |
|--------|-------------|
| security | Security Engineer (15+ years, penetration testing, secure development) |
| finance | Financial Analyst (investment banking, risk management) |
| medical | Medical Researcher (clinical medicine, evidence-based medicine) |
| legal | Legal Expert (corporate law, IP, regulatory compliance) |
| engineering | Software Engineer (system design, architecture patterns) |
| data-science | Data Scientist (ML, statistical analysis) |
| economics | Economist (micro/macro economics, policy analysis) |
| architecture | Systems Architect (distributed systems, cloud platforms) |
| devops | DevOps Engineer (CI/CD, infrastructure automation) |
| product | Product Manager (strategy, user research, go-to-market) |
Diverse Panel Mode
Add diversePersonas: true to give each panelist a different complementary role within the domain — just like assembling a real expert panel:
{
"question": "Should we migrate to microservices?",
"expertise": "architecture",
"diversePersonas": true
}For architecture, this creates a panel of:
- Cloud architect (AWS/GCP/Azure best practices)
- Platform architect (internal developer platforms)
- Data architect (data modeling, warehousing)
- Integration architect (APIs, messaging)
- Security architect (zero-trust, identity management)
- Solutions architect (customer requirements)
Auto-Expertise Mode
For the most authentic Delphi experience, let the administrator automatically determine what experts are needed based on your question:
{
"question": "Should we implement rate limiting at the API gateway or application layer?",
"autoExpertise": true
}The administrator analyzes your question and dynamically generates an optimal expert panel:
| Expert | Focus | Perspective | |--------|-------|-------------| | API Gateway Architect | Rate limiting patterns, edge vs origin | Infrastructure scalability | | Security Engineer | DDoS protection, abuse prevention | Defensive, assumes adversarial users | | Backend Developer | Application-level implementation | Developer experience, maintainability | | SRE/Platform Engineer | Observability, failure modes | Operational reliability |
Why auto-expertise?
- Mimics how real Delphi studies select experts based on the question
- No need to guess which domain fits best
- Gets complementary perspectives without manual configuration
- Shows rationale for why each expert was chosen
API
| Tool | Description |
|------|-------------|
| delphi_query | Multi-model consensus query |
| delphi_factcheck | Fact-check a specific claim |
| delphi_list_models | List available models |
| delphi_estimate_cost | Estimate before running |
Documentation
For full technical documentation including:
- All configuration options
- Test results & insights
- Architecture internals
- Cost analysis
- Safety features
License
MIT — Built by Thor Matthiasson
