ai-model-advisor-mcp

v2.3.0

Published

2 months ago

An intelligent Model Context Protocol (MCP) Server that acts as an AI model advisor. Compare LLM and media pricing, features, and quality across 1,000+ models from OpenRouter, fal.ai, Together AI, Replicate, and Fireworks AI. Deeply compatible with Claude

Downloads

291

AI Model Advisor MCP Server

The ultimate Model Context Protocol (MCP) server for AI model discovery, cost optimization, and performance benchmarking. Fully compatible with Claude Desktop, Cursor AI, Windsurf, and any MCP client.

Stop guessing which AI model to use. Give your agent the tools to compare pricing, intelligence benchmarks (MMLU/Coding), latency speed (TTFT), and throughput across 1000+ models from 5 top AI platform providers (OpenRouter, fal.ai, Together AI, Replicate, and Fireworks AI).

"Where is the cheapest Llama 3.3 endpoint?" "What is the fastest model for image generation under my budget?" "Which Claude 3.7 model has the highest coding benchmark?" Just ask your agent. It shops across all providers instantly with real-time AI model pricing.

Why use this Model Context Protocol Server?

New AI models drop constantly. Your AI coding agent (like Claude Desktop or Cursor) doesn't inherently know what's available, what API inference costs, or what the exact performance metrics are. This MCP fixes that by acting as a live model catalog and routing engine, giving your agent real-time access to:

🧠 300+ LLMs via OpenRouter API (GPT-4o, Claude 3.7, Gemini 2.5 Pro, Llama, Reasoning Models, DeepSeek V3/R1)
🎨 200+ Media Models via fal.ai API (Flux Pro, Stable Diffusion, Kling Video, Whisper)
⚡ 200+ Open-Source Models via Together AI (Llama 3.3 70B, Qwen, Mistral)
🔁 Community fine-tunes & LORAs via Replicate (Wan 2.1, Recraft, Custom pipelines)
🔥 Blazing Fast Inference endpoints via Fireworks AI
🏷️ Cross-Provider Price Calculator — find the cheapest API endpoint / lowest cost LLM for any architecture.
⚡ Live Speed & Latency Metrics — powered by Artificial Analysis (TTFT and Tokens/sec throughput).
🏆 Intelligence Leaderboards — baked-in MMLU, Math, and Coding benchmark scores.
💰 Real-Time Pricing — powered by our cloud backend (Zero config, no API keys needed for pricing).
⭐ Curated Quality Tiers (S/A/B/C) to prevent agents from picking hallucinating or outdated models.
🆕 Discovery Engine — agents can ask "what new AI models dropped this week?"

Quick Start

Zero-config (No API keys required)

The server works entirely out-of-the-box. Just add it to your MCP settings and save:

{
  "mcpServers": {
    "model-advisor": {
      "command": "npx",
      "args": ["-y", "ai-model-advisor-mcp@latest"]
    }
  }
}

That's it! Live pricing for all 1,000+ models across all 5 providers is fetched automatically using our hosted Cloudflare Worker Pricing API. No API keys, no environment variables, no setup.

Tools (9 total)

🧭 `select_model_for_project` ⭐ NEW

One call for agents that already have project context. Returns the best overall model, cheapest acceptable model, and best value option without forcing the agent to manually chain several tools.

select_model_for_project({
  project: "TypeScript MCP server for coding agents",
  task: "coding assistant",
  requirements: ["coding", "reasoning", "tool_use"],
  expected_usage: { input_tokens: 5000000, output_tokens: 1000000 }
})

The server searches the catalog behind the scenes and returns a compact decision with model IDs, provider, price, quality tier, reasons, tradeoffs, and structured output that agents can parse.

🏷️ `find_cheapest_provider` ⭐ NEW

The killer feature. Shop for a model across all 5 providers.

find_cheapest_provider({ model: "llama 3.3 70b" })

Output:

🏷️ Price comparison for "llama 3.3 70b"

| Provider    | Input $/1M | Output $/1M | Model ID                                         |
|-------------|-----------|------------|--------------------------------------------------|
| OpenRouter  | $0.00     | $0.00      | meta-llama/llama-3.3-70b-instruct                |
| Together AI | $0.88     | $0.88      | meta-llama/Llama-3.3-70B-Instruct-Turbo          |
| Fireworks   | $0.90     | $0.90      | accounts/fireworks/models/llama-v3p3-70b-instruct |

💡 Cheapest: OpenRouter — FREE

Uses fuzzy matching — handles version format differences (v3p3 = 3.3) across providers.

📦 `batch_get_pricing` ⭐ NEW

Get pricing for multiple models in a single call. Returns a compact table.

batch_get_pricing({ model_ids: ["openai/gpt-4o", "anthropic/claude-sonnet-4", "fal-ai/flux-pro/v1.1", "meta-llama/Llama-3.3-70B-Instruct-Turbo"] })

🎯 `recommend_model`

"I need X" → ranked models matching your task, requirements, and budget. Searches all 5 providers.

recommend_model({ task: "image generation", requirements: ["photorealistic", "fast"], budget: "low" })

⚖️ `compare_models`

Side-by-side table across providers. Auto-adapts columns for LLMs vs media models.

compare_models({ model_ids: ["openai/gpt-4o", "anthropic/claude-sonnet-4", "google/gemini-2.5-pro-preview"] })

📋 `list_models`

Browse and filter by category, provider, capability, or price.

list_models({ category: "text-to-image", max_price: 0.05 })
list_models({ provider: "together", category: "llm" })
list_models({ provider: "replicate", category: "text-to-video" })

📖 `get_model_info`

Comprehensive model card with everything you need to decide.

get_model_info({ model_id: "fal-ai/flux-pro/v1.1" })

💰 `estimate_cost`

Cost estimation for any usage scenario.

estimate_cost({ model_id: "openai/gpt-4o", usage: { input_tokens: 1000000, output_tokens: 100000 } })
estimate_cost({ model_id: "fal-ai/flux-pro/v1.1", usage: { images: 500 } })

🆕 `whats_new`

Discover recently added models. Never miss a new release.

whats_new({ since: "7d" })
whats_new({ since: "30d", category: "text-to-video" })

Providers

| Provider | Models | Type | Hosted Pricing Data | |----------|--------|------|------------------| | OpenRouter | 350+ | LLMs | ✅ Yes | | fal.ai | 40+ | Image, Video, Audio, 3D | ✅ Yes | | Together AI | 220+ | LLMs, Image | ✅ Yes | | Replicate | 120+ | Everything | ✅ Yes | | Fireworks AI | 12+ | LLMs | ✅ Yes |

Quality Tiers

Popular models are rated on a curated quality scale:

S — Best in class (Claude Sonnet 4, GPT-4o, Flux Pro Ultra, Kling v2.0)
A — Excellent (Gemini Flash, Llama 3.3, Flux Dev, Recraft v3)
B — Good (Mistral Small, Flux Schnell, SD3.5 Turbo)
C — Adequate

Architecture

Agent → MCP Server → Cloudflare Worker Pricing API (Our Hosted Backend)
                         ├─ Fetches from OpenRouter (350+ models)
                         ├─ Fetches from fal.ai (40+ models)
                         ├─ Fetches from Together AI (220+ models)
                         ├─ Fetches from Replicate (120+ models)
                         └─ Fetches from Fireworks AI (12+ models)

Unified Model Registry
→ select, recommend, compare, list, info, estimate, shop, batch

The MCP Server connects to our blazing-fast Cloudflare Worker that aggregates live pricing data across all 5 providers on a recurring 6-hour cron schedule. This gives your agent real-time pricing awareness without requiring you to juggle 5 different API keys.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

AI Model Advisor MCP Server

Why use this Model Context Protocol Server?

Quick Start

Zero-config (No API keys required)

Tools (9 total)

🧭 select_model_for_project ⭐ NEW

🏷️ find_cheapest_provider ⭐ NEW

📦 batch_get_pricing ⭐ NEW

🎯 recommend_model

⚖️ compare_models

📋 list_models

📖 get_model_info

💰 estimate_cost

🆕 whats_new