local-model-suitability-mcp

v1.1.28

Published

18 days ago

AI model router for agents. Checks whether a local model can handle the task before calling cloud inference. LOCAL/CLOUD verdict saves cost on every call.

0High
0Medium
0Low

ojas1

mcp local-model ollama model-routing cost-optimisation inference ai-agents llm edge-ai model-selection cloud-vs-local

Local Model Suitability MCP

Cloud inference is expensive. Everything that can run locally should.

This MCP server tells your agent — before every cloud API call — whether the task can be handled by a local model instead. Route to Ollama, LM Studio, or llama.cpp when you can. Only pay for cloud when you must.

The Tool

`check_local_viability`

Call this BEFORE every cloud inference call. If verdict is LOCAL, skip the cloud call entirely and route to your local model. Only use cloud when this tool returns CLOUD.

Inputs: | Field | Required | Description | |---|---|---| | task | ✅ | The exact task you are about to send to a cloud model | | quality_threshold | Optional | PRODUCTION (default) / PROTOTYPE / BEST_EFFORT | | data_sensitivity | Optional | PUBLIC (default) / INTERNAL / CONFIDENTIAL |

CONFIDENTIAL forces LOCAL regardless of task complexity — data never leaves the machine.

Response:

{
  "verdict": "LOCAL",
  "confidence": "HIGH",
  "reason": "Simple text summarisation — no reasoning depth required. Any 7B+ local model handles this well.",
  "estimated_cost_saving": "$0.002-0.008 saved per call at claude-sonnet pricing",
  "recommended_local_models": ["llama3.2:8b", "mistral-7b", "phi3:medium"],
  "cloud_justified_reason": null,
  "analysis_type": "AI-powered cost routing — NOT a simple lookup"
}

Data Sources

AI reasoning: Anthropic Claude (claude-sonnet) — cost routing analysis
No external data sources — pure AI reasoning

Pricing

| Plan | Calls | Price | |---|---|---| | Free | 20/month | $0 | | Starter | 500-call bundle | $20 | | Pro | 2,000-call bundle | $70 |

Subscribe at kordagencies.com

Setup

{
  "mcpServers": {
    "local-model-suitability": {
      "command": "npx",
      "args": ["-y", "local-model-suitability-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "your-key",
        "API_KEY": "your-lms-api-key-for-paid-tier"
      }
    }
  }
}

Free tier requires no API key — tracked by IP.

Harness Integration

Claude Code / Claude Desktop (.mcp.json)

{
  "mcpServers": {
    "local-model-suitability": {
      "type": "http",
      "url": "https://local-model-suitability-mcp-production.up.railway.app"
    }
  }
}

LangChain (Python)

from langchain_mcp_adapters.client import MultiServerMCPClient
client = MultiServerMCPClient({
    "local-model-suitability": {
        "url": "https://local-model-suitability-mcp-production.up.railway.app",
        "transport": "http"
    }
})
tools = await client.get_tools()

OpenAI Agents SDK (Python)

from agents import Agent, HostedMCPTool
agent = Agent(
    name="Assistant",
    tools=[HostedMCPTool(tool_config={
        "type": "mcp",
        "server_label": "local-model-suitability",
        "server_url": "https://local-model-suitability-mcp-production.up.railway.app",
        "require_approval": "never"
    })]
)

LangGraph

Same as LangChain above — langchain-mcp-adapters works with LangGraph natively.

Legal

Results are for cost-optimisation guidance only and do not constitute technical advice. Full terms: kordagencies.com/terms.html

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme