@inferwise/mcp

v1.0.0

Published

4 months ago

MCP server for Inferwise — AI agent tools for LLM cost estimation and model selection

0High
0Medium
0Low

@inferwise/mcp

MCP server for Inferwise — gives AI agents tools to estimate LLM costs, suggest cheaper models, and audit codebases for cost optimization.

Works with any AI tool that supports the Model Context Protocol: Claude Code, Cursor, VS Code (1.99+), Windsurf, Cline, and more.

Setup

Claude Code

claude mcp add inferwise -- npx -y @inferwise/mcp

Cursor / VS Code / Windsurf

Add to your MCP settings (.cursor/mcp.json, .vscode/mcp.json, etc.):

{
  "mcpServers": {
    "inferwise": {
      "command": "npx",
      "args": ["-y", "@inferwise/mcp"]
    }
  }
}

Cline

Add to Cline MCP settings:

{
  "mcpServers": {
    "inferwise": {
      "command": "npx",
      "args": ["-y", "@inferwise/mcp"]
    }
  }
}

Tools

Once connected, the AI agent gets four tools:

`suggest_model`

Suggest the cheapest LLM model capable of handling a given task. Analyzes the task description using keyword-based pattern matching to infer required capabilities (code, reasoning, general, creative, vision, search, audio), then finds the cheapest model across all providers that has those capabilities. If no specific capabilities are detected, defaults to general.

Input:

task (string, required) — Description of what you want the LLM to do
provider (string, optional) — Restrict to a specific provider (anthropic, openai, google, xai, perplexity)
maxCostPerMillionTokens (number, optional) — Maximum acceptable cost per million output tokens (USD)

Returns: Recommended model with pricing, up to 3 cheaper alternatives, inferred capabilities, and reasoning.

Example: Agent asks "classify support tickets by category" → Inferwise infers ["general"] capability → returns gpt-4o-mini at $0.60/MTok instead of gpt-4o at $10/MTok.

See the main repo for details on capability inference and cross-provider ranking.

`estimate_cost`

Estimate the cost of an LLM API call given provider, model, and token counts. Optionally projects monthly costs based on daily request volume.

Input:

provider (string, required) — LLM provider
model (string, required) — Model ID (e.g., claude-sonnet-4-6, gpt-4o)
inputTokens (number, required) — Number of input tokens
outputTokens (number, required) — Number of output tokens
requestsPerDay (number, optional) — Daily volume for monthly projection
useBatch (boolean, optional) — Use Batch API pricing
useCache (boolean, optional) — Assume cache-hit pricing

Returns: Cost per call, monthly projection, and pricing breakdown.

`audit`

Scan a directory for LLM API calls, estimate costs, and suggest cheaper capable models for each call site.

Input:

directory (string, required) — Absolute path to the directory to scan
volume (number, optional) — Requests per day for monthly projection (default: 1000)

Returns: Per-call-site cost estimates, total monthly cost, unknown models, and smart model recommendations with savings percentages.

`apply_recommendations`

Apply model swap recommendations to source files. Replaces expensive model IDs with cheaper alternatives in-place. Can auto-detect recommendations via audit, or accept explicit swaps.

Input:

directory (string, required) — Absolute path to the project directory
volume (number, optional) — Requests per day for monthly projection (default: 1000)
dryRun (boolean, optional) — Preview changes without modifying files (default: false)
recommendations (array, optional) — Explicit swaps to apply. Each item has file, line, currentModel, suggestedModel. If omitted, runs audit and applies all recommendations.

Returns: Applied swaps, skipped swaps (with reasons), and estimated monthly savings.

Example flow:

Agent calls audit → gets recommendations
Agent calls apply_recommendations with those recommendations → files are rewritten
Agent commits the changes

Or in one step: call apply_recommendations with just { directory: "." } — runs audit + applies all fixes automatically.

How It Works

The MCP server runs locally as a subprocess — no hosted infrastructure, no API keys needed for basic usage. It communicates via stdio using the MCP protocol (JSON-RPC over stdin/stdout).

The server depends on:

@inferwise/pricing-db — bundled pricing for 35+ models across 5 providers
inferwise — code scanner and cost estimation engine

Non-MCP Alternatives

If your tool doesn't support MCP, use the CLI directly:

# JSON output for programmatic consumption
inferwise audit . --format json
inferwise price openai gpt-4o --format json

# SDK for embedding in pipelines
import { estimate } from "inferwise/sdk";

License

Apache 2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@inferwise/mcp

Setup

Claude Code

Cursor / VS Code / Windsurf

Cline

Tools

suggest_model

estimate_cost

audit

apply_recommendations

How It Works

Non-MCP Alternatives

License

`suggest_model`

`estimate_cost`

`audit`

`apply_recommendations`