@inferlane/mcp
v0.7.0
Published
Local-first compute fuel gauge for Claude Code + 49 MCP tools for model selection, spend tracking, credibility, routing, and the compute exchange. Auto-ingests real usage from Claude Code transcripts — no API key needed. Works with Claude, Cursor, Windsur
Maintainers
Readme
@inferlane/mcp
The compute exchange for AI agents. Every request routed to the cheapest provider that meets your quality and privacy bar — across 25 cloud providers and five decentralised supply networks. MCP-native, so cost-awareness happens before the call, not after.
Why this exists
HTTP gateways (OpenRouter, Portkey, LiteLLM) only see requests after your agent has decided what to send. By then the expensive model is already on the wire.
InferLane ships as an MCP server — it sits inside the agent's decision
loop. Before a single token is generated, your agent can call
pick_model / estimate_cost / assess_routing and route the request to the
cheapest qualified provider. Cost-awareness becomes a decision, not an audit.
┌──────────────────────────────────────────────────────────────────┐
│ Traditional: │
│ Agent ──► chooses Opus ──► HTTP gateway ──► bills you $$$ │
│ │
│ With InferLane MCP: │
│ Agent ──► asks pick_model ──► gets "Haiku is fine" ──► $0.01 │
│ \__ OR il_routing=cheapest ──► spot exchange ──► $0.005 │
└──────────────────────────────────────────────────────────────────┘Install
npx -y @inferlane/mcpOr add to your MCP client config:
{
"mcpServers": {
"inferlane": {
"command": "npx",
"args": ["-y", "@inferlane/mcp"],
"env": {
"INFERLANE_API_KEY": "il_your_key_here"
}
}
}
}Works with Claude Desktop, Claude Code, Cursor, Goose, Windsurf, and any MCP-compatible agent.
Claude Desktop reads ~/Library/Application Support/Claude/claude_desktop_config.json
Cursor reads ~/.cursor/mcp.json
Get an API key (optional, unlocks the proxy): inferlane.dev/dashboard/onboarding
What you get
Works offline — no account needed
These run locally against a bundled pricing database and local SQLite state:
| Tool | What it does |
|---|---|
| pick_model | Recommend the cheapest viable model for a task |
| estimate_cost | Estimate cost before you send the prompt |
| compare_models | Side-by-side quality + price + latency across equivalents |
| session_cost | Track live spend for the current session |
| log_request | Record a call locally for budget + credibility |
| token_tachometer | Real-time tokens/sec gauge |
| agent_status | Traffic-light status (green / amber / red) |
| lifecycle_report | Spend breakdown by coding / testing / CI / deploy phase |
| credibility_profile | Track your agent's decision quality over time |
| assess_routing | Local Ollama vs cloud — which is cheaper right now? |
Add INFERLANE_API_KEY for the platform layer
These route through the hosted platform — where rebate-backed spot pricing, decentralised operator supply, and the privacy-tier routing live:
| Tool | What it does |
|---|---|
| dispatch | Send a prompt through the platform's routing engine |
| route_via_platform | Route via InferLane instead of direct to provider |
| exchange_spot | Live spot prices across 25 providers + decentralised nets |
| exchange_offers | Browse the full order book |
| exchange_list_capacity | List your own GPU as a seller on the exchange |
| cost_savings | See actual savings vs rack-rate counterfactual |
| check_promotions | Active provider promotions (free credits, rebates) |
| register_webhook | Subscribe to spend / budget / routing events |
Full tool reference: inferlane.dev/developers
Environment variables
| Variable | Required | Description |
|---|---|---|
| INFERLANE_API_KEY | No | Unlocks the platform tools (proxy, exchange, webhooks) |
| OLLAMA_HOST | No | Local Ollama endpoint. Enables free local routing |
| INFERLANE_BUDGET_TOTAL | No | Monthly USD cap — enforced client-side |
| INFERLANE_EVENTS_PORT | No | Expose a local SSE event stream for dashboards |
| INFERLANE_BASE_URL | No | Override platform URL (default: https://inferlane.dev) |
The standing instruction
Add this to your global CLAUDE.md (or equivalent agent memory) to make every
session cost-aware by default:
## InferLane — cost-aware routing is active
Before making any non-trivial LLM API call:
1. If the task is repetitive, classification, parsing, or summarisation —
consider routing to a cheaper model via `pick_model` / `assess_routing`.
2. Log API requests via `log_request`.
3. When multiple providers can serve the same task, use `get_cost_comparison`.This is what makes the MCP integration different from a post-hoc dashboard — the agent consults InferLane before it spends.
What's open source and what isn't
This repo (Apache 2.0):
- MCP server, protocol plumbing, stdio transport
- All local tools (
pick_model,estimate_cost,compare_models, tachometer, traffic-light, credibility, SQLite persistence) - Thin HTTP client to the commercial platform
- Public rack-rate pricing data
- Dashboard HTML, install scripts, config templates
Not open source (hosted at inferlane.dev):
- Routing engine (the actual provider-selection logic under load)
- Rebate-adjusted spot pricing
- Decentralised operator onboarding rails
- Privacy-tier attestation (Shamir fragmentation, TEE verification)
- Wallet / float / KYC / compliance plumbing
- Enterprise features (SSO, audit logs, policy engine, on-prem)
Why split this way? The client is the distribution surface — it belongs in the community. The moat is the network (relationships, volume data, decentralised supply, attestation partnerships) — not the code. Licensing tricks don't protect moats; network effects do.
How the hosted platform works
The MCP server is free and open source forever. The hosted platform at inferlane.dev routes each request to whichever provider — across 25 cloud providers and five decentralised networks (Akash, Bittensor/Chutes, Nosana, Darkbloom, Hyperbolic) — best fits the task's quality, price, and privacy requirements. Buyers get transparent routing; providers get qualified, high-signal traffic from agent workloads.
Full economics and revenue model: inferlane.dev/transparency
Privacy
- The MCP server runs locally as a stdio subprocess spawned by your agent
- Offline tools (
pick_model,estimate_cost,compare_models) never touch the network - Online tools only send model names and token counts — never prompt content — unless you opt into
route_via_platform/dispatch - API keys live in your OS keychain (or the
envblock of your MCP config), not in this package
Contributing
Issues and PRs welcome. Priority areas:
- New MCP clients (we test against Claude Desktop, Claude Code, Cursor, Goose, Windsurf — others untested)
- Additional local providers in the pricing database
- Benchmark tasks for quality scoring
- Translations of the README
Security disclosures: [email protected] or inferlane.dev/.well-known/security.txt
License
Apache-2.0. Fork it, embed it, relicense your own fork as you wish — but you'll need the platform API to unlock the rebate-adjusted routing, the decentralised supply, and the privacy-tier attestation.
Links
- Landing: inferlane.dev
- Developer docs: inferlane.dev/developers
- Live stats: inferlane.dev/stats
- Pricing: inferlane.dev/pricing
- Become an operator: inferlane.dev/earn
- Transparency: inferlane.dev/transparency
