metrillm-mcp

v0.2.6

Published

2 months ago

MCP server for MetriLLM — benchmark local LLMs from any AI coding assistant

0High
0Medium
0Low

topaz750

mcp llm benchmark ollama claude cursor ai

MetriLLM MCP Server

MCP (Model Context Protocol) server for MetriLLM — benchmark local LLMs directly from Claude Code, Cursor, Windsurf, Continue.dev, or any MCP-compatible client.

Quick Start

Claude Code

claude mcp add metrillm -- npx metrillm-mcp@latest

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "metrillm": {
      "command": "npx",
      "args": ["metrillm-mcp@latest"]
    }
  }
}

Cursor / Windsurf / Continue.dev

Add to your editor's MCP configuration:

{
  "mcpServers": {
    "metrillm": {
      "command": "npx",
      "args": ["metrillm-mcp@latest"]
    }
  }
}

Prerequisites

Node.js >= 20
One local runtime available:
- Ollama (ollama serve)
- LM Studio (Local Server enabled)
At least one model available on the selected runtime

Available Tools

`list_models`

List all locally available LLM models.

| Param | Type | Default | Description | |---|---|---|---| | runtime | "ollama" \| "lm-studio" | "ollama" | Inference runtime |

Example response:

{
  "models": [
    { "name": "llama3.2:3b", "size": 2019393189, "parameterSize": "3.2B", "quantization": "Q4_K_M", "family": "llama" }
  ],
  "count": 1
}

`run_benchmark`

Run a full benchmark (performance + quality) on a local model.

| Param | Type | Default | Description | |---|---|---|---| | model | string | (required) | Model name (e.g. "llama3.2:3b") | | runtime | "ollama" \| "lm-studio" | "ollama" | Inference runtime | | perfOnly | boolean | false | If true, measure performance only (skip quality) |

Example response:

{
  "success": true,
  "model": "llama3.2:3b",
  "verdict": "GOOD",
  "globalScore": 65,
  "performance": {
    "tokensPerSecond": 42.5,
    "ttftMs": 120,
    "memoryUsedGB": 2.1,
    "memoryPercent": 13
  },
  "interpretation": "This model runs well on your hardware."
}

`get_results`

Retrieve previous benchmark results stored locally.

| Param | Type | Default | Description | |---|---|---|---| | model | string | (optional) | Filter by model name (substring match) | | runtime | "ollama" \| "lm-studio" | "ollama" | Inference runtime |

`share_result`

Upload a result to the public MetriLLM leaderboard.

| Param | Type | Description | |---|---|---| | resultFile | string | Absolute path to the result JSON file |

Uses official MetriLLM upload defaults out of the box.

Optional overrides for self-hosted/staging deployments:

METRILLM_SUPABASE_URL
METRILLM_SUPABASE_ANON_KEY
METRILLM_PUBLIC_RESULT_BASE_URL

Architecture

The MCP server is a thin wrapper around the existing MetriLLM CLI logic:

mcp/src/index.ts  → MCP entry point (stdio transport)
mcp/src/tools.ts  → Tool definitions + calls to CLI modules
    ↓
../src/core/      → CLI logic reused directly
../src/commands/  → CLI commands (bench, list)

No code duplication — the MCP server imports CLI modules directly.

Supported Runtimes

| Runtime | Status | |---|---| | Ollama | Supported | | LM Studio | Supported | | MLX | Planned | | llama.cpp | Planned | | vLLM | Planned |

The runtime parameter is present on every tool to prepare for multi-runtime support. Unimplemented runtimes return a clear error.

License

Apache License 2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme