metrillm-mcp
v0.2.2
Published
MCP server for MetriLLM — benchmark local LLMs from any AI coding assistant
Maintainers
Readme
MetriLLM MCP Server
MCP (Model Context Protocol) server for MetriLLM — benchmark local LLMs directly from Claude Code, Cursor, Windsurf, Continue.dev, or any MCP-compatible client.
Quick Start
Claude Code
claude mcp add metrillm -- npx metrillm-mcp@latestClaude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"metrillm": {
"command": "npx",
"args": ["metrillm-mcp@latest"]
}
}
}Cursor / Windsurf / Continue.dev
Add to your editor's MCP configuration:
{
"mcpServers": {
"metrillm": {
"command": "npx",
"args": ["metrillm-mcp@latest"]
}
}
}Prerequisites
- Node.js >= 20
- One local runtime available:
- At least one model available on the selected runtime
Available Tools
list_models
List all locally available LLM models.
| Param | Type | Default | Description |
|---|---|---|---|
| runtime | "ollama" \| "lm-studio" | "ollama" | Inference runtime |
Example response:
{
"models": [
{ "name": "llama3.2:3b", "size": 2019393189, "parameterSize": "3.2B", "quantization": "Q4_K_M", "family": "llama" }
],
"count": 1
}run_benchmark
Run a full benchmark (performance + quality) on a local model.
| Param | Type | Default | Description |
|---|---|---|---|
| model | string | (required) | Model name (e.g. "llama3.2:3b") |
| runtime | "ollama" \| "lm-studio" | "ollama" | Inference runtime |
| perfOnly | boolean | false | If true, measure performance only (skip quality) |
Example response:
{
"success": true,
"model": "llama3.2:3b",
"verdict": "GOOD",
"globalScore": 65,
"performance": {
"tokensPerSecond": 42.5,
"ttftMs": 120,
"memoryUsedGB": 2.1,
"memoryPercent": 13
},
"interpretation": "This model runs well on your hardware."
}get_results
Retrieve previous benchmark results stored locally.
| Param | Type | Default | Description |
|---|---|---|---|
| model | string | (optional) | Filter by model name (substring match) |
| runtime | "ollama" \| "lm-studio" | "ollama" | Inference runtime |
share_result
Upload a result to the public MetriLLM leaderboard.
| Param | Type | Description |
|---|---|---|
| resultFile | string | Absolute path to the result JSON file |
Uses official MetriLLM upload defaults out of the box.
Optional overrides for self-hosted/staging deployments:
METRILLM_SUPABASE_URLMETRILLM_SUPABASE_ANON_KEYMETRILLM_PUBLIC_RESULT_BASE_URL
Architecture
The MCP server is a thin wrapper around the existing MetriLLM CLI logic:
mcp/src/index.ts → MCP entry point (stdio transport)
mcp/src/tools.ts → Tool definitions + calls to CLI modules
↓
../src/core/ → CLI logic reused directly
../src/commands/ → CLI commands (bench, list)No code duplication — the MCP server imports CLI modules directly.
Supported Runtimes
| Runtime | Status | |---|---| | Ollama | Supported | | LM Studio | Supported | | MLX | Planned | | llama.cpp | Planned | | vLLM | Planned |
The runtime parameter is present on every tool to prepare for multi-runtime support. Unimplemented runtimes return a clear error.
