@us-all/mlflow-mcp
v1.11.2
Published
MLflow MCP server - experiments, runs, registered models, model versions, traces, and assessments
Readme
MLflow MCP Server
The widest-coverage MLflow MCP — including MLflow 3 traces, attachments, prompt-optimization, and webhooks that no other MCP exposes.
78 tools across experiments, runs, registry, logged models, traces, assessments, webhooks, prompt-optimization. Aggregation tools (
summarize-experiment,summarize-run) fold 3–5 round-trips into one structured response with already-fetched metric stats.
What it does that others don't
- Full coverage — only third-party MLflow MCP shipping prompt-optimization-jobs (5 tools), webhooks (6), MLflow 3 LoggedModel (8), and trace attachments (
list-trace-attachments,get-trace-attachment). - Aggregation tools —
summarize-experimentreturns experiment + topN runs + metric stats (min/max/mean) in one call from already-fetched data, zero extra round-trips.summarize-rundedupsmetricHistory.history.*.key(~100KB savings on 4k-point series). - MCP Prompts (4) —
debug-failed-traces,promote-best-run,compare-top-runs,annotate-trace-quality. Workflow templates the model invokes directly. - MCP Resources (6) —
mlflow://run/{runId},mlflow://experiment/{expId},mlflow://run/{runId}/artifacts,mlflow://experiment/{expId}/runs,mlflow://registered-model/{name}/versions,mlflow://trace/{traceId}. - Token-efficient by design —
extractFieldsprojection onsearch-traces/get-trace/ fat reads,MLFLOW_TOOLS/MLFLOW_DISABLE8 categories,search-toolsmeta-tool. - Apps SDK card —
compare-runsrenders as a side-by-side card on ChatGPT clients (run summary + metric/param tables with diff highlight) via_meta["openai/outputTemplate"]. Claude clients receive the same JSON content. - stdio + Streamable HTTP — defaults to stdio. Set
MCP_TRANSPORT=httpfor ChatGPT Apps SDK or remote clients (Bearer auth viaMCP_HTTP_TOKEN).
Try this — 5 prompts
Connect the server to Claude Desktop or Claude Code, then paste any of these:
- Best run — "In the
customer-churn-v3experiment, find the run with the highestval_accuracy. Show its hyperparameters and metric history." - Failure mode clustering — "Find traces with
status=ERRORfrom the last 24h in experiment 12. Group the failures by exception type and surface the 3 most common." - Run comparison — "Compare the top 5 runs of experiment 12 by
validation_loss. Show differing hyperparameters in a table." - Model promotion — "Get the latest version of
recommendation_v2registered model with thechampionalias. Show its training metrics + lineage to the source run." - Trace deep-dive — "Pull trace
tr-abc123with all attachments. Highlight slow spans and any failed feedback annotations."
When to use this vs alternatives
| | Official mlflow[mcp] | kkruglik/mlflow-mcp | @us-all/mlflow-mcp (this) |
|--|------------------------|---------------------|------------------------------|
| Tool count | ~9 (trace-only) | ~25 | 78 |
| MLflow 3 LoggedModel | ❌ | ✅ | ✅ |
| Trace attachments | ❌ | ❌ | ✅ |
| Prompt-optimization-jobs | ❌ | ❌ | ✅ |
| Webhooks | ❌ | ❌ | ✅ |
| Aggregation tools | ❌ | ❌ | ✅ summarize-experiment, summarize-run |
| MCP Prompts | ❌ | ✅ | ✅ |
| MCP Resources | ❌ | ❌ | ✅ 6 URIs |
| Auth | Databricks SDK | Bearer / basic | Bearer / basic |
| Transport | stdio | stdio | stdio |
The official mlflow[mcp] is bundled inside MLflow itself and intentionally trace-narrow. Use it for quick managed-MLflow trace inspection. Use this server for end-to-end coverage, especially MLflow 3 entities, prompt-optimization workflows, and aggregation-driven AI debugging.
Install
Claude Desktop
{
"mcpServers": {
"mlflow": {
"command": "npx",
"args": ["-y", "@us-all/mlflow-mcp"],
"env": {
"MLFLOW_TRACKING_URI": "http://localhost:5000"
}
}
}
}Claude Code
claude mcp add mlflow -s user \
-e MLFLOW_TRACKING_URI=http://localhost:5000 \
-- npx -y @us-all/mlflow-mcpDocker
docker run --rm -i \
-e MLFLOW_TRACKING_URI=http://your-host:5000 \
ghcr.io/us-all/mlflow-mcp-serverBuild from source
git clone https://github.com/us-all/mlflow-mcp-server.git
cd mlflow-mcp-server && pnpm install && pnpm build
node dist/index.jsConfiguration
| Variable | Required | Default | Description |
|---|---|---|---|
| MLFLOW_TRACKING_URI | ✅ | — | MLflow tracking URL (http://localhost:5000, Databricks workspace URL, etc.) |
| MLFLOW_TRACKING_TOKEN | ❌ | — | Bearer token. Use for Databricks PAT (dapi…) |
| MLFLOW_TRACKING_USERNAME | ❌ | — | Basic-auth username (alternative to token) |
| MLFLOW_TRACKING_PASSWORD | ❌ | — | Basic-auth password |
| MLFLOW_EXPERIMENT_ID | ❌ | — | Default experiment ID for tools that accept it implicitly |
| MLFLOW_ALLOW_WRITE | ❌ | false | Set true to enable mutations (create/update/delete) |
| MLFLOW_TOOLS | ❌ | — | Comma-sep allowlist of categories. Biggest token saver. |
| MLFLOW_DISABLE | ❌ | — | Comma-sep denylist. Ignored when MLFLOW_TOOLS is set. |
| MCP_TRANSPORT | ❌ | stdio | http to enable Streamable HTTP transport |
| MCP_HTTP_TOKEN | conditional | — | Bearer token. Required when MCP_TRANSPORT=http |
| MCP_HTTP_PORT | ❌ | 3000 | HTTP listen port |
| MCP_HTTP_HOST | ❌ | 127.0.0.1 | HTTP bind host (DNS rebinding protection auto-enabled for localhost) |
| MCP_HTTP_SKIP_AUTH | ❌ | false | Skip Bearer auth — e.g. behind a reverse proxy that handles it |
Categories (8): experiments, runs, registry, logged-models, traces, assessments, webhooks, prompts.
When MCP_TRANSPORT=http: POST /mcp (Bearer-auth JSON-RPC) + GET /health (public liveness).
Databricks managed MLflow
For Databricks-hosted MLflow:
MLFLOW_TRACKING_URI=https://<workspace>.cloud.databricks.com
MLFLOW_TRACKING_TOKEN=dapi... # PAT or service-principal tokenThe MLflow REST API path (/api/2.0/mlflow/...) is identical between OSS and Databricks. Bearer auth handles both PAT and service-principal flows.
Token efficiency
| Scenario | Tools | Schema tokens | vs default |
|----------|------:|--------------:|-----------:|
| default (all categories) | 78 | 9,200 | — |
| typical (MLFLOW_TOOLS=experiments,runs,registry,traces) | 54 | 5,900 | −36% |
| narrow (MLFLOW_TOOLS=experiments,runs) | 27 | 3,200 | −66% |
Plus extractFields on search-traces / get-trace / summarize-experiment — caller can scope response fields per call.
Read-only mode
By default, all writes are blocked. The following require MLFLOW_ALLOW_WRITE=true:
create-experiment, update-experiment, delete-experiment, restore-experiment, set-experiment-tag, delete-experiment-tag, create-run, update-run, delete-run, restore-run, log-metric, log-param, log-batch, log-inputs, set-run-tag, delete-run-tag, create-registered-model, rename-registered-model, update-registered-model, delete-registered-model, plus all model-version, logged-model, trace, assessment, webhook, and prompt-optimization writes.
MCP Prompts (4)
Workflow templates available via MCP prompts/list:
debug-failed-traces— find failed traces, group failure modespromote-best-run— find best run, register, setchampionaliascompare-top-runs— top-N comparison by metricannotate-trace-quality— guided feedback annotation loop
MCP Resources
URI-based read-only access:
mlflow://run/{runId}, mlflow://experiment/{expId}, mlflow://experiment-by-name/{name}, mlflow://registered-model/{name}, mlflow://model-version/{name}/{version}, mlflow://trace/{traceId}, mlflow://run/{runId}/artifacts, mlflow://experiment/{expId}/runs, mlflow://registered-model/{name}/versions.
Tools (82)
8 categories. Use search-tools to discover at runtime; full list collapsed below.
Experiments (9)
create-experiment, search-experiments, get-experiment, get-experiment-by-name, update-experiment, delete-experiment, restore-experiment, set-experiment-tag, delete-experiment-tag
Runs (18)
create-run, get-run, search-runs, update-run, delete-run, restore-run, log-metric, log-param, log-batch, log-inputs, get-metric-history, set-run-tag, delete-run-tag, list-artifacts, get-best-run, compare-runs, search-runs-by-tags, summarize-run (aggregation)
Registered Models (12)
create-registered-model, get-registered-model, search-registered-models, rename-registered-model, update-registered-model, delete-registered-model, get-latest-model-versions, set-registered-model-tag, delete-registered-model-tag, set-registered-model-alias, delete-registered-model-alias, get-model-version-by-alias
Model Versions (9)
create-model-version, get-model-version, search-model-versions, update-model-version, delete-model-version, transition-model-version-stage, get-model-version-download-uri, set-model-version-tag, delete-model-version-tag
Logged Models — MLflow 3 (8)
create-logged-model, search-logged-models, get-logged-model, finalize-logged-model, delete-logged-model, set-logged-model-tags, delete-logged-model-tag, log-logged-model-params
Traces (8)
search-traces, get-trace, get-trace-info, delete-traces, set-trace-tag, delete-trace-tag, list-trace-attachments, get-trace-attachment
search-traces, get-trace, and summarize-experiment accept extractFields for response slicing.
Assessments (5)
log-feedback, log-expectation, get-assessment, update-assessment, delete-assessment
Webhooks (6)
create-webhook, list-webhooks, get-webhook, update-webhook, delete-webhook, test-webhook
Prompt Optimization (5)
create-prompt-optimization-job, get-prompt-optimization-job, search-prompt-optimization-jobs, cancel-prompt-optimization-job, delete-prompt-optimization-job
Aggregations
summarize-experiment, summarize-run — fold 3–5 round-trips into one structured response with caveats array.
Meta
search-tools — query other tools by keyword; always enabled.
Local validation with docker compose
# 1. start MLflow (UI at http://localhost:5050)
docker compose up -d mlflow
# 2. seed demo experiment, runs, registered model, traces
docker compose run --rm seed
# 3a. probe the MCP server locally against the compose'd MLflow
MLFLOW_TRACKING_URI=http://localhost:5050 \
MLFLOW_EXPERIMENT_ID=1 \
MLFLOW_ALLOW_WRITE=true \
node dist/index.js
# 3b. or run inside compose (stdio)
docker compose run --rm mcp
# tear down
docker compose down -v./dev/seed.py is idempotent — skips if demo experiment already has runs.
Architecture
Claude → MCP stdio → src/index.ts → src/tools/*.ts → MlflowClient (fetch) → MLflow REST APIBuilt on @us-all/mcp-toolkit:
extractFields— token-efficient response projectionsaggregate(fetchers, caveats)— fan-out helper forsummarize-experimentcreateWrapToolHandler— Bearer/basic credential redaction + MlflowError extractionsearch-toolsmeta-tool
Targets MLflow 3.5.1+ (uses v3 traces/assessments REST). Validated end-to-end against MLflow 3.11.1.
Tech stack
Node.js 20+ • TypeScript strict ESM • pnpm • @modelcontextprotocol/sdk • zod • dotenv • vitest.
References
- MLflow MCP overview: https://mlflow.org/docs/latest/genai/mcp/
- MLflow REST API: https://mlflow.org/docs/latest/api_reference/rest-api.html
