cubeapm-mcp
v1.3.1
Published
MCP server for CubeAPM - Query traces, metrics, and logs from your observability platform using AI assistants
Maintainers
Readme
CubeAPM MCP Server
A Model Context Protocol (MCP) server for CubeAPM - enabling AI assistants like Claude to query your observability data including traces, metrics, and logs.
What is this?
This MCP server connects AI assistants (like Claude) to your CubeAPM instance, allowing you to:
- Query logs using natural language that gets translated to LogsQL
- Analyze metrics with PromQL-compatible queries
- Search and inspect traces to debug distributed systems
- Monitor your services through conversational interfaces
Installation
From NPM (Recommended)
npm install -g cubeapm-mcpFrom Source
git clone https://github.com/TechnicalRhino/cubeapm-mcp.git
cd cubeapm-mcp
npm install
npm run buildQuick Start
1. Configure Claude Code
Add to your Claude Code settings (~/.claude/settings.json):
{
"mcpServers": {
"cubeapm": {
"command": "npx",
"args": ["-y", "cubeapm-mcp"],
"env": {
"CUBEAPM_HOST": "your-cubeapm-server.com"
}
}
}
}2. Restart Claude Code
After updating the settings, restart Claude Code to load the MCP server.
3. Start Querying
You can now ask Claude questions like:
- "Show me error logs from the payment-service in the last hour"
- "What's the p99 latency for the checkout API?"
- "Find traces where duration > 5s in production"
- "Get the full trace for trace ID abc123def456"
Configuration
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| CUBEAPM_URL | - | Full URL to CubeAPM (e.g., https://cube.example.com). Takes precedence over HOST/PORT settings. |
| CUBEAPM_HOST | localhost | CubeAPM server hostname or IP (used if CUBEAPM_URL not set) |
| CUBEAPM_QUERY_PORT | 3140 | Port for querying APIs (traces, metrics, logs) |
| CUBEAPM_INGEST_PORT | 3130 | Port for ingestion APIs |
Example Configurations
Local Development:
{
"mcpServers": {
"cubeapm": {
"command": "npx",
"args": ["-y", "cubeapm-mcp"],
"env": {
"CUBEAPM_HOST": "localhost"
}
}
}
}Production (with full URL):
{
"mcpServers": {
"cubeapm": {
"command": "npx",
"args": ["-y", "cubeapm-mcp"],
"env": {
"CUBEAPM_URL": "https://cubeapm.internal.company.com"
}
}
}
}Production (with host/port):
{
"mcpServers": {
"cubeapm": {
"command": "npx",
"args": ["-y", "cubeapm-mcp"],
"env": {
"CUBEAPM_HOST": "cubeapm.internal.company.com",
"CUBEAPM_QUERY_PORT": "3140"
}
}
}
}Available Tools
Logs
| Tool | Description |
|------|-------------|
| query_logs | Query logs using LogsQL syntax with time range and limit |
Parameters:
query- LogsQL query string (e.g.,{service="api"} error)start- Start time (RFC3339 or Unix timestamp)end- End time (RFC3339 or Unix timestamp)limit- Maximum entries to return (default: 100)
Metrics
| Tool | Description |
|------|-------------|
| query_metrics_instant | Execute a PromQL query at a single point in time |
| query_metrics_range | Execute a PromQL query over a time range |
Instant Query Parameters:
query- PromQL expressiontime- Evaluation timestampstep- Optional time window in seconds
Range Query Parameters:
query- PromQL expressionstart/end- Time rangestep- Resolution in seconds
Traces
| Tool | Description |
|------|-------------|
| search_traces | Search traces by service, environment, or custom query |
| get_trace | Fetch complete trace details by trace ID |
Search Parameters (Required):
query- Search query (default:*for wildcard)env- Environment filter (default:UNSET)service- Service name filter (required, case-sensitive)start/end- Time range (RFC3339 or Unix timestamp)
Search Parameters (Optional):
limit- Maximum results (default: 20)spanKind- Filter by span type:server,client,consumer,producersortBy- Sort by:duration(useful for finding slow traces)
Get Trace Parameters:
trace_id- Hex-encoded trace IDstart/end- Time range to search within
Ingestion
| Tool | Description |
|------|-------------|
| ingest_metrics_prometheus | Send metrics in Prometheus text exposition format |
Prompts
Pre-defined templates for common observability tasks:
| Prompt | Description |
|--------|-------------|
| investigate-service | Comprehensive service investigation - checks errors, latency, and traces |
| check-latency | Get P50, P95, P99 latency percentiles for a service |
| find-slow-traces | Find slowest traces to identify performance bottlenecks |
Usage Example:
Use the investigate-service prompt for Kratos-ProdResources
Readable resources exposing CubeAPM data and configuration:
| Resource URI | Description |
|--------------|-------------|
| cubeapm://config | Current CubeAPM connection configuration |
| cubeapm://query-patterns | Query patterns and naming conventions reference |
CubeAPM Query Patterns
Metrics (PromQL / MetricsQL)
CubeAPM uses specific naming conventions that differ from standard OpenTelemetry:
| What | CubeAPM Convention |
|------|-------------------|
| Metric prefix | cube_apm_* (e.g., cube_apm_calls_total, cube_apm_latency_bucket) |
| Service label | service (NOT server or service_name) |
| Common labels | env, service, span_kind, status_code, http_code |
Histogram Queries (P50, P90, P95, P99)
CubeAPM uses VictoriaMetrics-style histograms with vmrange labels instead of Prometheus le buckets:
# ✅ Correct - Use histogram_quantiles() with vmrange
histogram_quantiles("phi", 0.95, sum by (vmrange, service) (
increase(cube_apm_latency_bucket{service="MyService", span_kind="server"}[5m])
))
# ❌ Wrong - Standard Prometheus syntax won't work
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_bucket[5m])))Note: Latency values are returned in seconds (0.05 = 50ms)
Logs (LogsQL)
Stream Selectors
Log labels vary by source. Use * query first to discover available labels:
| Source | Common Labels |
|--------|---------------|
| Lambda functions | faas.name, faas.arn, env, aws.lambda_request_id |
| Services | service_name, level, host |
# Discover all labels
*
# Lambda function logs
{faas.name="my-lambda-prod"}
# Regex match
{faas.name=~".*-prod"}
# Text filter with boolean operators
{faas.name=~".*"} AND "error" AND NOT "retry"Pipe Operators
Chain after any query with |:
| Pipe | Syntax | Description |
|------|--------|-------------|
| copy | \| copy src AS dst | Copy field value |
| drop | \| drop field1, field2 | Remove fields from output |
| extract_regexp | \| extract_regexp "(?P<name>re)" | Extract via named capture groups |
| join | \| join by (field) (...subquery...) | Join with subquery results |
| keep | \| keep field1, field2 | Keep only specified fields |
| limit | \| limit N | Return at most N results |
| math | \| math result = f1 + f2 | Arithmetic (+, -, *, /, %) |
| rename | \| rename src AS dst | Rename a field |
| replace | \| replace (field, "old", "new") | Substring replacement |
| replace_regexp | \| replace_regexp (field, "re", "repl") | Regex replacement |
| sort | \| sort by (field) [asc\|desc] | Sort results |
| stats | \| stats <func> as alias [by (fields)] | Aggregate results |
| unpack_json | \| unpack_json | Extract fields from JSON body |
Stats Functions
Used with the | stats pipe:
| Function | Description |
|----------|-------------|
| avg(field) | Arithmetic mean |
| count() | Total matching entries |
| count_empty(field) | Entries where field is empty |
| count_uniq(field) | Distinct values |
| max(field) | Maximum value |
| median(field) | Median (50th percentile) |
| min(field) | Minimum value |
| quantile(p, field) | p-th quantile (e.g., quantile(0.95, duration)) |
| sum(field) | Sum of values |
Example Log Queries
# Count errors per Lambda function
{faas.name=~".*"} AND "error" | stats count() as errors by (faas.name)
# Top 10 slowest requests
{service_name="my-service"} | sort by (duration) desc | limit 10
# Extract and aggregate from JSON logs
{service_name="api"} | unpack_json | stats avg(response_time) as avg_rt by (endpoint)Traces
Trace queries use the same pipe syntax as logs: {stream_selector} | pipe1 | pipe2
Important notes:
query,env,serviceare REQUIRED parameters- Duration is in milliseconds (not seconds like metrics)
p95is NOT a valid stats function — usequantile(0.95, duration)- Service names are case-sensitive (e.g.,
"Kratos-Prod"not"kratos")
Example Trace Queries
# P95 latency for a service
{service="Kratos-Prod", span_kind="server"} | stats quantile(0.95, duration) as p95_ms
# Error count by endpoint
{service="Kratos-Prod", status_code="ERROR"} | stats count() as errors by (http_route)
# Slowest spans
{service="Kratos-Prod"} | sort by (duration) desc | limit 20Example Natural Language Queries
Logs
"Show me logs from webhook-lambda-prod"
"Find all logs containing 'timeout' in the last hour"
"Count errors per Lambda function in the last 24h"Metrics
"What's the P95 latency for Kratos-Prod service?"
"Show me error rate for all services"
"List all available services in CubeAPM"Traces
"Find the P95 latency for Kratos-Prod using trace stats"
"Show me traces with errors in the production environment"
"Get the full waterfall for trace ID abc123"Development
# Clone the repository
git clone https://github.com/TechnicalRhino/cubeapm-mcp.git
cd cubeapm-mcp
# Install dependencies
npm install
# Run in development mode (with hot reload)
npm run dev
# Build for production
npm run build
# Test the build
npm startHow It Works
┌─────────────────┐ MCP Protocol ┌─────────────────┐ HTTP API ┌─────────────────┐
│ Claude / AI │◄───────────────────►│ cubeapm-mcp │◄────────────────►│ CubeAPM │
│ Assistant │ (stdio transport) │ MCP Server │ (REST calls) │ Server │
└─────────────────┘ └─────────────────┘ └─────────────────┘The MCP server:
- Receives tool calls from the AI assistant via stdio
- Translates them to CubeAPM HTTP API requests
- Returns formatted results back to the assistant
Requirements
- Node.js 18+
- CubeAPM instance (self-hosted or cloud)
- Claude Code or any MCP-compatible client
Related Links
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.
