npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dici1435/observability-mcp

v1.0.12

Published

MCP server for querying logs and traces from Loki/Tempo observability stack

Readme

@dici1435/observability-mcp

MCP (Model Context Protocol) server that exposes observability tools to Cursor for AI-assisted debugging. Query logs, traces, error codes, service health, and error rates directly in your IDE.

Installation

One-Click Install (Recommended)

Click the button below to install directly to Cursor:

Install to Cursor

Manual Installation

Add to your ~/.cursor/mcp.json:

{
  "mcpServers": {
    "fp-observability": {
      "command": "npx",
      "args": ["-y", "@dici1435/observability-mcp"],
      "env": {
        "LOKI_URL": "http://localhost:3100",
        "TEMPO_URL": "http://localhost:3200",
        "API_GATEWAY_URL": "http://localhost:3000"
      }
    }
  }
}

Then restart Cursor.

Tools

get_trace

Retrieve a distributed trace by traceId from Tempo. Now includes error registry metadata inline.

"Get trace abc123def456"
"Show me what happened in trace 5f8d3a..."

Features:

  • Supports partial trace IDs (minimum 8 characters)
  • Automatically resolves short IDs from console logs
  • Shows error code metadata from the live registry (category, severity, retryable)

get_logs

Query logs from Loki with flexible filtering. Enhanced error/warn display with promoted error attributes.

"Show error logs from api-gateway"
"Get logs for trace abc123"
"Find logs mentioning 'timeout' in the last 30 minutes"

Parameters: service, level, traceId, spanId, flowId, correlationId, search, since, limit

analyze_request

Comprehensive analysis combining trace and log data. Now includes codeRef extraction and smart registry-based recommendations.

"Analyze what happened to request with trace abc123"
"Debug the failed request xyz789"

Smart recommendations based on error registry:

  • Retryable errors prompt retry verification
  • External errors point to third-party health
  • Critical errors flag expected pages
  • Runbook links included when available

run_tests

Execute tests and capture trace IDs from the output.

"Run unit tests for packages/testing"
"Run e2e:api tests"

Test Types: unit (default), e2e:api, e2e:browser, traced

search_traces (NEW)

Search for traces matching criteria. Returns lightweight summaries without N+1 full trace fetching.

"Find error traces from api-gateway in the last 30 minutes"
"Search for slow traces over 2 seconds"

Parameters: service, operation, minDuration, maxDuration, status (error/ok), tags, since, limit

check_services (NEW)

Check health of all services: api-gateway + downstream gRPC microservices + infrastructure (Loki, Tempo, Prometheus).

"Are all services healthy?"
"Check service health"

Uses the api-gateway deep health endpoint for gRPC fan-out to identity, core-apps-routing, lenders, finance, and edge-ops.

get_error_info (NEW)

Look up FormPiper error code metadata from the live error registry.

"What does FP.LENDERS.SUBMISSION_FAILED mean?"
"Show all system-category errors"
"List all critical severity errors"

Parameters: code, codeRef, category (user/system/external), severity (critical/high/medium/low)

get_error_rate (NEW)

Query error rates per service using Loki log counts.

"What's the error rate across services?"
"Show error rates for the last 5 minutes"

Parameters: service, since (default: "5m"), threshold (default: 1%)

compare_traces (NEW)

Compare two traces side-by-side with configurable span matching.

"Compare trace abc123 (passing) with trace def456 (failing)"

Parameters: traceIdA, traceIdB, matchStrategy (default/strict/loose)

Matching strategies:

  • default: operation + service + parent operation with positional tiebreaker
  • strict: adds spanKind + depth for highly uniform traces
  • loose: operation + service only for simple request/response traces

Configuration

Environment Variables

| Variable | Default | Description | | -------------------------- | ----------------------- | ------------------------------------------------------------------ | | LOKI_URL | http://localhost:3100 | Loki server URL | | TEMPO_URL | http://localhost:3200 | Tempo server URL | | API_GATEWAY_URL | http://localhost:3000 | fp-mono api-gateway URL (error registry + deep health) | | OBSERVABILITY_API_KEY | (none) | API key for ObservabilityGuard (optional in dev, required in prod) | | PROMETHEUS_URL | http://localhost:9090 | Prometheus URL (for check_services health) | | SPAN_MATCH_STRATEGY | default | Default span matching for compare_traces | | DICI_WORKSPACE_ROOT | Current directory | Workspace root for running tests | | LOKI_FLOW_ID_ATTR | flowId | LogQL field name for flow ID | | LOKI_CORRELATION_ID_ATTR | correlationId | LogQL field name for correlation ID |

How It Works

Cursor IDE
    │
    │ MCP protocol (stdio)
    ▼
fp-observability MCP server (9 tools)
    │
    │ HTTP calls
    ▼
┌──────────────────────────────────────┐
│  Observability Stack                 │
│  • Loki  (logs)                      │
│  • Tempo (traces)                    │
│  • Prometheus (health checks)        │
│                                      │
│  fp-mono api-gateway                 │
│  • /api/v1/error-registry (metadata) │
│  • /api/v1/health/deep (gRPC fan-out)│
│    → identity (:50051)               │
│    → core-apps-routing (:50052)      │
│    → lenders (:50053)                │
│    → finance (:50054)                │
│    → edge-ops (:50055)               │
└──────────────────────────────────────┘

Development

Using Local Build

{
  "mcpServers": {
    "fp-observability": {
      "command": "node",
      "args": ["/path/to/dici-new/packages/mcp-observability/dist/index.js"],
      "env": {
        "LOKI_URL": "http://localhost:3100",
        "TEMPO_URL": "http://localhost:3200",
        "API_GATEWAY_URL": "http://localhost:3000"
      }
    }
  }
}

Workflow

  1. Make changes to source files in src/
  2. Rebuild: pnpm build
  3. Reload MCP in Cursor: Cmd+Shift+P → "Developer: Reload Window"

Troubleshooting

"Cannot connect to Loki/Tempo"

  1. Verify your observability stack is running
  2. Check the configured URLs are correct
  3. Use check_services tool to diagnose all services at once

"Error registry unavailable"

  1. Ensure fp-mono api-gateway is running
  2. Check API_GATEWAY_URL is correct
  3. If in production, set OBSERVABILITY_API_KEY env var

Partial trace ID not resolving

  • Use the full 32-character trace ID
  • Or increase the search window with since: "7d"

Known Limitations

  • Error rates are approximated, not precise. get_error_rate counts log lines in Loki as a proxy for error rates. This is not real metrics -- it's an approximation. Until the OTel Collector is configured to export metrics to Prometheus, there is no request-level error rate data available.
  • Dev-environment only. The entire stack depends on Docker Compose being up. This is IDE-integrated debugging for local development, not production observability.
  • Only as good as the telemetry. If a service has poor span coverage or doesn't propagate trace context correctly, the trace data will have gaps. The MCP tools can't fix bad instrumentation -- they surface what the services emit.
  • compare_traces matching is inherently fuzzy. Span matching across two different traces relies on heuristics (operation name, service, parent). Structural differences from conditional code paths, retries, or fan-out variations can make comparisons noisy. The three matching strategies (default, strict, loose) help, but aren't perfect.
  • No real-time streaming. All tools are request/response. There is no live tail of logs or traces -- each query is a point-in-time snapshot.

Roadmap

This MCP server currently runs locally against local Loki/Tempo/Prometheus and fp-mono api-gateway. The goal is to deploy it as a production MCP server for live agent-assisted debugging.

Production Deployment

  • [ ] Add authentication layer (API key or OAuth) for production Loki/Tempo/Prometheus access
  • [ ] Set OBSERVABILITY_API_KEY in production for error registry + deep health access
  • [ ] Add TLS support for all client connections
  • [ ] Deploy as a standalone service (Docker container or serverless function)
  • [ ] Add rate limiting to prevent runaway agent queries against production observability stack
  • [ ] Add read-only query guards (prevent agents from running expensive unbounded queries)
  • [ ] Support remote MCP transport (SSE or HTTP) instead of stdio for production use
  • [ ] Add multi-environment support (staging vs production) via environment selector

Tool Enhancements

  • [ ] Configure OTel Collector to export metrics to Prometheus for real request-level error rates (replacing Loki log-count approximation)
  • [ ] Add TraceQL support to search_traces for advanced trace querying
  • [ ] Add Grafana dashboard links in get_trace and get_error_rate output
  • [ ] Correlate Temporal workflow executions with their traces and logs in a single view
  • [ ] Proactive anomaly surfacing -- detect elevated error rates or degraded services on session start instead of waiting for the user to ask
  • [ ] Implement gRPC Health Checking Protocol (grpc.health.v1) on all microservices for cleaner deep health checks

Integrations

  • [ ] Add alerting integration (query PagerDuty/OpsGenie for active incidents alongside trace data)
  • [ ] Link to Temporal UI for workflow-level debugging when traces span workflow activities

License

MIT