openclaw-grafana-lens
v0.5.0
Published
OpenClaw plugin that gives AI agents full Grafana access — 18 composable tools for PromQL/LogQL/TraceQL queries, dashboard creation, alerting, SRE investigation, security monitoring, data collection pipeline management via Grafana Alloy (29 recipes), and
Maintainers
Keywords
Readme
Grafana Lens
Agent-driven Grafana observability for OpenClaw — query, visualize, alert, trace, and share across 15+ messaging channels.
Note: This is a community-built OpenClaw plugin, not an official Grafana Labs product. Grafana, Loki, Tempo, and Prometheus are trademarks of Grafana Labs.
OpenClaw is an open-source AI agent platform. Grafana Lens extends it with full Grafana integration — 18 composable tools that let your agent query metrics and logs, trace distributed requests, create dashboards, set up alerts, render charts, run security audits, investigate incidents, push custom data, and manage data collection pipelines — all through natural language conversation.
Why Grafana Lens?
| Pain Point | How Grafana Lens Helps |
|---|---|
| "Where did my budget go?" | Cost dashboards with model-level attribution, token tracking, and cost anomaly alerts |
| "Is my agent stuck in a loop?" | Real-time tool loop detection, stuck session monitoring, and SRE operations dashboard |
| "Am I being prompt-injected?" | 12-pattern prompt injection detection with security dashboard and threat-level reporting |
| "I need observability but don't want another SaaS" | Fully self-hosted, open source, OTLP-native — runs on a free local Grafana stack |
| "I can't debug multi-step agent sessions" | Hierarchical traces: session → LLM call → tool execution, with log-to-trace correlation |
| "My alert fired — now what?" | grafana_investigate gathers metrics, logs, traces in parallel and generates hypotheses with specific tool+params for follow-up |
| "I want to track my own data in Grafana" | Push any custom metrics (fitness, calendar, git, finance) from conversation |
| "How do I get my data INTO Grafana?" | alloy_pipeline sets up data collection from databases, Docker, Kubernetes, log files, and more — 29 recipes, just describe what you want to monitor |
Key Features
- 18 Composable Agent Tools — Query PromQL/LogQL/TraceQL, create dashboards, set alerts, share panel images, run security checks, investigate incidents, push custom metrics, manage data collection pipelines, and more
- SRE Investigation — Multi-signal triage (
grafana_investigate), anomaly scoring with z-score against 7-day baselines, seasonality comparison, and alert fatigue detection - Full OTLP Observability — Metrics → Prometheus, Logs → Loki, Traces → Tempo. Push-based with no scraping — data is available immediately
- Security Monitoring — 6-check threat assessment covering prompt injection, cost anomalies, tool loops, session enumeration, webhook errors, and stuck sessions
- 12 Pre-Built Dashboard Templates — From LLM Command Center and Cost Intelligence to Security Overview and SRE Operations
- Custom Data Observatory — Push any external data (calendar events, git commits, fitness stats, financial metrics) into Grafana via conversation
- Works with ANY Datasource — Not limited to OpenClaw metrics. Query any Prometheus or Loki datasource configured in your Grafana instance
- Data Collection Pipeline Management — 29 pre-built Alloy pipeline recipes across 5 categories (metrics, logs, traces, infrastructure, profiling) — describe what you want to monitor and the agent handles the Alloy configuration
Quick Start
# 1. Start the LGTM observability stack (Grafana + Prometheus + Loki + Tempo + OTel Collector)
# See https://github.com/grafana/docker-otel-lgtm for more options
docker pull grafana/otel-lgtm:latest
docker run -d --name lgtm -p 3000:3000 -p 4317:4317 -p 4318:4318 -p 9090:9090 grafana/otel-lgtm:latest
# 2. Install the plugin
openclaw plugins install openclaw-grafana-lens
# 3. Configure credentials (see "Configuration" section below for full options)
export GRAFANA_URL=http://localhost:3000
export GRAFANA_SERVICE_ACCOUNT_TOKEN=glsa_xxxxxxxxxxxx
# 4. Restart the gateway to load the plugin
openclaw gateway restart
# Optional: Enable Alloy pipeline management (for data collection)
# See "Alloy Pipeline Management" section below for full setup
export ALLOY_URL=http://localhost:12345
export ALLOY_CONFIG_DIR=/path/to/alloy/config.dTable of Contents
- Prerequisites
- Setup Guide
- What Can You Do?
- Agent Tools
- Alloy Pipeline Management
- Dashboard Templates
- Security Monitoring
- Custom Metrics
- Architecture
- Observability Deep Dive
- Configuration Reference
- Development
- License
Prerequisites
- OpenClaw installed and running
- Docker (for the recommended LGTM stack) or an existing Grafana instance
- Grafana is an open-source platform for monitoring, visualization, and alerting. It connects to data sources like Prometheus (metrics), Loki (logs), and Tempo (traces) to give you dashboards, alerts, and exploration tools for any data.
- Grafana Service Account Token with Editor role (see setup steps below)
Setup Guide
1. LGTM Stack (Docker)
The recommended local setup uses the grafana/otel-lgtm all-in-one Docker image, maintained by Grafana Labs. It bundles everything you need in a single container — no configuration files, no compose setup:
| Component | Port | Purpose | |-----------|------|---------| | Grafana | 3000 | Dashboards, alerting, visualization | | Prometheus | 9090 | Metrics storage and querying | | Loki | — | Log aggregation | | Tempo | — | Distributed tracing | | OTel Collector | 4317 (gRPC), 4318 (HTTP) | Receives OTLP metrics, logs, and traces |
docker pull grafana/otel-lgtm:latest
docker run -d --name lgtm -p 3000:3000 -p 4317:4317 -p 4318:4318 -p 9090:9090 grafana/otel-lgtm:latestTip: You can also use the
run-lgtm.shscripts from the official repo for additional options like volume persistence and OBI auto-instrumentation.
Default credentials: admin / admin
Create a Service Account Token
- Open Grafana at
http://localhost:3000 - Go to Administration → Service Accounts → Add service account
- Name it (e.g.,
grafana-lens), set role to Editor - Click Add service account token → Generate token
- Copy the token (starts with
glsa_) — you'll need it for configuration
2. Plugin Installation
# Install from npm
openclaw plugins install openclaw-grafana-lens
# Restart the gateway to load the plugin
openclaw gateway restartFor local development:
# Clone and link locally
git clone <repo-url> ~/workspace/grafana-lens
cd ~/workspace/grafana-lens && npm install
openclaw plugins install -l ~/workspace/grafana-lens
openclaw gateway restart2b. Grafana Alloy (Optional -- Data Collection Pipelines)
Grafana Alloy is Grafana Labs' open-source telemetry collector. It collects metrics, logs, traces, and profiles from your infrastructure and applications, then forwards them to backends like Prometheus, Loki, Tempo, and Pyroscope. Think of it as the "data pipeline" -- it gets your data INTO Grafana.
Grafana Lens manages Alloy pipelines through the alloy_pipeline tool. Once Alloy is running, the agent can create, update, delete, and diagnose pipelines via conversation -- just describe what you want to monitor.
Option A: Docker Compose (Full LGTM + Alloy Stack)
The alloy-scenarios repository includes a grafana-lens-test environment with a complete LGTM + Alloy stack:
cd alloy-scenarios/grafana-lens-test
docker compose --env-file ../image-versions.env up -d
# Verify
curl -sf http://localhost:3000/api/health && echo "Grafana OK"
curl -sf http://localhost:12345/-/ready && echo "Alloy OK"
curl -sf http://localhost:9090/-/ready && echo "Prometheus OK"| Service | Port | Purpose | |---------|------|---------| | Alloy | 12345 | Telemetry collector (HTTP API + live debugging UI) | | Grafana | 3000 | Dashboards and visualization | | Prometheus | 9090 | Metrics storage | | Loki | -- | Log aggregation | | Tempo | -- | Distributed tracing |
Option B: Add Alloy to an Existing Setup
Install Alloy standalone (installation guide):
# macOS
brew install grafana/grafana/alloy
# Linux (Debian/Ubuntu)
sudo apt install alloy
# Run in directory mode (loads all .alloy files from the directory)
mkdir -p /path/to/alloy/config.d
alloy run /path/to/alloy/config.d/Alloy runs its HTTP API on port 12345 by default. Grafana Lens communicates with this API to reload configuration after creating or updating pipelines. No restart needed -- Alloy picks up new config files via a hot-reload API call.
Key insight:
alloy.urlin plugin config useslocalhost:12345(your machine → Alloy), butalloy.lgtm.*URLs use Docker service names (e.g.,http://prometheus:9090) because they are embedded in generated.alloyconfigs that run inside Docker. If you're running everything on bare metal, all URLs uselocalhost.
3. Configuration
Minimal Setup (Environment Variables)
The simplest way to configure Grafana Lens — just two environment variables:
export GRAFANA_URL=http://localhost:3000
export GRAFANA_SERVICE_ACCOUNT_TOKEN=glsa_xxxxxxxxxxxxFull Configuration
For more control, add the plugin config to ~/.openclaw/openclaw.json:
{
"plugins": {
"entries": {
"openclaw-grafana-lens": {
"enabled": true,
"config": {
"grafana": {
"url": "http://localhost:3000", // or set GRAFANA_URL env var
"apiKey": "glsa_xxxxxxxxxxxx", // or set GRAFANA_SERVICE_ACCOUNT_TOKEN env var
"orgId": 1 // optional, default 1
},
"metrics": {
"enabled": true // enable OTLP telemetry (default: true)
},
"otlp": {
"endpoint": "http://localhost:4318/v1/metrics", // OTLP collector endpoint
"exportIntervalMs": 15000, // push interval in ms (default: 15000)
"logs": true, // push logs to Loki (default: true)
"traces": true, // push traces to Tempo (default: true)
"captureContent": true, // include prompts/completions in telemetry (default: true)
"contentMaxLength": 2000, // truncation limit for content fields (default: 2000)
"forwardAppLogs": true, // forward OpenClaw app logs to Loki (default: true)
"appLogMinSeverity": "debug", // min severity: trace/debug/info/warn/error/fatal
"redactSecrets": true // auto-strip API keys/tokens before export (default: true)
},
"proactive": {
"enabled": false, // enable Grafana alert webhook handler
"webhookPath": "/grafana-lens/alerts", // HTTP path for webhook endpoint
"costAlertThreshold": 5.0 // daily cost alert threshold in USD
},
"customMetrics": {
"enabled": true, // allow custom metric push (default: true)
"maxMetrics": 100, // max metric definitions
"maxLabelsPerMetric": 5, // max label keys per metric
"maxLabelValues": 50, // max unique label combos per metric
"defaultTtlDays": null // optional auto-expiry in days
},
"alloy": {
"enabled": true, // enable Alloy pipeline management (default: false)
"url": "http://localhost:12345", // Alloy HTTP API (or set ALLOY_URL env var)
"configDir": "/path/to/alloy/config.d", // where pipeline configs are written (or set ALLOY_CONFIG_DIR)
"filePrefix": "lens-", // prefix for managed config files (default: "lens-")
"maxPipelines": 20, // max managed pipelines (default: 20)
"lgtm": { // export target URLs embedded in generated configs
"prometheusRemoteWriteUrl": "http://prometheus:9090/api/v1/write",
"lokiUrl": "http://loki:3100/loki/api/v1/push",
"otlpEndpoint": "http://tempo:4318",
"pyroscopeUrl": "http://localhost:4040" // optional, for profiling recipes
}
}
}
}
}
}
}Environment Variable Fallbacks
| Env Var | Config Key | Description |
|---------|------------|-------------|
| GRAFANA_URL | grafana.url | Grafana instance URL |
| GRAFANA_SERVICE_ACCOUNT_TOKEN | grafana.apiKey | Service account token |
| OTEL_EXPORTER_OTLP_ENDPOINT | otlp.endpoint | OTLP collector base URL (auto-appends /v1/metrics) |
| OTEL_EXPORTER_OTLP_HEADERS | otlp.headers | Custom headers in key=value,key2=value2 format |
| ALLOY_URL | alloy.url | Alloy HTTP API URL (default: http://localhost:12345) |
| ALLOY_CONFIG_DIR | alloy.configDir | Directory where pipeline config files are written |
Config precedence: explicit plugin config > environment variables > defaults.
What Can You Do?
For OpenClaw Users — Bring Grafana Into Your Agent's Toolkit
Just talk to your agent in natural language:
- "Create a cost dashboard" — Creates a pre-built cost intelligence dashboard with model attribution, cache savings, and spending trends
- "Alert me if daily spend exceeds $5" — Sets up a Grafana-native alert rule with PromQL condition
- "Show me a chart of my token usage" — Renders a panel as a PNG image and delivers it inline to your chat
- "Am I being attacked?" — Runs 6 parallel security checks and reports a threat level (green/yellow/red)
- "Track my daily steps" — Pushes custom fitness data into Grafana via OTLP for personal dashboards
- "What metrics are available?" — Discovers all metrics in your Prometheus datasource with descriptions
- "What does openclaw_lens_daily_cost_usd mean? Why did it spike?" — Explains the metric with current value, trend, stats, and drill-down suggestions
- "Find slow traces in the last hour" — Searches Tempo traces by duration, status, or span attributes using TraceQL
- "Show me the trace for session abc123" — Retrieves full distributed trace with hierarchical span details
- "Investigate this alert" — Gathers metrics, logs, traces, and annotations in parallel and suggests hypotheses with specific follow-up tool+params
- "Is this metric anomalous?" — Shows z-score against 7-day baseline, seasonality comparison (vs 1 day ago, vs 7 days ago), and severity rating
- "Which alerts are noisy?" — Detects always-firing, flapping, and error/nodata rules with optimization suggestions
- "Monitor my Postgres database" — Creates an Alloy pipeline using the postgres-exporter recipe with credential handling and sample queries
- "Collect Docker container logs" — Sets up a docker-logs pipeline that forwards container logs to Loki
- "Set up OTLP trace collection" — Deploys an otlp-receiver pipeline to accept traces from your applications
- "What pipelines are running?" — Lists all managed Alloy pipelines with status and signal type
- Create and customize your own unique dashboards — Combine any data from Prometheus, Loki, or custom metrics to build personalized monitoring views
For Grafana Power Users — Let an AI Agent Manage Your Grafana
- Natural language PromQL — Ask questions about your metrics without memorizing syntax
- Automated alert creation — Describe alert conditions in plain English; the agent generates the PromQL and creates the rule
- Dashboard management — Create, update, add/remove panels, or delete dashboards via conversation
- Panel sharing — Render any dashboard panel as an image and deliver it to Telegram, Slack, Discord, and 15+ other channels
- Infrastructure monitoring — Use pre-built templates for node-exporter and HTTP service monitoring
- Metric exploration — Discover, explain, and drill into any metric with
grafana_explain_metric - Dashboard auditing — Health-check all panels in a dashboard, find broken queries, verify datasource connectivity
- Search and discovery — Find existing dashboards by title, tags, or starred status
Agent Tools
All 18 tools are registered automatically when the plugin loads. The agent decides when to use each tool based on your request.
| Tool | Description | Example Use |
|------|-------------|-------------|
| grafana_explore_datasources | Discover datasources configured in Grafana | "What data sources do I have?" |
| grafana_list_metrics | Discover available metrics with optional metadata | "What metrics are available?" |
| grafana_query | Run PromQL instant or range queries | "What's my token usage today?" |
| grafana_query_logs | Run LogQL queries against Loki | "Show me error logs from the last hour" |
| grafana_create_dashboard | Create dashboards from 12 templates or custom JSON | "Create a cost dashboard" |
| grafana_update_dashboard | Add, remove, or update panels; change dashboard metadata | "Add a latency panel to my dashboard" |
| grafana_get_dashboard | Get compact dashboard summary with optional health audit | "What panels are on my overview?" |
| grafana_search | Search dashboards by title, tags, or starred status | "Do I have a cost dashboard?" |
| grafana_share_dashboard | Render panels as PNG images for inline delivery | "Show me a chart of my costs" |
| grafana_create_alert | Create Grafana-native alert rules with PromQL conditions | "Alert me if daily cost > $5" |
| grafana_check_alerts | List, acknowledge, silence alerts; manage rules and webhooks | "Are there any alerts firing?" |
| grafana_annotate | Create or query event annotations on dashboards | "Mark that deployment on my dashboard" |
| grafana_push_metrics | Push custom data (calendar, git, fitness, finance) via OTLP | "Track my daily steps in Grafana" |
| grafana_explain_metric | Get metric context: current value, trend, stats, metadata | "Why did my bill spike?" |
| grafana_security_check | Run 6 parallel security checks → threat level report | "Am I being attacked?" |
| grafana_query_traces | Run TraceQL queries against Tempo; search traces or get full trace by ID | "Find slow traces" / "Show trace for session X" |
| grafana_investigate | Multi-signal investigation triage with hypothesis generation | "Investigate this alert" / "What's wrong?" / "Root cause" |
| alloy_pipeline | Create and manage Alloy data collection pipelines (29 recipes, 7 actions) | "Monitor my Postgres DB" / "Collect Docker logs" / "Pipeline status" |
Alloy Pipeline Management
The alloy_pipeline tool manages Grafana Alloy data collection pipelines. Alloy is Grafana Labs' open-source telemetry collector -- it gets your data INTO Grafana by collecting metrics, logs, traces, and profiles from your infrastructure and forwarding them to the LGTM stack.
How It Works
- Describe what you want to monitor -- "monitor my Postgres database", "collect Docker logs", "set up OTLP trace ingestion"
- The agent selects a recipe -- 29 pre-built templates handle common scenarios with validation, credential management, and sample queries
- Config is generated and hot-reloaded -- An Alloy
.alloyconfig file is written and Alloy picks it up via API reload (no restart needed) - Data flows automatically -- The agent verifies data flow and provides sample queries and suggested next steps (dashboards, alerts)
For custom patterns not covered by recipes, the agent can write raw Alloy River config directly.
7 Actions
| Action | What It Does |
|--------|-------------|
| create | Deploy a new pipeline from a recipe or raw config |
| list | Show all managed pipelines with status |
| update | Change pipeline parameters or replace raw config |
| delete | Remove a pipeline and its config file |
| recipes | Browse the recipe catalog by category |
| status | Check component health and data flow for a pipeline |
| diagnose | Full system check: Alloy connectivity, all pipeline health, config drift, orphan files |
Recipe Categories (29 Recipes)
| Category | Count | Examples | |----------|-------|---------| | Metrics | 11 | scrape-endpoint, node-exporter, postgres-exporter, mysql-exporter, redis-exporter, kubernetes-pods | | Logs | 10 | docker-logs, file-logs, syslog, kubernetes-logs, kafka-logs, faro-frontend | | Traces | 4 | otlp-receiver, application-traces, span-metrics, service-graph | | Infrastructure | 3 | docker-metrics, elasticsearch-exporter, kafka-exporter | | Profiling | 1 | continuous-profiling (Pyroscope) |
Use alloy_pipeline with action recipes to browse the full catalog with required parameters and descriptions.
Credential Handling
Recipes that require credentials (database passwords, API keys) use Alloy's sys.env() function -- secrets are referenced as environment variables, never written to config files. The tool response includes envVarsRequired listing which variables must be set where Alloy runs.
Important: Alloy config reload is atomic. If a credential recipe's env vars aren't set, the reload failure blocks ALL managed pipelines until the env vars are set or the pipeline is deleted. The agent will warn you and ask if you have the credentials ready before creating such pipelines.
Workflow Integration
After creating a pipeline, the tool provides suggestedWorkflow with concrete next-step tool calls:
alloy_pipelineactionstatus-- verify data flow (data takes ~15-20s to appear)grafana_list_metricsorgrafana_query_logs-- discover collected datagrafana_create_dashboard-- visualize the datagrafana_create_alert-- set up monitoring
Tip: For real-world Alloy scenario examples (Postgres monitoring, Docker logs, Kubernetes, Kafka, OTLP tracing, and more), see the alloy-scenarios repository.
Dashboard Templates
Grafana Lens includes 12 pre-built dashboard templates. Create any of them with:
"Create a <template-name> dashboard"
AI Observability (Tier 1-3 Drill-Down Hierarchy)
| Template | Purpose | Key Panels |
|----------|---------|------------|
| llm-command-center | System overview (Tier 1) | Golden signals, live sessions, cost, cache efficiency, token rates |
| session-explorer | Session debugging (Tier 2) | Per-session traces, LLM calls, tool calls, conversation flow |
| cost-intelligence | Cost deep-dive (Tier 3a) | Spending trends, model attribution, cache savings, per-session cost |
| tool-performance | Tool analytics (Tier 3b) | Tool leaderboard, latency ranking, error rates, execution traces |
| sre-operations | SRE operations (Tier 3c) | Queue health, webhook errors, stuck sessions, tool loops, context pressure |
| genai-observability | Industry-standard gen_ai | OpenTelemetry gen_ai metrics: token analytics, LLM performance, traces |
| security-overview | Security monitoring | Webhook errors, injection signals, session anomalies, cost spikes |
Infrastructure & Generic
| Template | Purpose | Key Panels |
|----------|---------|------------|
| node-exporter | System health | CPU, memory, disk, network (Prometheus node-exporter metrics) |
| http-service | Web service monitoring | HTTP request rate, error rate, latency (RED signals) |
| metric-explorer | Single metric deep-dive | Interactive exploration of any metric from a dropdown |
| multi-kpi | KPI overview | 4-metric side-by-side comparison |
| weekly-review | Weekly trends | 2-metric trends with external data table |
All AI templates use Grafana template variables for dropdown selectors ($prometheus, $loki, $tempo, $model, $provider, etc.) and stable UIDs for cross-dashboard drill-down navigation.
Security Monitoring
Prompt injection is the #1 risk in OWASP's Top 10 for LLM Applications. Grafana Lens provides detection-only security monitoring — it observes and reports, but never blocks or terminates operations.
Security Check (grafana_security_check)
Runs 6 PromQL queries in parallel using Promise.allSettled (a single failing check won't break the others):
| Check | What It Detects | Green | Yellow | Red |
|-------|----------------|-------|--------|-----|
| webhook_error_ratio | Failing webhook deliveries | < 20% | 20–50% | ≥ 50% |
| cost_anomaly | Unusual daily spending | < $10 | $10–$50 | ≥ $50 |
| tool_loops | Agent stuck in repetitive tool calls | 0 | 1–2 | ≥ 3 |
| injection_signals | Prompt injection patterns detected | 0 | 1–4 | ≥ 5 |
| session_enumeration | Unusual session volume (brute-force) | < 50/hr | 50–200/hr | ≥ 200/hr |
| stuck_sessions | Sessions not progressing | 0 | 1–2 | ≥ 3 |
Returns an overall threat level (green, yellow, or red) with suggested investigation actions for each finding.
Honest limitations: Authentication failures are invisible to this tool — OpenClaw's auth middleware emits no diagnostic events for failed auth attempts. Monitor gateway access logs separately.
Security Dashboard
Use the security-overview template for a persistent visual dashboard with 15 panels covering all security signals.
What's Being Monitored
Grafana Lens automatically collects security-relevant metrics from OpenClaw's lifecycle:
- Prompt injection detection — 12 regex patterns scanning LLM inputs (configurable via
otlp.captureContent) - Tool error classification — Categorizes tool failures as network, filesystem, timeout, or other
- Session anomalies — Unique session sliding window (1-hour) detects enumeration attempts
- Gateway restarts — Tracks infrastructure availability
- Session resets — Monitors forced context wipes
SRE Investigation
Grafana Lens includes purpose-built investigation capabilities for diagnosing alerts, anomalies, and outages.
Multi-Signal Triage (grafana_investigate)
Use as the first step for any "what's wrong?" question. The tool:
- Auto-discovers Prometheus, Loki, and Tempo datasources (graceful degradation if Loki/Tempo unavailable)
- Gathers signals in parallel using
Promise.allSettled:- Metrics — Focus metric current value + trend, anomaly z-score, RED signals (rate, error rate, p95 latency)
- Logs — Volume, severity breakdown, top error patterns, sample errors
- Traces — Error traces, slow traces (>10s)
- Context — Recent annotations, active alerts
- Generates hypotheses — Each includes an
evidencesummary,confidencelevel, and atestWithfield with the exact tool name and parameters for follow-up
Anomaly Scoring (grafana_explain_metric)
For 24-hour period queries on plain metric names, grafana_explain_metric now includes:
- Anomaly z-score — Current value compared against the 7-day baseline (avg ± stddev). Severity levels:
normal(<1.5σ),mild(1.5–2σ),significant(2–3σ),critical(>3σ) - Seasonality comparison — Value vs 1 day ago and vs 7 days ago, with change percentages
Alert Fatigue Detection (grafana_check_alerts)
Use action: "analyze" to audit alert rule health:
- Always-firing — Rules firing >24 hours, suggesting thresholds need adjustment
- Flapping — Rules in error/nodata state, indicating query or datasource issues
- Overall health —
healthy,moderate_fatigue, orsevere_fatiguewith specific suggestions
Custom Metrics (Data Observatory)
Push any external data into Grafana via the grafana_push_metrics tool — no code required, just ask your agent:
"Track my daily steps in Grafana" "Push my git commit count for today" "Record my morning weight"
How It Works
Register a metric (or let auto-registration handle it):
- Metric names are auto-prefixed with
openclaw_ext_if not already - Types:
counter(ever-increasing) orgauge(current value)
- Metric names are auto-prefixed with
Push values with optional labels and timestamps:
- Data is pushed immediately via OTLP — no scraping delay
- Historical data supported via the
timestampparameter
Visualize — create a dashboard or query the metric directly
Naming Rules
- All custom metrics use the
openclaw_ext_prefix - Counter metrics get a
_totalsuffix in Prometheus (e.g.,openclaw_ext_steps→openclaw_ext_steps_total) - Label names follow Prometheus conventions (letters, digits, underscores only)
Cardinality Limits
| Limit | Default | Purpose | |-------|---------|---------| | Max metrics | 100 | Prevents unbounded metric growth | | Max labels per metric | 5 | Limits label key cardinality | | Max label values per metric | 50 | Limits unique label combinations |
Definitions persist across restarts in ${stateDir}/custom-metrics.json.
Architecture
Grafana Lens is a self-contained OpenClaw plugin — all Grafana interaction is handled by the bundled GrafanaClient. No external MCP servers, no additional infrastructure beyond the LGTM stack.
┌─────────────────────────────────────────────────────┐
│ OpenClaw Agent │
│ (processes messages, invokes tools) │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Grafana Lens Plugin │ │
│ │ • 18 Agent Tools │ │
│ │ • MetricsCollector Service │ │
│ │ • AlertWebhook Service │ │
│ │ • AlloyPipeline Service │ │
│ │ • LifecycleTelemetry (16 hooks) │ │
│ │ • Bundled GrafanaClient (REST API) │ │
│ │ • Bundled AlloyClient (REST API) │ │
│ └────────────┬──────────────┬─────────────────┘ │
│ │ │ │
└───────────────┼──────────────┼──────────────────────┘
│ │
┌───────────┴──────┐ ┌────┴──────────────────┐
│ OTLP HTTP Push │ │ Alloy HTTP API │
│ (:4318) │ │ (:12345) │
│ /v1/metrics │ │ /-/reload │
│ /v1/logs │ │ /api/v0/components │
│ /v1/traces │ └────┬──────────────────┘
└───────┬──────────┘ │
│ ┌──────────┴──────────────────┐
│ │ Grafana Alloy │
│ │ (data collection) │
│ │ Reads config.d/*.alloy │
│ │ Hot-reload via API │
│ │ Scrapes, tails, receives │
│ └──────────┬──────────────────┘
│ │ remote_write / loki.write / otel
┌───────────┴──────────────────┴──────────────────────┐
│ LGTM Stack │
│ ┌──────────────┐ ┌──────┐ ┌───────┐ │
│ │ Prometheus │ │ Loki │ │ Tempo │ │
│ │ (:9090) │ │ │ │ │ │
│ │ Metrics │ │ Logs │ │Traces │ │
│ └──────┬───────┘ └──┬───┘ └──┬────┘ │
│ └──────────────┼────────┬┘ │
│ ┌────┴────────┴───┐ │
│ │ Grafana │ │
│ │ (:3000) │ │
│ │ Dashboards │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────┘Key Design Decisions
| Decision | Choice | Why |
|----------|--------|-----|
| Everything is tools | 18 composable tools, no background automation | Agent decides when to act; Grafana handles scheduled work |
| Self-contained | Bundled GrafanaClient (REST API) | No external MCP servers or dependencies |
| OTLP push | Push-based metrics, logs, traces | No scraping delay — data available immediately |
| General-purpose | Works with ANY Grafana datasource | Not limited to openclaw_lens_* metrics |
| Alert provenance | X-Disable-Provenance header | Agent-created alert rules remain editable in Grafana UI |
| Panel rendering | Grafana Image Renderer plugin | Persistent dashboards + three-tier fallback (PNG → snapshot → deep link) |
| Secret redaction | Auto-strips API keys before OTLP export | Prevents token leakage into observability pipelines |
| Alloy hot-reload | Write config file, POST /-/reload | Atomic config updates; failed reloads keep previous config running |
Relationship with diagnostics-otel
OpenClaw ships a built-in diagnostics-otel extension that converts diagnostic events into basic counters and histograms (token usage, cost, webhook durations, message processing). Grafana Lens does not replace it — they run side by side and complement each other:
| Capability | diagnostics-otel | Grafana Lens |
|-----------|-----------------|-------------|
| Token/cost counters | Yes | Replicated (self-sufficient templates) |
| Session-scoped traces (gen_ai conventions) | No | Yes — hierarchical invoke_agent → chat → execute_tool |
| Structured logs → Loki | No | Yes — diagnostic events, LLM I/O, app logs |
| Log-to-trace correlation | No | Yes — trace_id in log records for Loki → Tempo click-through |
| Security monitoring | No | Yes — prompt injection, tool error classification, cost anomalies |
| Operational gauges | No | Yes — active sessions, queue depth, context pressure, stuck sessions |
| Secret redaction | No | Yes — auto-strips tokens before OTLP export |
| Content capture controls | No | Yes — configurable prompt/completion logging with truncation |
Why not just use diagnostics-otel? It only subscribes to diagnostic events and emits counters — it has no lifecycle hooks, no tracing, no log forwarding, and no security signals. Grafana Lens registers 16 lifecycle hooks (session_start, llm_input, after_tool_call, etc.) to build session-scoped traces and rich log records that diagnostics-otel was never designed to provide.
Why not extend diagnostics-otel? It uses NodeSDK with global OTel providers (setGlobalMeterProvider()). Grafana Lens uses local MeterProvider, LoggerProvider, and BasicTracerProvider instances to avoid conflicts — both extensions coexist safely.
Observability Deep Dive
Three Pillars — OTLP Push
All telemetry is pushed via OTLP HTTP to the collector (default localhost:4318). No Prometheus scraping, no /metrics endpoint.
| Signal | Destination | Content | |--------|-------------|---------| | Metrics | Prometheus (via OTel Collector) | Agent gauges, counters, histograms — token usage, cost, session state, security signals | | Logs | Loki | Diagnostic events, app logs, LLM inputs/outputs (with redaction), security events | | Traces | Tempo | Session-scoped hierarchical traces following gen_ai semantic conventions |
Trace Hierarchy
Each agent session produces a trace tree:
invoke_agent openclaw (root span)
├── chat claude-3-opus (LLM call)
├── execute_tool grafana_query (tool execution)
├── chat claude-3-opus (next LLM turn)
├── execute_tool grafana_create_dashboard
├── openclaw.compaction (context compression)
└── openclaw.agent.end (session close)Span names follow the OpenTelemetry gen_ai semantic conventions: {operation} {model/provider}.
Log-to-Trace Correlation
Span-producing events include a trace_id attribute in their log records. In Grafana, this enables click-through from Loki logs directly to the corresponding Tempo trace — useful for debugging specific LLM calls or tool failures.
Distributed Trace Queries
The grafana_query_traces tool queries Tempo directly using TraceQL:
- Search by attributes —
{ span.service.name = "openclaw" && duration > 5s }finds slow spans - Search by status —
{ status = error }finds failed operations - Get trace by ID — Retrieves the full span tree for a specific trace, with hierarchical parent-child relationships
- Log-to-trace click-through — Span-producing events include
trace_idin Loki log records, enabling one-click navigation from a log line to the full Tempo trace in Grafana
The tool handles both OTLP JSON and Tempo's protobuf-JSON response formats (base64-encoded IDs, string enum kinds/status codes) transparently.
Secret Redaction
When otlp.redactSecrets is enabled (default: true), all telemetry data is automatically scanned for sensitive tokens before OTLP export:
- GitHub tokens (
ghp_*,github_pat_*) - Slack tokens (
xoxb-*,xoxp-*) - API keys (
sk-*,sk-ant-*,AIza*) - Grafana tokens (
glsa_*) - Bearer tokens, PEM private keys, and more
Tokens are redacted to ${first6}...${last4} format (e.g., glsa_Ab...x9Zq).
Lifecycle Hooks
Grafana Lens registers 16 lifecycle hooks into OpenClaw for deep observability:
session_start, session_end, llm_input, llm_output, agent_end, message_received, message_sent, before_compaction, after_compaction, subagent_spawned, subagent_ended, before_tool_call, after_tool_call, before_reset, gateway_start, gateway_stop
These hooks power the session-scoped traces, gen_ai standard metrics, security signals, and SRE operational data.
Content Capture Controls
For privacy-sensitive deployments, configure what's included in telemetry:
{
"otlp": {
"captureContent": false, // disable prompt/completion logging
"redactSecrets": true, // auto-strip tokens (default: true)
"contentMaxLength": 500, // truncate content fields
"forwardAppLogs": false // disable app log forwarding to Loki
}
}Configuration Reference
All Config Options
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| grafana.url | string | — | Grafana instance URL (required) |
| grafana.apiKey | string | — | Service account token (required) |
| grafana.orgId | number | 1 | Organization ID |
| metrics.enabled | boolean | true | Enable OTLP telemetry collection |
| otlp.endpoint | string | http://localhost:4318/v1/metrics | OTLP collector endpoint |
| otlp.headers | object | {} | Custom HTTP headers for OTLP export |
| otlp.exportIntervalMs | number | 15000 | Metric export interval (ms) |
| otlp.logs | boolean | true | Push logs to Loki via OTLP |
| otlp.traces | boolean | true | Push traces to Tempo via OTLP |
| otlp.captureContent | boolean | true | Include prompts/completions in telemetry |
| otlp.contentMaxLength | number | 2000 | Max chars for content fields |
| otlp.forwardAppLogs | boolean | true | Forward OpenClaw app logs to Loki |
| otlp.appLogMinSeverity | string | "debug" | Min log severity (trace/debug/info/warn/error/fatal) |
| otlp.redactSecrets | boolean | true | Auto-redact API keys/tokens before OTLP export |
| proactive.enabled | boolean | false | Enable Grafana alert webhook handler |
| proactive.webhookPath | string | "/grafana-lens/alerts" | HTTP path for alert webhooks |
| proactive.costAlertThreshold | number | 5.0 | Daily cost alert threshold (USD) |
| customMetrics.enabled | boolean | true | Allow custom metric push |
| customMetrics.maxMetrics | number | 100 | Max custom metric definitions |
| customMetrics.maxLabelsPerMetric | number | 5 | Max label keys per metric |
| customMetrics.maxLabelValues | number | 50 | Max unique label combos per metric |
| customMetrics.defaultTtlDays | number | null | Optional auto-expiry (days) |
| alloy.enabled | boolean | false | Enable Alloy pipeline management |
| alloy.url | string | "http://localhost:12345" | Alloy HTTP API URL |
| alloy.configDir | string | -- | Directory for pipeline config files (required when enabled) |
| alloy.filePrefix | string | "lens-" | Prefix for managed config file names |
| alloy.maxPipelines | number | 20 | Maximum managed pipelines |
| alloy.lgtm.prometheusRemoteWriteUrl | string | "http://localhost:9009/api/prom/push" | Prometheus remote write endpoint (used in generated configs) |
| alloy.lgtm.lokiUrl | string | "http://localhost:3100/loki/api/v1/push" | Loki push endpoint |
| alloy.lgtm.otlpEndpoint | string | (derived from otlp.endpoint) | OTLP endpoint for trace/metrics export |
| alloy.lgtm.pyroscopeUrl | string | "http://localhost:4040" | Pyroscope endpoint (profiling recipes) |
Development
# Install dependencies
npm install
# Run unit tests
npm test
# TypeScript type-check (no emit)
npm run typecheck
# Link into OpenClaw for local development
openclaw plugins install -l ~/workspace/grafana-lens
# Restart gateway after changes
openclaw gateway restartProject Structure
grafana-lens/
├── index.ts # Plugin entry point
├── package.json # Package metadata
├── openclaw.plugin.json # Plugin manifest with config schema
├── src/
│ ├── config.ts # Config parsing and validation
│ ├── grafana-client.ts # Bundled Grafana REST API client
│ ├── metric-definitions.ts # Shared metric registry
│ ├── tools/ # 18 agent tools
│ │ ├── query.ts # grafana_query
│ │ ├── query-logs.ts # grafana_query_logs
│ │ ├── create-dashboard.ts # grafana_create_dashboard
│ │ ├── update-dashboard.ts # grafana_update_dashboard
│ │ ├── get-dashboard.ts # grafana_get_dashboard
│ │ ├── search.ts # grafana_search
│ │ ├── share-dashboard.ts # grafana_share_dashboard
│ │ ├── create-alert.ts # grafana_create_alert
│ │ ├── check-alerts.ts # grafana_check_alerts
│ │ ├── annotate.ts # grafana_annotate
│ │ ├── explore-datasources.ts # grafana_explore_datasources
│ │ ├── list-metrics.ts # grafana_list_metrics
│ │ ├── push-metrics.ts # grafana_push_metrics
│ │ ├── explain-metric.ts # grafana_explain_metric
│ │ ├── security-check.ts # grafana_security_check
│ │ ├── query-traces.ts # grafana_query_traces
│ │ ├── investigate.ts # grafana_investigate
│ │ └── alloy-pipeline.ts # alloy_pipeline
│ ├── alloy/ # Alloy pipeline management
│ │ ├── alloy-client.ts # Alloy HTTP API client
│ │ ├── pipeline-store.ts # Pipeline state persistence
│ │ ├── pipeline-helpers.ts # Config generation helpers
│ │ ├── types.ts # Shared Alloy types
│ │ └── recipes/ # 29 pipeline recipe definitions
│ │ └── catalog.ts # Recipe registry
│ ├── services/
│ │ ├── metrics-collector.ts # Diagnostic event → OTLP push
│ │ ├── alert-webhook.ts # Grafana alert webhook handler
│ │ ├── alloy-service.ts # Alloy pipeline lifecycle management
│ │ ├── otel-metrics.ts # MeterProvider (local, no global)
│ │ ├── otel-logs.ts # LoggerProvider
│ │ ├── otel-traces.ts # TracerProvider
│ │ ├── lifecycle-telemetry.ts # Session-scoped gen_ai traces + metrics
│ │ ├── custom-metrics-store.ts # Custom metric definitions + state
│ │ └── redact.ts # Secret redaction utility
│ └── templates/ # 12 dashboard JSON templates
│ ├── llm-command-center.json
│ ├── session-explorer.json
│ ├── cost-intelligence.json
│ ├── tool-performance.json
│ ├── sre-operations.json
│ ├── genai-observability.json
│ ├── security-overview.json
│ ├── node-exporter.json
│ ├── http-service.json
│ ├── metric-explorer.json
│ ├── multi-kpi.json
│ └── weekly-review.json
└── skills/
└── SKILL.md # Agent skill fileLicense
MIT
Disclaimer: Grafana Lens is a community-built OpenClaw plugin. It is not developed, maintained, or endorsed by Grafana Labs. Grafana, Prometheus, Loki, Tempo, and Mimir are trademarks of Grafana Labs.
