@marcos_feitoza/grafana
v1.1.1
Published
Readme
Grafana (Observability UI)
Grafana
Install
helm pull grafana/grafana --version 8.4.1 helm template grafana ./grafana -n monitoring > grafana.yamlSSL Certificates Using cert-manager for Granafa
- Generate the Certificate and Key:
# Generate a private key
openssl genrsa -out key.pem 2048
# Generate a self-signed certificate
openssl req -new -x509 -key key.pem -out cert.pem -days 365 -subj /CN=grafana.local- Create a Kubernetes secret using these files:
kubectl create secret tls grafana-tls --cert=cert.pem --key=key.pem -n monitoringk describe secrets grafana-tls-wh6ws -n monitoring
Name: grafana-tls-wh6ws
Namespace: monitoring
Labels: cert-manager.io/next-private-key=true
Annotations: <none>
Type: Opaque
Data
====
tls.key: 1704 bytes- Create Certificate Resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: grafana-cert
namespace: monitoring
spec:
secretName: grafana-tls
issuerRef:
name: letsencrypt-staging
kind: ClusterIssuer
commonName: grafana.local.com
dnsNames:
- grafana.local.com
- '*.grafana.local.com'- Configure Ingress for Grafana
Set up an Ingress resource for Grafana to use the SSL certificate. This configuration will direct traffic to Grafana and apply the SSL/TLS settings.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-staging
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
ingressClassName: nginx
rules:
- host: grafana.local.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 80
tls:
- hosts:
- grafana.local.com
secretName: grafana-tls- Verify the Setup
secret
k get secret -n monitoring
NAME TYPE DATA AGE
grafana Opaque 3 22h
grafana-cert-n4r8f Opaque 1 13h
grafana-tls-wh6ws Opaque 1 13hk describe secrets grafana-tls-wh6ws -n monitoring (pik8s/default)
Name: grafana-tls-wh6ws
Namespace: monitoring
Labels: cert-manager.io/next-private-key=true
Annotations: <none>
Type: Opaque
Data
====
tls.key: 1704 bytesCertifiticates
kubectl get certificates -n monitoring
NAME READY SECRET AGE
grafana-cert False grafana-tls 13h
grafana-tls False grafana-tls 13hk describe certificates.cert-manager.io grafana-cert -n monitoring
Name: grafana-cert
Namespace: monitoring
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: Certificate
Metadata:
Creation Timestamp: 2024-08-06T01:24:34Z
Generation: 1
Managed Fields:
API Version: cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:commonName:
f:dnsNames:
f:issuerRef:
.:
f:kind:
f:name:
f:secretName:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2024-08-06T01:24:33Z
API Version: cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:nextPrivateKeySecretName:
Manager: cert-manager-certificates-key-manager
Operation: Update
Subresource: status
Time: 2024-08-06T01:24:34Z
API Version: cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
.:
k:{"type":"Issuing"}:
.:
f:lastTransitionTime:
f:message:
f:observedGeneration:
f:reason:
f:status:
f:type:
Manager: cert-manager-certificates-trigger
Operation: Update
Subresource: status
Time: 2024-08-06T01:24:34Z
API Version: cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"Ready"}:
.:
f:lastTransitionTime:
f:message:
f:observedGeneration:
f:reason:
f:status:
f:type:
Manager: cert-manager-certificates-readiness
Operation: Update
Subresource: status
Time: 2024-08-06T01:30:33Z
Resource Version: 805312
UID: 7839ce65-26c8-4bf7-b139-06b27c84c6ff
Spec:
Common Name: grafana.local.com
Dns Names:
grafana.local.com
*.grafana.local.com
Issuer Ref:
Kind: ClusterIssuer
Name: letsencrypt-staging
Secret Name: grafana-tls
Status:
Conditions:
Last Transition Time: 2024-08-06T01:24:34Z
Message: Issuing certificate as Secret was previously issued by Issuer.cert-manager.io/
Observed Generation: 1
Reason: IncorrectIssuer
Status: True
Type: Issuing
Last Transition Time: 2024-08-06T01:24:34Z
Message: Issuing certificate as Secret does not exist
Observed Generation: 1
Reason: DoesNotExist
Status: False
Type: Ready
Next Private Key Secret Name: grafana-cert-n4r8f
Events: <none>Pods Overview: filtro pod_regex
No dashboard Pods Overview, o filtro de texto pod_regex permite filtrar pods por regex no nome.
Exemplos:
- Somente Grafana:
^grafana-.*
- Pods com
personalno nome:.*personal.*
- Pods com
cryptono nome:.*crypto.*
- Somente monitoring (grafana/loki/vector):
^(grafana|loki|vector)-.*
- Pod específico:
^grafana-5d4c8dffdd-nrvk6$
Para funcionar nos painéis e variáveis, use o filtro como regex:
=~"$pod_regex"- ou
/$pod_regex/
Verify Ingress: Check that the Ingress resource is correctly routing traffic and applying SSL/TLS.
kubectl logs -n cert-manager deployment/cert-manager
I0806 15:15:50.701719 1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="grafana.local.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-wft84" "related_resource_namespace"="monitoring" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="grafana-tls-2fghh-3829399934-1227087645" "resource_namespace"="monitoring" "resource_version"="v1" "type"="HTTP-01"
I0806 15:15:50.701905 1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="grafana.local.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-npc6l" "related_resource_namespace"="monitoring" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="grafana-tls-2fghh-3829399934-1227087645" "resource_namespace"="monitoring" "resource_version"="v1" "type"="HTTP-01"
I0806 15:15:50.702059 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="grafana.local.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-ww52x" "related_resource_namespace"="monitoring" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="grafana-tls-2fghh-3829399934-1227087645" "resource_namespace"="monitoring" "resource_version"="v1" "type"="HTTP-01"
E0806 15:15:50.710047 1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://grafana.local.com/.well-known/acme-challenge/nikrW0u9cZ9I9redAqzjlcJqKpfEfMn8B6KpnNEkcoo': Get \"http://grafana.local.com/.well-known/acme-challenge/nikrW0u9cZ9I9redAqzjlcJqKpfEfMn8B6KpnNEkcoo\": dial tcp: lookup grafana.local.com on 10.96.0.10:53: no such host" "dnsName"="grafana.local.com" "resource_kind"="Challenge" "resource_name"="grafana-tls-2fghh-3829399934-1227087645" "resource_namespace"="monitoring" "resource_version"="v1" "type"="HTTP-01"Troubleshooting
Certificate Status Issues: If the certificate is not issued, check the logs of cert-manager:
kubectl logs -n cert-manager deployment/cert-manager- SSL Handshake Errors: Ensure the backend-protocol annotation matches Grafana’s service configuration and that Grafana is not set to serve HTTPS unless configured to do so.
How to reset grafana's admin password (installed by helm)
namespace=monitoring
kubectl exec --namespace $namespace -it $(kubectl get pods --namespace $namespace -l "app.kubernetes.io/name=grafana" -o jsonpath="{.items[0].metadata.name}") -- grafana cli admin reset-admin-password yourNewPasswordHere
INFO[01-21|10:24:17] Connecting to DB logger=sqlstore dbtype=sqlite3
INFO[01-21|10:24:17] Starting DB migration logger=migrator
Admin password changed successfully ✔Este repositório entrega o Grafana da stack de observabilidade do homelab.
Papel na arquitetura
- Fonte de métricas:
Prometheus - Fonte de logs:
Loki - Coleta de logs no cluster:
Vector -> Loki - Dashboards versionados em Git (provisioning)
Fluxo de integração (métricas + logs)
- Apps e componentes do cluster expõem
/metrics. - Prometheus faz scrape e armazena séries temporais.
- Vector coleta logs dos pods (DaemonSet), normaliza labels (
namespace,pod,container,node,app) e envia para Loki. - Grafana consulta Prometheus (métricas) e Loki (logs) com dashboards provisionados.
Diagrama da stack
Diagrama visual (Mermaid)
flowchart LR
U[User] --> G[Grafana]
subgraph K8s[Cluster Kubernetes]
A[Pods de Aplicacao]
M[Componentes de Infra\nArgoCD, kube-state-metrics, node-exporter]
V[Vector DaemonSet]
L[Loki]
P[Prometheus]
end
A -->|/metrics| P
M -->|/metrics| P
A -->|stdout/stderr logs| V
M -->|stdout/stderr logs| V
V -->|labels + logs| L
G -->|PromQL| P
G -->|LogQL| LCódigo python-diagrams (para gerar imagem)
Requer
python3+ pacotediagramse Graphviz instalado no host.
# diagrams/observability_stack.py
from diagrams import Cluster, Diagram
from diagrams.onprem.monitoring import Grafana, Prometheus
from diagrams.onprem.logging import Loki
from diagrams.k8s.compute import Pod
with Diagram("observability-stack", show=False, filename="grafana/docs/observability-stack"):
user = Pod("user")
with Cluster("kubernetes"):
apps = Pod("apps")
infra = Pod("infra metrics")
vector = Pod("vector ds")
loki = Loki("loki")
prom = Prometheus("prometheus")
grafana = Grafana("grafana")
apps >> prom
infra >> prom
apps >> vector >> loki
infra >> vector
user >> grafana
grafana >> prom
grafana >> lokiComando:
python3 grafana/diagrams/observability_stack.pyEstrutura relevante
helm/prod-values.yaml: configuração principal para produção.helm/dashboards/*.json: dashboards custom versionados.- Datasources provisionados:
Prometheus:http://prometheus-server.monitoring.svc.cluster.localLoki:http://loki.monitoring.svc.cluster.local:3100
Dashboards principais
Nodes OverviewPods OverviewLogs OverviewPersonal Finance Logs OverviewArgoCD
Filtro pod_regex no Pods Overview
Use regex para filtrar pods por nome:
- Somente grafana:
^grafana-.* - Tudo que contém
personal:.*personal.* - Tudo que contém
crypto:.*crypto.* - Somente monitoring core:
^(grafana|loki|vector)-.* - Pod exato:
^grafana-5d4c8dffdd-nrvk6$
As queries devem usar:
pod=~"$pod_regex"
Operação (runbook rápido)
- Ver pods Grafana:
kubectl -n monitoring get pods -l app.kubernetes.io/name=grafana -o wide- Ver logs:
kubectl -n monitoring logs -l app.kubernetes.io/name=grafana --tail=200- Ver provisioning carregado no startup:
kubectl -n monitoring logs <grafana-pod> --tail=300 | grep -E "provisioning|dashboard|datasource"Troubleshooting comum
- Dashboard sem dados em painel de sync/atividade:
- Validar janela de tempo (eventos podem ser raros em
30m). - Testar mesma query no Explore.
- Validar janela de tempo (eventos podem ser raros em
- Pod novo do Grafana em
0/1durante rollout:- Conferir probes e tempo de startup.
- Verificar saúde do nó/CNI (
NodeNotReady, flannel, etc.).
- Logs no Loki com
app=unknown:- Ajustar labels no
vector/helm/templates/configmap.yaml.
- Ajustar labels no
Convenções
- Mudanças em dashboard/datasource sempre via Git + PR.
- Nada manual persistente no UI sem versionar JSON correspondente.
