tomograph

v0.4.0

Published

12 days ago

Tomograph — the Observability Compiler. Compiles, scans, and scores ObservabilityPack spec v1.2 manifests. Express server + thin client.

0High
0Medium
0Low

carlosfmontero

tomograph observability-compiler observability otel opentelemetry mcp sre slo prometheus grafana alertmanager spec v1.2

Tomograph

Tomograph is the observability compiler and diagnostic workspace for ObservabilityPack spec v1.2.

It answers one operational question:

Is this service's observability diagnostic-grade?

Tomograph checks that in two parts:

Coverage - are we observing the right signals for the service's observability goals and OLA?
Trust - do the declared signals, rules, dashboards, alerts, and response paths match what is active in production?

The workflow is intentionally simple:

Discover -> Diagnose -> Remediate

Use a repo scan, a live MCP scan, or an uploaded pack to create an ObservabilityPack. Compare the declared repo posture with the live production posture. Then compile and deploy the delta through the platform tools.

In Tomograph, the OLA is represented as an observability contract inside the pack: criticality, SLOs, SLIs, telemetry bindings, rules, dashboards, alerts, runbooks, and validation expectations. A repo-derived pack captures what the service declares. A live MCP-derived pack captures what production verifies. The gap between those two packs is the diagnostic finding.

The canonical specification lives at MoebiusX/otel-observability-pack. A checksummed copy is vendored under vendor/observability-pack-spec/v1.2/.

Why It Exists

Most observability failures are not caused by a missing chart. They come from drift:

the repo declares an SLO, but the recording rule is not in production
Grafana has dashboards that no pack owns
alerts still exist, but their thresholds no longer match the SLO
live telemetry exists, but no OLA or runbook says why it matters
the team cannot explain whether the service is truly diagnosable

Tomograph treats observability as a compiled contract. The pack is the source of truth. Native artifacts are generated from it. Live systems are scanned back into pack shape. The diff between declared and live is the operational truth.

Main Journey

1. Discover - What Do We Have?

Create or load a pack:

scan a service repository
generate a live pack from an OpenTelemetry MCP server
upload a canonical YAML or JSON ObservabilityPack

The Discover view renders the observability tomogram across the layered model:

L1 Contract: SLIs and SLOs
L2 Telemetry: OTel, backends, collectors, pipelines
L3 Insight: recording rules, dashboards, derived views
L4 Action: alerts, routes, remediations
L5 Validation: baselines, synthetics, chaos, release checks
GOV: ownership and governance metadata

Tomograph Discover view showing the layered observability inventory

2. Diagnose - Can We Trust It?

Load the declared repo pack as Pack A and the live production pack as Pack B. Tomograph computes the Diagnostic Grade:

Score: total criteria passed out of 7
Coverage: four checks for "are we observing the right things?"
Trust: three checks for "can we trust what the signals show?"
Operability: one informational check (Actionable — runbooks linked), displayed but never scored
Verified: whether a live MCP signal is present

The score maps onto a metrology-style instrument grade — the rating users actually read; the full ladder renders on the grade card with the current rung highlighted:

| Grade | Class | Score band | |---|---|---| | A++ | Calibration / Reference Grade | — (needs external reference benchmarking) | | A+ | Laboratory / Research Grade | ≥ 95% | | A | Diagnostic / Clinical Grade | > 85% (the audit bar) | | B+ | Inspection Grade | ≥ 75% | | B | Industrial Grade | ≥ 62.5% | | C | Field Grade | ≥ 37.5% | | D | Consumer Grade | < 37.5% |

The machine contract is unchanged: the audit passes when the score is greater than 85% — i.e. exactly when the grade is A or better; the letter and PASS/FAIL can never disagree. Failed criteria remain visible as evidence. A pack can therefore pass the grade while still showing drift that belongs in Remediate.

The checks are (grade schema 2):

| Area | Criteria | Scored | |---|---|---| | Coverage | Multi-modal, Correlated, Calibrated, Comprehensive | yes | | Trust | Chaos-validated, Drift-free, Fresh | yes | | Operability | Actionable | no — informational |

Runbooks measure response readiness of the overall solution, not diagnostic capability — a perfectly diagnostic system tells you what is wrong even when nobody wrote the response script. The runbook gap stays visible on the grade card and in the posture matrix; it just no longer costs diagnostic credit.

The drift drill shows:

aligned artifacts
matched artifacts whose behavior drifted
declared artifacts not confirmed live
live-only shadow signals
out-of-scope live inventory that belongs to the wider platform

Traceability shows requirement chains from SLO to SLI, metrics, recording rules, exporters, scrape evidence, dashboards, alerts, and runbooks.

Tomograph Diagnose view showing Diagnostic Grade and live drift buckets

3. Remediate - Fix The Gaps

Tomograph compiles the pack delta into native backend artifacts:

Prometheus recording and alerting rules
Grafana-managed rules
Grafana dashboards
OTel Collector pipelines
Alertmanager routes

Deployable artifacts can be pushed through an MCP write target. Non-deployable or inferred artifacts remain visible as manual follow-up, not silent production changes.

Tomograph Remediate view showing the Pack A minus Pack B deploy delta

Quickstart

git clone https://github.com/MoebiusX/tomograph.git
cd tomograph
npm install
npm run dev

Open http://127.0.0.1:8000.

Security Posture

One token, three postures:

Local (default). The server binds to 127.0.0.1 and runs with no authentication — a zero-friction local workspace.
Exposed with a token. Set TOMOGRAPH_API_TOKEN=<secret> and bind wherever you need (HOST=0.0.0.0). Mutating /api/* routes (crawl, draft, validate-register, deploy, verify, reset) then require Authorization: Bearer <secret>; read routes stay open. Set TOMOGRAPH_API_TOKEN_LABEL=<team-or-owner> to stamp the deploy audit log with the token's ownership — the secret itself never lands in any log.
Exposed without a token. The server refuses to start with a clear message. TOMOGRAPH_INSECURE_NO_AUTH=1 overrides knowingly (it logs a loud warning) for trusted-network demos only.

MCP write tokens are unrelated to the API token: they pass through per request and are never stored server-side. Registered packs and the deploy audit live in the .tomograph/ workspace (TOMOGRAPH_WORKSPACE relocates it).

Useful local checks:

npm run lint:server
npm run lint:studio
npm run lint:crawler
npm run lint:fetcher
npm run test

Run In Docker Or Kubernetes

The whole app is one Express process, so the container story is one image:

docker build -t tomograph:0.4.0 .
docker run --rm -p 8000:8000 tomograph:0.4.0

Kubernetes manifests (Deployment + Service + Ingress, applied with Kustomize) live in deploy/k8s/:

kubectl apply -k deploy/k8s

Common Operations

Scan A Repo

npm run crawl -- path/to/service-repo --name krystalinex-core --env prod > repo.pack.yaml
npm run validate-pack -- repo.pack.yaml

The crawler reads source files such as:

Prometheus rule files
Grafana dashboard JSON
Alertmanager config
OTel Collector config
Helm and Kubernetes manifests
Docker Compose files

It emits a canonical v1.2 pack plus crawler annotations describing what was scanned and what was inferred.

Fetch Live From MCP

MCP_URL=https://otel-mcp.example.com/mcp \
MCP_AUTH=$MCP_CLIENT_KEY \
npm run fetch-live

The default output is the ignored local file examples/production-live.pack.yaml.

See docs/MCP_INTEGRATION.md for the live fetch and write-back contract.

Validate Or Upload A Pack

npm run validate-pack -- path/to/pack.yaml

The studio also accepts drag-and-drop or file picker upload. Uploaded, crawled, and MCP-drafted packs are registered in memory and become addressable through the same /api/packs/:id/* endpoints as catalog packs.

Compile Artifacts

# Enumerate the compile tree
curl http://127.0.0.1:8000/api/packs/<pack-id>/compile-catalog

# Compile one artifact
curl "http://127.0.0.1:8000/api/packs/<pack-id>/compile-artifact?group=rules&flavor=grafana-managed&artifact=slo:slo_settlement_latency_99"

The UI exposes the same path through Remediate -> Compile & Deploy.

Saved Journeys — Repeatable Drift Checks

Freeze a comparison as a journey file and run it on demand or on a schedule:

# .tomograph/journeys/repo-vs-live.journey.yaml
name: repo-vs-live
packA:
  crawl: { path: ../my-service, name: my-service, env: prod }
packB:
  mcp: { url: https://otel-mcp.example.com/mcp, authEnv: MY_MCP_TOKEN }
gate:
  minAlignmentPct: 85
  requireGradePass: true
  maxLiveAgeHours: 24

node tools/cli.mjs journey run repo-vs-live          # markdown report
node tools/cli.mjs journey run repo-vs-live --json   # automation output
node tools/cli.mjs journey list                      # journeys + last outcome

Exit codes follow the gate contract: 0 verdict passes, 1 gate failed, 2 tooling/config error — so the same command is a cron job, a Windows scheduled task, or a CI gate. Every run appends a JSON record under .tomograph/runs/<journey>/ (the drift-over-time series). Secrets never live in journey files — MCP auth is referenced by env-var name.

API Surface

| Method | Path | Purpose | |---|---|---| | GET | /healthz | Health and vendored spec version | | GET | /api/packs | In-memory and catalog pack registry | | GET | /api/examples | Bundled example packs | | GET | /api/references | Curated catalogue reference packs | | GET | /api/packs/:id | Adapted layered pack | | GET | /api/packs/:id/canonical | Canonical pack with env overlay | | GET | /api/packs/:id/conformance | Maturity-rubric scoring | | GET | /api/diff?a=&b= | Repo/live or pack/pack structural diff | | GET | /api/packs/:id/compile-catalog | Per-artifact compile tree | | GET | /api/packs/:id/compile-artifact | Compile one artifact or group | | POST | /api/validate | Validate and register uploaded YAML/JSON | | POST | /api/crawl | Draft a pack from uploaded repo files | | POST | /api/crawl-github | Draft a pack from a GitHub URL | | POST | /api/draft-from-mcp | Draft a live pack from an MCP endpoint | | POST | /api/packs/:id/deploy-bulk | Deploy selected compiled artifacts | | POST | /api/packs/:id/deploy/:target | Deploy one compiled target | | DELETE | /api/uploads | Clear uploaded/crawled/drafted packs |

Repository Map

server/
  index.mjs                Express API, upload registry, compile/deploy routes
  test-smoke.mjs           End-to-end route smoke tests

studio/
  app.mjs                  Browser app shell and three-step workflow
  compare-view.mjs         Diagnostic Grade, drift, traceability entry points
  compile-view.mjs         Remediate, compile catalog, deploy surfaces
  layers-view.mjs          Discover tomogram and artifact cards

tools/
  crawl-repo.mjs           CLI repo crawler
  fetch-live-pack.mjs      MCP live-pack fetcher
  validate-pack.mjs        Canonical pack validator
  lib/
    adapter.mjs            Canonical pack -> layered UI model
    compile.mjs            packc compiler
    conformance.mjs        Maturity rubric
    diff.mjs               Structural pack diff
    traceability.mjs       Requirement chains

examples/
  production-curated.pack.yaml
  target-advanced.pack.yaml
  demo-skeleton.pack.yaml

vendor/observability-pack-spec/v1.2/examples/
  payment-service.pack.yaml

reference-packs/
  kafka.pack.yaml
  prometheus.pack.yaml
  grafana.pack.yaml

deploy/k8s/
  kustomization.yaml       Kustomize entry point (see deploy/k8s/README.md)

Key Docs

docs/USER_JOURNEY.md - product journey and design invariants
docs/DRY_RUN.md - dry-run script and readiness checklist
docs/RELEASE_READINESS.md - V1 release gate
docs/MCP_INTEGRATION.md - live fetch, verification, deploy writes
docs/MODEL.md - the layered observability model (L1–L5, L2X, GOV)
docs/DIFF.md - structural alignment and drift model
docs/CONFORMANCE.md - maturity rubric scoring
docs/DIAGNOSTIC_GRADE_FRAMEWORK.md - the eight coverage/trust criteria behind the Diagnose grade
docs/PHASE_1_VERDICT_TRUST_RESEARCH.md - draft research/spec for the verdict-trust phase
docs/TRACEABILITY_GRAPH_COMPARISON_SPEC.md - requirement-chain comparison semantics
docs/USER_STORY_CRAWLER_PROVENANCE.md - provenance requirements for deployable artifacts
docs/USER_STORY_REQUIRED_DEPLOYMENT_ENVIRONMENT.md - backlog story for required crawl environment selection
docs/ADVANCED_FEATURE_AUDIT.md - per-view audit of the Advanced tools (References · Conformance · Schema · OTLP · Traceability · Atlas)
docs/VALUE_BACKLOG.md - prioritized product backlog for the next iterations
docs/REFACTORING_PLAN.md - maintainability refactor backlog from the 2026-06 audit
docs/BRANCHING.md - the branching model: lanes, per-commit bar, multi-writer rules, promotion cadence

Superseded planning docs live in docs/archive/.

License

MIT - see LICENSE.