@arcote.tech/arc-otel
v0.7.20
Published
OpenTelemetry instrumentation primitives for the Arc framework — server + browser SDK init, span helpers, PII-safe attribute sanitization, W3C Trace Context propagation
Readme
@arcote.tech/arc-otel
OpenTelemetry instrumentation primitives for the Arc framework. Wraps the
core OTel SDKs (@opentelemetry/*) behind an Arc-friendly API with
PII-safe defaults, dev/prod sampling modes, and W3C Trace Context
propagation.
This package is optional. Arc apps run identically whether you import
it or not — every span call short-circuits when no SDK is attached. Opt
in by setting observability.enabled: true in deploy.arc.json (or by
exporting ARC_OTEL_ENABLED=true for ad-hoc local runs).
Architecture
┌─ browser ───────────────────────────────────────────┐
│ start-app.ts │
│ └─ if (window.__ARC_OTEL_CONFIG) │
│ await import("@arcote.tech/arc-otel/browser") │
│ initBrowserTelemetry({...}) │
└─────────────────────────────────────────────────────┘
│
│ OTLP/HTTP ──/otel/v1/traces──┐
▼ ▼
┌─ arc-prod container ─────────┐ ┌─ otel-collector ────┐
│ startPlatformServer() │ OTLP │ receivers.otlp │
│ └─ initServerTelemetry({…}) ├────────▶│ processors: │
│ traces + logs + metrics │ │ - tail_sampling │
│ createArcServer({telemetry})│ │ - attributes │
│ └─ HTTP span │ │ - batch │
│ └─ WS message span │ │ exporters: │
│ └─ command.<name> span │ │ - tempo (traces) │
│ └─ db.find/set/commit span │ │ - loki (logs) │
└──────────────────────────────┘ │ - prom (metrics) │
└────────┬────────────┘
│
┌──────────────────┼────────────────┐
▼ ▼ ▼
┌─ tempo ────┐ ┌─ loki ─────┐ ┌─ prometheus ┐
│ 7d traces │ │ 7d logs │ │ 30d metrics │
└────────────┘ └────────────┘ └─────────────┘
\ | /
\ | /
▼ ▼ ▼
┌─ grafana (HTTPS via Caddy basic-auth) ─┐
│ observability.<apex-of-app-domain> │
│ admin / ARC_OBSERVABILITY_PASSWORD │
└────────────────────────────────────────┘Quick start (production deploy)
Opt-in via deploy.arc.json:
{
"target": {...},
"envs": {
"prod": { "domain": "app.example.com", "db": { "type": "postgres" } }
},
"caddy": { "email": "[email protected]" },
"registry": { "domain": "registry.example.com", "passwordEnv": "ARC_REGISTRY_PASSWORD" },
"observability": {
"enabled": true
// optional:
// "subdomain": "observability",
// "adminPasswordEnv": "ARC_OBSERVABILITY_PASSWORD",
// "retention": { "traces": "168h", "logs": "168h", "metrics": "30d" }
}
}Then:
arc platform deploy prodFirst deploy auto-generates ARC_OBSERVABILITY_PASSWORD into
deploy.arc.env (gitignored) and provisions five sidecar containers:
otel-collector, tempo, loki, prometheus, grafana. Grafana is
reachable at https://observability.<apex>/ behind Caddy basic-auth
(admin / generated password).
Existing deploys that don't set observability are unchanged — no new
containers, no env-vars on the app, zero added latency.
What's instrumented
Server (always when ARC_OTEL_ENABLED=true)
| Span name | Source | Notable attrs |
|---|---|---|
| ${METHOD} ${PATH} | arc-host fetch handler | http.request.method, http.route, http.response.status_code |
| ws.${type} | arc-host websocket dispatch | messaging.message.type, arc.ws.client_id |
| command.${name} | ContextHandler.executeCommand | rpc.system=arc, arc.command.name, arc.command.params_size, payload in dev only |
| db.${op} ${store} | wrapDbAdapter (postgres/sqlite) | db.system, db.operation.name, db.collection.name, db.response.row_count |
Server emits metrics automatically:
| Metric | Type | Labels |
|---|---|---|
| arc.commands.total | Counter | arc.command.name |
| arc.command.duration_ms | Histogram | arc.command.name |
| arc.db.find_ms | Histogram | db.system, db.collection.name |
Browser (when window.__ARC_OTEL_CONFIG is present)
- W3C Trace Context propagator registered globally — outbound
fetchcalls automatically attachtraceparent/tracestateheaders so the server's HTTP span can pick up the client trace. - SDK chunk is
import()-ed on demand fromstart-app.ts, so initial bundle size stays untouched when observability is off. - Anonymous session id stored in
sessionStorage(arc:otel-session-id) groups spans per tab — never use as a user identifier.
Per-hook spans (
useQuery,useMutation) and WS-frame trace context injection are deliberately out of scope for v1. The server already wraps every command/query in a span that's parented to the HTTP route, which gives end-to-end visibility for traffic that originates fromfetchcalls. WS traceparent propagation is a follow-up — when added, callinjectTraceContexton outbound messages.
Sampling
The SDK is configured parentbased_always_on on the server and
traceIdRatioBased(0.1) on the browser by default. Final decisions are
made by the collector's tail sampler, evaluated 10s after the trace
finishes:
- All errors are kept (any span with status_code=ERROR).
- All slow traces are kept (duration > 500 ms).
- 10% of everything else is kept (random).
This guarantees you'll never miss a failure or a latency outlier, while typical happy-path traffic costs ~10% of total trace bandwidth.
Tune in observability-configs.ts:generateOtelCollectorConfig — edit
the tail_sampling block's policies (full spec in the OTel collector docs).
PII safety
Span attributes go through sanitizeAttrs() by default, which:
- Drops any key matching
(password|token|secret|authorization|jwt|api_key|cookie|email|credit_card|ssn)(case-insensitive, recursive). - Truncates strings longer than 2 KB.
- Truncates serialized objects longer than 4 KB.
- Catches circular references →
"[circular]".
shouldIncludePayloads() defaults to true in development, false in
production. Instrumentation sites that attach raw mutation params
gate on this flag — so a prod span shows arc.command.params_size,
while a dev span shows the (sanitized) payload itself.
DATABASE_URL and similar credentials passing through error messages
are run through redactConnectionString().
Forbidden (never attach as a span attribute):
- Raw JWT / rawToken
TokenPayload.paramsin full — onlytokenType+ sanitized id-like params- Full event payload in prod
- Raw DB error parameters
Local development
For a fast feedback loop without the full sidecar stack:
# Terminal 1 — Jaeger all-in-one (only traces, no logs/metrics).
docker run --rm -d --name jaeger \
-p 16686:16686 -p 4318:4318 \
cr.jaegertracing.io/jaegertracing/all-in-one:1.62
# Terminal 2 — your app
export ARC_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
arc platform dev
# Open http://localhost:16686 — service "arc-<env>"Browser spans need window.__ARC_OTEL_CONFIG injected by
generateShellHtml, which fires when the server has
ARC_OTEL_ENABLED=true — so a single env var enables both sides.
API surface
import {
ArcTelemetry,
sanitizeAttrs,
redactConnectionString,
injectTraceContext,
extractTraceContext,
contextFromHeaders,
wrapDbAdapter,
} from "@arcote.tech/arc-otel";
import { initServerTelemetry } from "@arcote.tech/arc-otel/server";
import { initBrowserTelemetry } from "@arcote.tech/arc-otel/browser";Key methods on ArcTelemetry:
// Wrap an async function in a span; auto-records exceptions + sets status.
await telemetry.startSpan(name, async (span) => {...}, { kind, attributes });
// Counter / histogram / log API mirrors OTel's.
telemetry.incrementCounter("arc.foo.total", 1, { label: "x" });
telemetry.recordHistogram("arc.foo.duration_ms", elapsed, {...});
telemetry.log("info", "message", { attr: "value" });
// Bridge inbound trace context (HTTP headers, WS frame fields) before
// starting the active span.
telemetry.runWithExtractedContext(req.headers, () =>
telemetry.startSpan("http.request", async () => {...}),
);Out of scope
- Alerting (Grafana alerts + notifier config).
- Long-term storage (S3/GCS) for traces and logs — needs cloud creds.
- Continuous profiling (Pyroscope/Parca).
- Synthetic uptime monitoring.
- Per-hook React span instrumentation (planned follow-up).
- WS frame traceparent injection on the client (helper exists, integration with EventWire/CommandWire pending).
