@nullplatform/tracing
v0.2.0
Published
Producer-side SDK for the nullplatform tracing API (M2 graph-fact + facets wire contract)
Readme
@nullplatform/tracing
Producer-side SDK for the nullplatform tracing API. Wraps the M2 wire contract (EVENTS.md) in a typed TypeScript surface so producers don't have to hand-build envelopes.
Runnable examples (one per scenario, full coverage): examples/ — start with 01-quickstart.cjs.
Using an AI coding assistant? llms.txt is a dense, agent-optimized usage guide (rules, full API surface, and a removed-API anti-patterns table) — ships with the package.
Quick start
import { Tracer, jobRef } from '@nullplatform/tracing';
const tracer = new Tracer({
baseURL: process.env.TRACING_URL!,
apiKey: process.env.NULLPLATFORM_API_KEY!,
producer: '[email protected]',
});
// Callback mode: `started` on entry, `completed` on return, `failed` on throw.
// A spec carries IDENTITY only; labels/facets are chainable setters.
await tracer.run({ trace_id: 'D-9007', run_id: 'D-9007' }, async (run) => {
run.labels({ release: 'R-127', env: 'prod' });
run.instanceOf(jobRef('nullplatform', 'deploy', '42'));
await run.step({ key: 'provision' }, () => provision());
});
// Drain before shutdown.
const stats = await tracer.shutdown(); // { accepted, duplicate, rejected }Label values may be string, number, or boolean — the SDK stringifies
them on the wire, so you never write String(id). A null/undefined
label value is dropped rather than recorded as "null".
Trace & run ids
run_id defaults to a generated UUIDv7. trace_id defaults to run_id —
a lone root anchors its own trace (every node has a trace). Set trace_id
explicitly to thread related work into one trace: at an ingress, from an
inbound carrier (extractTraceContext), or to a business key so an entity's
operations group together — e.g. trace_id: key('application', id),
run_id: key('application', id, 'create').
Tenancy scope (nrn)
By default the SDK sends no nrn and the API derives it from the caller's
token (the common case). Set nrn on the run spec to scope the run — and all
its steps, edges, and terminals — to a precise resource; a per-call nrn
overrides for a single emit:
const run = tracer.run({ trace_id, run_id, nrn: 'organization=1:application=42' });
run.step('build'); // inherits the run's nrn
run.complete(); // inherits — or .complete({ nrn }) to override one emitRun & step lifecycle: callback vs handle
A run (and each step) has two equivalent forms. Both emit started on open
and exactly one terminal (completed/failed/skipped/cancelled/
timed_out).
Callback — pass a function. The SDK emits completed on return,
failed on throw (then rethrows), so there's no bookkeeping:
await tracer.run({ trace_id, run_id }, async (run) => {
await run.step({ key: 'charge' }, () => charge());
});Handle — get the object and close it yourself. For flows that span
calls / ticks / processes, or when you need run.run_id before the work
finishes:
const run = tracer.run({ trace_id, run_id });
try {
const charge = run.step({ key: 'charge' });
await charge();
charge.complete();
run.complete();
} catch (error) {
run.fail(error); // cascades: any still-open child step is failed too
throw error;
}Which to use — the callback's only cost is one level of indentation, and that cost is proportional to how much code the closure wraps:
- Callback for a short traced region or newly written block — the indentation is negligible and you get auto-terminalize for free.
- Handle when wrapping a long existing method body (a closure would re-indent and balloon the diff), or when the run/step outlives a single synchronous scope.
Fail cascade. Calling fail() on a run or step finalizes any of its
still-open handle-mode child steps with the same error — so a catch
collapses to one run.fail(error) regardless of how many steps are open,
with no per-step ?.fail() plumbing. A child you already closed wins (the
single-terminal guard makes the cascade a no-op for it). complete() does
not cascade: auto-completing an open child would back-date its duration
and assert a success the SDK can't vouch for — close steps explicitly on the
happy path. Callback-mode children self-terminalize, so they're already
closed before control returns to the parent.
Attaching context: labels & core facets
Runs and steps describe what happened with labels (small string
key/values, queryable) and facets (structured bodies). The wire contract
reserves 11 core facets under tracing.*; the SDK exposes one typed,
chainable setter for each, so you never hand-write the namespace string:
const run = tracer.run({ trace_id, run_id })
.labels({ entity: 'application', action: 'create', 'application.id': id })
.input('create-params', { name, namespaceId }) // tracing.io 'in' descriptor
.output('application', { id, slug, status }) // tracing.io 'out' descriptor
.actor({ kind: 'user', id: userId, source: 'api' });
// `decision`/`retry`/`signal` are STEP-only — the type system won't let you
// call them on a run, so a mis-placed core facet can't be written.
await run.step('apply-settings', async (step) => {
step.decision({ chosen: status, available: ['pending', 'pending_hook'],
expression: 'a gating hook holds it at pending_hook' });
});The setters take their §8 body verbatim (decision is
{ chosen, available?, expression? }; actor is { kind, id, source? }) — no
renamed fields. (io is the exception: there is no raw io setter — its
descriptors are set only through the typed builders below.) Each setter
returns the handle (chain freely), is a no-op on an empty/undefined body, and
accumulates onto the terminal event. A setter called synchronously
(before the first await) also lands on the started event, so an in-flight
run/step is visible with its start-time context (who started it, its inputs,
its labels) rather than a bare identity; a setter called after an await
lands on the terminal only. (started is emitted one microtask after
construction to make this work — see Run & step lifecycle.) They're placed
by node kind: a run and step share io/actor/timing/externalLinks/
engineStatus/dropped/error; decision/retry/signal are step-only;
plan is on a run (override) or a job. Use your own namespace via
.facet('myapp.thing', { … }) for non-core data — the tracing.* prefix is
reserved.
const run = tracer.run({ trace_id, run_id })
.actor(userToken) // a JWT string → actor + default nrn; start-time
.input('create-params', input) // start-time → on `started` + terminal
.labels({ entity: 'application', 'application.status': 'pending' });
const application = await persist(input); // ← the work
run.output('application', application) // result → terminal only
.labels({ 'application.status': 'active' }) // result → terminal only
.complete();Timing is automatic. Every run/step carries a tracing.timing facet the SDK
fills from the operation it brackets: started_at is stamped when you open the
node (tracer.run()/run.step()), ended_at when you close it. So every traced
node gets start/end/duration with no effort — you only call .timing({ … })
yourself to override (per field), e.g. backfilling real historical times for
an operation that already finished. Open the node at the top of the operation so
started_at is accurate.
Definition nodes (job/dataset) have no lifecycle — they're a single emit
that fires lazily, on first use. tracer.job(...)/tracer.dataset(...) return
a chainable handle: its spec carries identity only, and you enrich it with the
same typed setters (.labels(), .facet(), .schema(), and a job's typed
.plan() — no 'tracing.plan' string). The node emits exactly once, the first
time you reference it from an edge, await it (resolves to its ref), or
call .emit() — so a handle referenced by several edges emits one node, and a
handle you never use emits nothing. Once emitted it's frozen (a later setter
throws rather than silently lose data). The common path needs no ceremony:
const provisionJob = tracer
.job({ namespace: 'nullplatform', name: 'application-provision', version: '7.2' })
.labels({ team: 'platform' })
.plan({ steps: [{ key: 'build' }, { key: 'deploy', after: 'build' }] });
const image = tracer.dataset('image:sha256:abc'); // nothing on the wire yet
run.instanceOf(provisionJob); // ← emits the job node here, then the edge
build.produces(image); // ← emits the dataset node here, then the edge
deploy.consumes(image); // ← reuses the node; just the consumes edge
await tracer.dataset('snapshot'); // await emits + resolves to its ref
await tracer.job({ namespace, name, version }).emit(); // or emit eagerly, ignoring/await the refErgonomic shorthands (sugar over the same wire shapes):
tracing.iois set only through six typed builders — one per kind × direction, no raw descriptor array, nokind/directionmagic strings. Call them as handle methods to accumulate node io —.input/.output(name, value)(inline),.inputRef/.outputRef(name, source, externalId)(ref),.inputPointer/.outputPointer(name, uri, { size_bytes?, content_type? })(pointer — large data referenced by URI) — or import the same six as standalone functions and hand one toproduces/consumesto declare the edge's io once (records the node descriptor AND derives the edge binding — see below)..decision({ chosen })accepts a bare string forchosen(normalized to an array)..plan({ steps: [...] })takes the wire shape directly — each step is a plain typed object{ key, after?, sla?, optional? }(no element builder; a plan step is inert data, so astep()factory would be a redundant second way). A step'safteraccepts a bare string or an array (after: 'build'≡after: ['build']), andsla.after_lifecycleis typed to theStatusenum so a misspelled state won't compile..plan()validates client-side before send: a duplicate stepkeyor anafterreferencing an undeclared step throws (not a silent ingest drop).- Every edge method accepts a handle, a ref, or a definition handle
(
job/dataset, emitted on first use), andproduces/consumesalso a bare dataset id:run.triggeredBy(otherRun),run.instanceOf(deployJob),run.produces('image:sha256:…'),step.compensates(otherStep). To describe the payload flowing over aproduces/consumesedge, pass an io builder as the second arg —step.produces(image, outputPointer('image', ref, { content_type })),step.consumes(raw, inputPointer('raw-blob', uri, { size_bytes }))— which records the step's io descriptor AND derives the edge'stracing.iobinding ({ name, content_type?, size_bytes? }) in one call. Direction is enforced by the type:producestakes anoutput*,consumesaninput*. run.step('charge', cb?)andtracer.dataset('id')accept a bare single id string (one unambiguous arg). A job takes a named identity spec —tracer.job({ namespace, name, version }), never three positional strings, so the call reads unambiguously.key('application', id)builds a stable id string — use it anywhere one is needed (trace_id/run_idin a spec, or arunRef/stepRefargument), so a key is never hand-interpolated.null/undefined/empty parts are dropped, so a missing part can't fork the trace..actor()accepts a bearer-JWT string or anActorFacetInput({ kind, id, source? }). A JWT is decoded internally (no public decoder): the actoridcomes from nullplatform'scognito:groups@nullplatform/user(falling back to standardsub), and@nullplatform/organization→ the run's defaultnrn = organization=N(inherited by every step/edge, overridable by an explicit specnrn); a non-JWT string (an api key) omits the actor. The object form is the §8 actor body and sets the actor only — noorganizationfield (it's nullplatform-specific; for raw ids that need an org scope, set the specnrn).
Label values may be string | number | boolean and are coerced for you.
Authentication
Provide exactly one of apiKey or getToken.
Recommended — apiKey. Pass your nullplatform API key and the SDK
handles the token lifecycle for you: it exchanges the key for a short-lived
JWT at POST {authBaseURL}/token, caches it in memory, and refreshes it
before expiry (and once reactively on a 401). Concurrent emits that all
need a token collapse into a single exchange.
const tracer = new Tracer({
baseURL: process.env.TRACING_URL!,
apiKey: process.env.NULLPLATFORM_API_KEY!,
// authBaseURL defaults to https://api.nullplatform.com
producer: '[email protected]',
});The SDK reads no environment variables — pass everything explicitly
(the example above reads process.env in your own code and hands the value
to the config; the SDK never touches the environment itself).
Escape hatch — getToken. If you already manage tokens externally
(e.g. a pre-fetched JWT injected into a Lambda), supply a callback instead.
It is called per request, so cache the token yourself and only refresh when
it's about to expire — the SDK does not cache the result, and a 401 is
treated as deterministic (no automatic re-exchange).
const tracer = new Tracer({
baseURL: process.env.TRACING_URL!,
getToken: () => process.env.TRACING_TOKEN!,
producer: '[email protected]',
});A token exchange that fails transiently (network error or 5xx from the
auth endpoint) is retried alongside the event POST; a 4xx (bad/expired
apiKey) is deterministic and surfaces as a TransportError — via the
rejected promise in { sync: true } mode, or the 'drop' event in the
buffered path.
Delivery model
Default — async-buffered. Each emit enqueues and returns immediately
with the assigned id. A background worker flushes the queue every
flushIntervalMs (default 1000ms) or whenever it reaches flushBatchSize
events (default 100).
Opt-in — synchronous. Pass { sync: true } in the per-call options
to bypass the queue and await the POST directly:
run.complete({ sync: true });Errors throw to the caller (not surfaced via the 'drop' event).
Idempotency
Every emit assigns a UUIDv7 id if the producer didn't provide one. The
API deduplicates on id (ON CONFLICT (event_id) DO NOTHING), so
re-emitting the same event after a crash is safe.
Producers MAY supply a deterministic id from business data for
idempotent retry-after-crash:
run.step({ key: 'charge' }, () => charge(), { id: makeId('charge', attempt) });Error handling
tracer.on('drop', (envelope, reason) => {
// log, alert, persist to disk, etc.
});Drop events fire for:
- Events that exhaust retries (5xx / network failures).
- Events deterministically rejected by the server (4xx).
- Events dropped due to queue overflow (
maxQueueSize).
Per-event reject also propagates to the awaited promise of a { sync: true }
emit (e.g. await run.complete({ sync: true })), so callers can handle failures
inline instead of via the 'drop' event if they prefer.
Listener errors are swallowed — a buggy on('drop') handler cannot
prevent other listeners or the queue from making progress.
400 / 401 / 403 are NOT retried (deterministic). 5xx and network
failures ARE retried with exponential backoff + jitter.
Queue overflow
The queue is bounded by maxQueueSize (default 10000). When the queue is
at capacity and a new event arrives, the oldest queued event is
dropped (with the 'drop' event fired). Use a 'drop' listener to track.
Flush / shutdown
const stats = await tracer.flush(); // { accepted, duplicate, rejected }
const stats = await tracer.shutdown(); // drain + reject further emitsBoth return per-outcome counts. shutdown() is idempotent; after
shutdown, further emits reject.
On Node, an enabled tracer auto-drains on SIGTERM/SIGINT/beforeExit
so a process that just emits and exits doesn't lose buffered events. The
hooks are removed again by shutdown(). Opt out with shutdownHooks: false
if your app manages its own exit ordering, and install hooks yourself with
installNodeShutdownHooks(tracer, { signals }) for custom signals. The
hooks are a no-op off Node (browser/edge).
Wire contract version
This SDK targets EVENTS.md v1 / D21 envelope shape. Adding a new event type to the wire spec is a major SDK version bump.
