causal-order
v0.3.4
Published
An event integrity library for ordering and analyzing distributed events without false certainty.
Maintainers
Readme
causal-order
An event integrity library for distributed systems that still use clocks, but cannot rely on one globally synchronized clock as the truth model.
causal-order helps developers design and run event processing, replay, and recovery flows without assuming the system has one perfect global time source.
It does not replace clocks or timestamps. It helps when timestamp order alone is not enough to explain what happened.
Website:
It helps you:
- order what can be ordered
- preserve concurrency only when it can be justified honestly
- flag what is suspicious
- keep the difference between proof, inference, fallback, and unknown
Why This Exists
Distributed systems often produce misleading timelines:
- clocks drift across regions
- replayed events can look newer than original events
- offline devices sync late
- ingestion order differs from creation order
- some events are truly concurrent
A timestamp-only sort produces a clean-looking answer. In distributed systems, clean-looking timestamp order is often not the same as causal truth.
causal-order exists to make that uncertainty visible instead of hiding it.
Mental Model
causal-order is built around a simple rule:
Be easy to use at the surface, but hard to misuse into false certainty.
In practice, that means:
- not every event set should be forced into one total order
- explicit causal evidence outranks clock appearance
- cross-node events without supported causal evidence should usually remain
unknown - shared
traceIdorpartitionmetadata does not, by itself, imply causality - streaming finality is operational, not causal truth
Supported causal evidence today is intentionally narrow:
parentEventIddependencyEventIds- same-node monotonic
sequence
This library is not trying to eliminate clocks. It is trying to stop treating wall-clock agreement as the truth model for a distributed system.
What You Get
Given a set of distributed events, the library returns more than a sorted list.
It returns:
ordered: events withorderIndex,orderBasis, andconfidenceanomalies: invalid, suspicious, or operationally important recordsstats: summary counts for the batch
Confidence is explicit:
proven: explicit causal evidence existsderived: order was inferred from useful but weaker metadatafallback: deterministic ordering was imposed for stabilityunknown: the library cannot honestly justify the claim
Install
npm install causal-orderPlatform:
- Node.js
20+ - ESM only
Quick Example
import { orderEvents } from "causal-order"
const events = [
{
id: "evt-1",
nodeId: "orders-api",
clock: {
physicalTimeMs: 1714971840123n,
logicalCounter: 0,
nodeId: "orders-api",
},
sequence: 1n,
payload: { type: "order.created" },
},
{
id: "evt-2",
nodeId: "payments-worker",
clock: {
physicalTimeMs: 1714971840125n,
logicalCounter: 1,
nodeId: "payments-worker",
},
parentEventId: "evt-1",
payload: { type: "payment.captured" },
},
]
const result = orderEvents(events, {
strict: false,
detectAnomalies: true,
})
console.log(result.ordered)
console.log(result.anomalies)Example output shape:
[
{
event: events[0],
orderIndex: 0n,
orderBasis: "sequence",
confidence: "derived",
},
{
event: events[1],
orderIndex: 1n,
orderBasis: "causal",
confidence: "proven",
causalEvidence: [{ type: "parent_event", parentEventId: "evt-1" }],
},
]The important part is not just the order. It is the explanation of why that order exists and how trustworthy it is.
Streaming Overview
For large or unbounded event flows, use orderEventStream() instead of assuming everything belongs in one in-memory batch.
That includes both:
- ordinary day-to-day stream processing
- delayed reconnect, offline sync, or recovery flows where late arrivals are part of normal operations
import { orderEventStream } from "causal-order"
for await (const batch of orderEventStream(source(), {
batchSize: 100,
maxLateArrivalMs: 30_000n,
lateArrivalPolicy: "flag",
strict: false,
})) {
console.log(batch.events)
console.log(batch.anomalies)
console.log(batch.watermark, batch.isFinal)
}Keep this mental model in mind:
- the watermark controls operational readiness, not causal truth
- late events are handled by explicit policy rather than being silently hidden
- non-final output may need later reconciliation, especially in reconnect-heavy flows
For the full stream contract, see:
When To Use It
causal-order is primarily for deployable operational event processing in distributed systems that cannot rely on one perfect global clock.
That includes:
- continuous stream processing with explicit late-arrival and reconciliation behavior
- delayed reconnect and recovery workflows
- offline sync inspection
- replay analysis
Other strong use cases include:
- multi-region debugging
- audit timeline reconstruction
- late-arrival stream handling
- distributed incident analysis
It is especially useful when:
- events come from multiple services, devices, or regions
- timestamps are not enough on their own
- ordering claims need explanation
- concurrency matters
- suspicious metadata should not be silently normalized
It is less useful when:
- you already have authoritative causal ordering elsewhere
- you only need a plain timestamp sort
Documentation
Start here:
Streaming:
Failure modes and case studies:
- Case Studies
- Replay Corruption
- Multi-Region Drift
- False Audit Timelines
- Offline Sync Anomalies
- Causal Inversion
Workloads and hardening:
- Production Gate
0.3.2 - Anomaly Surface Audit
0.3.2 - Fuzz Testing
0.3.2 - Streaming Hardening And Pressure
0.3.3 - Implementation Guide
0.3.3 - Runtime Stability 0.3.4
- Implementation Guide 0.3.4
- Stress Hardening
- After-Hours Batch Processing
- Realistic Workloads
The 0.3.2 hardening story is now explicit:
- production-gate criteria define what the current contract must prove
- anomaly-surface notes explain what the runtime can and cannot currently signal
- seeded fuzz coverage pressure-tests outage, replay, reconnect, duplicate, and clock-noise cases reproducibly
- bounded batch recovery, replay, and audit-style workloads are the stronger current deployment story within the existing contract
- the larger remaining proof bar is on long-running streaming behavior rather than on bounded batch ordering
Runnable examples:
Status
causal-order is in the public 0.3.x release line.
Current release shape:
0.3.2established the current production-gate hardening baseline0.3.3broadened the streaming hardening and pressure release story after that production-gate milestone0.3.4is the current runtime-stability release for prolonged and constrained-runtime streaming proof
The current 0.3.4 release is centered on:
- explicit
0.3.2production-gate proof - the broader
0.3.3streaming pressure profiles and higher-scale visibility bands that established the current pressure surface - repeated-cycle stream endurance runs
- constrained-heap stream endurance runs
- GC-observed stream runs
- sustained correction-churn and anomaly-heavy reconnect endurance profiles
Current deployment posture:
- bounded batch recovery, replay, reconciliation, and audit-style workloads are the stronger production-credible side of the current contract
- the main remaining operational hardening work is on prolonged and constrained-runtime streaming behavior
That means:
- the package is usable today
- the API is still evolving
- semantics matter more than surface churn at this stage
1.0.0is the point where the semantic contract should feel stable enough to preserve long-term
Repository Development
If you are working in the repository itself:
npm install
npm run check
npm test
npm run bench:check
npm run release:checkUseful local commands:
npm run demonpm run examplesnpm run benchnpm run bench:streamnpm run bench:allnpm run bench:csvnpm run bench:profile
Current test posture:
npm testincludes the direct release-gate suites plus seeded0.3.2fuzz coverage- the fuzz layer currently covers batch outage/replay noise plus streaming reconnect, fragmented watermark-lag, correction-burst, sustained correction-churn, reconnect-burst, bounded-window lagging-watermark, and bounded-memory cross-window replay pressure
- broader exploratory fuzz campaigns are now part of the shipped
0.3.3pressure expansion
Current benchmark posture:
10kand100kare the main enforced guardrail bands150kcorrupted-dataset profiles are available for stress visibility, but are not currently enforced innpm run bench:check150kremains the enforced sustained watermark-lag stream guard band while250kremains exploratory stretch visibility rather than a routine guard target- repeated-cycle, constrained-heap, GC-observed, and sustained correction/reconnect endurance runs are now available as explicit runtime-stability evidence commands
npm run bench:profileis available when you need CPU profiles for the slowest stress cases
License
MIT. See LICENSE.
Security
See SECURITY.md for supported versions and private vulnerability reporting guidance.
Contributing
See CONTRIBUTING.md for repository workflow, verification expectations, and documentation update guidance.
