service-bridge
v1.8.0-dev.35
Published
ServiceBridge SDK for Node.js — production-ready RPC, durable events, workflows, jobs, and distributed tracing. One Go runtime + PostgreSQL replaces Istio, RabbitMQ, Temporal, and Jaeger.
Maintainers
Keywords
Readme
service-bridge
The Unified Bridge for Microservices Interaction
Node.js SDK for ServiceBridge — production-ready RPC, durable events, workflows, jobs, and distributed tracing in a single SDK. One Go runtime and PostgreSQL.
┌─────────────────────────────────────────────────────────────────┐
│ BEFORE: 10 moving parts │
│ Istio · Envoy · RabbitMQ · Temporal · Jaeger · Consul · │
│ cert-manager · Alertmanager · cron · custom glue │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ AFTER: ServiceBridge + PostgreSQL │
│ RPC · Events · Workflows · Jobs · Tracing · mTLS · Dashboard │
│ One SDK · One runtime · Zero sidecars │
└─────────────────────────────────────────────────────────────────┘Table of Contents
- Why ServiceBridge
- Use Cases
- Quick Start
- Install
- Runtime Setup
- End-to-End Example
- Platform Features
- How It Compares
- API Reference
- HTTP Plugins
- Configuration
- Environment Variables
- Error Handling
- When to Use / When Not to Use
- FAQ
- Community and Support
- License
Why ServiceBridge
| Problem | Without ServiceBridge | With ServiceBridge | |---|---|---| | Service-to-service calls | Istio/Envoy sidecar proxy per pod | Direct SDK-to-worker gRPC, zero proxy hops | | Async messaging | Kafka/RabbitMQ + retry logic + DLQ setup | Built-in durable events with retry, DLQ, replay | | Background jobs | Bull/BullMQ + Redis + cron daemon | Built-in cron and delayed jobs | | Workflow orchestration | Temporal/Conductor cluster + persistence | Built-in DAG workflows | | Distributed tracing | Jaeger/Tempo + OTEL collector + dashboards | Built-in traces + realtime UI | | Service discovery | Consul/etcd + DNS glue | Built-in registry + health-aware balancing | | mTLS | cert-manager + Vault PKI | Auto-provisioned certs from service key |
Result: 10 tools → 1 runtime. One Go binary + PostgreSQL replaces the entire stack.
Use Cases
Microservice communication — Replace sidecar mesh with direct RPC calls. Get sub-millisecond overhead instead of double proxy hop latency.
Event-driven architecture — Publish durable events with fan-out, retries, DLQ, idempotency, and server-side filtering. No broker infrastructure to manage.
Background job scheduling — Cron jobs, delayed execution, and job-triggered workflows in a single API. No Redis, no separate queue workers.
Saga / distributed transactions — DAG workflows with typed steps (rpc, event, event_wait, sleep, child workflow). Compensations and rollbacks via workflow step dependencies.
AI agent orchestration — Stream LLM tokens via realtime trace streams with replay. Orchestrate multi-step AI pipelines as workflows.
Full-stack observability — Every RPC call, event delivery, workflow step, and HTTP request traced automatically. One timeline, one dashboard. Prometheus metrics and Loki-compatible log API included.
Quick Start
1. Install
npm i service-bridge
# or
bun add service-bridge2. Create a worker (service that handles calls)
import { servicebridge } from "service-bridge";
const sb = servicebridge(
process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
process.env.SERVICEBRIDGE_SERVICE_KEY!,
);
sb.handleRpc("charge", async (payload: { orderId: string; amount: number }) => {
return { ok: true, txId: `tx_${Date.now()}`, orderId: payload.orderId };
});
await sb.serve({ host: "localhost" });3. Call it from another service
import { servicebridge } from "service-bridge";
const sb = servicebridge(
process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
process.env.SERVICEBRIDGE_SERVICE_KEY!,
);
const result = await sb.rpc<{ ok: boolean; txId: string }>("payments/charge", {
orderId: "ord_42",
amount: 4990,
});
console.log(result.txId); // tx_1711234567890That's it. No broker, no sidecar, no proxy — direct gRPC call between services.
Runtime Setup
The SDK connects to a ServiceBridge runtime. The fastest way to start:
bash <(curl -fsSL https://servicebridge.dev/install.sh)This installs ServiceBridge + PostgreSQL via Docker Compose and generates an admin password automatically. After install, the dashboard is at http://localhost:14444 and the gRPC control plane at localhost:14445.
For manual Docker Compose setup, configuration reference, and all runtime environment variables, see the Runtime Setup section in the main SDK README.
End-to-End Example
A complete order flow: HTTP request → RPC → Event → Event handler with streaming.
import { servicebridge } from "service-bridge";
// --- Payments service (worker) ---
const payments = servicebridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
payments.handleRpc("charge", async (payload: { orderId: string; amount: number }, ctx) => {
await ctx?.stream.write({ status: "charging", orderId: payload.orderId }, "progress");
// ... charge logic ...
await ctx?.stream.write({ status: "charged" }, "progress");
return { ok: true, txId: `tx_${Date.now()}` };
});
await payments.serve({ host: "localhost" });// --- Orders service (caller + event publisher) ---
const orders = servicebridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
// Call payments, then publish event
const charge = await orders.rpc<{ ok: boolean; txId: string }>("payments/charge", {
orderId: "ord_42",
amount: 4990,
});
await orders.event("orders.completed", {
orderId: "ord_42",
txId: charge.txId,
}, {
idempotencyKey: "order:ord_42:completed",
headers: { source: "checkout" },
});// --- Notifications service (event consumer) ---
const notifications = servicebridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
notifications.handleEvent("orders.*", async (payload, ctx) => {
const body = payload as { orderId: string; txId: string };
await ctx.stream.write({ status: "sending_email", orderId: body.orderId }, "progress");
// ... send email ...
});
await notifications.serve({ host: "localhost" });// --- Orchestrate as a workflow ---
await orders.workflow("order.fulfillment", [
{ id: "reserve", type: "rpc", ref: "inventory/reserve" },
{ id: "charge", type: "rpc", ref: "payments/charge", deps: ["reserve"] },
{ id: "wait_dlv", type: "event_wait", ref: "shipping.delivered", deps: ["charge"] },
{ id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_dlv"] },
]);Every step above — RPC, event publish, event delivery, workflow execution — appears in a single trace timeline in the built-in dashboard.
Platform Features
Communication
- Direct RPC — zero-hop gRPC calls with retries, deadlines, and mTLS identity
- Durable Events — fan-out delivery, guaranteed delivery (RabbitMQ-style), at-least-once guarantees, retries, DLQ, replay, idempotency. If a consumer is offline, the message waits in the server-side queue and is dispatched the moment the consumer reconnects — no retry budget consumed while waiting.
- Realtime Streams — live chunks with replay for AI/progress/log streaming
- Service Discovery — automatic endpoint resolution and round-robin balancing
- HTTP Middleware — Express and Fastify instrumentation with automatic trace propagation
Orchestration
- Workflows — DAG steps:
rpc,event,event_wait,sleep, child workflow - Jobs — cron, delayed, and workflow-triggered scheduling
Security
- TLS by default — control plane TLS + worker mTLS with gRPC certificate provisioning
- Access Policy — service-level caller/target restrictions and RBAC
Observability
- Unified Tracing — single trace timeline across HTTP, RPC, events, workflows, and jobs
- Metrics — Prometheus-compatible
/metricsendpoint (30+ metric families) - Logs — structured log ingest with Loki-compatible query API
- Alerts — runtime alerts for delivery failures, errors, and service health
- Dashboard — realtime web UI for traces, events, workflows, jobs, DLQ, service map, and service keys
How It Compares
| Concern | Istio + Envoy | Dapr | Temporal + Kafka | ServiceBridge |
|---|---|---|---|---|
| RPC data path | Sidecar proxy hop | Sidecar/daemon hop | N/A | Direct (proxyless) |
| Service discovery | K8s control plane | Sidecar placement | External registry | Built-in registry |
| Durable events + DLQ | External broker | Pub/Sub component | Kafka + consumers | Built-in |
| Workflow orchestration | External engine | External engine | Built-in | Built-in |
| Job scheduling | External cron/queue | External scheduler | External scheduler | Built-in |
| Traces + UI | Jaeger/Tempo + dashboards | OTEL backend + dashboards | Temporal UI | Built-in |
| Logs for Grafana | Loki + Promtail pipeline | Log pipeline | Log pipeline | Built-in Loki API |
| Metrics | App/exporter setup | App/exporter setup | Multiple exporters | Built-in /metrics |
| Security model | Mesh PKI + policy | Deployment-dependent mTLS | Mixed | Service keys + auto mTLS |
| Operational footprint | Multi-component mesh | Runtime + sidecars | Workflow + broker + DB | One binary + PostgreSQL |
API Reference
Cross-SDK parity notes
ServiceBridge keeps the core API shape consistent across Node.js, Go, and Python:
constructor, RPC, events, jobs, workflows, executeWorkflow, streams, serve/stop, and ServiceBridgeError.
Constructor-level defaults for timeout, retries, and retryDelay are available
across all three SDKs. Parity differences are naming-only (language idioms):
- Constructor TLS overrides:
workerTLS/caCert(Node),WorkerTLS/CACert(Go),worker_tls/ca_cert(Python) - Handler hints: timeout/retryable/concurrency/prefetch are advisory in all SDKs
- Shared
serve()fields across SDKs: host, max in-flight, instance ID, weight, and per-serve TLS override
servicebridge(url, serviceKey, opts?)
function servicebridge(
url: string,
serviceKey: string,
serviceOrOpts?: string | ServiceBridgeOpts,
maybeGlobalOpts?: ServiceBridgeOpts,
): ServiceBridgeServiceCreates an SDK client instance.
Service identity is resolved by the runtime from serviceKey; passing a third service argument is legacy-only.
ServiceBridgeOpts:
| Option | Type | Default | Description |
|---|---|---|---|
| timeout | number | 30000 | Default hard timeout per RPC attempt (ms). |
| retries | number | 3 | Default retry count for rpc(). |
| retryDelay | number | 300 | Base backoff delay (ms) for rpc(). |
| discoveryRefreshMs | number | 10000 | Discovery refresh period for endpoint updates. |
| queueMaxSize | number | 1000 | Max offline queue size for control-plane writes. |
| queueOverflow | "drop-oldest" \| "drop-newest" \| "error" | "drop-oldest" | Overflow strategy for offline queue. |
| heartbeatIntervalMs | number | 10000 | Base heartbeat period for worker registrations. |
| captureLogs | boolean | true | Forward console.* logs to ServiceBridge. |
Advanced TLS overrides
| Option | Type | Default | Description |
|---|---|---|---|
| workerTLS | WorkerTLSOpts | auto | Explicit cert/key/CA for worker mTLS. |
| caCert | string \| Buffer | from serviceKey | Optional control-plane CA override. By default SDK reads CA from sbv2 service key. |
WorkerTLSOpts:
type WorkerTLSOpts = {
caCert?: string | Buffer;
cert?: string | Buffer;
key?: string | Buffer;
serverName?: string;
}rpc(fn, payload?, opts?)
rpc<T = unknown>(fn: string, payload?: unknown, opts?: RpcOpts): Promise<T>Calls a registered RPC handler on another service. Direct gRPC path, no proxy.
Function name formats — fn accepts two formats:
| Format | Example | When to use |
|---|---|---|
| Plain name | "charge" | Function name is globally unique across all services. Resolved automatically via service discovery. |
| Canonical name | "payments/charge" | Multiple services expose a function with the same plain name, or you want to be explicit about the target service. |
Both formats are interchangeable when the name is unique globally. Canonical format is recommended in production for clarity and to avoid ambiguity as your service count grows.
RpcOpts:
| Option | Type | Description |
|---|---|---|
| timeout | number | Call timeout in ms. |
| retries | number | Retry count override. |
| retryDelay | number | Base retry delay override. |
| traceId | string | Explicit trace id. |
| parentSpanId | string | Explicit parent span id. |
| mode | "direct" \| "proxy" | Transport mode. "direct" (default) connects directly to the worker. "proxy" routes through the control plane when direct connection is unavailable. |
// plain name — works when "get" is unique across services
const user = await sb.rpc<{ id: string; name: string }>("get", { id: "u_1" });
// canonical name — explicit service target, always unambiguous
const user = await sb.rpc<{ id: string; name: string }>("users/get", { id: "u_1" }, {
timeout: 5000,
retries: 2,
});rpc() is bounded even when a downstream worker is silent:
each attempt has a hard local timeout, retries are finite (retries + 1 total attempts),
and after the final failed attempt the root RPC span is closed with error.
Retry delay uses exponential backoff: retryDelay * 2^(attempt-1).
event(topic, payload?, opts?)
event(topic: string, payload?: unknown, opts?: EventOpts): Promise<string>Publishes a durable event. Returns messageId when online.
EventOpts:
| Option | Type | Description |
|---|---|---|
| traceId | string | Explicit trace id. |
| parentSpanId | string | Explicit parent span id. |
| idempotencyKey | string | Idempotency key for dedup-safe publishing. |
| headers | Record<string, string> | Custom metadata headers. |
await sb.event("orders.created", { orderId: "ord_42" }, {
idempotencyKey: "order:ord_42",
headers: { source: "checkout" },
});publishEvent(topic, payload?, opts?)
publishEvent(topic: string, payload?: unknown, opts?: PublishEventOpts): Promise<string>Publishes an event via the established worker session stream. Requires an active worker session — call after serve(). Resolves with messageId once the server confirms with publish_ack. Times out after 30 s if no ack. Use event() when not serving (e.g. caller-only services); use publishEvent() from within a worker for lower-latency publishing over the existing session.
job(target, opts)
job(target: string, opts: ScheduleOpts): Promise<string>Registers a scheduled or delayed job.
ScheduleOpts:
| Option | Type | Description |
|---|---|---|
| cron | string | Cron expression. |
| delay | number | Delay in ms before execution. Backed by int32 in the proto — maximum ~24.8 days (~2,147,483,647 ms). |
| timezone | string | Timezone for cron execution. |
| misfire | "fire_now" \| "skip" | Misfire policy. |
| via | "event" \| "rpc" \| "workflow" | Target type. |
| retryPolicyJson | string | Retry policy JSON string. |
await sb.job("billing/collect", {
cron: "0 * * * *",
timezone: "UTC",
via: "rpc",
});workflow(name, steps, opts?)
workflow(name: string, steps: WorkflowStep[], opts?: WorkflowOpts): Promise<string>Registers (or updates) a workflow definition as a DAG of typed steps. Returns the workflow name.
WorkflowStep:
| Field | Type | Description |
|---|---|---|
| id | string | Unique step identifier in the DAG. |
| type | "rpc" \| "event" \| "event_wait" \| "sleep" \| "workflow" | Step execution type. |
| ref | string | Required for rpc, event, event_wait, workflow. |
| deps | string[] | Dependencies. Empty/omitted means root step. |
| if | string | Optional filter expression (step is skipped if false). |
| timeoutMs | number | Optional timeout for rpc and event_wait steps. |
| durationMs | number | Required for sleep steps. |
WorkflowOpts:
interface WorkflowOpts {
stateLimitBytes?: number; // default 262144 (256 KB)
stepTimeoutMs?: number; // default 30000 (30 s)
}| Field | Type | Default | Description |
|---|---|---|---|
| stateLimitBytes | number | 262144 (256 KB) | Maximum serialized state size in bytes. |
| stepTimeoutMs | number | 30000 (30 s) | Default per-step timeout in milliseconds. |
await sb.workflow("order.fulfillment", [
{ id: "reserve", type: "rpc", ref: "inventory/reserve" },
{ id: "charge", type: "rpc", ref: "payments/charge", deps: ["reserve"] },
{ id: "wait_5m", type: "sleep", durationMs: 300_000, deps: ["charge"] },
{ id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_5m"] },
]);With explicit limits:
await sb.workflow("checkout.flow", steps, { stepTimeoutMs: 60_000 });executeWorkflow(name, input?, opts?)
executeWorkflow(name: string, input?: unknown, opts?: ExecuteWorkflowOpts): Promise<{ traceId: string; groupTraceId: string }>Starts a workflow execution on demand. The workflow must be registered first via workflow().
An alternative to scheduling via job(target, { via: "workflow" }) — triggers the execution immediately.
| Parameter | Type | Default | Description |
|---|---|---|---|
| name | string | required | Name of a previously registered workflow. |
| input | unknown | undefined | Optional JSON-serializable input payload. |
Returns { traceId, groupTraceId }. Use traceId with watchTrace() to observe execution in real time.
ExecuteWorkflowOpts:
| Option | Type | Description |
|---|---|---|
| traceId | string | Override trace ID for this workflow execution. |
const { traceId, groupTraceId } = await sb.executeWorkflow("user.onboarding", { userId: "u_123" });cancelWorkflow(traceId)
cancelWorkflow(traceId: string): Promise<void>Cancels a running workflow instance.
await sb.cancelWorkflow("trace_01HQ...XYZ");handleRpc(fn, handler, opts?)
handleRpc(
fn: string,
handler: (payload: unknown, ctx?: RpcContext) => unknown | Promise<unknown>,
opts?: HandleRpcOpts,
): ServiceBridgeServiceRegisters an RPC handler. Chainable.
RpcContext:
| Field | Type | Description |
|---|---|---|
| traceId | string | Current trace ID. |
| spanId | string | Current span ID. |
| stream | StreamWriter | Real-time stream writer. |
HandleRpcOpts:
| Option | Type | Description |
|---|---|---|
| timeout | number | Advisory timeout hint (currently metadata-level, not hard-enforced by runtime). |
| retryable | boolean | Advisory retry hint (currently metadata-level, not a strict policy switch). |
| concurrency | number | Advisory concurrency hint (currently not hard-enforced). |
| schema | RpcSchemaOpts | Inline protobuf schema for binary encode/decode. |
| allowedCallers | string[] | Allow-list of caller service names. |
sb.handleRpc("ai/generate", async (payload: { prompt: string }, ctx) => {
await ctx?.stream.write({ token: "Hello" }, "output");
await ctx?.stream.write({ token: " world" }, "output");
return { text: "Hello world" };
});StreamWriter:
| Method | Signature | Description |
|---|---|---|
| write | write(data: unknown, key?: string): Promise<void> | Append a real-time chunk to the trace stream. |
| end | end(key?: string): Promise<void> | No-op placeholder for API symmetry (lifecycle managed by runtime). |
handleEvent(pattern, handler, opts?)
handleEvent(
pattern: string,
handler: (payload: unknown, ctx: EventContext) => void | Promise<void>,
opts?: HandleEventOpts,
): ServiceBridgeServiceRegisters an event consumer handler. Chainable.
HandleEventOpts:
| Option | Type | Description |
|---|---|---|
| groupName | string | Consumer group name. Default: <service-key-id>.<pattern>. |
| concurrency | number | Advisory concurrency hint (currently not hard-enforced). |
| prefetch | number | Advisory prefetch hint (currently not hard-enforced). |
| retryPolicyJson | string | Retry policy JSON string. |
| filterExpr | string | Server-side filter expression. |
Duplicate groupName registration throws an error.
Delivery guarantee: once a message is accepted by the runtime, delivery to each consumer group
is guaranteed. If the consumer is offline, the message waits in the server-side queue and is
dispatched automatically the moment the service reconnects and registers its handlers — no retry
budget is consumed while waiting. After SERVICEBRIDGE_DELIVERY_TTL_DAYS (default 7) days without
a consumer, the delivery moves to DLQ with reason delivery_ttl_exceeded.
EventContext helpers:
ctx.traceId— current trace IDctx.spanId— current span IDctx.retry(delayMs?)— ask for redelivery with optional delayctx.reject(reason)— move to DLQ immediately, bypassing remaining retriesctx.refs— metadata (topic,groupName,messageId,attempt,headers)ctx.stream.write(...)— append real-time chunks to trace stream
sb.handleEvent("orders.*", async (payload, ctx) => {
const body = payload as { orderId?: string };
if (!body.orderId) {
ctx.reject("missing_order_id");
return;
}
await ctx.stream.write({ status: "processing", orderId: body.orderId }, "progress");
});serve(opts?)
serve(opts?: ServeOpts): Promise<void>Starts the worker gRPC server and registers handlers with the control plane.
The promise resolves once startup/registration is complete (it does not block
the Node.js process). Throws immediately if no handlers are registered (neither handleRpc() nor handleEvent() have been called).
ServeOpts:
| Option | Type | Description |
|---|---|---|
| host | string | Bind host. Default: localhost. Use 0.0.0.0 in Docker/Kubernetes so ServiceBridge can reach the worker. |
| maxInFlight | number | Max in-flight runtime-originated commands over OpenWorkerSession. Default: 128. |
| instanceId | string | Stable worker instance identifier. |
| weight | number | Scheduling/discovery weight hint. |
| tls | WorkerTLSOpts | Per-serve worker TLS override. |
await sb.serve({
host: "localhost",
instanceId: process.env.HOSTNAME,
});stop()
stop(): voidGracefully stops the worker gRPC server (try graceful shutdown, then force), heartbeats, channels, and SDK internals.
startHttpSpan(opts)
startHttpSpan(opts: {
method: string;
path: string;
traceId?: string;
parentSpanId?: string;
}): HttpSpanManual HTTP tracing primitive.
const span = sb.startHttpSpan({ method: "GET", path: "/health" });
try {
span.end({ statusCode: 200, success: true });
} catch (e) {
span.end({ success: false, error: String(e) });
}registerHttpEndpoint(opts)
registerHttpEndpoint(opts: {
method: string;
route: string;
instanceId?: string;
endpoint?: string;
allowedCallers?: string[];
requestSchemaJson?: string;
responseSchemaJson?: string;
transport?: string;
}): Promise<void>Registers HTTP route metadata in the ServiceBridge service catalog. Also starts a periodic heartbeat to keep the HTTP endpoint alive in the registry.
| Option | Type | Description |
|---|---|---|
| method | string | HTTP method: GET, POST, PUT, PATCH, DELETE, etc. |
| route | string | Route pattern with parameter placeholders, e.g. "/users/:id". |
| instanceId | string | Stable identifier for this process instance. |
| endpoint | string | Reachable address, e.g. "http://10.0.0.1:3000". |
| allowedCallers | string[] | Service names allowed to call (RBAC). |
| requestSchemaJson | string | JSON schema for request validation metadata. |
| responseSchemaJson | string | JSON schema for response validation metadata. |
| transport | string | Transport label (e.g. "http", "https"). |
await sb.registerHttpEndpoint({
method: "GET",
route: "/users/:id",
requestSchemaJson: '{"type":"object"}',
transport: "http",
});watchTrace(traceId, opts?)
watchTrace(traceId: string, opts?: WatchTraceOpts): AsyncIterable<TraceStreamEvent>Subscribes to a trace stream with replay and live updates. traceId is the stream
identifier used by ctx.stream.write(...).
WatchTraceOpts:
| Option | Type | Default | Description |
|---|---|---|---|
| key | string | "" | Stream key filter ("" = all keys). |
| fromSequence | number | 0 | Replay from sequence cursor. |
TraceStreamEvent:
| Field | Type | Description |
|---|---|---|
| type | "chunk" \| "trace_complete" | Event kind. |
| traceId | string | Trace identifier being watched. |
| key | string | Stream lane key. |
| sequence | number | Monotonic sequence number. |
| data | unknown | JSON-decoded chunk payload. |
| traceStatus | string \| undefined | Final status on trace_complete. |
Behavior:
- Auto-reconnect with exponential backoff (
500ms→5000ms) on retryable stream failures. - Deduplicates by
sequenceacross reconnects. - Enforces strict JSON for
type="chunk"payloads (non-JSON chunk terminates stream with fatal error). - Enforces internal queue limit
256; overflow is fatal (consumer must drain promptly).
for await (const evt of sb.watchTrace(traceId, { key: "output", fromSequence: 0 })) {
if (evt.type === "chunk") {
process.stdout.write(String((evt.data as { token?: string }).token ?? ""));
}
if (evt.type === "trace_complete") break;
}Trace Utilities
getTraceContext()
getTraceContext(): { traceId: string; spanId: string } | undefinedReturns the current async-local trace context.
import { getTraceContext } from "service-bridge";
const tc = getTraceContext();
if (tc) {
console.log(tc.traceId, tc.spanId);
}withTraceContext(ctx, fn)
withTraceContext<T>(ctx: { traceId: string; spanId: string }, fn: () => T): TRuns a function inside an explicit trace context.
import { withTraceContext } from "service-bridge";
withTraceContext({ traceId: "trace-1", spanId: "span-1" }, async () => {
await sb.event("audit.log", { action: "user.login" });
});HTTP Plugins
Express (service-bridge/express)
npm install expressimport express from "express";
import { servicebridge } from "service-bridge";
import { servicebridgeMiddleware, registerExpressRoutes } from "service-bridge/express";
const sb = servicebridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
const app = express();
app.use(servicebridgeMiddleware({
client: sb,
excludePaths: ["/health"],
autoRegister: true,
}));
app.get("/users/:id", async (req, res) => {
const user = await req.servicebridge.rpc("users/get", { id: req.params.id });
res.json(user);
});servicebridgeMiddleware(options)
servicebridgeMiddleware(options: {
client: ServiceBridgeService;
excludePaths?: string[];
propagateTraceHeader?: boolean;
autoRegister?: boolean;
}): express.RequestHandler- Attaches
req.servicebridge,req.traceId,req.spanId - Starts/ends HTTP span automatically
- Optionally sets
x-trace-idresponse header - Optionally auto-registers route pattern in catalog on first hit
registerExpressRoutes(app, client, opts?)
Eager route catalog registration without waiting for first request.
await registerExpressRoutes(app, sb, {
endpoint: "http://10.0.0.5:3000",
allowedCallers: ["api-gateway"],
excludePaths: ["/health"],
});Fastify (service-bridge/fastify)
npm install fastifyimport Fastify from "fastify";
import { servicebridge } from "service-bridge";
import { servicebridgePlugin, wrapHandler } from "service-bridge/fastify";
const sb = servicebridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
const app = Fastify();
await app.register(servicebridgePlugin, {
client: sb,
excludePaths: ["/health"],
autoRegister: true,
});
app.get("/users/:id", wrapHandler(async (request, reply) => {
const user = await request.servicebridge.rpc("users/get", {
id: (request.params as any).id,
});
return reply.send(user);
}));servicebridgePlugin(fastify, options)
servicebridgePlugin(fastify, {
client,
excludePaths?,
propagateTraceHeader?,
autoRegister?,
register?: {
instanceId?,
endpoint?,
allowedCallers?,
excludePaths?,
},
})- Decorates
request.servicebridge,request.traceId,request.spanId - Traces HTTP lifecycle via hooks
- Auto-registers routes on
onRoutebefore traffic
wrapHandler(handler)
Runs a Fastify handler inside the current trace context so downstream SDK calls inherit the trace.
Trace Utilities (HTTP Plugins)
extractTraceFromHeaders(headers)
import { extractTraceFromHeaders } from "service-bridge/express";
// or
import { extractTraceFromHeaders } from "service-bridge/fastify";
const { traceId, parentSpanId } = extractTraceFromHeaders(req.headers);Extracts trace context from HTTP headers. Supports W3C traceparent, x-trace-id/x-span-id headers, and generates random IDs as fallback. Useful for custom HTTP framework integrations (Hono, Koa, etc.).
Configuration
TLS behavior
- Worker transport is TLS-only.
- Control plane is TLS-only. Trust source is embedded into sbv2 service key by default.
- Embedded/explicit CA PEM is validated with strict x509 parsing.
- If
workerTLSis not provided, SDK auto-provisions worker certs via gRPCProvisionWorkerCertificate. workerTLS.certandworkerTLS.keymust be provided together.serve({ tls })overrides globalworkerTLSfor a specific worker instance.
Offline queue behavior
When the control plane is unavailable, SDK queues write operations (event, job, workflow, telemetry writes).
- Queue size:
queueMaxSize(default: 1000) - Overflow policy:
queueOverflow(default:"drop-oldest") - Return values for queued writes may be empty strings until flushed
Environment Variables
The SDK requires values you pass into servicebridge(...). Common setup:
| Variable | Required | Example | Description |
|---|---|---|---|
| SERVICEBRIDGE_URL | yes | localhost:14445 | gRPC control plane URL |
| SERVICEBRIDGE_SERVICE_KEY | yes | sbv2.<id>.<secret>.<ca> | Service authentication key (sbv2 only) |
const sb = servicebridge(
process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
process.env.SERVICEBRIDGE_SERVICE_KEY!,
);Error Handling
ServiceBridgeError is exported for normalized SDK and runtime errors.
import { servicebridge, ServiceBridgeError } from "service-bridge";
try {
await sb.rpc("payments/charge", { orderId: "ord_1" });
} catch (e) {
if (e instanceof ServiceBridgeError) {
console.error(e.component, e.operation, e.severity, e.retryable, e.code);
}
throw e;
}| Field | Type | Description |
|---|---|---|
| component | string | SDK subsystem (for example, "rpc" or "event"). |
| operation | string | Operation that failed. |
| severity | "fatal" \| "retriable" \| "ignorable" | Error classification. |
| retryable | boolean | Whether retry is recommended. |
| code | number \| undefined | gRPC status code (if available). |
| cause | unknown | Original underlying error. |
When to Use / When Not to Use
ServiceBridge is a good fit when you:
- Have 3+ microservices that need to communicate via RPC, events, or both
- Want RPC + events + workflows + jobs without managing separate infrastructure for each
- Need end-to-end tracing across all communication patterns in one timeline
- Want to eliminate sidecar proxies and reduce operational overhead
- Need durable event delivery with retry, DLQ, and replay without running a broker
- Are building AI/LLM pipelines and need realtime streaming with replay
Consider alternatives when you:
- Run a single monolith with no service decomposition plans
- Need ultra-high-throughput event streaming (100K+ msg/s sustained) — Kafka is purpose-built for this
- Need a full API gateway with rate limiting, auth plugins, and request transformation — use Kong/Envoy Gateway
- Already have a mature Istio/Linkerd mesh and only need traffic management (no events/workflows/jobs)
- Need multi-region event replication — ServiceBridge currently targets single-region deployments
v2 Session API
session_v2.ts реализует новый Enterprise Session Protocol — Channel-based bidi stream с 8-состоянийным FSM, адаптивным heartbeat и кредитным управлением потоком. Симметричен с Go и Python SDK.
Жизненный цикл сессии (8 состояний FSM)
connecting → handshaking → ready ↔ active
↘ suspended → (reconnect)
↘ draining → closed
↘ fenced (permanent)| Состояние | Описание |
|-----------|----------|
| connecting | Устанавливается TCP/TLS соединение |
| handshaking | Отправлен Hello, ждём HelloAck |
| ready | HelloAck получен, команды не выполняются |
| active | Есть активные команды |
| suspended | Heartbeat пропущен 2+ раза |
| draining | Инициирован graceful shutdown |
| fenced | Сервер прислал GOAWAY_FENCED — сессия закрыта навсегда |
| closed | Соединение закрыто |
Быстрый старт
import { V2SessionClient, validateV2Config } from 'service-bridge';
const cfg = {
serverAddress: 'localhost:9090',
serviceName: 'my-worker',
instanceId: 'worker-1',
zone: 'us-east-1a',
transportMode: 'direct' as const,
maxInflight: 64,
};
validateV2Config(cfg);
const session = new V2SessionClient(cfg);
// Отправить Hello при подключении
const hello = session.getHelloFields();
// Обработать HelloAck от сервера
session.onHelloAck({
sessionId: 'sess-abc',
resumeToken: 'token-xyz',
epoch: 1n,
resumed: false,
resumeFromSeq: 0n,
replayedCommands: 0,
reconciledResults: 0,
heartbeatIntervalMs: 10_000,
heartbeatTimeoutMs: 30_000,
initialPermits: 64,
maxPermits: 128,
effectiveTransportMode: 'direct',
});
console.log(session.state); // 'ready'
// Входящая команда
const accepted = session.onCommandReceived(1n, 'cmd-001');
if (!accepted) {
// backpressure — permits = 0
}
// Команда выполнена
session.onCommandCompleted(1n, 'cmd-001');Адаптивный heartbeat (EWMA RTT)
import { AdaptiveHeartbeatV2 } from 'service-bridge';
const hb = new AdaptiveHeartbeatV2(10_000, 30_000);
// Получен pong
hb.onPong(25); // rttMs
// Следующий интервал (адаптируется по EWMA RTT)
const nextMs = hb.nextIntervalMs();
// Пропуск — ускоряем пинги
const missCount = hb.onMiss();
if (missCount >= 2) {
// reconnect
}Алгоритм: базовый интервал intervalMs / 3; при пропусках делится на 2^miss (min 2s); при стабильном RTT < 50ms удваивается (max 30s).
Кредитное управление потоком
import { FlowControlStateV2 } from 'service-bridge';
const fc = new FlowControlStateV2(64, 1, 128);
if (fc.tryConsume()) {
// dispatch command
}
// Команда завершена — вернуть permit
fc.release(1);
// Сервер прислал FlowControlUpdate
fc.setWindow(32);Reconnect и resume
BackoffV2 реализует экспоненциальный backoff с full jitter (base=100ms, max=30s). При переподключении getHelloFields() автоматически включает resumeToken, epoch, lastReceivedSeq, lastSentSeq, completedCommandIds — сервер продолжит сессию с нужной позиции.
import { BackoffV2 } from 'service-bridge';
const backoff = new BackoffV2();
while (true) {
if (backoff.isCircuitOpen()) break; // 10+ сбоев подряд
const delayMs = backoff.next();
await new Promise(r => setTimeout(r, delayMs));
try {
// reconnect...
backoff.reset();
} catch {
backoff.recordFail();
}
}ConfigPush — динамическая конфигурация транспорта
Сервер может в любой момент прислать ConfigPush с новыми правилами маршрутизации:
session.onConfigPush({
defaultMode: 'direct',
serviceOverrides: {
'payment-svc': { mode: 'proxy', fallbackPolicy: 'fallback_to_direct' },
},
functionOverrides: {
'payment-svc/charge': { mode: 'proxy', timeoutMs: 5000 },
},
});
// Разрешить транспорт для функции
const mode = session.resolveTransportMode('payment-svc/charge'); // 'proxy'Все события сессии
| Метод | Описание |
|-------|----------|
| getHelloFields() | Поля для отправки Hello (первый + resume) |
| onHelloAck(ack) | Обработка HelloAck от сервера |
| onCommandReceived(seq, id) | Входящая команда; возвращает false при backpressure |
| onCommandCompleted(seq, id) | Команда выполнена; освобождает permit |
| onPermitGrant(n) | Сервер добавил n permits |
| onFlowControlUpdate(size, reason) | Сервер изменил размер окна |
| onPong(rttMs) | Получен pong; обновляет EWMA |
| onHeartbeatMiss() | Таймаут pong; возвращает true → suspended |
| onDrain(reason, deadlineMs) | Инициировать graceful drain |
| onGoaway(code, reason) | GoawaySignal от сервера |
| onConfigPush(config) | Применить новую конфигурацию транспорта |
| resolveTransportMode(fnName) | Получить режим транспорта для функции |
| stop() | Немедленно закрыть сессию |
Экспортируемые классы и типы
| Символ | Тип | Описание |
|--------|-----|----------|
| V2SessionClient | class | Главный клиент сессии |
| AdaptiveHeartbeatV2 | class | EWMA RTT heartbeat controller |
| FlowControlStateV2 | class | Кредитное управление потоком |
| BackoffV2 | class | Exponential backoff + circuit |
| PositionTrackerV2 | class | Трекер seq/completed IDs |
| ConfigPushStateV2 | class | Менеджер динамической конфигурации |
| validateV2Config | function | Валидация конфига; бросает Error |
| V2Config | interface | Конфигурация сессии |
| SessionStateV2 | type | Союз 8 состояний FSM |
| TransportMode | type | 'direct' \| 'proxy' |
| HelloAckV2 | interface | Данные HelloAck от сервера |
| TransportConfigV2 | interface | ConfigPush payload |
| ReconcileRequestV2 | interface | Declarative worker registration request |
| FunctionDeclarationV2 | interface | Function declaration for Reconcile |
| ConsumerGroupDeclarationV2 | interface | Consumer group declaration |
| HttpRouteDeclarationV2 | interface | HTTP route declaration |
| JobDeclarationV2 | interface | Job declaration |
| WorkflowDeclarationV2 | interface | Workflow declaration |
| SubscribeRequestV2 | interface | Registry subscribe request |
| WorkerEndpointV2 | interface | Worker endpoint info |
| IssueCertificateRequestV2 | interface | Certificate request |
| IssueCertificateResponseV2 | interface | Certificate response |
| CircuitBreakerConfigV2 | interface | Circuit breaker config |
| ZoneConfigV2 | interface | Zone-aware config |
| ServiceTransportOverride | interface | Per-service transport override |
| FunctionTransportOverride | interface | Per-function transport override |
| ResumeState | interface | Reconnect resume state |
Key types available for import:
import type {
WorkflowStep,
WorkerTLSOpts,
RpcContext,
EventContext,
StreamWriter,
TraceCtx,
RetryPolicy,
ServiceBridgeErrorSeverity,
} from "service-bridge";FAQ
How does ServiceBridge handle service failures? RPC calls have configurable retries with exponential backoff and hard per-attempt timeouts, so a silent downstream service cannot keep a call pending forever. Events are durable (PostgreSQL-backed) with at-least-once delivery per consumer group. Failed deliveries are retried according to policy, then moved to DLQ. Workflows track step state and can be resumed.
Is there vendor lock-in? ServiceBridge is self-hosted. The runtime is a single Go binary + PostgreSQL. SDK calls map to standard patterns (RPC, pub/sub, cron) — migrating away means replacing SDK calls with equivalent library calls.
How does tracing work without an OTEL collector? The SDK automatically reports trace spans for every RPC call, event publish/delivery, workflow step, and HTTP request. The runtime stores traces in PostgreSQL and serves them via the built-in dashboard and a Loki-compatible API for Grafana integration.
Can I use ServiceBridge alongside existing infrastructure? Yes. You can adopt incrementally — start with RPC between two services, add events later, then workflows. ServiceBridge doesn't require replacing your existing broker or mesh all at once.
What happens when the control plane is down? In-flight direct RPC calls continue working (they go service-to-service, not through the control plane). New discovery lookups, event publishes, and telemetry writes are queued in the SDK offline queue and flushed when the control plane recovers.
What databases does the runtime support? PostgreSQL 16+. The runtime uses PostgreSQL for all persistence: traces, events, workflows, jobs, service registry, and configuration.
Community and Support
- Website: servicebridge.dev
- GitHub: github.com/service-bridge
- SDK monorepo: README.md
License
Free for non-commercial use. Commercial use requires a separate license. See LICENSE.
Copyright (c) 2026 Eugene Surkov.
Keywords
service-bridge · servicebridge · npm install service-bridge · npm i service-bridge · bun add service-bridge · Node.js SDK · TypeScript SDK · JavaScript microservices · RPC · gRPC · event bus · event-driven · distributed tracing · workflow orchestration · background jobs · cron · mTLS · service mesh · service discovery · zero sidecar · Istio alternative · Envoy alternative · RabbitMQ alternative · Temporal alternative · Jaeger alternative · PostgreSQL · Docker · Kubernetes · DLQ · dead letter queue · saga · distributed transactions · AI agent orchestration · Express middleware · Fastify middleware · HTTP middleware · observability · Prometheus · tracing · service catalog · durable events · retries · idempotency · auto mTLS · runtime dashboard · production ready · microservice communication
