defuss-express
v0.2.1
Published
Extremely performant (!), express-compatible, auto-multi-core, QUIC/HTTP/3-enabled, WebSocket-capable, load-balanced, ultimate-express-powered server runtime for defuss.
Readme
defuss-express
Extremely performant (!), express-compatible, auto-multi-core, QUIC/HTTP/3-enabled, WebSocket-capable, load-balanced, ultimate-express-powered server runtime for defuss.
defuss-express is a thin runtime wrapper around ultimate-express (and thus express API compatible!) that starts your app in one worker per CPU core and places a tiny TCP load balancer in front of those workers.
The intended developer experience is boring in the best way: you simply change your import from express to defuss-express, and immediately, your app will scale in the best way possible.
This package handles worker ports, process restarts, request fan-out, and worker telemetry using defuss-open-telemetry hooks.
You can even customize load balancing strategies based on request metadata and live worker stats, or just leverage the built-in round-robin or least-connections balancer algorithms.
Features
- Node.js based for maximum compatibility with existing Express apps and middleware
- automatic multi-core startup
- TCP-level load balancing proxy in the primary process
- request-aware load balancing hooks
- default round-robin balancing
- worker CPU and memory telemetry over IPC
- worker auto-respawn
ultimate-expressre-exported asexpress- zero app-level cluster boilerplate
- graceful shutdown handling with signal handlers and timeouts
- QUIC/HTTP/3 and WebSocket support out of the box (via
ultimate-express) - built-in benchmarks with realistic, real-world payloads
- as little 3rd-party dependencies as possible (
ultimate-expressanddefuss-open-telemetryfor telemetry features)
Install
bun/pnpm/yarn add defuss-expressNote: We use bun as a package manager only here.
Node.js is the intended runtime for defuss-express apps.
Basic usage
import { express, startServer, stopServer } from "defuss-express";
const app = express({ threads: 0 });
app.disable("x-powered-by");
app.get("/", (_req, res) => {
res.status(200).send("hello");
});
await startServer(app);
process.on("SIGINT", () => {
stopServer();
});
process.on("SIGTERM", () => {
stopServer();
});Custom balancing
loadBalancer receives the parsed HTTP request head, the candidate backend list, and the client socket.
import {
express,
setServerConfig,
startServer,
type BackendCandidate,
} from "defuss-express";
const chooseLowestCpu = (candidates: BackendCandidate[]) =>
[...candidates].sort(
(left, right) => (left.stats?.cpuPercent ?? 0) - (right.stats?.cpuPercent ?? 0),
)[0]!;
setServerConfig({
loadBalancer: ({ request, candidates }) => {
if (request.path?.startsWith("/realtime")) {
return chooseLowestCpu(candidates);
}
return candidates[0]!;
},
});
const app = express({ threads: 0 });
await startServer(app);Advanced server config
The full config object is passed to startServer or setServerConfig before startup. This example shows request-aware routing with a custom load balancer, opt-in telemetry via defuss-open-telemetry, and tuned timeouts:
import { express, startServer, type LoadBalancerContext } from "defuss-express";
import { createOpenTelemetrySink, OtelMeterAdapter } from "defuss-open-telemetry";
import { metrics } from "@opentelemetry/api";
// Custom load balancer: sticky sessions for /api, lowest CPU for everything else
const customBalancer = ({ request, candidates, previousIndex }: LoadBalancerContext) => {
if (request.path?.startsWith("/api") && request.headers["x-session-id"]) {
// Hash the session header to a stable backend index
const hash = [...request.headers["x-session-id"]].reduce(
(acc, ch) => ((acc << 5) - acc + ch.charCodeAt(0)) | 0,
0,
);
return candidates[Math.abs(hash) % candidates.length]!;
}
// Default: pick the worker with the lowest CPU usage
return [...candidates].sort(
(a, b) => (a.stats?.cpuPercent ?? 0) - (b.stats?.cpuPercent ?? 0),
)[0]!;
};
const app = express({ threads: 0 });
await startServer(app, {
host: "0.0.0.0",
port: 8080,
workers: 4,
loadBalancer: customBalancer,
// Opt-in OpenTelemetry (omit for silent no-op)
telemetry: createOpenTelemetrySink({
meter: new OtelMeterAdapter(metrics.getMeter("my-app")),
prefix: "defuss.express.",
}),
// Tuning
requestInspectionTimeoutMs: 25, // more time to sniff headers
maxHeaderBytes: 32 * 1024, // allow larger headers
workerHeartbeatIntervalMs: 30_000,
gracefulShutdownTimeoutMs: 15_000,
});The LoadBalancerContext provides candidates (healthy backends with live CPU/memory stats), request (parsed method, path, host, headers), socket (raw TCP socket), and previousIndex (for round-robin tracking). Return the BackendCandidate that should receive the connection.
API
express
Re-export of ultimate-express. Please see the ultimate-express documentation for details on compatibility and caveats/limitations. Almost all of the express features are supported, including WebSockets and HTTP/3. There are tiny differences/edge cases to be aware of, but this shouldn't be a concern for typical apps.
Make sure to implement end-2-end tests for your app to verify compatibility if you are using advanced or rarely used express features before you migrate to defuss-express. We haven't found any issues in our testing, but there are so many express features and combinations that we can't guarantee 100% compatibility in all edge cases. That said, if you do find any issues, please report them! This package is actively maintained and we will prioritize fixes to ensure maximum compatibility with the express API.
setServerConfig(config)
Sets global runtime config before startServer(app).
setServerConfig({
host: "0.0.0.0",
port: 3000,
workerHost: "127.0.0.1",
baseWorkerPort: 3001,
workers: "auto",
workerHeartbeatIntervalMs: 60_000,
workerHeartbeatStaleAfterMs: 150_000,
requestInspectionTimeoutMs: 10,
maxHeaderBytes: 16 * 1024,
gracefulShutdownTimeoutMs: 10_000,
respawnWorkers: true,
installSignalHandlers: true,
});startServer(app, config?)
Starts the runtime. In the primary process it forks workers and starts the TCP balancer. In worker processes it binds the app to a worker port.
stopServer(graceful=true)
Stops the balancer or the worker server depending on the current process role.
A graceful shutdown (default) first stops accepting new connections, then waits for in-flight requests to finish until the gracefulShutdownTimeoutMs is reached, at which point it forcefully terminates any remaining connections and exits.
A non-graceful shutdown immediately terminates all connections and exits.
Built-in load balancer strategies
roundRobinLoadBalancerleastConnectionsLoadBalancerresourceAwareLoadBalancerdefaultLoadBalancer
Implementation details and design notes
Should I store state in-memory in my app? Can I simply declare a global variable to share state across requests or per-session? Does this handle statefulness?
General advice: VERY BAD IDEA. If you CAN decide for an architecture, DO NOT implement any statefulness in-memory if you want your system to be horizontally scalable!
Memory is usually not shared across processes. And thus, when a user request hits the load balancer, it could be routed to any of the available worker processes. Your app will behave flaky if sometimes the request hits the worker with the expected in-memory state, and sometimes it hits a different worker that doesn't have that state. This is a fundamental consequence of the multi-core model and how load balancers work.
Therefore, you can:
Use an external state store.
defuss-sharedmemory,defuss-redisordefuss-dbare great candidates for that.Use
defuss-redisordefuss-dbif you need to share state across multiple machines, or want the convenience of a higher-level data model and don't mind the extra latency of a network round trip to your state store.Use
defuss-sharedmemoryif you only deploy to 1 SINGLE HOST (no horizontal scaling across multiple machines) and want the lowest possible latency for state access.
That being said, should you have any other specific requirements for statefulness (aka. you need sticky sessions, have Bearer token auth/API auth with state managed per-session in memory and per process), you MUST MESS WITH implementing a custom load balancer strategy via setServerConfig in order to pin requests to a specific process once the machine you're running this on, has more than one CPU core/hyperthread. You could use the request metadata (headers, path, etc.) to hash to a specific backend index (see defuss-hash for federated hashing).
Operational notes on multi-core behavior
The same entry module is executed in both the primary process and each worker. That means app construction should be deterministic and free of one-shot side effects that are only safe in a single process.
The primary process does not terminate TLS or parse full HTTP bodies. It only sniffs the request head long enough to let a custom balancer inspect method, path, and headers, then proxies bytes to the selected worker.
WebSocket upgrades continue to work because the proxy is plain TCP after backend selection.
Commands
bun run check
bun run test
bun run build
bun run bench # throughput benchmark (DSON payloads)
bun run bench:server # start benchmark server standaloneBenchmarks
Measured with autocannon on Apple Macbook Air M4 (8 cores), 100 concurrent connections, 10 s per scenario + 3 s warmup. Server uses resourceAwareLoadBalancer. Payloads are serialized/deserialized through defuss-dson (typed superset of JSON supporting Date, Map, Set, RegExp, BigInt, Uint8Array, …).
bun run bench| Scenario | Avg Req/s | p50 | p99 | Throughput | |---|---|---|---|---| | GET /dson/generate (complex typed object) | 36,384 | 2 ms | 13 ms | 186 MB/s | | POST /dson/echo (small, 3 fields) | 52,841 | 1 ms | 4 ms | 18 MB/s | | POST /dson/echo (medium, 100 users) | 10,168 | 8 ms | 25 ms | 179 MB/s | | POST /dson/transform (enrich 100 users) | 10,560 | 8 ms | 23 ms | 188 MB/s | | POST /dson/echo (large, 500 records + binary) | 750 | 120 ms | 299 ms | 228 MB/s |
Zero errors across all scenarios. Results scale linearly with core count.
License
MIT
