@quicore/expressjs
v3.0.0
Published
Fast-start Express server with production-grade scaling primitives.
Readme
@quicore/expressjs
Fast-start Express server with sensible defaults, a power-user plugin system, and production-grade scaling primitives built in.
import { QuicoreExpressServer } from '@quicore/expressjs';
const server = new QuicoreExpressServer();
server.setRoutes([{ method: 'get', path: '/', handlers: (req, res) => res.send('ok') }]);
await server.start();This still works exactly as before. The v3 additions are all opt-in.
What's new in v3
Production HTTP timeouts (on by default)
new QuicoreExpressServer({
timeouts: {
keepAlive: 65_000, // > LB idle timeout (typical: 60s)
headers: 66_000, // must be > keepAlive (Node requirement)
request: 30_000, // total request time cap
},
});Defaults are tuned for environments behind AWS ALB / GCP LB / Cloudflare. The
keepAlive > LB idle ordering matters: it prevents the LB from sending a
request on a socket the origin is about to close (which surfaces as random
502s on the client side).
Load shedding (opt-in)
Three independent signals to reject excess work fast — protect the process instead of letting it collapse:
new QuicoreExpressServer({
loadShed: {
maxInFlight: 1000, // 503 above this many concurrent requests
maxEventLoopLagMs: 200, // 503 when p99 event-loop lag > 200ms
maxConnections: 10_000, // TCP-level cap (server.maxConnections)
retryAfterSeconds: 1, // included in 503 response
},
});At 5–20k RPS you absolutely want at least maxInFlight set — without it, a
slow downstream causes unbounded queue growth in your process.
Metrics exposed at server.metrics:
{
inFlight: 234,
maxInFlight: 1000,
eventLoopLagMs: 12.5,
maxEventLoopLagMs: 200,
shed: { inFlight: 0, eventLoop: 0 },
}/ready (via healthPlugin) reads these and returns 503 when overloaded, so
the LB stops sending traffic before things break.
HTTP/2 (opt-in)
new QuicoreExpressServer({
http: { port: 8080, http2: true }, // h2c — plaintext HTTP/2
https: { enabled: true, port: 8443, ssl: {...}, http2: true }, // h2 + h2 over TLS
});allowHTTP1: true is set so the same server handles both HTTP/1.1 and HTTP/2
clients. All existing Express middleware works unchanged.
Use this when origin-terminating H2 (service mesh, gRPC-Web, server push). For edge-terminated deployments behind CloudFront/Cloudflare, leave it off — the edge handles H2/H3 and talks H1 to your origin.
Two-phase graceful shutdown
The shutdown sequence is now drain-then-close, which is critical in K8s/LB environments:
new QuicoreExpressServer({
shutdown: {
drainDelayMs: 5_000, // pause so LB deregisters us
gracePeriodMs: 30_000, // then close, allow in-flight to finish
},
});Sequence on SIGTERM:
_draining = true,drainingevent fires,/readystarts returning 503- Wait
drainDelayMsfor LB to remove us from rotation server.close()— refuse new connectionscloseIdleConnections()(Node 18.2+) — drop idle keep-alive sockets- Wait up to
gracePeriodMsfor in-flight requests to complete closeAllConnections()— destroy survivors- Exit
The infra triggers this (K8s preStop hook + SIGTERM); the class handles the choreography.
Streaming body strategies (the FHIR/HL7 fix)
The old captureRaw buffered every request body in V8 heap. At 10MB payloads
× 100 concurrent requests = 1GB heap, OOM. v3 has three bounded alternatives:
import { rawBodyCapture, streamBodyToFile, streamBodyToHandler } from '@quicore/expressjs';
// 1. Bounded raw Buffer — for HMAC signature verification (small payloads)
server.setRoutes([{
method: 'post',
path: '/webhooks/stripe',
handlers: [rawBodyCapture({ limit: '256kb' }), verifyAndHandleWebhook],
}]);
// 2. Stream large bodies to a temp file — for FHIR Bundles / HL7 batches
server.setRoutes([{
method: 'post',
path: '/fhir/Bundle',
handlers: [streamBodyToFile({ limit: '500mb', dir: '/var/quicore/uploads' }), processBundle],
}]);
// In your handler: req.bodyFile = { path, size, contentType }
// File is auto-deleted on response finish.
// 3. Hand the raw stream to your handler — for direct forwarding to S3/Kafka
server.setRoutes([{
method: 'post',
path: '/hl7/batch',
handlers: streamBodyToHandler(async (req, res) => {
await pipeline(req, await createS3UploadStream(req));
res.status(202).json({ received: true });
}, { limit: '5gb' }),
}]);All three:
- Enforce hard size limits (413 on overflow, socket destroyed — no further bytes read)
- Skip if a body parser already consumed the body upstream
- Mount per-route (NOT globally) — the default JSON parser still handles your normal routes
captureRaw at the server level (bodyParser.json.captureRaw: true) is now off by default. It still works for backward compat, but the streaming middleware is the production answer.
What's deliberately NOT in scope
These are infra concerns, not server-class concerns:
- Clustering / worker_threads — Use K8s replicas, PM2, or
node --cluster. One process per container is the modern norm. - HTTP/3 / QUIC — Edge-terminate at Cloudflare/ALB. Node's QUIC support isn't production-ready.
- Distributed rate limiting — Use a sidecar (Envoy) or external store (Redis). The framework offers per-instance limits via plugins.
- TLS certificate rotation — Use cert-manager / certbot / your secrets manager. Restart the process to pick up new certs (K8s rolling restart handles this).
- Metrics scraping — Expose
server.metricsvia the optional/metricsendpoint or wire it to your Prometheus client. Not bundled.
Recommended config for 5–20k RPS per instance
new QuicoreExpressServer({
http: { port: 3000 }, // edge-terminate TLS
trustProxy: 1, // honor X-Forwarded-* from one proxy hop
timeouts: {
keepAlive: 65_000,
headers: 66_000,
request: 30_000,
},
loadShed: {
maxInFlight: 2000, // tune from observed p99 concurrency
maxEventLoopLagMs: 200,
maxConnections: 10_000,
},
bodyParser: {
json: { limit: '1mb' }, // captureRaw off; use streaming middleware for large payloads
},
shutdown: {
drainDelayMs: 10_000, // K8s preStop sleep should match
gracePeriodMs: 30_000, // K8s terminationGracePeriodSeconds should be > this
},
logLevel: 'info',
});K8s deployment companions (rough):
spec:
terminationGracePeriodSeconds: 45 # > drainDelayMs + gracePeriodMs
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["sleep", "10"] # matches drainDelayMs
readinessProbe:
httpGet: { path: /ready, port: 3000 }
periodSeconds: 2
livenessProbe:
httpGet: { path: /health, port: 3000 }
periodSeconds: 10