@choubo/ordosdk

v0.4.0

Published

9 days ago

ordo task scheduler Node.js worker SDK (zero-dep, ESM)

Downloads

340

0High
0Medium
0Low

choubo

ordo task-scheduler worker sdk

ordo — Node.js SDK

Zero-dependency worker SDK for the ordo task scheduler. Node 18+ (uses global fetch).

Quick start

import { Client, Success, Fail, BUSINESS_FAILURE } from './ordo.mjs';

const client = new Client({
  baseUrl:  'http://localhost:8080',
  token:    'devtoken',
  domain:   'demo',
  workerId: 'node-1',
});

client.register('greet', async (run) => {
  if (run.shouldBail()) return Fail(BUSINESS_FAILURE, 'cancelled by operator');
  return Success({ hello: run.payload.who ?? 'anon' });
});

await client.run();    // blocks; Ctrl+C for clean shutdown

See example_worker.mjs for a runnable demo.

Two ways to use

The SDK is split into a high-level loop and a set of low-level primitives. Pick one — they share the same Client instance, but you should not mix them on the same client (don't call claim() while run() is running).

1. High-level: `Client.run()`

The default. You register() handlers and call run(); the SDK owns the claim loop, heartbeats, the 409 abort signal, and graceful SIGINT/SIGTERM. Suitable for the common "one worker process, one slot" shape and what the quick-start example shows.

2. Low-level: `claim` / `heartbeat` / `report` (+ `createRun`)

For multi-slot pools, custom drain semantics, integration with an external abort bus, or any other case where the high-level loop is too opinionated. Each method maps 1:1 to one HTTP endpoint:

const claimed = await client.claim({ signal: drainSignal });
if (claimed) {
  // …handle claimed.run on your own slot. Heartbeat at <= leaseSeconds/4:
  const t = setInterval(async () => {
    const { alive, cancelRequested } = await client.heartbeat(
      claimed.run.id, claimed.lease_token, claimed.fencing_token);
    if (!alive) abortYourHandler();         // D4: abort + drop /result
  }, (client.leaseSeconds * 1000) / client.heartbeatRatio);
  // …on completion:
  clearInterval(t);
  await client.report(claimed.run.id, claimed.lease_token, claimed.fencing_token,
                      Success({ ok: true }));
}

Caller responsibilities when bypassing run():

Fencing. Every heartbeat() and report() call MUST carry the exact fencing_token from claim(). Never compute or cache it across claims.
Cadence. Heartbeat at <= leaseSeconds / heartbeatRatio (default 30s with 120s lease). One missed cadence + a brief network blip and the sweeper reclaims you.
409 = abort. When heartbeat() returns { alive: false }, abort in-flight handler work immediately and DROP report() — the server would 409 it on fencing anyway, but the side effects you started have already happened.
Drain. claim({ signal }) accepts an external AbortSignal so a long poll can be interrupted on shutdown without waiting for wait_seconds to elapse.

A handler that wants to fan out follow-up runs from inside its own slot:

await client.createRun({
  definition_id: 42,
  payload: { parent_run_id: run.id, /* …work-specific input… */ },
});

This is just POST /api/v1/runs. The scheduler stays domain-agnostic (D7/D8) — parent/child linkage lives in your payload, not in the scheduler's schema.

A handler that wants to leave a breadcrumb in its own audit stream:

client.register('charge', async (run) => {
  await run.progress('step1: fetched invoice', { invoice_id: 42 });
  // ...do work...
  await run.progress('step2: charged');
  return Success({ ok: true });
});

run.progress(message, attributes?) is fire-and-forget — any HTTP error is logged at WARN and swallowed; a failed checkpoint write is never a reason to fail the run. Persisted as a progress row in task_run_events. Server does NOT throttle, so call at milestones (≤ ~1Hz typical), not in tight loops.

Contract (mirrors sdk/go/client Go SDK)

Long-poll claim → POST /api/v1/claim with wait_seconds=20.
Heartbeat every leaseSeconds / 4 in the background (default leaseSeconds=120).
requestTimeoutMs (Config, default 10_000) caps the short calls (heartbeat / report / progress / createRun). Per-call opts.timeoutMs overrides. claim() is independent — its timeout is derived from waitSeconds + 10.
Every /heartbeat + /result carries fencing_token.
/heartbeat 409 → lease lost. The SDK calls AbortController.abort() on run.signal; the handler can await fetch/timers wired against it for instant cancel. /result is not sent.
run.shouldBail() — 0 RTT; true on cancel_requested OR lease loss (the abort-signal aborted). Use run.cancelRequested() / run.leaseLost() if you need to distinguish.
run.signal — a real AbortSignal. Pass to fetch({ signal }) or any AbortSignal-aware API for fully interruptible I/O.
Handler throw → Fail(HANDLER_BUG, msg) reported.
SIGINT / SIGTERM → stop the claim loop after the in-flight run finishes.

`shouldBail()` checkpoint pattern

client.register('charge', async (run) => {
  // run.signal makes I/O interruptible: fetch will reject with AbortError
  // the moment the lease is lost, so the irreversible call below is never
  // entered after a 409.
  const invoice = await fetch(`/invoices/${run.payload.id}`, { signal: run.signal })
                          .then((r) => r.json());

  if (run.shouldBail()) return Fail(BUSINESS_FAILURE, 'cancelled');

  await chargeCard(invoice);                // irreversible
  return Success({ charged: invoice.amount });
});

Fencing is the authoritative defense — if cancel arrives between the check and the charge, /result will 409 and the worker drops the run. shouldBail() is the "bail early, unwind cleanly" lever; run.signal is the "kill in-flight I/O the moment the lease is gone" lever.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme