@anterior/brrr
v0.1.251
Published
Horizontally scalable workflow scheduling with pluggable backends
Downloads
2,644
Keywords
Readme
Brrr: high performance workflow scheduling
Differences between Brrr and other workflow schedulers:
- Queue & database agnostic. Others lock you in to e.g. PostgreSQL, which inevitably becomes an unscalable point of failure.
- The queue & database provide stability & concurrency guarantees. Others tend to reinvent the wheel and reimplement a half-hearted queue on top of a database. Brrr lets your queue and DB do what they do best.
- Looks sequential & blocking. Your Python code looks like a simple linear function.
- Not actually blocking. You don’t need to lock up RAM in your fleet equivalent to the entire call graph’s execution stack. In other words: A Brrr fleet’s memory usage is O(fleet), not O(call graph).
- No logging, monitoring, error handling, or tracing. Brrr does one thing and one thing only: workflow scheduling. Bring Your Own Logging.
- No agent. Every worker connects directly to the underlying queue, jobs are scheduled by directly sending them to the queue. This allows massive parallelism: your only limit is your queue & DB capacity.
- No encoding choices: the only datatype seen by Brrr is "array of bytes". You must Bring Your Own Encoder.
- Dynamic call graph: no need to declare your dependencies in advance.
The lack of agent means that you can use Brrr with SQS & DynamoDB to scale basically as far as your wallet can stretch without any further config.
To summarize, these elements are not provided, and you must Bring Your Own:
- queue
- KV store
- logging
- tracing
- encoding
Brrr is a protocol that can be implemented in many languages. "It's just bytes on the wire."
Python
Brrr is a Python uv bundle which you can import and use directly.
See the brrr_demo.py file for a full demo.
Highlights:
import brrr
@brrr.handler
async def fib(app: brrr.ActiveWorker, n: int, salt=None):
match n:
case 0: return 0
case 1: return 1
case _: return sum(await app().gather(
app.call(fib)(n - 2),
app.call(fib)(n - 1),
))
@brrr.handler
async def fib_and_print(app: brrr.ActiveWorker, n: str):
f = await app.call(fib)(int(n))
print(f"fib({n}) = {f}", flush=True)
return f
@brrr.handler_no_arg
async def hello(greetee: str):
greeting = f"Hello, {greetee}!"
print(greeting, flush=True)
return greeting
async def amain():
queue, store, cache, codec = ...
async with brrr.serve(queue, store, cache) as conn:
app = brrr.AppWorker(
handlers=dict(fib=fib, hello=hello, fib_and_print=fib_and_print)),
codec=codec,
connection=conn
)
await conn.loop("demo", app.handle)Note: the .call(fib) calls don’t ever actually block for the execution of the underlying logic: the entire parent function instead is aborted and re-executed multiple times until all its inputs are available.
Benefit: your code looks intuitive.
Drawback: the call graph must be idempotent, meaning: for the same inputs, a task must always call the same sub-tasks with the same arguments. It is allowed to return a different result each time.
Development
There are two SDK implementations:
- Python, async
- TypeScript
A Nix devshell is provided for both languages which can be used for development and testing:
$ nix develop .#python # or
$ nix develop .#typescriptA generic Nix devshell is provided with some tools on the path but without uv2nix, for managing the Python packages or fixing uv / NPM if the lock files break somehow (e.g. git conflicts):
$ nix developDemo
Requires Nix, with flakes enabled.
You can start the full demo without installation:
$ nix run github:anteriorai/brrr#demoIn the process list, select the worker process so you can see its output. Now in another terminal:
$ curl -X POST 'http://localhost:8080/hello?greetee=John'You should see the worker print a greeting.
You can also run a Fibonacci job:
$ curl -X POST 'http://localhost:8080/calc_and_print?op=fib&n=11'Implementation and Trade-Offs
Brrr avoids having to "pause" any parent task while waiting on child tasks, by instead aborting the parent task entirely and retrying again later.
Fundamentally, that’s the MO. Most everything else flows from there.
- Your tasks must be safe for re-execution (this is a good idea anyway in distributed systems)
- Your tasks must be deterministic in the call graph. Tasks can dynamically specify their dependencies (by just calling them), but those must be the exact same dependencies with the exact same arguments every time the task is run with the same inputs.
Additionally, there is no agent. Brrr works on the assumption that you can bring two pieces of infrastructure, with corresponding interface implementation:
- A queue with topics (Redis implementation provided)
- A k/v store with CAS (DynamoDB implementation provided)
The guarantees offered by these implementations are surfaced to the application layer. If your k/v store is write-after-write consistent, your application will have a consistent view of the call graph. If your store is eventually consistent, you may get contradicting results about dependents’ return values when a task is re-executed. This trade-off is yours to make.
Finally, brrr has 0 or more delivery guarantee. To give brrr a once-or-more delivery, put it behind a job queue which has that capability. E.g.: SQS.
Using a queue with at-least-once delivery as the brrr queue itself is not enough to make the entire system at-least-once delivery: brrr does not carry around receipt handles for internal messages.
Topics
Brrr has no central scheduling agent. Workers contend for the same jobs and the only differentiation is through queue topics: e.g. you could have
memory-intensiveegress-accesstypescript
Workers have to specify on which topic(s) they listen, and once a worker listens on a topic it must be able to handle every incoming job.
Copyright & License
Brrr is authored by Anterior, based in NYC, USA.
We’re hiring! If you got this far, e-mail us at [email protected] and mention brrr.
The code is available under the AGPLv3 license (not later).
See the LICENSE file.
