Load balancing across worker instances
Run multiple workers in a group; Resonate dispatches work to whichever is available, recovers it when one dies.
Run several workers in the same group. Calls to target: "poll://any@<group>" get dispatched to whichever worker claims them first. When a worker dies mid-execution, Resonate reassigns the work to a survivor — no service registry, no leader election, no glue.
TypeScript: @resonatehq/sdk v0.10.1 (current). Python: resonate-sdk v0.6.x against the legacy Resonate Server. Rust: 0.4.0, in active development.
Worker group with random-cost compute jobs dispatched via async RPC.
Worker group with random-cost compute jobs dispatched via async RPC.
Worker group with random-cost compute jobs dispatched via async RPC + spawn().
The problem#
A single worker eventually runs out of capacity, and a single worker is also a single point of failure. The textbook fix is to run several — but that opens its own can: which worker has spare capacity, how does the caller find one, what happens when the chosen worker dies mid-job, who takes the work over?
Most teams end up bolting service discovery, load balancing, and recovery onto application code in three different places, each with its own bugs.
Resonate's solution#
Resonate ships service discovery, load balancing, and crash recovery behind one primitive: the target schema. Workers in the same group long-poll the server; the caller dispatches with target: "poll://any@<group>" and the server hands the work to whichever worker is ready. If that worker dies before completing, the workflow's durable promise stays open and another worker in the group picks it up.
Code walkthrough#
Two pieces: a worker that registers a durable function and joins a group, and a client that dispatches work to that group.
The worker group#
Each worker process is identical except for the group it joins. Run as many as you want — they share work automatically.
import { Resonate } from "@resonatehq/sdk";
import type { Context } from "@resonatehq/sdk";
const resonate = new Resonate({
url: "http://localhost:8001",
group: "workers",
});
function computeSomething(ctx: Context, args: { id: string; computeCost: number }) {
console.log(`${args.id} starting computation`);
setTimeout(() => {
console.log(`${args.id} computed something that cost ${args.computeCost} seconds`);
}, args.computeCost * 1000);
}
resonate.register("computeSomething", computeSomething);
console.log("worker is running...");from resonate import Resonate
from threading import Event
import time
resonate = Resonate.remote(group="worker-group")
@resonate.register
def compute_something(_, id, compute_cost):
print(f"starting computation {id}")
time.sleep(compute_cost)
print(f"computed something that cost {compute_cost} seconds")
resonate.start()
print("worker running...")
Event().wait()use resonate::prelude::*;
use std::time::Duration;
#[resonate::function]
async fn compute_something(ctx: &Context, id: String, compute_cost: u64) -> Result<()> {
println!("{id} starting computation");
ctx.sleep(Duration::from_secs(compute_cost)).await?;
println!("{id} computed something that cost {compute_cost} seconds");
Ok(())
}
#[tokio::main]
async fn main() {
let resonate = Resonate::new(ResonateConfig {
url: Some("http://localhost:8001".into()),
group: Some("workers".into()),
..Default::default()
});
resonate.register(compute_something).unwrap();
println!("worker is running...");
tokio::signal::ctrl_c().await.unwrap();
}Dispatching to the group#
The caller picks a target with the poll://any@<group> schema. any means "whichever worker in the group claims it first."
import { Resonate } from "@resonatehq/sdk";
import { v4 as uuid } from "uuid";
const resonate = new Resonate({
url: "http://localhost:8001",
group: "client",
});
const id = uuid();
const computeCost = Math.floor(Math.random() * 10) + 1;
await resonate.beginRpc(
id,
"computeSomething",
{ id, computeCost },
resonate.options({ target: "poll://any@workers" }),
);
resonate.stop();from resonate import Resonate
from uuid import uuid4
from random import randint
resonate = Resonate.remote(group="invoke-group")
promise_id = str(uuid4())
compute_cost = randint(1, 10)
_ = resonate.options(target="poll://any@worker-group").begin_rpc(
promise_id, "compute_something", promise_id, compute_cost,
)use rand::Rng;
use resonate::prelude::*;
use uuid::Uuid;
#[tokio::main]
async fn main() {
let resonate = Resonate::new(ResonateConfig {
url: Some("http://localhost:8001".into()),
..Default::default()
});
let id = Uuid::new_v4().to_string();
let cost: u64 = rand::thread_rng().gen_range(1..=10);
let _: () = resonate
.rpc(&id, "compute_something", (id.clone(), cost))
.target("poll://any@workers")
.spawn()
.await
.unwrap();
resonate.stop().await;
}Run it locally#
Start the server, run several workers, then dispatch repeatedly from the client.
git clone https://github.com/resonatehq-examples/example-load-balancing-ts
cd example-load-balancing-ts
npm installbrew install resonatehq/tap/resonate
resonate devnpx tsx worker.tsfor i in 1 2 3 4 5 6; do npx tsx client.ts; doneWatch the work spread across the three worker terminals. Now kill one of them mid-execution — Resonate reassigns its in-flight workflow to a survivor.
git clone https://github.com/resonatehq-examples/example-load-balancing-py
cd example-load-balancing-py
uv syncbrew install resonatehq/tap/resonate
resonate serveuv run python worker.pyfor i in 1 2 3 4 5 6; do uv run python invoke.py; doneWatch the work spread across the three worker terminals. Kill one of them mid-execution — Resonate reassigns its in-flight workflow to a survivor.
git clone https://github.com/resonatehq-examples/example-load-balancing-rs
cd example-load-balancing-rs
cargo buildbrew install resonatehq/tap/resonate
resonate devcargo run --bin workerfor i in 1 2 3 4 5 6; do cargo run --bin client; doneTry the recovery story#
Start three workers and dispatch enough jobs to keep all of them busy. Kill the worker holding a long-running job. The Resonate Server detects the loss, reassigns the workflow's durable promise to one of the survivors, and the work continues. The client never sees an error — it just gets a slightly delayed result.
Related#
- Human-in-the-loop — same group dispatch, with a workflow that suspends on a durable promise.
- Async HTTP API endpoints — long-running HTTP work without holding the connection.