Load balancing across worker instances

Run multiple workers in a group; Resonate dispatches work to whichever is available, recovers it when one dies.

Load Balancing banner

Run several workers in the same group. Calls to target: "poll://any@<group>" get dispatched to whichever worker claims them first. When a worker dies mid-execution, Resonate reassigns the work to a survivor — no service registry, no leader election, no glue.

SDK versions

TypeScript: @resonatehq/sdk v0.10.2 (current). Python: resonate-sdk v0.6.x against the legacy Resonate Server. Rust: 0.4.0, in active development. Go: pre-release — no semver tag yet, so the example pins a specific commit (see Go SDK).

The problem#

A single worker eventually runs out of capacity, and a single worker is also a single point of failure. The textbook fix is to run several — but that opens its own can: which worker has spare capacity, how does the caller find one, what happens when the chosen worker dies mid-job, who takes the work over?

Most teams end up bolting service discovery, load balancing, and recovery onto application code in three different places, each with its own bugs.

Resonate's solution#

Resonate ships service discovery, load balancing, and crash recovery behind one primitive: the target schema. Workers in the same group long-poll the server; the caller dispatches with target: "poll://any@<group>" and the server hands the work to whichever worker is ready. If that worker dies before completing, the workflow's durable promise stays open and another worker in the group picks it up.

Code walkthrough#

Two pieces: a worker that registers a durable function and joins a group, and a client that dispatches work to that group.

The worker group#

Each worker process is identical except for the group it joins. Run as many as you want — they share work automatically.

worker.ts·typescript
import { Resonate } from "@resonatehq/sdk";
import type { Context } from "@resonatehq/sdk";

const resonate = new Resonate({
  url: "http://localhost:8001",
  group: "workers",
});

function computeSomething(ctx: Context, args: { id: string; computeCost: number }) {
  console.log(`${args.id} starting computation`);
  setTimeout(() => {
    console.log(`${args.id} computed something that cost ${args.computeCost} seconds`);
  }, args.computeCost * 1000);
}

resonate.register("computeSomething", computeSomething);
console.log("worker is running...");

Dispatching to the group#

The caller picks a target with the poll://any@<group> schema. any means "whichever worker in the group claims it first."

client.ts·typescript
import { Resonate } from "@resonatehq/sdk";
import { v4 as uuid } from "uuid";

const resonate = new Resonate({
  url: "http://localhost:8001",
  group: "client",
});

const id = uuid();
const computeCost = Math.floor(Math.random() * 10) + 1;
await resonate.beginRpc(
  id,
  "computeSomething",
  { id, computeCost },
  resonate.options({ target: "poll://any@workers" }),
);
await resonate.stop();

Run it locally#

Start the server, run several workers, then dispatch repeatedly from the client.

shell
git clone https://github.com/resonatehq-examples/example-load-balancing-ts
cd example-load-balancing-ts
npm install
Terminal 1Resonate Server·shell
brew install resonatehq/tap/resonate
resonate dev
Terminals 2–4three workers·shell
npx tsx worker.ts
Terminal 5dispatch in a loop·shell
for i in 1 2 3 4 5 6; do npx tsx client.ts; done

Watch the work spread across the three worker terminals. Now kill one of them mid-execution — Resonate reassigns its in-flight workflow to a survivor.

Try the recovery story#

Start three workers and dispatch enough jobs to keep all of them busy. Kill the worker holding a long-running job. The Resonate Server detects the loss, reassigns the workflow's durable promise to one of the survivors, and the work continues. The client never sees an error — it just gets a slightly delayed result.

The TypeScript, Python, and Rust examples run each worker as its own process, so you can Ctrl-C a single one to watch this happen. The Go example runs its workers inside one process, so it demonstrates the distribution but not single-worker recovery — recursive factorial shows the Go crash-recovery story with a separate worker process.