Multi-agent orchestration with durable handoffs
An orchestrator chains three specialist agents — researcher, writer, reviewer — with each handoff checkpointed.
Three specialist agents — researcher, writer, reviewer — chained through an orchestrator. Each agent's output is a durable checkpoint, so a flaky LLM call mid-pipeline retries only the failing agent and the prior outputs stay cached.
TypeScript: @resonatehq/sdk v0.10.1 (current). Python: resonate-sdk v0.6.x against the legacy Resonate Server. Rust example repo is forthcoming.
Researcher → writer → reviewer pipeline with crash-recoverable handoffs.
Three-agent OpenAI pipeline with durable handoffs — failed agents retry, prior outputs stay cached.
The problem#
Multi-agent pipelines look elegant on a whiteboard and brittle in production. Each LLM call is a roll of the dice — rate limits, transient errors, model regressions. If the third agent fails, naive implementations re-run the whole pipeline, paying for the first two LLM calls again. Add a human-approval step and the pipeline now needs to hold state across hours or days, which is its own coordination problem.
Resonate's solution#
Each agent invocation is a ctx.run call. The orchestrator function chains them in sequence; Resonate checkpoints every output. If the writer fails, only the writer retries — the researcher's findings are reused from the durable promise. Add a ctx.promise() between the reviewer and the publish step and you get human-in-the-loop suspension that survives process restarts.
Code walkthrough#
The orchestrator is one generator function that calls three plain agent functions through ctx.run. The agents themselves are unremarkable — what matters is that Resonate sits between them.
import type { Context } from "@resonatehq/sdk";
import { researcher, writer, reviewer } from "./agents";
export function* orchestrate(
ctx: Context,
topic: string,
crashOnWriter: boolean,
): Generator<any, OrchestrationResult, any> {
// 1. Researcher gathers findings on the topic.
const findings = yield* ctx.run(researcher, topic);
// 2. Writer produces a draft from those findings. Checkpointed —
// if the writer crashes, the researcher's findings are NOT re-run.
const draft = yield* ctx.run(writer, topic, findings, crashOnWriter);
// 3. Reviewer checks the draft quality.
const review = yield* ctx.run(reviewer, draft);
const approved = review.toUpperCase().includes("APPROVED");
return {
status: approved ? "published" : "rejected",
topic, findings, draft, review,
};
}from resonate import Context, Resonate
resonate = Resonate()
@resonate.register
def orchestrate(ctx: Context, topic: str, crash_on_writer: bool = False):
# 1. Researcher gathers findings on the topic.
findings = yield ctx.run(researcher, topic)
# 2. Writer produces a draft from those findings. Checkpointed —
# if the writer crashes, the researcher's findings are NOT re-run.
draft = yield ctx.run(writer, topic, findings, crash_on_writer)
# 3. Reviewer checks the draft quality.
review = yield ctx.run(reviewer, draft)
approved = "APPROVED" in review.upper()
return {
"status": "published" if approved else "rejected",
"topic": topic, "findings": findings, "draft": draft, "review": review,
}The OpenAI client is injected into the workflow via resonate.set_dependency("openai", OpenAI(...)) and accessed inside each agent with ctx.get_dependency("openai") — keeps the agent functions free of global imports and easy to test.
The crash argument is a demo lever — set it to true and the writer throws on its first attempt. Watch the logs: the researcher runs once, the writer retries until it succeeds, the reviewer runs once. Without checkpoints, the researcher would re-run on every writer retry.
To extend the pipeline with human approval, replace the inline approved calculation with a durable promise:
// Suspend until an external system (HTTP handler, CLI, button click)
// resolves the promise.
const approved = yield* ctx.promise<boolean>({ id: `approval/${topic}` });# Suspend until an external system resolves the promise.
approval = yield ctx.promise(id=f"approval/{topic}")
approved = yield approvalThat promise survives process restarts — the orchestrator can be killed and the workflow waits in the server's promise store until a human responds.
Run it locally#
git clone https://github.com/resonatehq-examples/example-multi-agent-orchestration-ts
cd example-multi-agent-orchestration-ts
bun installRun the happy path:
bun run src/index.tsRun with the writer crash flag — the researcher runs once, the writer retries:
bun run src/index.ts -- --crashSet up the repo and your OpenAI key:
git clone https://github.com/resonatehq-examples/example-multi-agent-orchestration-py
cd example-multi-agent-orchestration-py
uv sync
cp .env.example .env # add your OPENAI_API_KEYbrew install resonatehq/tap/resonate
resonate serveuv run python -m src.agentresonate invoke orchestration.1 \
--func orchestrate \
--arg '"The future of durable execution in AI applications"' \
--arg falseSet the second --arg to true to trigger the writer-crash demo. The researcher's "Researching" log only appears once across the retry sequence.
Watch the logs across the retry sequence — the researcher's "gathering findings" message only appears once.
Try the human-in-the-loop extension#
Swap the inline approved check for ctx.promise() and add a small HTTP server (or use the Resonate CLI) to resolve the promise. Now the orchestrator suspends after the reviewer step until a human decides whether to publish — for hours, days, or weeks. Worker crashes during the suspension don't lose progress; restart and the workflow is still waiting for the same promise.
Related#
- Deep research agent — recursive agent dispatch with parallel subagents.
- Human-in-the-loop — the suspension primitive in detail.