Multi-agent orchestration with durable handoffs

An orchestrator chains three specialist agents — researcher, writer, reviewer — with each handoff checkpointed.

Three specialist agents — researcher, writer, reviewer — chained through an orchestrator. Each agent's output is a durable checkpoint, so a flaky LLM call mid-pipeline retries only the failing agent and the prior outputs stay cached.

SDK versions

TypeScript: @resonatehq/sdk v0.10.1 (current). Python: resonate-sdk v0.6.x against the legacy Resonate Server. Rust example repo is forthcoming.

example-multi-agent-orchestration-tsTypeScript

Researcher → writer → reviewer pipeline with crash-recoverable handoffs.

example-multi-agent-orchestration-pyPython

Three-agent OpenAI pipeline with durable handoffs — failed agents retry, prior outputs stay cached.

The problem#

Multi-agent pipelines look elegant on a whiteboard and brittle in production. Each LLM call is a roll of the dice — rate limits, transient errors, model regressions. If the third agent fails, naive implementations re-run the whole pipeline, paying for the first two LLM calls again. Add a human-approval step and the pipeline now needs to hold state across hours or days, which is its own coordination problem.

Resonate's solution#

Each agent invocation is a ctx.run call. The orchestrator function chains them in sequence; Resonate checkpoints every output. If the writer fails, only the writer retries — the researcher's findings are reused from the durable promise. Add a ctx.promise() between the reviewer and the publish step and you get human-in-the-loop suspension that survives process restarts.

Code walkthrough#

The orchestrator is one generator function that calls three plain agent functions through ctx.run. The agents themselves are unremarkable — what matters is that Resonate sits between them.

src/workflow.ts

import type { Context } from "@resonatehq/sdk";
import { researcher, writer, reviewer } from "./agents";

export function* orchestrate(
  ctx: Context,
  topic: string,
  crashOnWriter: boolean,
): Generator<any, OrchestrationResult, any> {
  // 1. Researcher gathers findings on the topic.
  const findings = yield* ctx.run(researcher, topic);

  // 2. Writer produces a draft from those findings. Checkpointed —
  //    if the writer crashes, the researcher's findings are NOT re-run.
  const draft = yield* ctx.run(writer, topic, findings, crashOnWriter);

  // 3. Reviewer checks the draft quality.
  const review = yield* ctx.run(reviewer, draft);

  const approved = review.toUpperCase().includes("APPROVED");
  return {
    status: approved ? "published" : "rejected",
    topic, findings, draft, review,
  };
}

src/agent.py

from resonate import Context, Resonate

resonate = Resonate()


@resonate.register
def orchestrate(ctx: Context, topic: str, crash_on_writer: bool = False):
    # 1. Researcher gathers findings on the topic.
    findings = yield ctx.run(researcher, topic)

    # 2. Writer produces a draft from those findings. Checkpointed —
    #    if the writer crashes, the researcher's findings are NOT re-run.
    draft = yield ctx.run(writer, topic, findings, crash_on_writer)

    # 3. Reviewer checks the draft quality.
    review = yield ctx.run(reviewer, draft)

    approved = "APPROVED" in review.upper()
    return {
        "status": "published" if approved else "rejected",
        "topic": topic, "findings": findings, "draft": draft, "review": review,
    }

The OpenAI client is injected into the workflow via resonate.set_dependency("openai", OpenAI(...)) and accessed inside each agent with ctx.get_dependency("openai") — keeps the agent functions free of global imports and easy to test.

The crash argument is a demo lever — set it to true and the writer throws on its first attempt. Watch the logs: the researcher runs once, the writer retries until it succeeds, the reviewer runs once. Without checkpoints, the researcher would re-run on every writer retry.

To extend the pipeline with human approval, replace the inline approved calculation with a durable promise:

code

// Suspend until an external system (HTTP handler, CLI, button click)
// resolves the promise. ctx.promise() creates a child promise whose ID
// is `${ctx.id}.<seq>` — deterministic from the orchestrator's execution ID.
const approvalPromise = yield* ctx.promise<boolean>();
const approved = yield* approvalPromise;

code

# Suspend until an external system resolves the promise.
approval = yield ctx.promise(id=f"approval/{topic}")
approved = yield approval

That promise survives process restarts — the orchestrator can be killed and the workflow waits in the server's promise store until a human responds.

Run it locally#

code

git clone https://github.com/resonatehq-examples/example-multi-agent-orchestration-ts
cd example-multi-agent-orchestration-ts
bun install

Run the happy path:

code

bun run src/index.ts

Run with the writer crash flag — the researcher runs once, the writer retries:

code

bun run src/index.ts -- --crash

Set up the repo and your OpenAI key:

code

git clone https://github.com/resonatehq-examples/example-multi-agent-orchestration-py
cd example-multi-agent-orchestration-py
uv sync
cp .env.example .env  # add your OPENAI_API_KEY

Terminal 1 — Resonate Server (legacy)

brew install resonatehq/tap/resonate
resonate serve

Terminal 2 — worker

uv run python -m src.agent

Terminal 3 — invoke

resonate invoke orchestration.1 \
  --func orchestrate \
  --arg '"The future of durable execution in AI applications"' \
  --arg false

Set the second --arg to true to trigger the writer-crash demo. The researcher's "Researching" log only appears once across the retry sequence.

Watch the logs across the retry sequence — the researcher's "gathering findings" message only appears once.

Try the human-in-the-loop extension#

Swap the inline approved check for ctx.promise() and add a small HTTP server (or use the Resonate CLI) to resolve the promise. Now the orchestrator suspends after the reviewer step until a human decides whether to publish — for hours, days, or weeks. Worker crashes during the suspension don't lose progress; restart and the workflow is still waiting for the same promise.

Deep research agent — recursive agent dispatch with parallel subagents.
Human-in-the-loop — the suspension primitive in detail.

Multi-agent orchestration with durable handoffs

The problem#

Resonate's solution#

Code walkthrough#

Run it locally#

Try the human-in-the-loop extension#

Related#