Skip to main content

Crash failure recovery

Crash failure recovery is the ability for an application to seamlessly continue its workflows after an Application Node crashes. This is also known as Durable Execution. This feature is critical in distributed systems where disruptions can halt operations or lead to unintended consequences.

When an Application Node crashes mid-flow, it creates several challenges:

  • The node must be restarted, often requiring a supervisor to detect the crash and handle the recovery process.
  • Flows may have already initiated side effects (e.g., database writes or API calls), which must not be duplicated if they have irreversible consequences.
  • In multi-node environments, reassigning the execution to another node introduces additional complexity.

Resonate addresses these challenges with built-in crash recovery mechanisms. Imagine a flow involving a series of steps:

fn () {
await step1(); // Creates a side effect
await step2(); // Application Node crashes mid-step
await step3();
}
  • For single-node setups, Resonate restarts the function execution automatically as soon as the process is back online.
  • For multi-node environments, Resonate can reassign the execution to another Application Node, ensuring the flow resumes seamlessly without duplicating the side effect from step 1.

This process is made reliable by Resonate’s Durable Promises, which guarantee that side effects are not duplicated unless explicitly allowed. Developers can focus on application logic, knowing that Resonate handles recovery and consistency behind the scenes.

When you use Resonate to build applications, each Application Node in the Call Graph benefits from the ability to recover after a process crash.

If the process crashes, the Call Graph can resume on a completely different Application Node, or on the same Application Node once the process restarts.

For examples of how to connect to a Resonate Server from an Application Node, to enable Call Graph recovery see the following feature guidance:

For a deep dive into the different storage modes, see Durable Promises. For a deep dive into the Call Graph, see Call Graph.