Skip to main content

Failure detection and recovery

When we talk about failure detection and recovery, first we need to define what a failure is.

Failure

A failure occurs when any condition prevents a function from running to completion.

At the application level, a failure happens when a function throws an exception, returns an error, or rejects a promise.

At the platform level, a failure happens when the host executing the function crashes or becomes unresponsive.

Failure detection

Each Application Node written with a Resonate SDK has the ability to react to application level failures. The Resonate SDK listens for exceptions, errors, and rejected promises.

When your Application Node is configured with a Resonate Server as a supervisor, then the supervisor has the ability to detect and react to platform level failures.

Recovery

There are two levels of recovery:

  • Application level, where a function throws an error or rejects a promise.
  • Platform level, where the Application Node crashes or becomes unresponsive.

Application level

At the application level, if a failure is detected often the desired behavior is to retry as a means of recovery.

Local in memory promise storage with a retry

For example, in TypeScript, if you want a function to retry when it fails at its objective, then just throw an error from within the function:

async function download(ctx: Context, url: string): Promise<string> {
// ...
throw new Error("download failed");
// ...
}

Or, you can reject a promise:

async function download(ctx: Context, url: string): Promise<string> {
// ...
return new Promise((resolve, reject) => {
reject("download failed");
});
// ...
}

Platform level

At the platform level, if a failure is detected, often the desired behavior is to resume the function execution when the process comes back up, or have it resume on a different Application Node entirely. Thanks to Resonate's remote Durable Promise storage (Resonate Server), you can choose to do either.

Remote promise storage diagram with retries

The new function execution effectively resumes from the point of failure instead of restarting from the beginning.

Timeouts

Functions await on other functions through Durable Promises.

In Resonate, timeouts are associated with promise resolution.

Resonate attempts to resolve and retry durable promises until the specified timeout. If the timeout is reached, Resonate marks the promise as failed.”

const resonate = new Resonate({
// Configures the default durable promise
// timeout in ms, used for every function
// executed by calling resonate.run.
// Defaults to 1000.
timeout: 5000,
});

It's crucial to ensure that the operations performed by your durable functions are idempotent to prevent undefined behavior.