Failure detection and recovery
When we talk about failure detection and recovery, first we need to define what a failure is.
Failure
A failure occurs when any condition prevents a function from running to completion.
At the application level, a failure happens when a function throws an exception, returns an error, or rejects a promise.
At the platform level, a failure happens when the host executing the function crashes or becomes unresponsive.
Failure detection
Each Application Node written with a Resonate SDK has the ability to react to application level failures. The Resonate SDK listens for exceptions, errors, and rejected promises.
When your Application Node is configured with a Resonate Server as a supervisor, then the supervisor has the ability to detect and react to platform level failures.
Recovery
There are two levels of recovery:
- Application level, where a function throws an error or rejects a promise.
- Platform level, where the Application Node crashes or becomes unresponsive.
Application level
At the application level, if a failure is detected often the desired behavior is to retry as a means of recovery.
For example, in TypeScript, if you want a function to retry when it fails at its objective, then just throw an error from within the function:
async function download(ctx: Context, url: string): Promise<string> {
// ...
throw new Error("download failed");
// ...
}
Or, you can reject a promise:
async function download(ctx: Context, url: string): Promise<string> {
// ...
return new Promise((resolve, reject) => {
reject("download failed");
});
// ...
}
Platform level
At the platform level, if a failure is detected, often the desired behavior is to resume the function execution when the process comes back up, or have it resume on a different Application Node entirely. Thanks to Resonate's remote Durable Promise storage (Resonate Server), you can choose to do either.
The new function execution effectively resumes from the point of failure instead of restarting from the beginning.
Timeouts
Functions await on other functions through Durable Promises.
In Resonate HQ, timeouts are associated with promise resolution.
Resonate HQ attempts to resolve and retry durable promises until the specified timeout. If the timeout is reached, Resonate HQ marks the promise as failed.”
const resonate = new Resonate({
// Configures the default durable promise
// timeout in ms, used for every function
// executed by calling resonate.run.
// Defaults to 1000.
timeout: 5000,
});
It's crucial to ensure that the operations performed by your durable functions are idempotent to prevent undefined behavior.