Failure detection and recovery
What is failure in the context of Resonate?
When we talk about failure detection and recovery, first we need to define what a failure is.
Failureโ
A failure occurs when any condition prevents a function from running to completion.
At the application level, a failure happens when a function throws an exception, returns an error, or rejects a promise.
At the platform level, a failure happens when the host executing the function crashes or becomes unresponsive.
Failure detectionโ
Each Application Node written with a Resonate SDK has the ability to react to application level failures. The Resonate SDK listens for exceptions, errors, and rejected promises.
When your Application Node is configured with a Resonate Server as a supervisor, then the supervisor has the ability to detect and react to platform level failures.
Recoveryโ
What is recovery in the context of Resonate?
There are two levels of recovery:
- Application level, where a function throws an error or rejects a promise.
- Platform level, where the Application Node crashes or becomes unresponsive.
Application levelโ
At the application level, if a failure is detected (a function returns an error or an exception) often the desired behavior is to retry as a means of recovery.
Resonate automatically retries function executions if they fail, per the specified Retry Policy, and continues to do so until the Durable Promise Timeout is reached.
For examples of how to trigger a retry of an execution, see the following feature guidance:
See the section below, Retry Policies, for more information on how to configure retries.
Platform levelโ
At the platform level, if a failure is detected, often the desired behavior is to resume the function execution when the process comes back up, or have it resume on a different Application Node entirely. Thanks to Resonate's remote Durable Promise storage (Resonate Server), you can choose to do either.
The following diagram illustrates how a function execution resumes via restart on the same Application Node after the process crashes.
Resonate replays the function execution, using the results from already completed promises to effectively resume from the point of failure.
Retry Policiesโ
What is a Retry Policy?
A Retry Policy is a configuration that determines how a function should be retried if it fails.
Resonate offers several types of Retry Policies that can be further refined by configuring the number of retries and the delay between retries.
Constantโ
A Constant Retry Policy is one where the delay between retries is constant. This can be further refined to include a maximum number of retries.
Exponentialโ
An Exponential Retry Policy is one where the delay between retries grows exponentially. This can be further configured and refined by defining a base delay, the exponential factor, and a maximum number of retries.
Linearโ
A Linear Retry Policy is one where the delay between retries grows linearly. This can be further configured and refined by defining the linear delay growth and a maximum number of retries.