Building resilient .NET applications with Polly - Part 5

In this post, we will explore what a retry strategy entails and delve into the implementation of such a strategy using Polly.

What are transient failures ?

Transient failures are temporary and typically short-lived errors or issues that occur in a system but are not indicative of a permanent problem. These failures are often transient in nature, meaning they may resolve themselves after a brief period or with a subsequent attempt. Common examples of transient failures include temporary network issues, intermittent service unavailability, or momentary resource constraints.

In the context of distributed systems, where various components communicate over a network, transient failures can be more prevalent. These failures are often unpredictable and can occur due to factors such as network congestion, temporary server unavailability, or brief spikes in resource utilization.

What is a retry strategy ?

A retry strategy is a mechanism employed to automatically reattempt an operation that has initially failed. This approach involves making multiple consecutive attempts to execute the same operation with the expectation that subsequent attempts might succeed, especially in cases where the failure is transient or due to intermittent issues. Retry strategies aim to improve the resilience and reliability of applications by providing a mechanism to recover from transient failures without manual intervention.

In the context of Polly, a retry strategy involves defining a strategy that specifies the conditions under which retries should occur, the maximum number of retries, and the duration between consecutive retry attempts. This can be particularly useful in handling transient faults, network glitches, or other intermittent issues that may cause an operation to fail temporarily.

Simulating transient errors

  • Edit the FaultyService class.
 1public class FaultyService
 2{
 3    // ...
 4
 5    [FunctionName(nameof(GetWithTransientFailures))]
 6    public async Task<IActionResult> GetWithTransientFailures([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)] HttpRequest req, ILogger log)
 7    {
 8        var counter = CounterSingleton.Instance.Increment();
 9
10        if (counter % 3 == 1) return new InternalServerErrorResult();
11        else return new OkResult();
12    }
13}
14
15public class CounterSingleton
16{
17    private static CounterSingleton _instance;
18    private int _index = 0;
19
20    private CounterSingleton() { }
21
22    public static CounterSingleton Instance
23    {
24        get
25        {
26            if (_instance == null) _instance = new CounterSingleton();
27            return _instance;
28        }
29    }
30
31    public int Increment()
32    {
33        _index++;
34        return _index;
35    }
36}

Here, we are simulating transient failures by intentionally triggering a bad request once out of every three attempts. This specific operation is executed using a counter implemented as a singleton.

  • Edit the CallingService class in order to call this method.
 1public class CallingService
 2{
 3    public CallingService()
 4    {
 5    }
 6
 7    [FunctionName(nameof(GetAccountById02))]
 8    public async Task<IActionResult> GetAccountById02([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)] HttpRequest req, ILogger log)
 9    {
10        var client = new HttpClient();
11        var response = await client.GetAsync("http://localhost:7271/api/GetWithTransientFailures").ConfigureAwait(false);
12
13        return response.IsSuccessStatusCode ? new OkResult() : new InternalServerErrorResult();
14    }
15}

This request returns an error 500 in the event of an error and a typical error 200 when everything proceeds as expected.

Upon executing this request via Fiddler, we can indeed observe that an error occurs once out of every three attempts.

Information

In our specific case, our code is deterministic, meaning errors are not truly transient. In real-life scenarios, such errors would typically manifest randomly. However, we employ this simulation for illustrative purposes.

Implementing a retry strategy with Polly

Wait and retry strategies are commonly employed to handle transient errors, and Polly provides a convenient means to implement such strategies effortlessly.

  • Edit the CallingService class to implement a retry strategy.
 1[FunctionName(nameof(GetAccountById02))]
 2public async Task<IActionResult> GetAccountById02([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)] HttpRequest req, ILogger log)
 3{
 4    var options = new RetryStrategyOptions<HttpResponseMessage>()
 5    {
 6        Delay = TimeSpan.Zero,
 7        MaxRetryAttempts = 3,
 8        ShouldHandle = new PredicateBuilder<HttpResponseMessage>().HandleResult(response => response.StatusCode == HttpStatusCode.InternalServerError),
 9    };
10
11    var pipeline = new ResiliencePipelineBuilder<HttpResponseMessage>().AddRetry(options).Build();
12
13    // Execute the pipeline asynchronously
14    var response = await pipeline.ExecuteAsync(async token =>
15    {
16        var client = new HttpClient();
17        return await client.GetAsync("http://localhost:7271/api/GetWithTransientFailures", token);
18    });
19
20    return response.IsSuccessStatusCode ? new OkResult() : new BadRequestResult();
21}

Here, we implement a retry strategy with a maximum of 3 attempts. This implies that if a transient failure occurs, the calling service will not promptly return an error; instead, it will reattempt processing the request at least 3 times.

Upon executing this request via Fiddler, we can now observe that these transient failures are effectively handled.

Information

Additional configuration options are available to customize the retry strategy, including settings for the maximum number of attempts and the delay between two retry attempts. For more comprehensive details, please refer to the documentation.

The retry strategy does have a drawback: if the remote service is entirely down, we will still attempt to reach it, consuming threads in the calling function. A resilient strategy must consider this and implement a circuit breaker. We will explore how to quickly implement it with Polly.

Building resilient applications with Polly - Part 6