Retry with Exponential Backoff and Jitter
How do I implement retry with exponential backoff and jitter?
TL;DR
- Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use
delay = min(cap, base * 2^attempt) * random()for optimal load distribution on failing services. - Key tool/command:
delay = min(cap, base * 2^attempt) * random(0, 1) - Watch out for: Retrying non-idempotent operations without an idempotency key -- this causes duplicate writes, double charges, and data corruption.
- Works with: Any language, any protocol. Used by AWS SDKs, Google Cloud client libraries, gRPC, Polly (.NET), Tenacity (Python).
Constraints
- NEVER retry non-idempotent operations (POST creating resources, financial transactions) without an idempotency key [src2]
- ALWAYS set a maximum retry count (3-5 attempts) -- infinite retries cause cascading failures and resource exhaustion [src6]
- ALWAYS add jitter to backoff delays -- pure exponential backoff without jitter causes synchronized retry storms (thundering herd) [src1]
- NEVER retry 4xx client errors (400, 401, 403, 404) -- these are permanent failures. Only retry 429 (rate limit) and 5xx (server error) [src3]
- Cap maximum delay at 30-60 seconds -- unbounded exponential growth wastes time without improving success rate [src7]
Quick Reference
| Strategy | Formula | Thundering Herd Risk | Fairness | Complexity | Best For |
|---|---|---|---|---|---|
| No backoff (fixed) | delay = constant | Very High | Equal | Trivial | Never use for retries |
| Linear backoff | delay = base * attempt | High | Equal | Low | Simple rate limiting |
| Exponential (no jitter) | delay = min(cap, base * 2^attempt) | High | Equal | Low | Prototype only |
| Full jitter | delay = random(0, min(cap, base * 2^attempt)) | Very Low | High | Low | Default recommendation |
| Equal jitter | delay = exp/2 + random(0, exp/2) | Low | Medium | Low | Predictable minimum wait |
| Decorrelated jitter | delay = min(cap, random(base, prev * 3)) | Low | Medium | Medium | Stateful clients |
| Fixed delay | delay = constant | Very High | Equal | Trivial | Polling, not retries |
| Exponential + token bucket | Full jitter + token bucket rate limit | Very Low | High | Medium | AWS SDK default |
Decision Tree
START: Is the failed operation retryable?
|
+-- Is the error transient (429, 408, 500, 502, 503, 504, network timeout)?
| +-- NO (400, 401, 403, 404, 422) --> Do NOT retry. Return error immediately.
| +-- YES ↓
|
+-- Is the operation idempotent (safe to repeat)?
| +-- NO --> Add idempotency key or do NOT retry.
| +-- YES ↓
|
+-- How many concurrent clients may retry simultaneously?
| +-- Few (<10) --> Exponential backoff (jitter optional)
| +-- Many (10-1000) --> Full jitter (recommended default)
| +-- Very many (>1000) --> Full jitter + token bucket + circuit breaker
|
+-- Do you need a guaranteed minimum wait time?
| +-- YES --> Equal jitter (half fixed, half random)
| +-- NO --> Full jitter (lowest total load)
|
+-- Is this a long-running background job?
+-- YES --> Decorrelated jitter (independent of attempt count)
+-- NO --> Full jitter with max 3-5 attempts
Step-by-Step Guide
1. Identify retryable errors
Only retry transient failures. Server errors (5xx) and rate limits (429) are retryable. Client errors (4xx except 429, 408) are permanent and must not be retried. [src3]
RETRYABLE_STATUS_CODES = {408, 429, 500, 502, 503, 504}
RETRYABLE_EXCEPTIONS = (ConnectionError, TimeoutError, OSError)
def is_retryable(error):
if isinstance(error, RETRYABLE_EXCEPTIONS):
return True
if hasattr(error, 'status_code'):
return error.status_code in RETRYABLE_STATUS_CODES
return False
Verify: is_retryable(HTTPError(status_code=503)) returns True; is_retryable(HTTPError(status_code=400)) returns False.
2. Implement the full jitter formula
Full jitter provides the best load distribution across retrying clients. The formula randomizes the delay between 0 and the exponential ceiling. [src1]
import random
def full_jitter_delay(attempt, base=1.0, cap=30.0):
exp_delay = min(cap, base * (2 ** attempt))
return random.uniform(0, exp_delay)
Verify: full_jitter_delay(0) returns value in [0, 1.0]; full_jitter_delay(5) returns value in [0, 30.0].
3. Build the retry loop with maximum attempts
Wrap the retryable operation in a loop with configurable max attempts, applying the jitter delay between each attempt. [src2]
import time, logging
def retry_with_backoff(fn, max_attempts=4, base=1.0, cap=30.0):
last_exception = None
for attempt in range(max_attempts):
try:
return fn()
except Exception as e:
last_exception = e
if not is_retryable(e) or attempt == max_attempts - 1:
raise
delay = full_jitter_delay(attempt, base, cap)
logging.warning(f"Attempt {attempt+1}/{max_attempts} failed. Retrying in {delay:.2f}s")
time.sleep(delay)
raise last_exception
Verify: Function retries on 503, gives up on 400, raises after max_attempts exhausted.
4. Add retry budget / token bucket (for high-scale systems)
Prevent retry amplification by limiting the total retry rate across all requests. AWS SDKs use a token bucket: 500 initial tokens, 5 tokens per successful call refunded, 5 tokens consumed per retry. [src7]
import threading
class RetryBudget:
def __init__(self, max_tokens=500, refill_per_success=5, cost_per_retry=5):
self.tokens = max_tokens
self.max_tokens = max_tokens
self.refill = refill_per_success
self.cost = cost_per_retry
self._lock = threading.Lock()
def acquire(self):
with self._lock:
if self.tokens >= self.cost:
self.tokens -= self.cost
return True
return False
def success(self):
with self._lock:
self.tokens = min(self.max_tokens, self.tokens + self.refill)
Verify: After 100 consecutive failures (500 tokens consumed), acquire() returns False.
5. Respect Retry-After headers
When the server sends a Retry-After header (common with 429 and 503), use the server-specified delay instead of your calculated backoff. [src3]
def get_retry_delay(response, attempt, base=1.0, cap=30.0):
retry_after = response.headers.get('Retry-After')
if retry_after:
try:
return min(float(retry_after), cap)
except ValueError:
pass
return full_jitter_delay(attempt, base, cap)
Verify: Response with Retry-After: 5 returns 5.0; without header falls back to jitter calculation.
Code Examples
Python (tenacity): Decorator-Based Retry
# Input: Any function that may raise transient errors
# Output: Automatic retry with exponential backoff + jitter
from tenacity import (
retry, stop_after_attempt, wait_exponential_jitter,
retry_if_exception_type, before_sleep_log
)
import logging
import httpx # pip install httpx>=0.27
logger = logging.getLogger(__name__)
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential_jitter(initial=1, max=30, jitter=5),
retry=retry_if_exception_type((httpx.TransportError, httpx.HTTPStatusError)),
before_sleep=before_sleep_log(logger, logging.WARNING),
reraise=True,
)
def fetch_with_retry(url: str) -> dict:
response = httpx.get(url, timeout=10)
if response.status_code in (429, 500, 502, 503, 504):
response.raise_for_status()
return response.json()
Node.js: Async Retry with Full Jitter
// Input: Async function that may throw retryable errors
// Output: Result of successful call, or throws after max attempts
async function retryWithBackoff(fn, {
maxAttempts = 4,
baseDelay = 1000,
maxDelay = 30000,
} = {}) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
const isRetryable = error.status >= 500 || error.status === 429
|| error.code === 'ECONNRESET' || error.code === 'ETIMEDOUT';
if (!isRetryable || attempt === maxAttempts - 1) throw error;
const expDelay = Math.min(maxDelay, baseDelay * 2 ** attempt);
const delay = Math.random() * expDelay;
await new Promise(r => setTimeout(r, delay));
}
}
}
Go: Retry with Context and Full Jitter
// Input: Context, retryable function
// Output: Result of successful call, or error after max attempts
package retry
import (
"context"
"math"
"math/rand"
"time"
"fmt"
)
type Config struct {
MaxAttempts int
BaseDelay time.Duration
MaxDelay time.Duration
}
func Do(ctx context.Context, cfg Config, fn func() error) error {
var lastErr error
for attempt := 0; attempt < cfg.MaxAttempts; attempt++ {
lastErr = fn()
if lastErr == nil {
return nil
}
if attempt == cfg.MaxAttempts-1 {
break
}
expDelay := math.Min(
float64(cfg.MaxDelay),
float64(cfg.BaseDelay)*math.Pow(2, float64(attempt)),
)
delay := time.Duration(rand.Float64() * expDelay)
select {
case <-ctx.Done():
return fmt.Errorf("retry cancelled: %w", ctx.Err())
case <-time.After(delay):
}
}
return fmt.Errorf("all %d attempts failed: %w", cfg.MaxAttempts, lastErr)
}
Java: Retry with Exponential Backoff
// Input: Callable<T> that may throw retryable exceptions
// Output: Result of successful call, or throws after max attempts
import java.util.Set;
import java.util.concurrent.ThreadLocalRandom;
public class RetryWithBackoff {
private static final Set<Integer> RETRYABLE = Set.of(429, 500, 502, 503, 504);
public static <T> T execute(
RetryableCall<T> fn, int maxAttempts, long baseMs, long capMs
) throws Exception {
Exception lastException = null;
for (int attempt = 0; attempt < maxAttempts; attempt++) {
try {
return fn.call();
} catch (RetryableException e) {
lastException = e;
if (!RETRYABLE.contains(e.getStatusCode())
|| attempt == maxAttempts - 1) throw e;
long expDelay = Math.min(capMs, baseMs * (1L << attempt));
long delay = ThreadLocalRandom.current().nextLong(0, expDelay + 1);
Thread.sleep(delay);
}
}
throw lastException;
}
@FunctionalInterface
public interface RetryableCall<T> { T call() throws Exception; }
}
Anti-Patterns
Wrong: Retry without any backoff
# BAD -- hammering a failing service makes the outage worse
for attempt in range(5):
try:
return call_api()
except Exception:
pass # Retry immediately with zero delay
Correct: Retry with exponential backoff and jitter
# GOOD -- spreading retries over time lets the service recover
for attempt in range(5):
try:
return call_api()
except TransientError:
delay = min(30, 1.0 * 2 ** attempt) * random.random()
time.sleep(delay)
Wrong: Retrying non-retryable errors (400, 401, 404)
# BAD -- 400 Bad Request will fail every time, wasting retry attempts
@retry(stop=stop_after_attempt(4), wait=wait_exponential())
def create_user(data):
response = httpx.post('/users', json=data)
response.raise_for_status() # Retries even on 400/401/404!
Correct: Only retry transient errors
# GOOD -- only retry server errors and rate limits
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential_jitter(initial=1, max=30),
retry=retry_if_exception(lambda e: getattr(e, 'response', None)
and e.response.status_code in (429, 500, 502, 503, 504)),
)
def create_user(data):
response = httpx.post('/users', json=data)
response.raise_for_status()
Wrong: Exponential backoff without jitter
# BAD -- all 1000 clients retry at exactly 1s, 2s, 4s, 8s -> thundering herd
delay = base * (2 ** attempt)
time.sleep(delay)
Correct: Always add full jitter
# GOOD -- clients spread retries uniformly, preventing synchronized storms
delay = random.uniform(0, min(cap, base * (2 ** attempt)))
time.sleep(delay)
Wrong: Infinite retries with no maximum
# BAD -- retries forever, consuming threads/connections/memory
while True:
try:
return call_api()
except Exception:
time.sleep(2 ** attempt)
attempt += 1
Correct: Bounded retries with a cap
# GOOD -- give up after max_attempts and let the caller handle the failure
MAX_ATTEMPTS = 4
for attempt in range(MAX_ATTEMPTS):
try:
return call_api()
except TransientError:
if attempt == MAX_ATTEMPTS - 1:
raise
delay = random.uniform(0, min(30.0, 1.0 * 2 ** attempt))
time.sleep(delay)
Common Pitfalls
- No jitter causes thundering herd: 1000 clients failing at the same time all retry at 1s, 2s, 4s, 8s -- hitting the server in synchronized waves. Fix:
delay = random.uniform(0, min(cap, base * 2^attempt)). [src1] - Retrying non-idempotent operations: Retrying a POST /charge endpoint without an idempotency key creates duplicate charges. Fix: Pass
Idempotency-Keyheader; server deduplicates based on key. [src2] - Ignoring Retry-After headers: When a server sends
Retry-After: 60, ignoring it and retrying in 2s triggers rate limiting or bans. Fix: ParseRetry-Afterheader and use server-specified delay as minimum. [src3] - Retrying at every layer: If client retries 3x, load balancer retries 3x, and gateway retries 3x, one user request generates 27 backend calls. Fix: Retry only at the outermost layer, or use a shared retry budget. [src2]
- Unbounded exponential delay: Without a cap,
2^20 = 1,048,576seconds (~12 days). Fix:min(cap, base * 2^attempt)with cap of 30-60 seconds. [src7] - Not logging retry attempts: Silent retries mask transient issues until they become chronic. Fix: Log every retry with attempt number, delay, and error. [src6]
- Retrying on circuit breaker open: Continuing to retry when the circuit breaker is open wastes resources. Fix: Check circuit state before attempting retry. [src2]
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Transient network errors (timeouts, DNS failures) | Client-side validation errors (4xx) | Return error immediately |
| Rate-limited APIs returning 429 | Non-idempotent operations without idempotency keys | Idempotency pattern first, then retry |
| Cloud service temporary unavailability (503) | Real-time user-facing requests with tight latency SLAs | Hedged requests / speculative execution |
| Batch processing / background jobs | Service is consistently down (not transient) | Circuit breaker to fail fast |
| Database connection pool exhaustion | Retries at multiple layers (client + LB + gateway) | Single retry point with retry budget |
| Message queue consumer failures | Authentication/authorization errors (401, 403) | Re-authenticate, do not retry |
Important Caveats
- Full jitter has the lowest total server load according to AWS simulations, but takes slightly longer to complete than equal jitter -- choose based on whether you optimize for server health (full jitter) or client latency (equal jitter).
- AWS SDKs default to 3 max attempts with 20-second max backoff. Google Cloud client libraries always enable jitter. Polly (.NET) v8+ uses
BackoffType.ExponentialwithUseJitter = true. Prefer library implementations over hand-rolled retries. - Retry amplification is the most dangerous failure mode: N clients each retrying M times generates N*M load on an already failing service. Use a global retry budget (token bucket) for services handling >100 requests/second.
- When using gRPC, leverage built-in retry policies in the service config rather than implementing client-side retry.
- In serverless environments (AWS Lambda, Cloud Functions), retries consume invocation time and cost. Set aggressive max_attempts (2-3) for cost-sensitive workloads.