exponential backoff algorithm

- Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use delay = min(cap, base * 2^attempt) * random() for optimal load distribution on failing services.

retry with jitter implementation

- Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use delay = min(cap, base * 2^attempt) * random() for optimal load distribution on failing services.

backoff and retry pattern

- Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use delay = min(cap, base * 2^attempt) * random() for optimal load distribution on failing services.

how to retry API calls with exponential backoff

- Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use delay = min(cap, base * 2^attempt) * random() for optimal load distribution on failing services.

thundering herd retry prevention

- Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use delay = min(cap, base * 2^attempt) * random() for optimal load distribution on failing services.

Retry with Exponential Backoff and Jitter

How do I implement retry with exponential backoff and jitter?

TL;DR

Bottom line: Exponential backoff with full jitter prevents thundering herd problems by spreading retry attempts across time -- use delay = min(cap, base * 2^attempt) * random() for optimal load distribution on failing services.
Key tool/command: delay = min(cap, base * 2^attempt) * random(0, 1)
Watch out for: Retrying non-idempotent operations without an idempotency key -- this causes duplicate writes, double charges, and data corruption.
Works with: Any language, any protocol. Used by AWS SDKs, Google Cloud client libraries, gRPC, Polly (.NET), Tenacity (Python).

Constraints

NEVER retry non-idempotent operations (POST creating resources, financial transactions) without an idempotency key [src2]
ALWAYS set a maximum retry count (3-5 attempts) -- infinite retries cause cascading failures and resource exhaustion [src6]
ALWAYS add jitter to backoff delays -- pure exponential backoff without jitter causes synchronized retry storms (thundering herd) [src1]
NEVER retry 4xx client errors (400, 401, 403, 404) -- these are permanent failures. Only retry 429 (rate limit) and 5xx (server error) [src3]
Cap maximum delay at 30-60 seconds -- unbounded exponential growth wastes time without improving success rate [src7]

Quick Reference

Strategy	Formula	Thundering Herd Risk	Fairness	Complexity	Best For
No backoff (fixed)	`delay = constant`	Very High	Equal	Trivial	Never use for retries
Linear backoff	`delay = base * attempt`	High	Equal	Low	Simple rate limiting
Exponential (no jitter)	`delay = min(cap, base * 2^attempt)`	High	Equal	Low	Prototype only
Full jitter	`delay = random(0, min(cap, base * 2^attempt))`	Very Low	High	Low	Default recommendation
Equal jitter	`delay = exp/2 + random(0, exp/2)`	Low	Medium	Low	Predictable minimum wait
Decorrelated jitter	`delay = min(cap, random(base, prev * 3))`	Low	Medium	Medium	Stateful clients
Fixed delay	`delay = constant`	Very High	Equal	Trivial	Polling, not retries
Exponential + token bucket	Full jitter + token bucket rate limit	Very Low	High	Medium	AWS SDK default

Decision Tree

START: Is the failed operation retryable?
|
+-- Is the error transient (429, 408, 500, 502, 503, 504, network timeout)?
|   +-- NO (400, 401, 403, 404, 422) --> Do NOT retry. Return error immediately.
|   +-- YES ↓
|
+-- Is the operation idempotent (safe to repeat)?
|   +-- NO --> Add idempotency key or do NOT retry.
|   +-- YES ↓
|
+-- How many concurrent clients may retry simultaneously?
|   +-- Few (<10) --> Exponential backoff (jitter optional)
|   +-- Many (10-1000) --> Full jitter (recommended default)
|   +-- Very many (>1000) --> Full jitter + token bucket + circuit breaker
|
+-- Do you need a guaranteed minimum wait time?
|   +-- YES --> Equal jitter (half fixed, half random)
|   +-- NO --> Full jitter (lowest total load)
|
+-- Is this a long-running background job?
    +-- YES --> Decorrelated jitter (independent of attempt count)
    +-- NO --> Full jitter with max 3-5 attempts

Step-by-Step Guide

1. Identify retryable errors

Only retry transient failures. Server errors (5xx) and rate limits (429) are retryable. Client errors (4xx except 429, 408) are permanent and must not be retried. [src3]

RETRYABLE_STATUS_CODES = {408, 429, 500, 502, 503, 504}
RETRYABLE_EXCEPTIONS = (ConnectionError, TimeoutError, OSError)

def is_retryable(error):
    if isinstance(error, RETRYABLE_EXCEPTIONS):
        return True
    if hasattr(error, 'status_code'):
        return error.status_code in RETRYABLE_STATUS_CODES
    return False

Verify: is_retryable(HTTPError(status_code=503)) returns True; is_retryable(HTTPError(status_code=400)) returns False.

2. Implement the full jitter formula

Full jitter provides the best load distribution across retrying clients. The formula randomizes the delay between 0 and the exponential ceiling. [src1]

import random

def full_jitter_delay(attempt, base=1.0, cap=30.0):
    exp_delay = min(cap, base * (2 ** attempt))
    return random.uniform(0, exp_delay)

Verify: full_jitter_delay(0) returns value in [0, 1.0]; full_jitter_delay(5) returns value in [0, 30.0].

3. Build the retry loop with maximum attempts

Wrap the retryable operation in a loop with configurable max attempts, applying the jitter delay between each attempt. [src2]

import time, logging

def retry_with_backoff(fn, max_attempts=4, base=1.0, cap=30.0):
    last_exception = None
    for attempt in range(max_attempts):
        try:
            return fn()
        except Exception as e:
            last_exception = e
            if not is_retryable(e) or attempt == max_attempts - 1:
                raise
            delay = full_jitter_delay(attempt, base, cap)
            logging.warning(f"Attempt {attempt+1}/{max_attempts} failed. Retrying in {delay:.2f}s")
            time.sleep(delay)
    raise last_exception

Verify: Function retries on 503, gives up on 400, raises after max_attempts exhausted.

4. Add retry budget / token bucket (for high-scale systems)

Prevent retry amplification by limiting the total retry rate across all requests. AWS SDKs use a token bucket: 500 initial tokens, 5 tokens per successful call refunded, 5 tokens consumed per retry. [src7]

import threading

class RetryBudget:
    def __init__(self, max_tokens=500, refill_per_success=5, cost_per_retry=5):
        self.tokens = max_tokens
        self.max_tokens = max_tokens
        self.refill = refill_per_success
        self.cost = cost_per_retry
        self._lock = threading.Lock()

    def acquire(self):
        with self._lock:
            if self.tokens >= self.cost:
                self.tokens -= self.cost
                return True
            return False

    def success(self):
        with self._lock:
            self.tokens = min(self.max_tokens, self.tokens + self.refill)

Verify: After 100 consecutive failures (500 tokens consumed), acquire() returns False.

5. Respect Retry-After headers

When the server sends a Retry-After header (common with 429 and 503), use the server-specified delay instead of your calculated backoff. [src3]

def get_retry_delay(response, attempt, base=1.0, cap=30.0):
    retry_after = response.headers.get('Retry-After')
    if retry_after:
        try:
            return min(float(retry_after), cap)
        except ValueError:
            pass
    return full_jitter_delay(attempt, base, cap)

Verify: Response with Retry-After: 5 returns 5.0; without header falls back to jitter calculation.

Code Examples

Python (tenacity): Decorator-Based Retry

# Input:  Any function that may raise transient errors
# Output: Automatic retry with exponential backoff + jitter

from tenacity import (
    retry, stop_after_attempt, wait_exponential_jitter,
    retry_if_exception_type, before_sleep_log
)
import logging
import httpx  # pip install httpx>=0.27

logger = logging.getLogger(__name__)

@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=30, jitter=5),
    retry=retry_if_exception_type((httpx.TransportError, httpx.HTTPStatusError)),
    before_sleep=before_sleep_log(logger, logging.WARNING),
    reraise=True,
)
def fetch_with_retry(url: str) -> dict:
    response = httpx.get(url, timeout=10)
    if response.status_code in (429, 500, 502, 503, 504):
        response.raise_for_status()
    return response.json()

Node.js: Async Retry with Full Jitter

// Input:  Async function that may throw retryable errors
// Output: Result of successful call, or throws after max attempts

async function retryWithBackoff(fn, {
  maxAttempts = 4,
  baseDelay = 1000,
  maxDelay = 30000,
} = {}) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      const isRetryable = error.status >= 500 || error.status === 429
        || error.code === 'ECONNRESET' || error.code === 'ETIMEDOUT';
      if (!isRetryable || attempt === maxAttempts - 1) throw error;
      const expDelay = Math.min(maxDelay, baseDelay * 2 ** attempt);
      const delay = Math.random() * expDelay;
      await new Promise(r => setTimeout(r, delay));
    }
  }
}

Go: Retry with Context and Full Jitter

// Input:  Context, retryable function
// Output: Result of successful call, or error after max attempts

package retry

import (
    "context"
    "math"
    "math/rand"
    "time"
    "fmt"
)

type Config struct {
    MaxAttempts int
    BaseDelay   time.Duration
    MaxDelay    time.Duration
}

func Do(ctx context.Context, cfg Config, fn func() error) error {
    var lastErr error
    for attempt := 0; attempt < cfg.MaxAttempts; attempt++ {
        lastErr = fn()
        if lastErr == nil {
            return nil
        }
        if attempt == cfg.MaxAttempts-1 {
            break
        }
        expDelay := math.Min(
            float64(cfg.MaxDelay),
            float64(cfg.BaseDelay)*math.Pow(2, float64(attempt)),
        )
        delay := time.Duration(rand.Float64() * expDelay)
        select {
        case <-ctx.Done():
            return fmt.Errorf("retry cancelled: %w", ctx.Err())
        case <-time.After(delay):
        }
    }
    return fmt.Errorf("all %d attempts failed: %w", cfg.MaxAttempts, lastErr)
}

Java: Retry with Exponential Backoff

// Input:  Callable<T> that may throw retryable exceptions
// Output: Result of successful call, or throws after max attempts

import java.util.Set;
import java.util.concurrent.ThreadLocalRandom;

public class RetryWithBackoff {
    private static final Set<Integer> RETRYABLE = Set.of(429, 500, 502, 503, 504);

    public static <T> T execute(
            RetryableCall<T> fn, int maxAttempts, long baseMs, long capMs
    ) throws Exception {
        Exception lastException = null;
        for (int attempt = 0; attempt < maxAttempts; attempt++) {
            try {
                return fn.call();
            } catch (RetryableException e) {
                lastException = e;
                if (!RETRYABLE.contains(e.getStatusCode())
                        || attempt == maxAttempts - 1) throw e;
                long expDelay = Math.min(capMs, baseMs * (1L << attempt));
                long delay = ThreadLocalRandom.current().nextLong(0, expDelay + 1);
                Thread.sleep(delay);
            }
        }
        throw lastException;
    }

    @FunctionalInterface
    public interface RetryableCall<T> { T call() throws Exception; }
}

Anti-Patterns

Wrong: Retry without any backoff

# BAD -- hammering a failing service makes the outage worse
for attempt in range(5):
    try:
        return call_api()
    except Exception:
        pass  # Retry immediately with zero delay

Correct: Retry with exponential backoff and jitter

# GOOD -- spreading retries over time lets the service recover
for attempt in range(5):
    try:
        return call_api()
    except TransientError:
        delay = min(30, 1.0 * 2 ** attempt) * random.random()
        time.sleep(delay)

Wrong: Retrying non-retryable errors (400, 401, 404)

# BAD -- 400 Bad Request will fail every time, wasting retry attempts
@retry(stop=stop_after_attempt(4), wait=wait_exponential())
def create_user(data):
    response = httpx.post('/users', json=data)
    response.raise_for_status()  # Retries even on 400/401/404!

Correct: Only retry transient errors

# GOOD -- only retry server errors and rate limits
@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=30),
    retry=retry_if_exception(lambda e: getattr(e, 'response', None)
          and e.response.status_code in (429, 500, 502, 503, 504)),
)
def create_user(data):
    response = httpx.post('/users', json=data)
    response.raise_for_status()

Wrong: Exponential backoff without jitter

# BAD -- all 1000 clients retry at exactly 1s, 2s, 4s, 8s -> thundering herd
delay = base * (2 ** attempt)
time.sleep(delay)

Correct: Always add full jitter

# GOOD -- clients spread retries uniformly, preventing synchronized storms
delay = random.uniform(0, min(cap, base * (2 ** attempt)))
time.sleep(delay)

Wrong: Infinite retries with no maximum

# BAD -- retries forever, consuming threads/connections/memory
while True:
    try:
        return call_api()
    except Exception:
        time.sleep(2 ** attempt)
        attempt += 1

Correct: Bounded retries with a cap

# GOOD -- give up after max_attempts and let the caller handle the failure
MAX_ATTEMPTS = 4
for attempt in range(MAX_ATTEMPTS):
    try:
        return call_api()
    except TransientError:
        if attempt == MAX_ATTEMPTS - 1:
            raise
        delay = random.uniform(0, min(30.0, 1.0 * 2 ** attempt))
        time.sleep(delay)

Common Pitfalls

No jitter causes thundering herd: 1000 clients failing at the same time all retry at 1s, 2s, 4s, 8s -- hitting the server in synchronized waves. Fix: delay = random.uniform(0, min(cap, base * 2^attempt)). [src1]
Retrying non-idempotent operations: Retrying a POST /charge endpoint without an idempotency key creates duplicate charges. Fix: Pass Idempotency-Key header; server deduplicates based on key. [src2]
Ignoring Retry-After headers: When a server sends Retry-After: 60, ignoring it and retrying in 2s triggers rate limiting or bans. Fix: Parse Retry-After header and use server-specified delay as minimum. [src3]
Retrying at every layer: If client retries 3x, load balancer retries 3x, and gateway retries 3x, one user request generates 27 backend calls. Fix: Retry only at the outermost layer, or use a shared retry budget. [src2]
Unbounded exponential delay: Without a cap, 2^20 = 1,048,576 seconds (~12 days). Fix: min(cap, base * 2^attempt) with cap of 30-60 seconds. [src7]
Not logging retry attempts: Silent retries mask transient issues until they become chronic. Fix: Log every retry with attempt number, delay, and error. [src6]
Retrying on circuit breaker open: Continuing to retry when the circuit breaker is open wastes resources. Fix: Check circuit state before attempting retry. [src2]

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Transient network errors (timeouts, DNS failures)	Client-side validation errors (4xx)	Return error immediately
Rate-limited APIs returning 429	Non-idempotent operations without idempotency keys	Idempotency pattern first, then retry
Cloud service temporary unavailability (503)	Real-time user-facing requests with tight latency SLAs	Hedged requests / speculative execution
Batch processing / background jobs	Service is consistently down (not transient)	Circuit breaker to fail fast
Database connection pool exhaustion	Retries at multiple layers (client + LB + gateway)	Single retry point with retry budget
Message queue consumer failures	Authentication/authorization errors (401, 403)	Re-authenticate, do not retry

Important Caveats

Full jitter has the lowest total server load according to AWS simulations, but takes slightly longer to complete than equal jitter -- choose based on whether you optimize for server health (full jitter) or client latency (equal jitter).
AWS SDKs default to 3 max attempts with 20-second max backoff. Google Cloud client libraries always enable jitter. Polly (.NET) v8+ uses BackoffType.Exponential with UseJitter = true. Prefer library implementations over hand-rolled retries.
Retry amplification is the most dangerous failure mode: N clients each retrying M times generates N*M load on an already failing service. Use a global retry budget (token bucket) for services handling >100 requests/second.
When using gRPC, leverage built-in retry policies in the service config rather than implementing client-side retry.
In serverless environments (AWS Lambda, Cloud Functions), retries consume invocation time and cost. Set aggressive max_attempts (2-3) for cost-sensitive workloads.