distributed rate limiting

- Bottom line: Use a Redis-backed token bucket or sliding window counter at the API gateway layer; wrap all counter operations in Lua scripts for atomicity; deploy Redis with replication for availability.

API rate limiter design

- Bottom line: Use a Redis-backed token bucket or sliding window counter at the API gateway layer; wrap all counter operations in Lua scripts for atomicity; deploy Redis with replication for availability.

rate limiting system design interview

- Bottom line: Use a Redis-backed token bucket or sliding window counter at the API gateway layer; wrap all counter operations in Lua scripts for atomicity; deploy Redis with replication for availability.

Designing a Distributed Rate Limiter

How do I design a distributed rate limiter?

TL;DR

Bottom line: Use a Redis-backed token bucket or sliding window counter at the API gateway layer; wrap all counter operations in Lua scripts for atomicity; deploy Redis with replication for availability.
Key tool/command: redis-cli EVAL "lua_script" 1 key limit window (atomic rate-limit check via Lua)
Watch out for: Race conditions from non-atomic read-then-write operations — two concurrent requests can both pass the check before either increments the counter.
Works with: Any language with a Redis client; deploy at API gateway (Kong, NGINX, Envoy), application middleware, or edge (Cloudflare, AWS WAF).

Constraints

All counter operations (read-check-increment) must be atomic — use Redis Lua scripts or CAS operations to prevent race conditions
Clock skew across distributed nodes causes inconsistent windowing — use a centralized time source or wall-clock-aligned windows
Redis single-instance is a SPOF — always deploy with master-replica replication and Sentinel or Cluster for production
Never fail-open by default in security-critical contexts (DDoS protection, auth brute-force) — design explicit fail-open vs fail-closed policy per limiter
Memory grows linearly with unique keys — set TTL on all rate limit keys and monitor Redis memory usage

Quick Reference

Component	Role	Technology Options	Scaling Strategy
Rate Limit Algorithm	Core counting logic	Token bucket, sliding window counter, fixed window, leaky bucket	Algorithm choice depends on burst tolerance vs accuracy requirements
Centralized Counter Store	Atomic counters shared across nodes	Redis (primary), Memcached, DynamoDB	Redis Cluster with hash slots; read replicas for high-read workloads
API Gateway / Proxy	Enforcement point before application	NGINX, Kong, Envoy, AWS API Gateway, Cloudflare	Horizontal scaling; each instance queries shared counter store
Lua Script Engine	Atomic multi-step operations	Redis EVAL/EVALSHA	Scripts cached server-side via SHA; no additional scaling needed
Rule Configuration Store	Per-client/tier rate limits	Config file, database, feature flags	Hot-reload without restart; hierarchical rules (global > tenant > endpoint)
Client Identity Resolver	Extracts rate-limit key from request	API key, JWT claims, IP address, combination	Consistent hashing to same Redis shard per client key
Response Header Formatter	Communicates limit status to clients	X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After, RateLimit (draft RFC)	Standard headers; no scaling concern
Async Counter Sync	Reduces latency for non-critical limits	Local counters + periodic sync to Redis	Eventual consistency; configurable sync interval
Monitoring & Alerting	Tracks rejection rates, Redis latency	Prometheus + Grafana, Datadog, CloudWatch	Alert on rejection spike, Redis memory, counter key cardinality
Fallback / Circuit Breaker	Handles Redis outage gracefully	In-memory local limiter, fail-open, fail-closed	Degrade to local approximate limiting; log for reconciliation
Load Shedder	Protects system under extreme load	Priority-based traffic classification	Stripe model: critical > POSTs > GETs > test traffic [src1]
Clock Synchronization	Consistent time across nodes	NTP, wall-clock-aligned windows	Align windows to Unix epoch seconds; tolerate small skew with weighted windows

Decision Tree

START
├── Request volume <1K/sec and single datacenter?
│   ├── YES → In-process rate limiter (Guava RateLimiter, Go rate.Limiter)
│   └── NO ↓
├── Need to allow short traffic bursts above steady rate?
│   ├── YES → Token bucket algorithm (Redis + Lua)
│   │   ├── Stripe-style API? → Add concurrent request limiter + load shedder
│   │   └── Standard API? → Token bucket with per-key buckets
│   └── NO ↓
├── Need precise per-second accuracy (billing, compliance)?
│   ├── YES → Sliding window log (Redis sorted sets) — higher memory cost
│   └── NO ↓
├── Need smooth output rate (queue processing, webhooks)?
│   ├── YES → Leaky bucket (process at fixed rate, queue excess)
│   └── NO ↓
├── Simple implementation, tolerant of boundary burst?
│   ├── YES → Fixed window counter (simplest Redis INCR)
│   └── NO ↓
└── DEFAULT → Sliding window counter (Cloudflare approach) — best accuracy/memory trade-off

Step-by-Step Guide

1. Choose your rate-limiting key

Define what identifies a unique client. Common keys: API key, user ID, IP address, or composite (user + endpoint). [src1]

Rate-limit key examples:
  Per user:     ratelimit:{user_id}:{endpoint}
  Per API key:  ratelimit:{api_key}
  Per IP:       ratelimit:{client_ip}:{endpoint}
  Per tenant:   ratelimit:{tenant_id}:{tier}

Verify: Ensure keys are unique per client scope — duplicate keys cause shared limits across unrelated clients.

2. Implement atomic counter operations with Redis Lua

All rate-limit checks must be atomic. A Lua script ensures the read-check-increment cycle cannot be interrupted by concurrent requests. [src3] [src4]

-- Token bucket rate limiter (Redis Lua script)
-- KEYS[1] = rate limit key
-- ARGV[1] = max tokens (bucket capacity)
-- ARGV[2] = refill rate (tokens per second)
-- ARGV[3] = current timestamp (seconds)
-- ARGV[4] = tokens to consume (usually 1)
-- Returns: {allowed (0/1), remaining_tokens}

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])

if tokens == nil then
  tokens = capacity
  last_refill = now
end

local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + (elapsed * refill_rate))

local allowed = 0
local remaining = tokens

if tokens >= requested then
  tokens = tokens - requested
  allowed = 1
  remaining = tokens
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)

return {allowed, remaining}

Verify: redis-cli EVAL "$(cat token_bucket.lua)" 1 "ratelimit:user123" 10 1 $(date +%s) 1 → {1, 9}

3. Add response headers

Communicate rate-limit state back to clients using IETF draft standard headers. [src7]

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708732800
Retry-After: 30
Content-Type: application/json
{"error": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 30s."}

Verify: curl -v https://api.example.com/endpoint 2>&1 | grep X-RateLimit

4. Deploy Redis with high availability

Use Redis Sentinel or Cluster for production to avoid single point of failure. [src3]

# Redis Sentinel configuration (sentinel.conf)
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel parallel-syncs mymaster 1

Verify: redis-cli -p 26379 SENTINEL master mymaster → confirms master is up and replicas connected.

5. Implement fallback for Redis outages

Rate limiters must handle Redis failures gracefully. Decide on fail-open or fail-closed based on context. [src1]

Fallback strategy:
  1. Try Redis rate-limit check (Lua script)
  2. If Redis unavailable:
     a. Security-critical (auth, payment): FAIL CLOSED → reject with 503
     b. General API: FAIL OPEN → allow request, log the bypass
     c. Optional: fall back to in-memory approximate counter per node
  3. Circuit breaker: stop retrying Redis for 30s after 3 consecutive failures
  4. Alert on fallback activation

Verify: Kill Redis replica, confirm application logs fallback activation and requests handled per policy.

6. Configure tiered rate limits

Implement hierarchical rules for different client tiers and priority-based load shedding. [src1] [src6]

Rate-limit tiers:
  free_tier:    100 req/min,  1000 req/hour
  pro_tier:     1000 req/min, 50000 req/hour
  enterprise:   10000 req/min, unlimited/hour

Priority classification (Stripe model):
  P0 — Critical (payments):  never shed
  P1 — POST operations:      shed at 90% capacity
  P2 — GET operations:        shed at 80% capacity
  P3 — Test mode/analytics:   shed at 70% capacity

Verify: Send requests with different API keys → confirm each tier receives correct X-RateLimit-Limit.

Code Examples

Python: Redis-Based Sliding Window Counter

Full script: sliding_window_limiter.py (46 lines)

# Input:  client_key (str), max_requests (int), window_seconds (int)
# Output: (allowed: bool, remaining: int, retry_after: int)
# Requires: redis>=5.0.0

import time, redis

SLIDING_WINDOW_LUA = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local curr_window = tostring(math.floor(now / window) * window)
local prev_window = tostring(tonumber(curr_window) - window)
local prev_count = tonumber(redis.call('GET', key..':'..prev_window) or '0')
local curr_count = tonumber(redis.call('GET', key..':'..curr_window) or '0')
local elapsed = now - tonumber(curr_window)
local weight = (window - elapsed) / window
local estimated = math.floor(prev_count * weight) + curr_count
if estimated >= limit then return {0, 0, window - elapsed} end
redis.call('INCR', key..':'..curr_window)
redis.call('EXPIRE', key..':'..curr_window, window * 2)
return {1, limit - estimated - 1, 0}
"""

class SlidingWindowRateLimiter:
    def __init__(self, redis_url="redis://localhost:6379"):
        self.r = redis.from_url(redis_url, decode_responses=True)
        self._sha = self.r.script_load(SLIDING_WINDOW_LUA)

    def is_allowed(self, client_key, max_requests=100, window_seconds=60):
        try:
            result = self.r.evalsha(self._sha, 1,
                f"rl:{client_key}", max_requests, window_seconds, time.time())
            return bool(result[0]), int(result[1]), int(result[2])
        except redis.ConnectionError:
            return True, -1, 0  # Fail-open

Node.js: Express Middleware Rate Limiter

Full script: express_rate_limiter.js (65 lines)

// Input:  Express request object
// Output: next() if allowed, 429 response if rate-limited
// Requires: ioredis@^5.0.0, express@^4.18.0

const Redis = require('ioredis');
const TOKEN_BUCKET_LUA = `
local key = KEYS[1]
local cap = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local d = redis.call('HMGET', key, 'tokens', 'ts')
local t = tonumber(d[1]) or cap
local ts = tonumber(d[2]) or now
t = math.min(cap, t + (now - ts) * rate)
if t < 1 then return {0, 0, math.ceil((1-t)/rate)} end
t = t - 1
redis.call('HMSET', key, 'tokens', t, 'ts', now)
redis.call('EXPIRE', key, math.ceil(cap/rate)*2)
return {1, math.floor(t), 0}`;

function createRateLimiter(opts = {}) {
  const { redisUrl='redis://localhost:6379', capacity=100, refillRate=1.67 } = opts;
  const r = new Redis(redisUrl, { maxRetriesPerRequest: 1 });
  return async (req, res, next) => {
    const key = `rl:${req.headers['x-api-key'] || req.ip}`;
    try {
      const [ok, rem, retry] = await r.eval(TOKEN_BUCKET_LUA, 1, key,
        capacity, refillRate, Date.now()/1000);
      res.set({ 'X-RateLimit-Limit': String(capacity),
        'X-RateLimit-Remaining': String(rem) });
      if (!ok) { res.set('Retry-After', String(retry));
        return res.status(429).json({ error: 'rate_limit_exceeded' }); }
      next();
    } catch(e) { console.error('RL error:', e.message); next(); }
  };
}

Anti-Patterns

Wrong: Non-atomic read-then-write counter

# BAD — race condition: two requests both read count=99, both pass, actual=101
count = redis_client.get(f"rl:{client_id}")
if int(count or 0) < rate_limit:
    redis_client.incr(f"rl:{client_id}")
    allow_request()

Correct: Atomic Lua script

# GOOD — entire check runs atomically in Redis
result = redis_client.eval("""
    local count = redis.call('INCR', KEYS[1])
    if count == 1 then redis.call('EXPIRE', KEYS[1], ARGV[2]) end
    if count > tonumber(ARGV[1]) then return 0 else return 1 end
""", 1, f"rl:{client_id}", rate_limit, window_seconds)

Wrong: Fixed window without boundary burst protection

# BAD — 100 req at 0:59 + 100 at 1:00 = 200 in 2 seconds
window = int(time.time() / 60)
key = f"rl:{client_id}:{window}"
count = redis_client.incr(key)

Correct: Sliding window counter

# GOOD — weighted average across adjacent windows prevents boundary burst
# Cloudflare: estimated = prev_count * ((window - elapsed) / window) + curr_count
# At t=0:45 with 60s window: weight=0.25, only 25% of previous window counts

Wrong: In-memory state only (not shared)

# BAD — 4 servers x 100/min = client can send 400/min total
request_counts = {}  # Not shared across instances
def check_rate(client_id):
    request_counts[client_id] = request_counts.get(client_id, 0) + 1
    return request_counts[client_id] <= 100

Correct: Centralized counter store

# GOOD — all instances share the same counter via Redis
result = redis_client.evalsha(script_sha, 1, f"rl:{client_id}",
    max_requests, window_seconds, time.time())

Wrong: No TTL on rate-limit keys

# BAD — keys accumulate forever, Redis memory grows unbounded
redis_client.hmset(f"rl:{client_id}", {"tokens": tokens, "ts": now})
# No EXPIRE — if client disappears, key persists forever

Correct: Always set TTL

# GOOD — TTL = 2x window ensures cleanup even with clock drift
redis_client.hmset(f"rl:{client_id}", {"tokens": tokens, "ts": now})
redis_client.expire(f"rl:{client_id}", window_seconds * 2)

Common Pitfalls

Race conditions from non-atomic operations: Two concurrent requests both read the same counter, both pass, both increment — actual count exceeds limit. Fix: redis.eval(lua_script) for atomic read-check-increment. [src3]
Fixed-window boundary burst: Client sends max requests at end of window N and start of N+1, doubling throughput in a short period. Fix: use sliding window counter with weighted average. [src2]
Redis SPOF causing total outage: Single Redis instance down = all requests unthrottled or blocked. Fix: Redis Sentinel or Cluster + in-memory fallback with circuit breaker. [src1]
Clock skew in distributed nodes: Servers with offset clocks produce inconsistent window boundaries. Fix: wall-clock-aligned windows (floor(ts/window)*window) + NTP sync. [src5]
Missing Retry-After header: Clients without backoff guidance retry immediately, causing thundering herd. Fix: always include Retry-After on 429 responses. [src7]
Over-granular rate-limit keys: Using user+endpoint+method+IP creates millions of unique keys. Fix: use user_id or api_key + optional endpoint grouping. [src1]
No request priority differentiation: Analytics calls consume the same budget as payment operations. Fix: tiered load shedding with priority classification. [src1]
Testing only at low traffic: Rate limiters working at 10 req/sec may fail at 10K req/sec. Fix: load test at production scale; benchmark Redis EVALSHA latency under concurrency. [src4]

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Protecting APIs from abuse (brute force, scraping)	Single-server app with no distributed requirements	In-process rate limiter (Guava, Go rate, node-rate-limiter-flexible)
Enforcing SLA/tier-based API quotas	Need to block L3/L4 DDoS attacks	Network-level DDoS mitigation (Cloudflare, AWS Shield, Cloud Armor)
Preventing resource exhaustion in microservices	Service is failing (unhealthy, not overloaded)	Circuit breaker pattern (Resilience4j, Polly, Hystrix)
Multi-datacenter API with shared rate limits	Rate limiting per-user with small user base and single server	Simple in-memory counter with mutex
Cost control for expensive downstream calls (LLM APIs, payment processors)	Need request queuing with guaranteed delivery	Message queue (RabbitMQ, SQS) with consumer rate control

Important Caveats

Sliding window counter (Cloudflare approach) is an approximation — 0.003% error rate across 400M requests, acceptable for most use cases but not billing-precise metering
Redis Cluster splits keyspace into 16384 hash slots — ensure rate-limit key hashing distributes evenly; use hash tags {user_id} for related keys on same shard
Token bucket capacity and refill_rate are independent — capacity controls burst size, refill_rate controls sustained throughput; misconfiguring either produces unexpected behavior
Lua scripts block the Redis event loop during execution — keep scripts under 100 microseconds; complex scripts at high concurrency increase p99 latency
Distributed rate limiting with strict consistency adds 1-5ms latency per request; for sub-millisecond latency, use local approximate counting with periodic sync