Designing a Distributed Rate Limiter

Type: Software Reference Confidence: 0.93 Sources: 7 Verified: 2026-02-23 Freshness: 2026-02-23

TL;DR

Constraints

Quick Reference

ComponentRoleTechnology OptionsScaling Strategy
Rate Limit AlgorithmCore counting logicToken bucket, sliding window counter, fixed window, leaky bucketAlgorithm choice depends on burst tolerance vs accuracy requirements
Centralized Counter StoreAtomic counters shared across nodesRedis (primary), Memcached, DynamoDBRedis Cluster with hash slots; read replicas for high-read workloads
API Gateway / ProxyEnforcement point before applicationNGINX, Kong, Envoy, AWS API Gateway, CloudflareHorizontal scaling; each instance queries shared counter store
Lua Script EngineAtomic multi-step operationsRedis EVAL/EVALSHAScripts cached server-side via SHA; no additional scaling needed
Rule Configuration StorePer-client/tier rate limitsConfig file, database, feature flagsHot-reload without restart; hierarchical rules (global > tenant > endpoint)
Client Identity ResolverExtracts rate-limit key from requestAPI key, JWT claims, IP address, combinationConsistent hashing to same Redis shard per client key
Response Header FormatterCommunicates limit status to clientsX-RateLimit-Limit, X-RateLimit-Remaining, Retry-After, RateLimit (draft RFC)Standard headers; no scaling concern
Async Counter SyncReduces latency for non-critical limitsLocal counters + periodic sync to RedisEventual consistency; configurable sync interval
Monitoring & AlertingTracks rejection rates, Redis latencyPrometheus + Grafana, Datadog, CloudWatchAlert on rejection spike, Redis memory, counter key cardinality
Fallback / Circuit BreakerHandles Redis outage gracefullyIn-memory local limiter, fail-open, fail-closedDegrade to local approximate limiting; log for reconciliation
Load ShedderProtects system under extreme loadPriority-based traffic classificationStripe model: critical > POSTs > GETs > test traffic [src1]
Clock SynchronizationConsistent time across nodesNTP, wall-clock-aligned windowsAlign windows to Unix epoch seconds; tolerate small skew with weighted windows

Decision Tree

START
├── Request volume <1K/sec and single datacenter?
│   ├── YES → In-process rate limiter (Guava RateLimiter, Go rate.Limiter)
│   └── NO ↓
├── Need to allow short traffic bursts above steady rate?
│   ├── YES → Token bucket algorithm (Redis + Lua)
│   │   ├── Stripe-style API? → Add concurrent request limiter + load shedder
│   │   └── Standard API? → Token bucket with per-key buckets
│   └── NO ↓
├── Need precise per-second accuracy (billing, compliance)?
│   ├── YES → Sliding window log (Redis sorted sets) — higher memory cost
│   └── NO ↓
├── Need smooth output rate (queue processing, webhooks)?
│   ├── YES → Leaky bucket (process at fixed rate, queue excess)
│   └── NO ↓
├── Simple implementation, tolerant of boundary burst?
│   ├── YES → Fixed window counter (simplest Redis INCR)
│   └── NO ↓
└── DEFAULT → Sliding window counter (Cloudflare approach) — best accuracy/memory trade-off

Step-by-Step Guide

1. Choose your rate-limiting key

Define what identifies a unique client. Common keys: API key, user ID, IP address, or composite (user + endpoint). [src1]

Rate-limit key examples:
  Per user:     ratelimit:{user_id}:{endpoint}
  Per API key:  ratelimit:{api_key}
  Per IP:       ratelimit:{client_ip}:{endpoint}
  Per tenant:   ratelimit:{tenant_id}:{tier}

Verify: Ensure keys are unique per client scope — duplicate keys cause shared limits across unrelated clients.

2. Implement atomic counter operations with Redis Lua

All rate-limit checks must be atomic. A Lua script ensures the read-check-increment cycle cannot be interrupted by concurrent requests. [src3] [src4]

-- Token bucket rate limiter (Redis Lua script)
-- KEYS[1] = rate limit key
-- ARGV[1] = max tokens (bucket capacity)
-- ARGV[2] = refill rate (tokens per second)
-- ARGV[3] = current timestamp (seconds)
-- ARGV[4] = tokens to consume (usually 1)
-- Returns: {allowed (0/1), remaining_tokens}

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])

if tokens == nil then
  tokens = capacity
  last_refill = now
end

local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + (elapsed * refill_rate))

local allowed = 0
local remaining = tokens

if tokens >= requested then
  tokens = tokens - requested
  allowed = 1
  remaining = tokens
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)

return {allowed, remaining}

Verify: redis-cli EVAL "$(cat token_bucket.lua)" 1 "ratelimit:user123" 10 1 $(date +%s) 1{1, 9}

3. Add response headers

Communicate rate-limit state back to clients using IETF draft standard headers. [src7]

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708732800
Retry-After: 30
Content-Type: application/json
{"error": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 30s."}

Verify: curl -v https://api.example.com/endpoint 2>&1 | grep X-RateLimit

4. Deploy Redis with high availability

Use Redis Sentinel or Cluster for production to avoid single point of failure. [src3]

# Redis Sentinel configuration (sentinel.conf)
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel parallel-syncs mymaster 1

Verify: redis-cli -p 26379 SENTINEL master mymaster → confirms master is up and replicas connected.

5. Implement fallback for Redis outages

Rate limiters must handle Redis failures gracefully. Decide on fail-open or fail-closed based on context. [src1]

Fallback strategy:
  1. Try Redis rate-limit check (Lua script)
  2. If Redis unavailable:
     a. Security-critical (auth, payment): FAIL CLOSED → reject with 503
     b. General API: FAIL OPEN → allow request, log the bypass
     c. Optional: fall back to in-memory approximate counter per node
  3. Circuit breaker: stop retrying Redis for 30s after 3 consecutive failures
  4. Alert on fallback activation

Verify: Kill Redis replica, confirm application logs fallback activation and requests handled per policy.

6. Configure tiered rate limits

Implement hierarchical rules for different client tiers and priority-based load shedding. [src1] [src6]

Rate-limit tiers:
  free_tier:    100 req/min,  1000 req/hour
  pro_tier:     1000 req/min, 50000 req/hour
  enterprise:   10000 req/min, unlimited/hour

Priority classification (Stripe model):
  P0 — Critical (payments):  never shed
  P1 — POST operations:      shed at 90% capacity
  P2 — GET operations:        shed at 80% capacity
  P3 — Test mode/analytics:   shed at 70% capacity

Verify: Send requests with different API keys → confirm each tier receives correct X-RateLimit-Limit.

Code Examples

Python: Redis-Based Sliding Window Counter

Full script: sliding_window_limiter.py (46 lines)

# Input:  client_key (str), max_requests (int), window_seconds (int)
# Output: (allowed: bool, remaining: int, retry_after: int)
# Requires: redis>=5.0.0

import time, redis

SLIDING_WINDOW_LUA = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local curr_window = tostring(math.floor(now / window) * window)
local prev_window = tostring(tonumber(curr_window) - window)
local prev_count = tonumber(redis.call('GET', key..':'..prev_window) or '0')
local curr_count = tonumber(redis.call('GET', key..':'..curr_window) or '0')
local elapsed = now - tonumber(curr_window)
local weight = (window - elapsed) / window
local estimated = math.floor(prev_count * weight) + curr_count
if estimated >= limit then return {0, 0, window - elapsed} end
redis.call('INCR', key..':'..curr_window)
redis.call('EXPIRE', key..':'..curr_window, window * 2)
return {1, limit - estimated - 1, 0}
"""

class SlidingWindowRateLimiter:
    def __init__(self, redis_url="redis://localhost:6379"):
        self.r = redis.from_url(redis_url, decode_responses=True)
        self._sha = self.r.script_load(SLIDING_WINDOW_LUA)

    def is_allowed(self, client_key, max_requests=100, window_seconds=60):
        try:
            result = self.r.evalsha(self._sha, 1,
                f"rl:{client_key}", max_requests, window_seconds, time.time())
            return bool(result[0]), int(result[1]), int(result[2])
        except redis.ConnectionError:
            return True, -1, 0  # Fail-open

Node.js: Express Middleware Rate Limiter

Full script: express_rate_limiter.js (65 lines)

// Input:  Express request object
// Output: next() if allowed, 429 response if rate-limited
// Requires: ioredis@^5.0.0, express@^4.18.0

const Redis = require('ioredis');
const TOKEN_BUCKET_LUA = `
local key = KEYS[1]
local cap = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local d = redis.call('HMGET', key, 'tokens', 'ts')
local t = tonumber(d[1]) or cap
local ts = tonumber(d[2]) or now
t = math.min(cap, t + (now - ts) * rate)
if t < 1 then return {0, 0, math.ceil((1-t)/rate)} end
t = t - 1
redis.call('HMSET', key, 'tokens', t, 'ts', now)
redis.call('EXPIRE', key, math.ceil(cap/rate)*2)
return {1, math.floor(t), 0}`;

function createRateLimiter(opts = {}) {
  const { redisUrl='redis://localhost:6379', capacity=100, refillRate=1.67 } = opts;
  const r = new Redis(redisUrl, { maxRetriesPerRequest: 1 });
  return async (req, res, next) => {
    const key = `rl:${req.headers['x-api-key'] || req.ip}`;
    try {
      const [ok, rem, retry] = await r.eval(TOKEN_BUCKET_LUA, 1, key,
        capacity, refillRate, Date.now()/1000);
      res.set({ 'X-RateLimit-Limit': String(capacity),
        'X-RateLimit-Remaining': String(rem) });
      if (!ok) { res.set('Retry-After', String(retry));
        return res.status(429).json({ error: 'rate_limit_exceeded' }); }
      next();
    } catch(e) { console.error('RL error:', e.message); next(); }
  };
}

Anti-Patterns

Wrong: Non-atomic read-then-write counter

# BAD — race condition: two requests both read count=99, both pass, actual=101
count = redis_client.get(f"rl:{client_id}")
if int(count or 0) < rate_limit:
    redis_client.incr(f"rl:{client_id}")
    allow_request()

Correct: Atomic Lua script

# GOOD — entire check runs atomically in Redis
result = redis_client.eval("""
    local count = redis.call('INCR', KEYS[1])
    if count == 1 then redis.call('EXPIRE', KEYS[1], ARGV[2]) end
    if count > tonumber(ARGV[1]) then return 0 else return 1 end
""", 1, f"rl:{client_id}", rate_limit, window_seconds)

Wrong: Fixed window without boundary burst protection

# BAD — 100 req at 0:59 + 100 at 1:00 = 200 in 2 seconds
window = int(time.time() / 60)
key = f"rl:{client_id}:{window}"
count = redis_client.incr(key)

Correct: Sliding window counter

# GOOD — weighted average across adjacent windows prevents boundary burst
# Cloudflare: estimated = prev_count * ((window - elapsed) / window) + curr_count
# At t=0:45 with 60s window: weight=0.25, only 25% of previous window counts

Wrong: In-memory state only (not shared)

# BAD — 4 servers x 100/min = client can send 400/min total
request_counts = {}  # Not shared across instances
def check_rate(client_id):
    request_counts[client_id] = request_counts.get(client_id, 0) + 1
    return request_counts[client_id] <= 100

Correct: Centralized counter store

# GOOD — all instances share the same counter via Redis
result = redis_client.evalsha(script_sha, 1, f"rl:{client_id}",
    max_requests, window_seconds, time.time())

Wrong: No TTL on rate-limit keys

# BAD — keys accumulate forever, Redis memory grows unbounded
redis_client.hmset(f"rl:{client_id}", {"tokens": tokens, "ts": now})
# No EXPIRE — if client disappears, key persists forever

Correct: Always set TTL

# GOOD — TTL = 2x window ensures cleanup even with clock drift
redis_client.hmset(f"rl:{client_id}", {"tokens": tokens, "ts": now})
redis_client.expire(f"rl:{client_id}", window_seconds * 2)

Common Pitfalls

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Protecting APIs from abuse (brute force, scraping)Single-server app with no distributed requirementsIn-process rate limiter (Guava, Go rate, node-rate-limiter-flexible)
Enforcing SLA/tier-based API quotasNeed to block L3/L4 DDoS attacksNetwork-level DDoS mitigation (Cloudflare, AWS Shield, Cloud Armor)
Preventing resource exhaustion in microservicesService is failing (unhealthy, not overloaded)Circuit breaker pattern (Resilience4j, Polly, Hystrix)
Multi-datacenter API with shared rate limitsRate limiting per-user with small user base and single serverSimple in-memory counter with mutex
Cost control for expensive downstream calls (LLM APIs, payment processors)Need request queuing with guaranteed deliveryMessage queue (RabbitMQ, SQS) with consumer rate control

Important Caveats

Related Units