redis-cli EVAL "lua_script" 1 key limit window (atomic rate-limit check via Lua)| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| Rate Limit Algorithm | Core counting logic | Token bucket, sliding window counter, fixed window, leaky bucket | Algorithm choice depends on burst tolerance vs accuracy requirements |
| Centralized Counter Store | Atomic counters shared across nodes | Redis (primary), Memcached, DynamoDB | Redis Cluster with hash slots; read replicas for high-read workloads |
| API Gateway / Proxy | Enforcement point before application | NGINX, Kong, Envoy, AWS API Gateway, Cloudflare | Horizontal scaling; each instance queries shared counter store |
| Lua Script Engine | Atomic multi-step operations | Redis EVAL/EVALSHA | Scripts cached server-side via SHA; no additional scaling needed |
| Rule Configuration Store | Per-client/tier rate limits | Config file, database, feature flags | Hot-reload without restart; hierarchical rules (global > tenant > endpoint) |
| Client Identity Resolver | Extracts rate-limit key from request | API key, JWT claims, IP address, combination | Consistent hashing to same Redis shard per client key |
| Response Header Formatter | Communicates limit status to clients | X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After, RateLimit (draft RFC) | Standard headers; no scaling concern |
| Async Counter Sync | Reduces latency for non-critical limits | Local counters + periodic sync to Redis | Eventual consistency; configurable sync interval |
| Monitoring & Alerting | Tracks rejection rates, Redis latency | Prometheus + Grafana, Datadog, CloudWatch | Alert on rejection spike, Redis memory, counter key cardinality |
| Fallback / Circuit Breaker | Handles Redis outage gracefully | In-memory local limiter, fail-open, fail-closed | Degrade to local approximate limiting; log for reconciliation |
| Load Shedder | Protects system under extreme load | Priority-based traffic classification | Stripe model: critical > POSTs > GETs > test traffic [src1] |
| Clock Synchronization | Consistent time across nodes | NTP, wall-clock-aligned windows | Align windows to Unix epoch seconds; tolerate small skew with weighted windows |
START
├── Request volume <1K/sec and single datacenter?
│ ├── YES → In-process rate limiter (Guava RateLimiter, Go rate.Limiter)
│ └── NO ↓
├── Need to allow short traffic bursts above steady rate?
│ ├── YES → Token bucket algorithm (Redis + Lua)
│ │ ├── Stripe-style API? → Add concurrent request limiter + load shedder
│ │ └── Standard API? → Token bucket with per-key buckets
│ └── NO ↓
├── Need precise per-second accuracy (billing, compliance)?
│ ├── YES → Sliding window log (Redis sorted sets) — higher memory cost
│ └── NO ↓
├── Need smooth output rate (queue processing, webhooks)?
│ ├── YES → Leaky bucket (process at fixed rate, queue excess)
│ └── NO ↓
├── Simple implementation, tolerant of boundary burst?
│ ├── YES → Fixed window counter (simplest Redis INCR)
│ └── NO ↓
└── DEFAULT → Sliding window counter (Cloudflare approach) — best accuracy/memory trade-off
Define what identifies a unique client. Common keys: API key, user ID, IP address, or composite (user + endpoint). [src1]
Rate-limit key examples:
Per user: ratelimit:{user_id}:{endpoint}
Per API key: ratelimit:{api_key}
Per IP: ratelimit:{client_ip}:{endpoint}
Per tenant: ratelimit:{tenant_id}:{tier}
Verify: Ensure keys are unique per client scope — duplicate keys cause shared limits across unrelated clients.
All rate-limit checks must be atomic. A Lua script ensures the read-check-increment cycle cannot be interrupted by concurrent requests. [src3] [src4]
-- Token bucket rate limiter (Redis Lua script)
-- KEYS[1] = rate limit key
-- ARGV[1] = max tokens (bucket capacity)
-- ARGV[2] = refill rate (tokens per second)
-- ARGV[3] = current timestamp (seconds)
-- ARGV[4] = tokens to consume (usually 1)
-- Returns: {allowed (0/1), remaining_tokens}
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])
if tokens == nil then
tokens = capacity
last_refill = now
end
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + (elapsed * refill_rate))
local allowed = 0
local remaining = tokens
if tokens >= requested then
tokens = tokens - requested
allowed = 1
remaining = tokens
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
return {allowed, remaining}
Verify: redis-cli EVAL "$(cat token_bucket.lua)" 1 "ratelimit:user123" 10 1 $(date +%s) 1 → {1, 9}
Communicate rate-limit state back to clients using IETF draft standard headers. [src7]
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708732800
Retry-After: 30
Content-Type: application/json
{"error": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 30s."}
Verify: curl -v https://api.example.com/endpoint 2>&1 | grep X-RateLimit
Use Redis Sentinel or Cluster for production to avoid single point of failure. [src3]
# Redis Sentinel configuration (sentinel.conf)
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel parallel-syncs mymaster 1
Verify: redis-cli -p 26379 SENTINEL master mymaster → confirms master is up and replicas connected.
Rate limiters must handle Redis failures gracefully. Decide on fail-open or fail-closed based on context. [src1]
Fallback strategy:
1. Try Redis rate-limit check (Lua script)
2. If Redis unavailable:
a. Security-critical (auth, payment): FAIL CLOSED → reject with 503
b. General API: FAIL OPEN → allow request, log the bypass
c. Optional: fall back to in-memory approximate counter per node
3. Circuit breaker: stop retrying Redis for 30s after 3 consecutive failures
4. Alert on fallback activation
Verify: Kill Redis replica, confirm application logs fallback activation and requests handled per policy.
Implement hierarchical rules for different client tiers and priority-based load shedding. [src1] [src6]
Rate-limit tiers:
free_tier: 100 req/min, 1000 req/hour
pro_tier: 1000 req/min, 50000 req/hour
enterprise: 10000 req/min, unlimited/hour
Priority classification (Stripe model):
P0 — Critical (payments): never shed
P1 — POST operations: shed at 90% capacity
P2 — GET operations: shed at 80% capacity
P3 — Test mode/analytics: shed at 70% capacity
Verify: Send requests with different API keys → confirm each tier receives correct X-RateLimit-Limit.
Full script: sliding_window_limiter.py (46 lines)
# Input: client_key (str), max_requests (int), window_seconds (int)
# Output: (allowed: bool, remaining: int, retry_after: int)
# Requires: redis>=5.0.0
import time, redis
SLIDING_WINDOW_LUA = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local curr_window = tostring(math.floor(now / window) * window)
local prev_window = tostring(tonumber(curr_window) - window)
local prev_count = tonumber(redis.call('GET', key..':'..prev_window) or '0')
local curr_count = tonumber(redis.call('GET', key..':'..curr_window) or '0')
local elapsed = now - tonumber(curr_window)
local weight = (window - elapsed) / window
local estimated = math.floor(prev_count * weight) + curr_count
if estimated >= limit then return {0, 0, window - elapsed} end
redis.call('INCR', key..':'..curr_window)
redis.call('EXPIRE', key..':'..curr_window, window * 2)
return {1, limit - estimated - 1, 0}
"""
class SlidingWindowRateLimiter:
def __init__(self, redis_url="redis://localhost:6379"):
self.r = redis.from_url(redis_url, decode_responses=True)
self._sha = self.r.script_load(SLIDING_WINDOW_LUA)
def is_allowed(self, client_key, max_requests=100, window_seconds=60):
try:
result = self.r.evalsha(self._sha, 1,
f"rl:{client_key}", max_requests, window_seconds, time.time())
return bool(result[0]), int(result[1]), int(result[2])
except redis.ConnectionError:
return True, -1, 0 # Fail-open
Full script: express_rate_limiter.js (65 lines)
// Input: Express request object
// Output: next() if allowed, 429 response if rate-limited
// Requires: ioredis@^5.0.0, express@^4.18.0
const Redis = require('ioredis');
const TOKEN_BUCKET_LUA = `
local key = KEYS[1]
local cap = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local d = redis.call('HMGET', key, 'tokens', 'ts')
local t = tonumber(d[1]) or cap
local ts = tonumber(d[2]) or now
t = math.min(cap, t + (now - ts) * rate)
if t < 1 then return {0, 0, math.ceil((1-t)/rate)} end
t = t - 1
redis.call('HMSET', key, 'tokens', t, 'ts', now)
redis.call('EXPIRE', key, math.ceil(cap/rate)*2)
return {1, math.floor(t), 0}`;
function createRateLimiter(opts = {}) {
const { redisUrl='redis://localhost:6379', capacity=100, refillRate=1.67 } = opts;
const r = new Redis(redisUrl, { maxRetriesPerRequest: 1 });
return async (req, res, next) => {
const key = `rl:${req.headers['x-api-key'] || req.ip}`;
try {
const [ok, rem, retry] = await r.eval(TOKEN_BUCKET_LUA, 1, key,
capacity, refillRate, Date.now()/1000);
res.set({ 'X-RateLimit-Limit': String(capacity),
'X-RateLimit-Remaining': String(rem) });
if (!ok) { res.set('Retry-After', String(retry));
return res.status(429).json({ error: 'rate_limit_exceeded' }); }
next();
} catch(e) { console.error('RL error:', e.message); next(); }
};
}
# BAD — race condition: two requests both read count=99, both pass, actual=101
count = redis_client.get(f"rl:{client_id}")
if int(count or 0) < rate_limit:
redis_client.incr(f"rl:{client_id}")
allow_request()
# GOOD — entire check runs atomically in Redis
result = redis_client.eval("""
local count = redis.call('INCR', KEYS[1])
if count == 1 then redis.call('EXPIRE', KEYS[1], ARGV[2]) end
if count > tonumber(ARGV[1]) then return 0 else return 1 end
""", 1, f"rl:{client_id}", rate_limit, window_seconds)
# BAD — 100 req at 0:59 + 100 at 1:00 = 200 in 2 seconds
window = int(time.time() / 60)
key = f"rl:{client_id}:{window}"
count = redis_client.incr(key)
# GOOD — weighted average across adjacent windows prevents boundary burst
# Cloudflare: estimated = prev_count * ((window - elapsed) / window) + curr_count
# At t=0:45 with 60s window: weight=0.25, only 25% of previous window counts
# BAD — 4 servers x 100/min = client can send 400/min total
request_counts = {} # Not shared across instances
def check_rate(client_id):
request_counts[client_id] = request_counts.get(client_id, 0) + 1
return request_counts[client_id] <= 100
# GOOD — all instances share the same counter via Redis
result = redis_client.evalsha(script_sha, 1, f"rl:{client_id}",
max_requests, window_seconds, time.time())
# BAD — keys accumulate forever, Redis memory grows unbounded
redis_client.hmset(f"rl:{client_id}", {"tokens": tokens, "ts": now})
# No EXPIRE — if client disappears, key persists forever
# GOOD — TTL = 2x window ensures cleanup even with clock drift
redis_client.hmset(f"rl:{client_id}", {"tokens": tokens, "ts": now})
redis_client.expire(f"rl:{client_id}", window_seconds * 2)
redis.eval(lua_script) for atomic read-check-increment. [src3]floor(ts/window)*window) + NTP sync. [src5]Retry-After on 429 responses. [src7]user_id or api_key + optional endpoint grouping. [src1]EVALSHA latency under concurrency. [src4]| Use When | Don't Use When | Use Instead |
|---|---|---|
| Protecting APIs from abuse (brute force, scraping) | Single-server app with no distributed requirements | In-process rate limiter (Guava, Go rate, node-rate-limiter-flexible) |
| Enforcing SLA/tier-based API quotas | Need to block L3/L4 DDoS attacks | Network-level DDoS mitigation (Cloudflare, AWS Shield, Cloud Armor) |
| Preventing resource exhaustion in microservices | Service is failing (unhealthy, not overloaded) | Circuit breaker pattern (Resilience4j, Polly, Hystrix) |
| Multi-datacenter API with shared rate limits | Rate limiting per-user with small user base and single server | Simple in-memory counter with mutex |
| Cost control for expensive downstream calls (LLM APIs, payment processors) | Need request queuing with guaranteed delivery | Message queue (RabbitMQ, SQS) with consumer rate control |
{user_id} for related keys on same shardcapacity and refill_rate are independent — capacity controls burst size, refill_rate controls sustained throughput; misconfiguring either produces unexpected behavior