Redis EVAL with Lua scripts for atomic, distributed rate limitingRetry-After header (RFC 6585) — omitting it causes retry storms| Algorithm | Burst Handling | Memory per Key | Accuracy | Distributed Friendly | Complexity | Best For |
|---|---|---|---|---|---|---|
| Token Bucket | Excellent | O(1) — 2 values | Good | Yes (atomic ops) | Low | General API rate limiting |
| Leaky Bucket | None — constant rate | O(1) — 2 values | Excellent | Yes (atomic ops) | Low | Traffic shaping |
| Fixed Window | Poor — 2x at boundary | O(1) — 1 counter | Low | Yes (INCR + EXPIRE) | Very Low | Simple counters |
| Sliding Window Log | None — strict | O(n) — all timestamps | Exact | Moderate (sorted sets) | High | Billing, compliance |
| Sliding Window Counter | Good — weighted | O(1) — 2 counters | Good (~0.003%) | Yes (two counters) | Low | Production APIs |
| GCRA | Good — configurable | O(1) — 1 timestamp | Excellent | Yes (single CAS) | Moderate | High-scale production |
| Header | Purpose | Example | Required |
|---|---|---|---|
Retry-After | Seconds until client should retry | Retry-After: 30 | Yes (with 429) |
X-RateLimit-Limit | Max requests in window | X-RateLimit-Limit: 100 | Recommended |
X-RateLimit-Remaining | Requests left | X-RateLimit-Remaining: 42 | Recommended |
X-RateLimit-Reset | Unix timestamp when window resets | X-RateLimit-Reset: 1708790400 | Recommended |
RateLimit-Policy | Machine-readable policy (IETF draft) | RateLimit-Policy: 100;w=60 | Optional |
START
├── Single server instance?
│ ├── YES → In-memory rate limiter (Map/dict with TTL cleanup)
│ │ ├── Need burst tolerance? → Token Bucket
│ │ └── Need smooth constant rate? → Leaky Bucket
│ └── NO (distributed) ↓
├── Have Redis available?
│ ├── YES ↓
│ │ ├── Need exact per-second accuracy (billing/compliance)?
│ │ │ ├── YES → Sliding Window Log (Redis Sorted Set)
│ │ │ └── NO ↓
│ │ ├── Need minimal memory at scale (>100K keys)?
│ │ │ ├── YES → GCRA (single timestamp per key)
│ │ │ └── NO ↓
│ │ └── DEFAULT → Sliding Window Counter
│ └── NO Redis ↓
├── Database-backed acceptable (higher latency)?
│ ├── YES → Fixed Window with SQL counter + row lock
│ └── NO ↓
└── DEFAULT → API gateway rate limiting (Kong, NGINX, Envoy, Cloudflare)
Decide how to identify clients. The key determines who gets throttled together. [src1]
| Strategy | Key Pattern | Pros | Cons |
|---|---|---|---|
| Per-IP | ratelimit:{ip} | No auth required | Fails behind NAT |
| Per-API-key | ratelimit:{api_key} | Accurate per-customer | Requires auth |
| Per-user | ratelimit:{user_id} | Precise per-account | Requires session |
| Per-endpoint | ratelimit:{ip}:{method}:{path} | Granular control | More memory |
| Tiered | ratelimit:{tier}:{user_id} | Differentiated limits | Complex config |
Use a Lua script for atomicity — this prevents race conditions. [src2] [src5]
-- sliding_window_counter.lua
-- KEYS[1] = rate limit key, ARGV[1] = window, ARGV[2] = max, ARGV[3] = now
local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_requests = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_key = key .. ":" .. math.floor(now / window)
local previous_key = key .. ":" .. (math.floor(now / window) - 1)
local current_count = tonumber(redis.call("GET", current_key) or "0")
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local elapsed = now % window
local weight = (window - elapsed) / window
local estimated_count = math.floor(previous_count * weight) + current_count
if estimated_count >= max_requests then
return {0, 0, window - elapsed}
end
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)
return {1, max_requests - estimated_count - 1, 0}
Verify: redis-cli EVAL "$(cat sliding_window_counter.lua)" 1 "test:key" 60 100 $(date +%s) → expected: 1 (allowed)
Include rate limit headers in every response, not just 429s. [src3] [src6]
function setRateLimitHeaders(res, limit, remaining, resetTimestamp, retryAfter) {
res.set('X-RateLimit-Limit', String(limit));
res.set('X-RateLimit-Remaining', String(Math.max(0, remaining)));
res.set('X-RateLimit-Reset', String(resetTimestamp));
if (retryAfter > 0) res.set('Retry-After', String(retryAfter));
}
Verify: curl -i https://your-api.com/endpoint → check for X-RateLimit-* headers
Return a structured error body with HTTP 429. [src3]
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Too many requests. Please retry after 30 seconds.",
"retry_after": 30,
"limit": 100,
"window": "60s"
}
}
If Redis is unreachable, fail open or closed depending on security requirements. [src8]
async function checkRateLimit(key, limit, window) {
try {
const result = await redis.eval(luaScript, 1, key, window, limit, Date.now() / 1000);
return { allowed: result[0] === 1, remaining: result[1], retryAfter: result[2] };
} catch (err) {
console.error('Rate limiter unavailable, failing open:', err.message);
return { allowed: true, remaining: -1, retryAfter: 0, degraded: true };
}
}
Full script: token-bucket-express.js (45 lines)
// Input: HTTP request to any Express route
// Output: 429 if rate limited, next() if allowed
// Requires: ioredis ^5.0.0
function rateLimiter({ capacity = 100, refillRate = 1.67, keyFn }) {
return async (req, res, next) => {
const key = `rl:${keyFn(req)}`;
const now = Date.now() / 1000;
try {
const [allowed, remaining, retryAfter] = await redis.eval(
TOKEN_BUCKET_SCRIPT, 1, key, capacity, refillRate, now, 1);
res.set('X-RateLimit-Limit', String(capacity));
res.set('X-RateLimit-Remaining', String(remaining));
if (!allowed) {
res.set('Retry-After', String(retryAfter));
return res.status(429).json({ error: { code: 'RATE_LIMIT_EXCEEDED' } });
}
next();
} catch (err) { next(); /* fail open */ }
};
}
Full script: sliding-window-fastapi.py (50 lines)
# Input: HTTP request to any FastAPI route
# Output: 429 JSONResponse if rate limited, None if allowed
# Requires: redis>=5.0.0, fastapi>=0.100.0
class RateLimitMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
key = f"rl:{request.headers.get('x-api-key', request.client.host)}"
result = await self.redis.evalsha(self.sha, 1, key,
self.window, self.limit, int(time.time()))
allowed, remaining, retry_after = result
response = (await call_next(request)) if allowed else JSONResponse(
status_code=429, content={"error": {"code": "RATE_LIMIT_EXCEEDED"}})
response.headers["X-RateLimit-Remaining"] = str(max(0, remaining))
if retry_after > 0: response.headers["Retry-After"] = str(retry_after)
return response
Full script: gcra-go.go (60 lines)
// Input: HTTP request to any Go HTTP handler
// Output: 429 if rate limited, passes to next handler if allowed
// Requires: go-redis/redis/v9
func (l *Limiter) Middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
key := fmt.Sprintf("rl:%s", l.keyFn(r))
now := float64(time.Now().UnixMilli()) / 1000.0
result, err := gcraScript.Run(ctx, l.rdb, []string{key},
l.emissionInterval, l.burst, now).Int64Slice()
if err != nil { next.ServeHTTP(w, r); return } // fail open
w.Header().Set("X-RateLimit-Remaining", strconv.FormatInt(result[1], 10))
if result[0] == 0 {
w.Header().Set("Retry-After", strconv.FormatInt(result[2], 10))
http.Error(w, `{"error":{"code":"RATE_LIMIT_EXCEEDED"}}`, 429)
return
}
next.ServeHTTP(w, r)
})
}
// BAD — race condition: two requests read count=99, both pass
async function checkRateLimit(key) {
const count = await redis.get(key); // read
if (count >= 100) return false;
await redis.incr(key); // NOT atomic with read!
return true;
}
// GOOD — atomic: INCR returns new value, single round-trip
async function checkRateLimit(key, limit, windowSec) {
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, windowSec);
return count <= limit;
}
// BAD — database query runs BEFORE rate limit check
app.get('/api/search', async (req, res) => {
const results = await db.query(req.query.q); // expensive!
if (await isRateLimited(req.ip)) return res.status(429).json({});
res.json(results);
});
// GOOD — rate limit check happens before any expensive operation
app.get('/api/search',
rateLimiter({ limit: 100, window: 60 }), // first!
async (req, res) => { res.json(await db.query(req.query.q)); }
);
// BAD — client retries immediately in a tight loop
if (isRateLimited(req)) {
return res.status(429).json({ error: 'Too many requests' });
}
// GOOD — client knows exactly when to retry
if (!result.allowed) {
res.set('Retry-After', String(result.retryAfter));
res.set('X-RateLimit-Limit', '100');
res.set('X-RateLimit-Remaining', '0');
return res.status(429).json({ error: 'Too many requests', retry_after: result.retryAfter });
}
# Check current rate limit state for a key
redis-cli GET "rl:user123:/api/data:$(date +%s | awk '{print int($1/60)}')"
# Monitor rate limit operations in real-time
redis-cli MONITOR | grep "rl:"
# Test rate limiting with rapid requests
for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code} " http://localhost:3000/api/test -H "X-API-Key: test123"; done
# Check Redis memory usage for rate limit keys
redis-cli MEMORY USAGE "rl:user123:/api/data"
# Verify Retry-After header on 429 response
curl -i http://localhost:3000/api/test 2>&1 | grep -E "(HTTP|Retry-After|X-RateLimit)"
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Public-facing API with untrusted clients | Internal microservice-to-microservice calls | Circuit breaker + bulkhead pattern |
| Need to enforce fair usage across consumers | Single-user CLI tool or batch job | Simple concurrency limiter (semaphore) |
| Protecting expensive backend resources | Network-layer DDoS attack (volumetric) | CDN/WAF rate limiting (Cloudflare, AWS Shield) |
| Multi-tenant SaaS with per-customer quotas | Real-time WebSocket persistent connections | Per-connection message throttling |
| Compliance requires audit trail of usage | Static asset serving (images, CSS, JS) | CDN caching + edge rules |
RateLimit header fields (draft-ietf-httpapi-ratelimit-headers) are still in draft as of Feb 2026 — X-RateLimit-* prefix remains the de facto standard{user123}) for co-location