API Rate Limiting: Algorithms, Implementation & Best Practices

Type: Software Reference Confidence: 0.92 Sources: 8 Verified: 2026-02-24 Freshness: 2026-02-24

TL;DR

Constraints

Quick Reference

Algorithm Comparison

AlgorithmBurst HandlingMemory per KeyAccuracyDistributed FriendlyComplexityBest For
Token BucketExcellentO(1) — 2 valuesGoodYes (atomic ops)LowGeneral API rate limiting
Leaky BucketNone — constant rateO(1) — 2 valuesExcellentYes (atomic ops)LowTraffic shaping
Fixed WindowPoor — 2x at boundaryO(1) — 1 counterLowYes (INCR + EXPIRE)Very LowSimple counters
Sliding Window LogNone — strictO(n) — all timestampsExactModerate (sorted sets)HighBilling, compliance
Sliding Window CounterGood — weightedO(1) — 2 countersGood (~0.003%)Yes (two counters)LowProduction APIs
GCRAGood — configurableO(1) — 1 timestampExcellentYes (single CAS)ModerateHigh-scale production

HTTP Response Headers

HeaderPurposeExampleRequired
Retry-AfterSeconds until client should retryRetry-After: 30Yes (with 429)
X-RateLimit-LimitMax requests in windowX-RateLimit-Limit: 100Recommended
X-RateLimit-RemainingRequests leftX-RateLimit-Remaining: 42Recommended
X-RateLimit-ResetUnix timestamp when window resetsX-RateLimit-Reset: 1708790400Recommended
RateLimit-PolicyMachine-readable policy (IETF draft)RateLimit-Policy: 100;w=60Optional

Decision Tree

START
├── Single server instance?
│   ├── YES → In-memory rate limiter (Map/dict with TTL cleanup)
│   │   ├── Need burst tolerance? → Token Bucket
│   │   └── Need smooth constant rate? → Leaky Bucket
│   └── NO (distributed) ↓
├── Have Redis available?
│   ├── YES ↓
│   │   ├── Need exact per-second accuracy (billing/compliance)?
│   │   │   ├── YES → Sliding Window Log (Redis Sorted Set)
│   │   │   └── NO ↓
│   │   ├── Need minimal memory at scale (>100K keys)?
│   │   │   ├── YES → GCRA (single timestamp per key)
│   │   │   └── NO ↓
│   │   └── DEFAULT → Sliding Window Counter
│   └── NO Redis ↓
├── Database-backed acceptable (higher latency)?
│   ├── YES → Fixed Window with SQL counter + row lock
│   └── NO ↓
└── DEFAULT → API gateway rate limiting (Kong, NGINX, Envoy, Cloudflare)

Step-by-Step Guide

1. Choose your rate limit key strategy

Decide how to identify clients. The key determines who gets throttled together. [src1]

StrategyKey PatternProsCons
Per-IPratelimit:{ip}No auth requiredFails behind NAT
Per-API-keyratelimit:{api_key}Accurate per-customerRequires auth
Per-userratelimit:{user_id}Precise per-accountRequires session
Per-endpointratelimit:{ip}:{method}:{path}Granular controlMore memory
Tieredratelimit:{tier}:{user_id}Differentiated limitsComplex config

2. Implement the rate limiter with Redis Lua script

Use a Lua script for atomicity — this prevents race conditions. [src2] [src5]

-- sliding_window_counter.lua
-- KEYS[1] = rate limit key, ARGV[1] = window, ARGV[2] = max, ARGV[3] = now
local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_requests = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_key = key .. ":" .. math.floor(now / window)
local previous_key = key .. ":" .. (math.floor(now / window) - 1)
local current_count = tonumber(redis.call("GET", current_key) or "0")
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local elapsed = now % window
local weight = (window - elapsed) / window
local estimated_count = math.floor(previous_count * weight) + current_count
if estimated_count >= max_requests then
    return {0, 0, window - elapsed}
end
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)
return {1, max_requests - estimated_count - 1, 0}

Verify: redis-cli EVAL "$(cat sliding_window_counter.lua)" 1 "test:key" 60 100 $(date +%s) → expected: 1 (allowed)

3. Add HTTP response headers

Include rate limit headers in every response, not just 429s. [src3] [src6]

function setRateLimitHeaders(res, limit, remaining, resetTimestamp, retryAfter) {
  res.set('X-RateLimit-Limit', String(limit));
  res.set('X-RateLimit-Remaining', String(Math.max(0, remaining)));
  res.set('X-RateLimit-Reset', String(resetTimestamp));
  if (retryAfter > 0) res.set('Retry-After', String(retryAfter));
}

Verify: curl -i https://your-api.com/endpoint → check for X-RateLimit-* headers

4. Handle rate limit exceeded responses

Return a structured error body with HTTP 429. [src3]

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests. Please retry after 30 seconds.",
    "retry_after": 30,
    "limit": 100,
    "window": "60s"
  }
}

5. Implement graceful degradation

If Redis is unreachable, fail open or closed depending on security requirements. [src8]

async function checkRateLimit(key, limit, window) {
  try {
    const result = await redis.eval(luaScript, 1, key, window, limit, Date.now() / 1000);
    return { allowed: result[0] === 1, remaining: result[1], retryAfter: result[2] };
  } catch (err) {
    console.error('Rate limiter unavailable, failing open:', err.message);
    return { allowed: true, remaining: -1, retryAfter: 0, degraded: true };
  }
}

Code Examples

Node.js (Express): Token Bucket with Redis

Full script: token-bucket-express.js (45 lines)

// Input:  HTTP request to any Express route
// Output: 429 if rate limited, next() if allowed
// Requires: ioredis ^5.0.0

function rateLimiter({ capacity = 100, refillRate = 1.67, keyFn }) {
  return async (req, res, next) => {
    const key = `rl:${keyFn(req)}`;
    const now = Date.now() / 1000;
    try {
      const [allowed, remaining, retryAfter] = await redis.eval(
        TOKEN_BUCKET_SCRIPT, 1, key, capacity, refillRate, now, 1);
      res.set('X-RateLimit-Limit', String(capacity));
      res.set('X-RateLimit-Remaining', String(remaining));
      if (!allowed) {
        res.set('Retry-After', String(retryAfter));
        return res.status(429).json({ error: { code: 'RATE_LIMIT_EXCEEDED' } });
      }
      next();
    } catch (err) { next(); /* fail open */ }
  };
}

Python (FastAPI): Sliding Window Counter with Redis

Full script: sliding-window-fastapi.py (50 lines)

# Input:  HTTP request to any FastAPI route
# Output: 429 JSONResponse if rate limited, None if allowed
# Requires: redis>=5.0.0, fastapi>=0.100.0

class RateLimitMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        key = f"rl:{request.headers.get('x-api-key', request.client.host)}"
        result = await self.redis.evalsha(self.sha, 1, key,
            self.window, self.limit, int(time.time()))
        allowed, remaining, retry_after = result
        response = (await call_next(request)) if allowed else JSONResponse(
            status_code=429, content={"error": {"code": "RATE_LIMIT_EXCEEDED"}})
        response.headers["X-RateLimit-Remaining"] = str(max(0, remaining))
        if retry_after > 0: response.headers["Retry-After"] = str(retry_after)
        return response

Go (net/http): GCRA with Redis

Full script: gcra-go.go (60 lines)

// Input:  HTTP request to any Go HTTP handler
// Output: 429 if rate limited, passes to next handler if allowed
// Requires: go-redis/redis/v9

func (l *Limiter) Middleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        key := fmt.Sprintf("rl:%s", l.keyFn(r))
        now := float64(time.Now().UnixMilli()) / 1000.0
        result, err := gcraScript.Run(ctx, l.rdb, []string{key},
            l.emissionInterval, l.burst, now).Int64Slice()
        if err != nil { next.ServeHTTP(w, r); return } // fail open
        w.Header().Set("X-RateLimit-Remaining", strconv.FormatInt(result[1], 10))
        if result[0] == 0 {
            w.Header().Set("Retry-After", strconv.FormatInt(result[2], 10))
            http.Error(w, `{"error":{"code":"RATE_LIMIT_EXCEEDED"}}`, 429)
            return
        }
        next.ServeHTTP(w, r)
    })
}

Anti-Patterns

Wrong: Non-atomic get-then-set in distributed environment

// BAD — race condition: two requests read count=99, both pass
async function checkRateLimit(key) {
  const count = await redis.get(key);       // read
  if (count >= 100) return false;
  await redis.incr(key);                    // NOT atomic with read!
  return true;
}

Correct: Atomic Lua script or INCR-first pattern

// GOOD — atomic: INCR returns new value, single round-trip
async function checkRateLimit(key, limit, windowSec) {
  const count = await redis.incr(key);
  if (count === 1) await redis.expire(key, windowSec);
  return count <= limit;
}

Wrong: Rate limiting after expensive operations

// BAD — database query runs BEFORE rate limit check
app.get('/api/search', async (req, res) => {
  const results = await db.query(req.query.q);  // expensive!
  if (await isRateLimited(req.ip)) return res.status(429).json({});
  res.json(results);
});

Correct: Rate limit as the first middleware

// GOOD — rate limit check happens before any expensive operation
app.get('/api/search',
  rateLimiter({ limit: 100, window: 60 }),  // first!
  async (req, res) => { res.json(await db.query(req.query.q)); }
);

Wrong: Missing Retry-After header on 429 responses

// BAD — client retries immediately in a tight loop
if (isRateLimited(req)) {
  return res.status(429).json({ error: 'Too many requests' });
}

Correct: Always include Retry-After and rate limit headers

// GOOD — client knows exactly when to retry
if (!result.allowed) {
  res.set('Retry-After', String(result.retryAfter));
  res.set('X-RateLimit-Limit', '100');
  res.set('X-RateLimit-Remaining', '0');
  return res.status(429).json({ error: 'Too many requests', retry_after: result.retryAfter });
}

Common Pitfalls

Diagnostic Commands

# Check current rate limit state for a key
redis-cli GET "rl:user123:/api/data:$(date +%s | awk '{print int($1/60)}')"

# Monitor rate limit operations in real-time
redis-cli MONITOR | grep "rl:"

# Test rate limiting with rapid requests
for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code} " http://localhost:3000/api/test -H "X-API-Key: test123"; done

# Check Redis memory usage for rate limit keys
redis-cli MEMORY USAGE "rl:user123:/api/data"

# Verify Retry-After header on 429 response
curl -i http://localhost:3000/api/test 2>&1 | grep -E "(HTTP|Retry-After|X-RateLimit)"

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Public-facing API with untrusted clientsInternal microservice-to-microservice callsCircuit breaker + bulkhead pattern
Need to enforce fair usage across consumersSingle-user CLI tool or batch jobSimple concurrency limiter (semaphore)
Protecting expensive backend resourcesNetwork-layer DDoS attack (volumetric)CDN/WAF rate limiting (Cloudflare, AWS Shield)
Multi-tenant SaaS with per-customer quotasReal-time WebSocket persistent connectionsPer-connection message throttling
Compliance requires audit trail of usageStatic asset serving (images, CSS, JS)CDN caching + edge rules

Important Caveats

Related Units