- Bottom line: Use token bucket for most APIs (best burst handling + simplicity); use sliding window log when you need precise per-second accuracy; use GCRA for memory-efficient production systems.

throttling API requests

- Bottom line: Use token bucket for most APIs (best burst handling + simplicity); use sliding window log when you need precise per-second accuracy; use GCRA for memory-efficient production systems.

token bucket rate limiting

- Bottom line: Use token bucket for most APIs (best burst handling + simplicity); use sliding window log when you need precise per-second accuracy; use GCRA for memory-efficient production systems.

sliding window rate limiter

- Bottom line: Use token bucket for most APIs (best burst handling + simplicity); use sliding window log when you need precise per-second accuracy; use GCRA for memory-efficient production systems.

request rate limiting middleware

- Bottom line: Use token bucket for most APIs (best burst handling + simplicity); use sliding window log when you need precise per-second accuracy; use GCRA for memory-efficient production systems.

API Rate Limiting: Algorithms, Implementation & Best Practices

How do I implement API rate limiting?

TL;DR

Bottom line: Use token bucket for most APIs (best burst handling + simplicity); use sliding window log for precise accuracy; use GCRA for memory-efficient production systems.
Key tool/command: Redis EVAL with Lua scripts for atomic, distributed rate limiting
Watch out for: Race conditions in distributed environments when using non-atomic get-then-set operations.
Works with: Any HTTP API framework (Express, FastAPI, Gin, Spring Boot). Redis 6+ recommended for distributed setups.

Constraints

Rate limiting checks MUST execute before expensive operations (auth, DB queries, business logic) — never after
Distributed deployments require atomic read-increment (Lua scripts or MULTI/EXEC) — naive get-then-set allows race condition bursts
Always return HTTP 429 with Retry-After header (RFC 6585) — omitting it causes retry storms
In-memory rate limiters only work for single-instance deployments — multi-instance requires shared state (Redis, Memcached)
Per-IP rate limiting fails behind NAT/proxies where thousands of users share one IP — combine with API key or user ID when possible

Quick Reference

Algorithm Comparison

Algorithm	Burst Handling	Memory per Key	Accuracy	Distributed Friendly	Complexity	Best For
Token Bucket	Excellent	O(1) — 2 values	Good	Yes (atomic ops)	Low	General API rate limiting
Leaky Bucket	None — constant rate	O(1) — 2 values	Excellent	Yes (atomic ops)	Low	Traffic shaping
Fixed Window	Poor — 2x at boundary	O(1) — 1 counter	Low	Yes (INCR + EXPIRE)	Very Low	Simple counters
Sliding Window Log	None — strict	O(n) — all timestamps	Exact	Moderate (sorted sets)	High	Billing, compliance
Sliding Window Counter	Good — weighted	O(1) — 2 counters	Good (~0.003%)	Yes (two counters)	Low	Production APIs
GCRA	Good — configurable	O(1) — 1 timestamp	Excellent	Yes (single CAS)	Moderate	High-scale production

HTTP Response Headers

Header	Purpose	Example	Required
`Retry-After`	Seconds until client should retry	`Retry-After: 30`	Yes (with 429)
`X-RateLimit-Limit`	Max requests in window	`X-RateLimit-Limit: 100`	Recommended
`X-RateLimit-Remaining`	Requests left	`X-RateLimit-Remaining: 42`	Recommended
`X-RateLimit-Reset`	Unix timestamp when window resets	`X-RateLimit-Reset: 1708790400`	Recommended
`RateLimit-Policy`	Machine-readable policy (IETF draft)	`RateLimit-Policy: 100;w=60`	Optional

Decision Tree

START
├── Single server instance?
│   ├── YES → In-memory rate limiter (Map/dict with TTL cleanup)
│   │   ├── Need burst tolerance? → Token Bucket
│   │   └── Need smooth constant rate? → Leaky Bucket
│   └── NO (distributed) ↓
├── Have Redis available?
│   ├── YES ↓
│   │   ├── Need exact per-second accuracy (billing/compliance)?
│   │   │   ├── YES → Sliding Window Log (Redis Sorted Set)
│   │   │   └── NO ↓
│   │   ├── Need minimal memory at scale (>100K keys)?
│   │   │   ├── YES → GCRA (single timestamp per key)
│   │   │   └── NO ↓
│   │   └── DEFAULT → Sliding Window Counter
│   └── NO Redis ↓
├── Database-backed acceptable (higher latency)?
│   ├── YES → Fixed Window with SQL counter + row lock
│   └── NO ↓
└── DEFAULT → API gateway rate limiting (Kong, NGINX, Envoy, Cloudflare)

Step-by-Step Guide

1. Choose your rate limit key strategy

Decide how to identify clients. The key determines who gets throttled together. [src1]

Strategy	Key Pattern	Pros	Cons
Per-IP	`ratelimit:{ip}`	No auth required	Fails behind NAT
Per-API-key	`ratelimit:{api_key}`	Accurate per-customer	Requires auth
Per-user	`ratelimit:{user_id}`	Precise per-account	Requires session
Per-endpoint	`ratelimit:{ip}:{method}:{path}`	Granular control	More memory
Tiered	`ratelimit:{tier}:{user_id}`	Differentiated limits	Complex config

2. Implement the rate limiter with Redis Lua script

Use a Lua script for atomicity — this prevents race conditions. [src2] [src5]

-- sliding_window_counter.lua
-- KEYS[1] = rate limit key, ARGV[1] = window, ARGV[2] = max, ARGV[3] = now
local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_requests = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_key = key .. ":" .. math.floor(now / window)
local previous_key = key .. ":" .. (math.floor(now / window) - 1)
local current_count = tonumber(redis.call("GET", current_key) or "0")
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local elapsed = now % window
local weight = (window - elapsed) / window
local estimated_count = math.floor(previous_count * weight) + current_count
if estimated_count >= max_requests then
    return {0, 0, window - elapsed}
end
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)
return {1, max_requests - estimated_count - 1, 0}

Verify: redis-cli EVAL "$(cat sliding_window_counter.lua)" 1 "test:key" 60 100 $(date +%s) → expected: 1 (allowed)

3. Add HTTP response headers

Include rate limit headers in every response, not just 429s. [src3] [src6]

function setRateLimitHeaders(res, limit, remaining, resetTimestamp, retryAfter) {
  res.set('X-RateLimit-Limit', String(limit));
  res.set('X-RateLimit-Remaining', String(Math.max(0, remaining)));
  res.set('X-RateLimit-Reset', String(resetTimestamp));
  if (retryAfter > 0) res.set('Retry-After', String(retryAfter));
}

Verify: curl -i https://your-api.com/endpoint → check for X-RateLimit-* headers

4. Handle rate limit exceeded responses

Return a structured error body with HTTP 429. [src3]

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests. Please retry after 30 seconds.",
    "retry_after": 30,
    "limit": 100,
    "window": "60s"
  }
}

5. Implement graceful degradation

If Redis is unreachable, fail open or closed depending on security requirements. [src8]

async function checkRateLimit(key, limit, window) {
  try {
    const result = await redis.eval(luaScript, 1, key, window, limit, Date.now() / 1000);
    return { allowed: result[0] === 1, remaining: result[1], retryAfter: result[2] };
  } catch (err) {
    console.error('Rate limiter unavailable, failing open:', err.message);
    return { allowed: true, remaining: -1, retryAfter: 0, degraded: true };
  }
}

Code Examples

Node.js (Express): Token Bucket with Redis

Full script: token-bucket-express.js (45 lines)

// Input:  HTTP request to any Express route
// Output: 429 if rate limited, next() if allowed
// Requires: ioredis ^5.0.0

function rateLimiter({ capacity = 100, refillRate = 1.67, keyFn }) {
  return async (req, res, next) => {
    const key = `rl:${keyFn(req)}`;
    const now = Date.now() / 1000;
    try {
      const [allowed, remaining, retryAfter] = await redis.eval(
        TOKEN_BUCKET_SCRIPT, 1, key, capacity, refillRate, now, 1);
      res.set('X-RateLimit-Limit', String(capacity));
      res.set('X-RateLimit-Remaining', String(remaining));
      if (!allowed) {
        res.set('Retry-After', String(retryAfter));
        return res.status(429).json({ error: { code: 'RATE_LIMIT_EXCEEDED' } });
      }
      next();
    } catch (err) { next(); /* fail open */ }
  };
}

Python (FastAPI): Sliding Window Counter with Redis

Full script: sliding-window-fastapi.py (50 lines)

# Input:  HTTP request to any FastAPI route
# Output: 429 JSONResponse if rate limited, None if allowed
# Requires: redis>=5.0.0, fastapi>=0.100.0

class RateLimitMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        key = f"rl:{request.headers.get('x-api-key', request.client.host)}"
        result = await self.redis.evalsha(self.sha, 1, key,
            self.window, self.limit, int(time.time()))
        allowed, remaining, retry_after = result
        response = (await call_next(request)) if allowed else JSONResponse(
            status_code=429, content={"error": {"code": "RATE_LIMIT_EXCEEDED"}})
        response.headers["X-RateLimit-Remaining"] = str(max(0, remaining))
        if retry_after > 0: response.headers["Retry-After"] = str(retry_after)
        return response

Go (net/http): GCRA with Redis

Full script: gcra-go.go (60 lines)

// Input:  HTTP request to any Go HTTP handler
// Output: 429 if rate limited, passes to next handler if allowed
// Requires: go-redis/redis/v9

func (l *Limiter) Middleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        key := fmt.Sprintf("rl:%s", l.keyFn(r))
        now := float64(time.Now().UnixMilli()) / 1000.0
        result, err := gcraScript.Run(ctx, l.rdb, []string{key},
            l.emissionInterval, l.burst, now).Int64Slice()
        if err != nil { next.ServeHTTP(w, r); return } // fail open
        w.Header().Set("X-RateLimit-Remaining", strconv.FormatInt(result[1], 10))
        if result[0] == 0 {
            w.Header().Set("Retry-After", strconv.FormatInt(result[2], 10))
            http.Error(w, `{"error":{"code":"RATE_LIMIT_EXCEEDED"}}`, 429)
            return
        }
        next.ServeHTTP(w, r)
    })
}

Anti-Patterns

Wrong: Non-atomic get-then-set in distributed environment

// BAD — race condition: two requests read count=99, both pass
async function checkRateLimit(key) {
  const count = await redis.get(key);       // read
  if (count >= 100) return false;
  await redis.incr(key);                    // NOT atomic with read!
  return true;
}

Correct: Atomic Lua script or INCR-first pattern

// GOOD — atomic: INCR returns new value, single round-trip
async function checkRateLimit(key, limit, windowSec) {
  const count = await redis.incr(key);
  if (count === 1) await redis.expire(key, windowSec);
  return count <= limit;
}

Wrong: Rate limiting after expensive operations

// BAD — database query runs BEFORE rate limit check
app.get('/api/search', async (req, res) => {
  const results = await db.query(req.query.q);  // expensive!
  if (await isRateLimited(req.ip)) return res.status(429).json({});
  res.json(results);
});

Correct: Rate limit as the first middleware

// GOOD — rate limit check happens before any expensive operation
app.get('/api/search',
  rateLimiter({ limit: 100, window: 60 }),  // first!
  async (req, res) => { res.json(await db.query(req.query.q)); }
);

Wrong: Missing Retry-After header on 429 responses

// BAD — client retries immediately in a tight loop
if (isRateLimited(req)) {
  return res.status(429).json({ error: 'Too many requests' });
}

Correct: Always include Retry-After and rate limit headers

// GOOD — client knows exactly when to retry
if (!result.allowed) {
  res.set('Retry-After', String(result.retryAfter));
  res.set('X-RateLimit-Limit', '100');
  res.set('X-RateLimit-Remaining', '0');
  return res.status(429).json({ error: 'Too many requests', retry_after: result.retryAfter });
}

Common Pitfalls

Fixed window boundary burst: Two bursts at the end and start of adjacent windows allows 2x limit. Fix: use sliding window counter or GCRA. [src1]
In-memory limiter in clustered deployment: Each instance tracks its own counts, so actual throughput = limit x instance_count. Fix: use Redis or shared store. [src2]
Key collision with shared IPs: Corporate NAT/VPNs put thousands behind one IP. Fix: prefer API key or user ID as key. [src6]
Redis failure cascading: If Redis goes down and you fail-closed, API becomes unavailable. Fix: fail-open with circuit breaker + local fallback. [src8]
Clock skew in distributed systems: Different server clocks cause inconsistent windows. Fix: use Redis TIME command or NTP-synced monotonic clocks. [src5]
No endpoint-level granularity: Global per-user limit means heavy endpoints compete with cheap ones. Fix: tiered limits per endpoint. [src8]

Diagnostic Commands

# Check current rate limit state for a key
redis-cli GET "rl:user123:/api/data:$(date +%s | awk '{print int($1/60)}')"

# Monitor rate limit operations in real-time
redis-cli MONITOR | grep "rl:"

# Test rate limiting with rapid requests
for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code} " http://localhost:3000/api/test -H "X-API-Key: test123"; done

# Check Redis memory usage for rate limit keys
redis-cli MEMORY USAGE "rl:user123:/api/data"

# Verify Retry-After header on 429 response
curl -i http://localhost:3000/api/test 2>&1 | grep -E "(HTTP|Retry-After|X-RateLimit)"

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Public-facing API with untrusted clients	Internal microservice-to-microservice calls	Circuit breaker + bulkhead pattern
Need to enforce fair usage across consumers	Single-user CLI tool or batch job	Simple concurrency limiter (semaphore)
Protecting expensive backend resources	Network-layer DDoS attack (volumetric)	CDN/WAF rate limiting (Cloudflare, AWS Shield)
Multi-tenant SaaS with per-customer quotas	Real-time WebSocket persistent connections	Per-connection message throttling
Compliance requires audit trail of usage	Static asset serving (images, CSS, JS)	CDN caching + edge rules

Important Caveats

IETF RateLimit header fields (draft-ietf-httpapi-ratelimit-headers) are still in draft as of Feb 2026 — X-RateLimit-* prefix remains the de facto standard
Redis Cluster mode requires all keys in a Lua script to hash to the same slot — use hash tags ({user123}) for co-location
GCRA's theoretical arrival time (TAT) can drift with clock adjustments (NTP jumps) — use monotonic time sources where available
Rate limiting alone is not a security solution — combine with authentication, input validation, and WAF rules
Cloud provider managed rate limiters (AWS API Gateway, Cloudflare, GCP Apigee) have different billing models — some charge per evaluation