Cache-Control: public, s-maxage=3600, stale-while-revalidate=60 for CDN; redis.setex(key, TTL, value) for distributed cacheVary headers or cache keys that include authentication context| Layer | Role | Technology Options | Typical TTL | Latency | Scaling Strategy |
|---|---|---|---|---|---|
| Browser Cache | Stores static assets and API responses locally on user device | HTTP Cache-Control headers, Service Workers | 1 hr – 1 year | 0ms | Per-client; no server cost |
| CDN / Edge Cache | Serves cached content from nearest edge node | Cloudflare, CloudFront, Fastly, Akamai | 5 min – 24 hr | 1-20ms | Automatic global distribution; tiered caching for origin shielding [src4] |
| Reverse Proxy / API Gateway | Caches API responses before reaching app servers | Nginx, Varnish, Kong, AWS API Gateway | 30s – 5 min | 1-5ms | Horizontal scaling; shared cache across app instances |
| Application Cache (L1) | In-process memory cache; fastest but per-instance | Guava (Java), lru-cache (Node), cachetools (Python) | 30s – 5 min | <0.1ms | Per-instance; scales with app replicas; risk of inconsistency |
| Distributed Cache (L2) | Shared cache across all instances; single cached truth | Redis Cluster, Memcached, KeyDB | 5 min – 1 hr | 1-5ms | Hash-based sharding, read replicas; 80-20 rule [src1] |
| Database Query Cache | Built-in query result caching at database level | Materialized views, PostgreSQL pg_stat_statements | Varies | 5-20ms | Limited; use only for expensive aggregations |
| Write-Behind Buffer | Absorbs writes, flushes to DB asynchronously | Redis + background worker, Kafka | N/A (write path) | <1ms ack | Decouples write latency; risk of data loss on crash |
| Session Cache | Stores user session data for stateless app servers | Redis, Memcached, DynamoDB | 30 min – 24 hr | 1-5ms | Partition by user ID; sticky sessions as fallback |
START
|-- What type of data?
| |-- Static assets (images, CSS, JS, fonts)?
| | |-- Use Browser Cache (Cache-Control: immutable) + CDN
| | +-- Set long TTL (1 year) with content-hash in filename
| |-- API responses (JSON)?
| | |-- Is response user-specific / authenticated?
| | | |-- YES -> Application Cache (L1) + Distributed Cache (L2) only
| | | | +-- Never cache in CDN/shared proxy without Vary
| | | +-- NO (public data) -> CDN + Reverse Proxy + L1 + L2
| | +-- Is strong consistency required?
| | |-- YES -> Short TTL (30s-5min) + event-driven invalidation
| | +-- NO -> Longer TTL (5min-1hr) + stale-while-revalidate
| |-- Database query results?
| | |-- Is the query expensive (>100ms)?
| | | |-- YES -> Distributed Cache (Redis) with cache-aside [src1]
| | | +-- NO -> Application-level L1 cache may suffice
| | +-- Is the data read-heavy (>10:1 read-to-write)?
| | |-- YES -> Multi-layer: L1 (in-process) + L2 (Redis)
| | +-- NO -> Write-through cache or skip caching
| +-- Session / user state?
| +-- Use Redis/Memcached with TTL matching session lifetime
|
|-- Expected QPS on hot keys?
| |-- <100 QPS -> Simple cache-aside, no stampede protection needed
| |-- 100-10K QPS -> Add request coalescing (single-flight) [src6]
| +-- >10K QPS -> Probabilistic early expiration + distributed locking
|
+-- How many app instances?
|-- 1 instance -> L1 in-process cache is sufficient
|-- 2-10 instances -> L1 + L2 (Redis) for consistency
+-- >10 instances -> L1 (short TTL) + L2 (Redis Cluster) + CDN
Analyze your read-to-write ratio, data staleness tolerance, and hotspot distribution. The 80-20 rule applies: 20% of keys typically serve 80% of traffic. [src1]
Data Analysis Checklist:
- Read-to-write ratio per entity type (>10:1 is cache-friendly)
- p50/p95/p99 latency of uncached DB queries
- Hotspot analysis: top 1000 keys by request frequency
- Staleness tolerance: seconds, minutes, or hours?
- Data size per cached object (affects memory budget)
Memory Budget:
cache_size = hot_key_count * avg_object_size * 1.3 (overhead)
Example: 100K hot keys * 2KB avg * 1.3 = ~260MB Redis
Verify: SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY calls DESC LIMIT 20
Add a local in-memory cache with short TTL to each application instance. This eliminates network round-trips for the hottest data. [src3]
# Python: using cachetools (pip install cachetools==5.3.2)
from cachetools import TTLCache
import threading
# L1 cache: 1000 items max, 60-second TTL
l1_cache = TTLCache(maxsize=1000, ttl=60)
l1_lock = threading.Lock()
def get_from_l1(key: str):
with l1_lock:
return l1_cache.get(key)
def set_in_l1(key: str, value):
with l1_lock:
l1_cache[key] = value
Verify: Monitor L1 hit rate. If <50%, increase maxsize or TTL.
Deploy Redis as the shared cache layer using the cache-aside pattern: check cache first, fall back to DB on miss, then populate cache. [src1]
import redis, json
r = redis.Redis(host='redis-primary', port=6379, decode_responses=True)
L2_TTL = 300 # 5 minutes
def get_with_caching(key: str, db_fetch_fn):
# 1. Check L1 (in-process)
value = get_from_l1(key)
if value is not None:
return value
# 2. Check L2 (Redis)
cached = r.get(f"cache:{key}")
if cached:
value = json.loads(cached)
set_in_l1(key, value)
return value
# 3. Cache miss -> query database
value = db_fetch_fn(key)
if value is not None:
r.setex(f"cache:{key}", L2_TTL, json.dumps(value))
set_in_l1(key, value)
return value
Verify: redis-cli INFO stats | grep keyspace — target >85% hit rate.
Set Cache-Control headers to leverage CDN edge caching for public data. Use s-maxage for CDN TTL independent of browser TTL. [src4]
from fastapi import FastAPI
from fastapi.responses import JSONResponse
app = FastAPI()
@app.get("/api/products/{product_id}")
async def get_product(product_id: str):
product = get_with_caching(f"product:{product_id}", fetch_product_from_db)
return JSONResponse(content=product, headers={
"Cache-Control": "public, max-age=60, s-maxage=300",
"CDN-Cache-Control": "public, s-maxage=300, stale-while-revalidate=60",
"Vary": "Accept-Encoding",
})
Verify: curl -I https://yourcdn.com/api/products/123 — check for cf-cache-status: HIT.
Use event-driven invalidation for consistency-critical data: write to DB, delete from L2, broadcast to L1 instances via Pub/Sub. [src1]
def update_product(product_id: str, new_data: dict):
# 1. Write to database (source of truth)
db.update("products", product_id, new_data)
# 2. Invalidate L2 (Redis)
r.delete(f"cache:product:{product_id}")
# 3. Broadcast invalidation for L1 on all instances
r.publish("cache:invalidate", f"product:{product_id}")
# 4. Optionally purge CDN (Cloudflare API)
Verify: After update, subsequent reads return new value within consistency window.
Implement distributed locking to prevent thundering herd when hot keys expire. [src6]
def get_with_stampede_protection(key, db_fetch_fn, cache_ttl=300):
cached = r.get(f"cache:{key}")
if cached:
return json.loads(cached)
# Acquire lock: only one request rebuilds cache
if r.set(f"lock:{key}", "1", nx=True, ex=5):
try:
value = db_fetch_fn(key)
if value is not None:
r.setex(f"cache:{key}", cache_ttl, json.dumps(value))
return value
finally:
r.delete(f"lock:{key}")
else:
# Wait for lock holder, then retry cache
time.sleep(0.1)
return json.loads(r.get(f"cache:{key}") or "null")
Verify: Under load test, only 1 request should reach DB per cache miss event.
# Input: A cache key and a callable that fetches from the database
# Output: Cached value from fastest available layer
import redis, json
from cachetools import TTLCache
from threading import Lock
class MultiLayerCache:
def __init__(self, redis_url="redis://localhost:6379",
l1_maxsize=1000, l1_ttl=60, l2_ttl=300):
self.l1 = TTLCache(maxsize=l1_maxsize, ttl=l1_ttl)
self.l1_lock = Lock()
self.l2 = redis.from_url(redis_url, decode_responses=True)
self.l2_ttl = l2_ttl
self.stats = {"l1_hit": 0, "l2_hit": 0, "miss": 0}
def get(self, key: str, fetch_fn=None):
with self.l1_lock:
val = self.l1.get(key)
if val is not None:
self.stats["l1_hit"] += 1
return val
raw = self.l2.get(f"c:{key}")
if raw:
val = json.loads(raw)
with self.l1_lock:
self.l1[key] = val
self.stats["l2_hit"] += 1
return val
if fetch_fn:
val = fetch_fn(key)
if val is not None:
self.set(key, val)
self.stats["miss"] += 1
return val
return None
def set(self, key: str, value):
with self.l1_lock:
self.l1[key] = value
self.l2.setex(f"c:{key}", self.l2_ttl, json.dumps(value))
def invalidate(self, key: str):
with self.l1_lock:
self.l1.pop(key, None)
self.l2.delete(f"c:{key}")
# Static immutable assets (hashed filenames like app.a1b2c3.js)
Cache-Control: public, max-age=31536000, immutable
# Public API responses (CDN + browser caching)
Cache-Control: public, max-age=60, s-maxage=300
CDN-Cache-Control: public, s-maxage=300, stale-while-revalidate=60
# Private user-specific data (no CDN, browser only)
Cache-Control: private, max-age=0, no-store
# Stale-while-revalidate pattern for best UX
Cache-Control: public, max-age=300, stale-while-revalidate=60, stale-if-error=86400
# Vary header to prevent serving wrong cached response
Vary: Accept-Encoding, Authorization
# BAD -- one TTL for everything: stale user data or wasted cache on static config
GLOBAL_TTL = 3600
cache.setex(f"user:{uid}", GLOBAL_TTL, data) # User data stale for 1 hour
cache.setex(f"config:{key}", GLOBAL_TTL, config) # Static config only cached 1 hour
cache.setex(f"feed:{uid}", GLOBAL_TTL, feed) # Social feed stale for 1 hour
# GOOD -- TTL matches data volatility
TTL_CONFIG = {"user_profile": 300, "static_config": 86400, "social_feed": 30, "product_catalog": 3600}
cache.setex(f"user:{uid}", TTL_CONFIG["user_profile"], data) # 5 min
cache.setex(f"config:{key}", TTL_CONFIG["static_config"], config) # 24 hours
cache.setex(f"feed:{uid}", TTL_CONFIG["social_feed"], feed) # 30 seconds
# BAD -- when hot key expires, 10,000 requests all hit DB simultaneously [src6]
def get_product(product_id):
cached = redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
product = db.query("SELECT * FROM products WHERE id = %s", [product_id])
redis.setex(f"product:{product_id}", 300, json.dumps(product))
return product
# GOOD -- only one request rebuilds cache; others wait or get stale data [src6]
def get_product(product_id):
cached = redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
lock = redis.set(f"lock:product:{product_id}", "1", nx=True, ex=5)
if lock:
try:
product = db.query("SELECT * FROM products WHERE id = %s", [product_id])
redis.setex(f"product:{product_id}", 300, json.dumps(product))
return product
finally:
redis.delete(f"lock:product:{product_id}")
else:
time.sleep(0.1)
return json.loads(redis.get(f"product:{product_id}") or "null")
# BAD -- CDN serves User A's dashboard to User B
@app.get("/api/dashboard")
async def dashboard(user: User):
data = get_user_dashboard(user.id)
return JSONResponse(content=data, headers={
"Cache-Control": "public, s-maxage=300" # CDN caches for ALL users!
})
# GOOD -- user-specific data never enters shared CDN cache
@app.get("/api/dashboard")
async def dashboard(user: User):
data = get_user_dashboard(user.id)
return JSONResponse(content=data, headers={
"Cache-Control": "private, max-age=60" # Browser-only, per-user
})
# BAD -- L1 caches on other instances still serve stale data
def update_product(product_id, new_data):
db.update("products", product_id, new_data)
redis.delete(f"product:{product_id}") # L2 cleared
# L1 caches on 10 other instances have old data for up to 60s!
# GOOD -- pub/sub ensures all L1 caches are cleared [src2]
def update_product(product_id, new_data):
db.update("products", product_id, new_data)
redis.delete(f"product:{product_id}") # Clear L2
redis.publish("cache:invalidate", f"product:{product_id}") # Notify all L1s
SET key NX EX 5), request coalescing, or probabilistic early expiration. [src6]maxmemory in Redis config and maxsize in L1 caches. Use LRU or LFU eviction. [src5]Cache-Control: public on user-specific endpoints causes data leaks. Fix: use private or no-store for authenticated responses. [src4]OBJECT FREQ with LFU eviction. [src1]# Check Redis cache hit ratio
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"
# Check Redis memory usage and eviction policy
redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human|maxmemory_policy"
# Monitor Redis commands in real-time (careful in production)
redis-cli MONITOR | head -50
# Check slow Redis commands
redis-cli SLOWLOG GET 10
# Verify CDN cache status (Cloudflare)
curl -sI https://example.com/api/products/123 | grep -i "cf-cache-status"
# Verify CDN cache status (CloudFront)
curl -sI https://example.com/api/products/123 | grep -i "x-cache"
# Check Cache-Control headers
curl -sI https://example.com/api/products/123 | grep -i "cache-control"
# Redis key count per prefix
redis-cli --scan --pattern "cache:product:*" | wc -l
# Application cache stats (if exposed via metrics)
curl -s http://localhost:8080/metrics | grep cache_hit
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Read-to-write ratio >10:1 and data tolerates seconds/minutes of staleness | Data must be real-time consistent (financial transactions, inventory decrements) | Direct DB reads with optimistic locking |
| Database query latency >50ms and query is repeated frequently | Each request has unique parameters (full-text search with user input) | Query optimization, read replicas, or Elasticsearch |
| Multiple app instances need shared cached state | Single instance with low traffic (<100 QPS) | In-process cache only (no Redis overhead) |
| Static/semi-static content (product catalogs, config, reference data) | Data changes per-request (real-time stock tickers, live scores) | WebSockets or Server-Sent Events with no caching |
| Need to reduce cloud costs by offloading DB/API compute | Cache infra cost exceeds DB query cost | Scale the database directly (read replicas, vertical scaling) |
stale-while-revalidate provides the best user experience for most web applications: users always get a fast response (possibly stale), while a background refresh ensures the next request gets fresh data. Requires CDN support (Cloudflare, CloudFront, Fastly). [src4]