How Do I Design a Multi-Layer Caching Strategy?

Type: Software Reference Confidence: 0.93 Sources: 7 Verified: 2026-02-23 Freshness: stable

TL;DR

Constraints

Quick Reference

LayerRoleTechnology OptionsTypical TTLLatencyScaling Strategy
Browser Cache Stores static assets and API responses locally on user device HTTP Cache-Control headers, Service Workers 1 hr – 1 year 0ms Per-client; no server cost
CDN / Edge Cache Serves cached content from nearest edge node Cloudflare, CloudFront, Fastly, Akamai 5 min – 24 hr 1-20ms Automatic global distribution; tiered caching for origin shielding [src4]
Reverse Proxy / API Gateway Caches API responses before reaching app servers Nginx, Varnish, Kong, AWS API Gateway 30s – 5 min 1-5ms Horizontal scaling; shared cache across app instances
Application Cache (L1) In-process memory cache; fastest but per-instance Guava (Java), lru-cache (Node), cachetools (Python) 30s – 5 min <0.1ms Per-instance; scales with app replicas; risk of inconsistency
Distributed Cache (L2) Shared cache across all instances; single cached truth Redis Cluster, Memcached, KeyDB 5 min – 1 hr 1-5ms Hash-based sharding, read replicas; 80-20 rule [src1]
Database Query Cache Built-in query result caching at database level Materialized views, PostgreSQL pg_stat_statements Varies 5-20ms Limited; use only for expensive aggregations
Write-Behind Buffer Absorbs writes, flushes to DB asynchronously Redis + background worker, Kafka N/A (write path) <1ms ack Decouples write latency; risk of data loss on crash
Session Cache Stores user session data for stateless app servers Redis, Memcached, DynamoDB 30 min – 24 hr 1-5ms Partition by user ID; sticky sessions as fallback

Decision Tree

START
|-- What type of data?
|   |-- Static assets (images, CSS, JS, fonts)?
|   |   |-- Use Browser Cache (Cache-Control: immutable) + CDN
|   |   +-- Set long TTL (1 year) with content-hash in filename
|   |-- API responses (JSON)?
|   |   |-- Is response user-specific / authenticated?
|   |   |   |-- YES -> Application Cache (L1) + Distributed Cache (L2) only
|   |   |   |   +-- Never cache in CDN/shared proxy without Vary
|   |   |   +-- NO (public data) -> CDN + Reverse Proxy + L1 + L2
|   |   +-- Is strong consistency required?
|   |       |-- YES -> Short TTL (30s-5min) + event-driven invalidation
|   |       +-- NO -> Longer TTL (5min-1hr) + stale-while-revalidate
|   |-- Database query results?
|   |   |-- Is the query expensive (>100ms)?
|   |   |   |-- YES -> Distributed Cache (Redis) with cache-aside [src1]
|   |   |   +-- NO -> Application-level L1 cache may suffice
|   |   +-- Is the data read-heavy (>10:1 read-to-write)?
|   |       |-- YES -> Multi-layer: L1 (in-process) + L2 (Redis)
|   |       +-- NO -> Write-through cache or skip caching
|   +-- Session / user state?
|       +-- Use Redis/Memcached with TTL matching session lifetime
|
|-- Expected QPS on hot keys?
|   |-- <100 QPS -> Simple cache-aside, no stampede protection needed
|   |-- 100-10K QPS -> Add request coalescing (single-flight) [src6]
|   +-- >10K QPS -> Probabilistic early expiration + distributed locking
|
+-- How many app instances?
    |-- 1 instance -> L1 in-process cache is sufficient
    |-- 2-10 instances -> L1 + L2 (Redis) for consistency
    +-- >10 instances -> L1 (short TTL) + L2 (Redis Cluster) + CDN

Step-by-Step Guide

1. Identify cache-worthy data and access patterns

Analyze your read-to-write ratio, data staleness tolerance, and hotspot distribution. The 80-20 rule applies: 20% of keys typically serve 80% of traffic. [src1]

Data Analysis Checklist:
- Read-to-write ratio per entity type (>10:1 is cache-friendly)
- p50/p95/p99 latency of uncached DB queries
- Hotspot analysis: top 1000 keys by request frequency
- Staleness tolerance: seconds, minutes, or hours?
- Data size per cached object (affects memory budget)

Memory Budget:
  cache_size = hot_key_count * avg_object_size * 1.3 (overhead)
  Example: 100K hot keys * 2KB avg * 1.3 = ~260MB Redis

Verify: SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY calls DESC LIMIT 20

2. Implement L1 in-process application cache

Add a local in-memory cache with short TTL to each application instance. This eliminates network round-trips for the hottest data. [src3]

# Python: using cachetools (pip install cachetools==5.3.2)
from cachetools import TTLCache
import threading

# L1 cache: 1000 items max, 60-second TTL
l1_cache = TTLCache(maxsize=1000, ttl=60)
l1_lock = threading.Lock()

def get_from_l1(key: str):
    with l1_lock:
        return l1_cache.get(key)

def set_in_l1(key: str, value):
    with l1_lock:
        l1_cache[key] = value

Verify: Monitor L1 hit rate. If <50%, increase maxsize or TTL.

3. Set up L2 distributed cache with Redis

Deploy Redis as the shared cache layer using the cache-aside pattern: check cache first, fall back to DB on miss, then populate cache. [src1]

import redis, json

r = redis.Redis(host='redis-primary', port=6379, decode_responses=True)
L2_TTL = 300  # 5 minutes

def get_with_caching(key: str, db_fetch_fn):
    # 1. Check L1 (in-process)
    value = get_from_l1(key)
    if value is not None:
        return value

    # 2. Check L2 (Redis)
    cached = r.get(f"cache:{key}")
    if cached:
        value = json.loads(cached)
        set_in_l1(key, value)
        return value

    # 3. Cache miss -> query database
    value = db_fetch_fn(key)
    if value is not None:
        r.setex(f"cache:{key}", L2_TTL, json.dumps(value))
        set_in_l1(key, value)
    return value

Verify: redis-cli INFO stats | grep keyspace — target >85% hit rate.

4. Configure CDN caching with proper headers

Set Cache-Control headers to leverage CDN edge caching for public data. Use s-maxage for CDN TTL independent of browser TTL. [src4]

from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()

@app.get("/api/products/{product_id}")
async def get_product(product_id: str):
    product = get_with_caching(f"product:{product_id}", fetch_product_from_db)
    return JSONResponse(content=product, headers={
        "Cache-Control": "public, max-age=60, s-maxage=300",
        "CDN-Cache-Control": "public, s-maxage=300, stale-while-revalidate=60",
        "Vary": "Accept-Encoding",
    })

Verify: curl -I https://yourcdn.com/api/products/123 — check for cf-cache-status: HIT.

5. Implement cache invalidation strategy

Use event-driven invalidation for consistency-critical data: write to DB, delete from L2, broadcast to L1 instances via Pub/Sub. [src1]

def update_product(product_id: str, new_data: dict):
    # 1. Write to database (source of truth)
    db.update("products", product_id, new_data)
    # 2. Invalidate L2 (Redis)
    r.delete(f"cache:product:{product_id}")
    # 3. Broadcast invalidation for L1 on all instances
    r.publish("cache:invalidate", f"product:{product_id}")
    # 4. Optionally purge CDN (Cloudflare API)

Verify: After update, subsequent reads return new value within consistency window.

6. Add cache stampede protection

Implement distributed locking to prevent thundering herd when hot keys expire. [src6]

def get_with_stampede_protection(key, db_fetch_fn, cache_ttl=300):
    cached = r.get(f"cache:{key}")
    if cached:
        return json.loads(cached)
    # Acquire lock: only one request rebuilds cache
    if r.set(f"lock:{key}", "1", nx=True, ex=5):
        try:
            value = db_fetch_fn(key)
            if value is not None:
                r.setex(f"cache:{key}", cache_ttl, json.dumps(value))
            return value
        finally:
            r.delete(f"lock:{key}")
    else:
        # Wait for lock holder, then retry cache
        time.sleep(0.1)
        return json.loads(r.get(f"cache:{key}") or "null")

Verify: Under load test, only 1 request should reach DB per cache miss event.

Code Examples

Python: Complete Multi-Layer Cache Client

# Input:  A cache key and a callable that fetches from the database
# Output: Cached value from fastest available layer

import redis, json
from cachetools import TTLCache
from threading import Lock

class MultiLayerCache:
    def __init__(self, redis_url="redis://localhost:6379",
                 l1_maxsize=1000, l1_ttl=60, l2_ttl=300):
        self.l1 = TTLCache(maxsize=l1_maxsize, ttl=l1_ttl)
        self.l1_lock = Lock()
        self.l2 = redis.from_url(redis_url, decode_responses=True)
        self.l2_ttl = l2_ttl
        self.stats = {"l1_hit": 0, "l2_hit": 0, "miss": 0}

    def get(self, key: str, fetch_fn=None):
        with self.l1_lock:
            val = self.l1.get(key)
        if val is not None:
            self.stats["l1_hit"] += 1
            return val
        raw = self.l2.get(f"c:{key}")
        if raw:
            val = json.loads(raw)
            with self.l1_lock:
                self.l1[key] = val
            self.stats["l2_hit"] += 1
            return val
        if fetch_fn:
            val = fetch_fn(key)
            if val is not None:
                self.set(key, val)
            self.stats["miss"] += 1
            return val
        return None

    def set(self, key: str, value):
        with self.l1_lock:
            self.l1[key] = value
        self.l2.setex(f"c:{key}", self.l2_ttl, json.dumps(value))

    def invalidate(self, key: str):
        with self.l1_lock:
            self.l1.pop(key, None)
        self.l2.delete(f"c:{key}")

HTTP: CDN Cache-Control Headers Reference

# Static immutable assets (hashed filenames like app.a1b2c3.js)
Cache-Control: public, max-age=31536000, immutable

# Public API responses (CDN + browser caching)
Cache-Control: public, max-age=60, s-maxage=300
CDN-Cache-Control: public, s-maxage=300, stale-while-revalidate=60

# Private user-specific data (no CDN, browser only)
Cache-Control: private, max-age=0, no-store

# Stale-while-revalidate pattern for best UX
Cache-Control: public, max-age=300, stale-while-revalidate=60, stale-if-error=86400

# Vary header to prevent serving wrong cached response
Vary: Accept-Encoding, Authorization

Anti-Patterns

Wrong: Single global TTL for all cache entries

# BAD -- one TTL for everything: stale user data or wasted cache on static config
GLOBAL_TTL = 3600
cache.setex(f"user:{uid}", GLOBAL_TTL, data)       # User data stale for 1 hour
cache.setex(f"config:{key}", GLOBAL_TTL, config)    # Static config only cached 1 hour
cache.setex(f"feed:{uid}", GLOBAL_TTL, feed)        # Social feed stale for 1 hour

Correct: TTL per data type based on staleness tolerance

# GOOD -- TTL matches data volatility
TTL_CONFIG = {"user_profile": 300, "static_config": 86400, "social_feed": 30, "product_catalog": 3600}
cache.setex(f"user:{uid}", TTL_CONFIG["user_profile"], data)        # 5 min
cache.setex(f"config:{key}", TTL_CONFIG["static_config"], config)   # 24 hours
cache.setex(f"feed:{uid}", TTL_CONFIG["social_feed"], feed)         # 30 seconds

Wrong: No cache stampede protection on hot keys

# BAD -- when hot key expires, 10,000 requests all hit DB simultaneously [src6]
def get_product(product_id):
    cached = redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)
    product = db.query("SELECT * FROM products WHERE id = %s", [product_id])
    redis.setex(f"product:{product_id}", 300, json.dumps(product))
    return product

Correct: Request coalescing with distributed lock

# GOOD -- only one request rebuilds cache; others wait or get stale data [src6]
def get_product(product_id):
    cached = redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)
    lock = redis.set(f"lock:product:{product_id}", "1", nx=True, ex=5)
    if lock:
        try:
            product = db.query("SELECT * FROM products WHERE id = %s", [product_id])
            redis.setex(f"product:{product_id}", 300, json.dumps(product))
            return product
        finally:
            redis.delete(f"lock:product:{product_id}")
    else:
        time.sleep(0.1)
        return json.loads(redis.get(f"product:{product_id}") or "null")

Wrong: Caching authenticated data in CDN without Vary

# BAD -- CDN serves User A's dashboard to User B
@app.get("/api/dashboard")
async def dashboard(user: User):
    data = get_user_dashboard(user.id)
    return JSONResponse(content=data, headers={
        "Cache-Control": "public, s-maxage=300"  # CDN caches for ALL users!
    })

Correct: Private cache or per-user cache keys

# GOOD -- user-specific data never enters shared CDN cache
@app.get("/api/dashboard")
async def dashboard(user: User):
    data = get_user_dashboard(user.id)
    return JSONResponse(content=data, headers={
        "Cache-Control": "private, max-age=60"  # Browser-only, per-user
    })

Wrong: Invalidating only L2 and forgetting L1

# BAD -- L1 caches on other instances still serve stale data
def update_product(product_id, new_data):
    db.update("products", product_id, new_data)
    redis.delete(f"product:{product_id}")  # L2 cleared
    # L1 caches on 10 other instances have old data for up to 60s!

Correct: Broadcast invalidation across all layers

# GOOD -- pub/sub ensures all L1 caches are cleared [src2]
def update_product(product_id, new_data):
    db.update("products", product_id, new_data)
    redis.delete(f"product:{product_id}")                           # Clear L2
    redis.publish("cache:invalidate", f"product:{product_id}")      # Notify all L1s

Common Pitfalls

Diagnostic Commands

# Check Redis cache hit ratio
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

# Check Redis memory usage and eviction policy
redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human|maxmemory_policy"

# Monitor Redis commands in real-time (careful in production)
redis-cli MONITOR | head -50

# Check slow Redis commands
redis-cli SLOWLOG GET 10

# Verify CDN cache status (Cloudflare)
curl -sI https://example.com/api/products/123 | grep -i "cf-cache-status"

# Verify CDN cache status (CloudFront)
curl -sI https://example.com/api/products/123 | grep -i "x-cache"

# Check Cache-Control headers
curl -sI https://example.com/api/products/123 | grep -i "cache-control"

# Redis key count per prefix
redis-cli --scan --pattern "cache:product:*" | wc -l

# Application cache stats (if exposed via metrics)
curl -s http://localhost:8080/metrics | grep cache_hit

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Read-to-write ratio >10:1 and data tolerates seconds/minutes of staleness Data must be real-time consistent (financial transactions, inventory decrements) Direct DB reads with optimistic locking
Database query latency >50ms and query is repeated frequently Each request has unique parameters (full-text search with user input) Query optimization, read replicas, or Elasticsearch
Multiple app instances need shared cached state Single instance with low traffic (<100 QPS) In-process cache only (no Redis overhead)
Static/semi-static content (product catalogs, config, reference data) Data changes per-request (real-time stock tickers, live scores) WebSockets or Server-Sent Events with no caching
Need to reduce cloud costs by offloading DB/API compute Cache infra cost exceeds DB query cost Scale the database directly (read replicas, vertical scaling)

Important Caveats

Related Units