Data

Caching Strategies Past Redis-by-Default

Ravinder·March 30, 2026·10 min read

DataCachingRedisPerformanceArchitecture

Caching Strategies Past Redis-by-Default

The default caching strategy for most backend engineers is: "add Redis in front of the database, TTL of 5 minutes, cache miss falls through to the DB." This works until it doesn't. Cache stampedes take down services. Write-back caches lose data. TTL-based invalidation serves stale data at exactly the wrong moment. The engineers who think carefully about caching before they need to are the ones whose services survive traffic spikes.

This post covers the patterns, the failure modes, and the decisions that actually matter.

The Four Cache Patterns

Before diving into failure modes, the taxonomy. There are four fundamental caching patterns, and most production systems use more than one:

Cache-Aside (Lazy Loading). The application checks the cache first. On a miss, it reads from the database and populates the cache. This is the most common pattern and the easiest to implement. The application controls cache population.

Read-Through. The cache sits in front of the database. On a miss, the cache itself fetches from the database and populates itself. The application only talks to the cache. Read-through is common in ORM-level caches and some managed caching solutions.

Write-Through. Every write goes to the cache and the database synchronously. Reads always hit warm cache data. Consistency is strong. The cost: every write takes the latency of both cache and database.

Write-Back (Write-Behind). Writes go to the cache first. The cache asynchronously persists to the database. Write latency is cache-speed (fast). The risk: data in the cache but not yet in the database can be lost if the cache crashes before flushing.

flowchart LR subgraph Cache-Aside A1[App] --miss--> DB1[(Database)] DB1 --> A1 A1 --> C1[(Cache)] A1 --hit--> C1 end subgraph Write-Through A2[App] --> C2[(Cache)] C2 --> DB2[(Database)] end subgraph Write-Back A3[App] --> C3[(Cache)] C3 -.async.-> DB3[(Database)] end

Cache-Aside: The Default and Its Traps

Cache-aside is the right default for most read-heavy workloads. The application controls cache population, which means it's explicit, debuggable, and easy to reason about. But it has failure modes worth understanding.

The thundering herd / cache stampede. A popular key expires. Simultaneously, 500 requests check the cache, get misses, and all issue database queries to repopulate. Your database just received 500 concurrent queries for the same data. This is a cache stampede.

The naive fix is to catch it at the application level:

import redis
import time
import threading
 
cache = redis.Redis(host='redis.internal', decode_responses=True)
 
def get_with_lock(key: str, fetch_fn, ttl: int = 300) -> str:
    """Cache-aside with mutex lock to prevent stampede."""
    value = cache.get(key)
    if value is not None:
        return value
 
    lock_key = f"lock:{key}"
    lock_acquired = cache.set(lock_key, "1", nx=True, ex=10)  # 10s lock TTL
 
    if lock_acquired:
        try:
            # This thread populates the cache
            value = fetch_fn()
            cache.set(key, value, ex=ttl)
            return value
        finally:
            cache.delete(lock_key)
    else:
        # Another thread holds the lock — wait briefly and retry
        time.sleep(0.05)
        return get_with_lock(key, fetch_fn, ttl)

The mutex lock prevents all 500 requests from hitting the database. One thread repopulates; the others wait briefly and hit the warm cache. The lock TTL (10 seconds) prevents a dead lock if the repopulating thread crashes.

A better approach: probabilistic early expiration. Instead of waiting for expiration, proactively refresh before the key expires. Add jitter to TTLs so keys don't expire simultaneously:

import random
 
def set_with_jitter(key: str, value: str, base_ttl: int):
    """Add ±10% jitter to TTL to spread expiration across time."""
    jitter = random.randint(-base_ttl // 10, base_ttl // 10)
    cache.set(key, value, ex=base_ttl + jitter)

For high-traffic keys, use the XFetch algorithm: before reading the cache value, probabilistically decide whether to refresh early based on how close to expiration the key is. The math: refresh with probability proportional to exp((now - expiry) / (beta * fetch_cost)), where beta controls aggressiveness. In practice, this prevents stampedes for popular keys at the cost of some extra background fetches.

Write-Back: The Risk No One Talks About

Write-back caching is seductive. Your writes are cache-speed fast. Reads are always warm. What's not to love?

The risk: the cache is not a durable store. Redis without AOF persistence loses all in-memory data on crash. Redis with AOF persistence loses the last 1 second of writes (with appendfsync everysec, which is the default recommended setting). Redis with appendfsync always is durable but slow — defeating the purpose.

A write-back cache that crashes between accepting writes and flushing to the database has silently lost those writes. There is no WAL, no replica protection against this specific failure mode at the application level.

When write-back is acceptable:

Counters and metrics that are approximate by design (view counts, click counts). Losing 1 second of increments is acceptable.
Session data where loss means the user re-authenticates rather than losing business data.
Rate limiter state — losing it means a brief window of unenforced limits, not data corruption.

When write-back is not acceptable:

Financial transactions. Any payment or ledger entry must be durable in the database before acknowledging success.
Inventory mutations. Overselling is a data consistency problem, not just a UX problem.
Any write where "the data exists" is a business commitment.

The decision rule: if the loss of a write would require reconciliation, compensation, or an apology to a customer, do not put it in a write-back cache.

Cache Invalidation: The Hard Problem

Phil Karlton's famous quip — "there are only two hard things in Computer Science: cache invalidation and naming things" — is a joke that describes a real engineering problem.

TTL-based invalidation is the blunt instrument. Simple, predictable, wrong. A 5-minute TTL means you serve data up to 5 minutes stale. For most product data, this is acceptable. For inventory, pricing, and user profile data, it frequently is not.

Event-driven invalidation is precise but operationally complex. When a record changes in the database, publish an event (via CDC or application-level messaging). Consumers invalidate the relevant cache keys.

# Application-level event-driven invalidation
# After updating a product, publish an invalidation event
 
def update_product(product_id: int, updates: dict) -> None:
    with db.transaction():
        db.execute(
            "UPDATE products SET ... WHERE id = %s",
            (product_id,)
        )
        # Publish invalidation event inside the transaction
        # (using transactional outbox pattern)
        db.execute(
            "INSERT INTO outbox (event_type, payload) VALUES (%s, %s)",
            ("cache.invalidate", json.dumps({"key": f"product:{product_id}"}))
        )
 
# Outbox consumer (separate process)
def process_outbox_events():
    for event in poll_outbox():
        if event["event_type"] == "cache.invalidate":
            cache.delete(event["payload"]["key"])
        acknowledge(event)

The transactional outbox pattern ensures the invalidation event is published if and only if the database write commits. Without it, you can update the database and fail before publishing the invalidation — leaving stale cache data permanently until TTL expires.

Cache-aside with version tags is a middle ground. Store a version counter per entity in the cache. Each write increments the version. Cache keys include the version: product:42:v7. Old cache entries with stale version numbers are simply never hit again (and are garbage collected by TTL). This avoids the need for explicit invalidation while ensuring freshness:

def get_product(product_id: int) -> dict:
    version = cache.get(f"product:{product_id}:version") or "1"
    value = cache.get(f"product:{product_id}:v{version}")
    if value:
        return json.loads(value)
 
    # Cache miss: fetch from DB
    product = db.fetch_product(product_id)
    version = product["version"]
    cache.setex(f"product:{product_id}:version", 3600, str(version))
    cache.setex(f"product:{product_id}:v{version}", 3600, json.dumps(product))
    return product
 
def invalidate_product(product_id: int, new_version: int) -> None:
    cache.set(f"product:{product_id}:version", str(new_version))
    # Old version key expires naturally via TTL

Eviction Policies Matter

Redis supports multiple eviction policies, applied when the cache reaches its memory limit. The choice has significant performance implications:

noeviction — reject writes when full. Your write paths start returning errors. Not suitable for caches used by latency-sensitive services.
allkeys-lru — evict the least recently used key across all keys. Good general-purpose choice for most workloads.
allkeys-lfu — evict the least frequently used key. Better than LRU for workloads with hot keys that haven't been accessed recently.
volatile-lru — evict least recently used keys among those with a TTL set. Useful when you store both ephemeral (TTL) and permanent (no TTL) data in the same instance.
volatile-ttl — evict keys with the shortest remaining TTL first. Useful for prioritizing long-lived cache entries.

The mistake: leaving the default (noeviction) in place for an application cache. Set allkeys-lru for most caches. Set a memory limit (maxmemory 4gb) so Redis evicts predictably rather than running the OS out of memory.

# Redis config for a typical application cache
maxmemory 4gb
maxmemory-policy allkeys-lru
maxmemory-samples 10  # LRU approximation sample size; higher = more accurate

Local Cache + Distributed Cache: The Two-Level Pattern

Network round trips to Redis add 0.5–2ms per call. For a hot code path making 10 cache lookups per request, that's 5–20ms of cache overhead. At high RPS, this adds up.

The two-level pattern: an in-process local cache (a few hundred MB, LRU) in front of Redis. Local cache hit = zero network cost. Local cache miss = Redis lookup = normal cache cost.

from cachetools import LRUCache, cached
from cachetools.keys import hashkey
import threading
 
local_cache = LRUCache(maxsize=10_000)  # ~10K items in process memory
local_cache_lock = threading.Lock()
 
def get_user(user_id: int) -> dict:
    key = hashkey("user", user_id)
 
    # L1: local in-process cache
    with local_cache_lock:
        if key in local_cache:
            return local_cache[key]
 
    # L2: Redis distributed cache
    raw = cache.get(f"user:{user_id}")
    if raw:
        user = json.loads(raw)
        with local_cache_lock:
            local_cache[key] = user
        return user
 
    # L3: database
    user = db.fetch_user(user_id)
    serialized = json.dumps(user)
    cache.setex(f"user:{user_id}", 300, serialized)
    with local_cache_lock:
        local_cache[key] = user
    return user

The tradeoff: local caches are per-process. In a multi-process deployment, each process has its own L1 cache. An invalidation must propagate to all processes, typically via a Redis pub/sub channel:

# Publish invalidation to all processes
cache.publish("invalidations", f"user:{user_id}")
 
# Subscribe in each process
def listen_for_invalidations():
    pubsub = cache.pubsub()
    pubsub.subscribe("invalidations")
    for message in pubsub.listen():
        if message["type"] == "message":
            key = message["data"]
            # Invalidate from local cache
            user_id = int(key.split(":")[1])
            with local_cache_lock:
                local_cache.pop(hashkey("user", user_id), None)

This is operationally more complex but delivers sub-millisecond cache hits for hot keys in high-RPS services.

When Not to Cache

Caching adds complexity. Before adding a cache layer, ask:

Is the bottleneck actually the database? Profile first. If database query time is 5ms and your P99 latency is 200ms, the bottleneck is elsewhere.
Is the data hot enough? A cache is only beneficial if the hit rate is high enough to offset the miss cost plus the cache overhead. A cache with 20% hit rate is often worse than no cache.
Is consistency critical? Some data (balances, inventory, permissions) must be read from the authoritative source. Serving stale cache data for these can cause incorrect business decisions.
Can you fix the query instead? A missing index, a bad query plan, or a denormalized reporting table is often a better solution than a cache for slow queries.

Key Takeaways

Cache stampedes are prevented by mutex locking on cache miss or probabilistic early expiration with TTL jitter — never rely on TTL-based expiration alone for high-traffic keys.
Write-back caching is only appropriate when losing the most recent writes is acceptable; financial and inventory data must persist to durable storage before acknowledging writes.
Event-driven invalidation with a transactional outbox is more precise than TTL but requires operational investment; version-tagged cache keys are a practical middle ground.
Set maxmemory and allkeys-lru on every Redis cache instance — leaving the default noeviction policy in an application cache will eventually cause write failures under memory pressure.
Two-level caching (in-process LRU + Redis) eliminates network overhead for hot keys at the cost of per-process staleness; use Redis pub/sub for invalidation propagation across processes.
Profile before caching: if the bottleneck is not the database read latency, a cache will not fix your performance problem and will add operational complexity without benefit.