Caching, CDNs, and Load Balancing
Add Redis caching layers, push static assets to a CDN, and distribute traffic across replicas with round-robin and consistent-hashing load balancers.
Why Caching Is Essential at Scale
Caching stores copies of frequently accessed data in a faster storage layer so that future requests can be served without hitting the slower backing store (database, external API). At scale, a small number of popular items receive the vast majority of requests — the 80/20 rule (Pareto principle) often applies: 20% of items account for 80% of traffic.
A cache that fits the hot 20% in memory can absorb 80% of database load. This is why adding a Redis cache often reduces database CPU by 70–90% and cuts p99 latency from 10ms to under 1ms for cache hits — without changing the database or application logic significantly.
# Demonstrating the 80/20 caching benefit
import random
# Simulate 1000 requests to 100 items with Zipf-like distribution
def zipf_sample(n_items, n_requests):
access_counts = {}
weights = [1.0 / (i + 1) for i in range(n_items)] # Zipf: item 0 most popular
total = sum(weights)
probs = [w / total for w in weights]
for _ in range(n_requests):
item = random.choices(range(n_items), weights=probs)[0]
access_counts[item] = access_counts.get(item, 0) + 1
return access_counts
random.seed(42)
counts = zipf_sample(100, 10000)
top_20_items = sorted(counts, key=counts.get, reverse=True)[:20]
top_20_requests = sum(counts[i] for i in top_20_items)
print(f'Top 20% of items ({20} of 100) handle {top_20_requests/100:.1f}% of requests')Cache-Aside Pattern (Lazy Loading)
The cache-aside pattern (also called lazy loading) is the most common caching strategy. The application code is responsible for managing the cache: on a read, check the cache first. On a cache hit, return immediately. On a cache miss, fetch from the database, write to cache, then return. On a write, update the database and invalidate (delete) the cache entry so the next read refreshes it.
This pattern ensures the cache only holds data that was actually requested (no unnecessary pre-loading) and stays consistent with the database via invalidation. The trade-off: first access after cache miss pays the full database cost (cold start).
# Cache-aside pattern in Python
class CacheAsideService:
def __init__(self, db, cache):
self.db = db
self.cache = cache # e.g., Redis client
def get_user(self, user_id):
cache_key = f'user:{user_id}'
# 1. Check cache
cached = self.cache.get(cache_key)
if cached:
return cached # cache hit
# 2. Cache miss: fetch from DB
user = self.db.query('SELECT * FROM users WHERE id=%s', user_id)
# 3. Write to cache with TTL
self.cache.set(cache_key, user, ttl=3600) # 1 hour TTL
return user
def update_user(self, user_id, data):
# 1. Write to DB
self.db.execute('UPDATE users SET ... WHERE id=%s', user_id, data)
# 2. Invalidate cache (delete, not update)
self.cache.delete(f'user:{user_id}')
# Next read will re-populate cache from DB
print('Cache-aside: READ from cache, miss? load from DB + write cache')
print(' WRITE to DB, then DELETE from cache (invalidate)')All lessons in this course
- The System Design Interview Framework
- Scalable Data Storage: SQL vs NoSQL
- Caching, CDNs, and Load Balancing
- Design Rate Limiter and Design Twitter Feed