Design Rate Limiter and Design Twitter Feed
Apply the framework to two canonical design problems: token-bucket/sliding-window rate limiting and a fan-out-on-write vs fan-out-on-read news feed.
Why Rate Limiting Is Essential
Rate limiting controls the number of requests a client can make to an API in a given time window. Without it, a single misbehaving client (or a DDoS attack) can saturate server resources, degrading service for all users. Rate limiting also protects against brute-force attacks, prevents API scraping, and enforces fair use of shared resources.
Common rate-limiting granularities: per user ID, per API key, per IP address, per endpoint, or a combination. Typical limits: 100 requests per minute per user, 1000 per hour per API key. The rate limiter must be fast (adding <1ms overhead) and distributed (consistent across all API server replicas).
# Rate limiting scenarios
use_cases = [
('API authentication endpoint', '5 attempts per 15 min per IP', 'Brute-force protection'),
('Public search API', '100 requests per minute per key', 'Fair use enforcement'),
('Email sending', '50 emails per hour per user', 'Spam prevention'),
('Payment processing', '10 transactions per second per account', 'Fraud prevention'),
('File upload', '5 uploads per minute per user', 'Resource quota'),
('Notification service', '1000 pushes per second globally', 'Cost control'),
]
print(f'{'Endpoint/Feature':35s} {'Limit':40s} {'Reason'}')
print('-'*95)
for endpoint, limit, reason in use_cases:
print(f'{endpoint:35s} {limit:40s} {reason}')Rate Limiting Algorithm 1: Token Bucket
The token bucket algorithm maintains a bucket with a maximum capacity of N tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected. If below capacity, it is accepted and the token is consumed.
Token bucket allows bursting: if no requests come in for 5 seconds, the bucket fills up to N tokens, and then N requests can come in immediately. This is appropriate for APIs where occasional bursts are acceptable. The two parameters are capacity (burst size) and refill rate.
import time
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity # max tokens (burst size)
self.refill_rate = refill_rate # tokens added per second
self.tokens = capacity # start full
self.last_refill = time.time()
def allow(self):
now = time.time()
elapsed = now - self.last_refill
# Refill tokens based on elapsed time
self.tokens = min(self.capacity,
self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True # request allowed
return False # rate limited
bucket = TokenBucket(capacity=5, refill_rate=2) # 2 tokens/sec, burst=5
for i in range(8):
allowed = bucket.allow()
print(f'Request {i+1}: {"ALLOWED" if allowed else "REJECTED"} (tokens={bucket.tokens:.1f})')
time.sleep(0.1) # 0.1s between requestsAll lessons in this course
- The System Design Interview Framework
- Scalable Data Storage: SQL vs NoSQL
- Caching, CDNs, and Load Balancing
- Design Rate Limiter and Design Twitter Feed