DSA Interview Prep · Lesson

Design Rate Limiter and Design Twitter Feed

Apply the framework to two canonical design problems: token-bucket/sliding-window rate limiting and a fan-out-on-write vs fan-out-on-read news feed.

Why Rate Limiting Is Essential

Rate limiting controls the number of requests a client can make to an API in a given time window. Without it, a single misbehaving client (or a DDoS attack) can saturate server resources, degrading service for all users. Rate limiting also protects against brute-force attacks, prevents API scraping, and enforces fair use of shared resources.

Common rate-limiting granularities: per user ID, per API key, per IP address, per endpoint, or a combination. Typical limits: 100 requests per minute per user, 1000 per hour per API key. The rate limiter must be fast (adding <1ms overhead) and distributed (consistent across all API server replicas).

# Rate limiting scenarios
use_cases = [
    ('API authentication endpoint', '5 attempts per 15 min per IP', 'Brute-force protection'),
    ('Public search API',           '100 requests per minute per key', 'Fair use enforcement'),
    ('Email sending',               '50 emails per hour per user', 'Spam prevention'),
    ('Payment processing',          '10 transactions per second per account', 'Fraud prevention'),
    ('File upload',                 '5 uploads per minute per user', 'Resource quota'),
    ('Notification service',        '1000 pushes per second globally', 'Cost control'),
]
print(f'{'Endpoint/Feature':35s} {'Limit':40s} {'Reason'}')
print('-'*95)
for endpoint, limit, reason in use_cases:
    print(f'{endpoint:35s} {limit:40s} {reason}')

Rate Limiting Algorithm 1: Token Bucket

The token bucket algorithm maintains a bucket with a maximum capacity of N tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected. If below capacity, it is accepted and the token is consumed.

Token bucket allows bursting: if no requests come in for 5 seconds, the bucket fills up to N tokens, and then N requests can come in immediately. This is appropriate for APIs where occasional bursts are acceptable. The two parameters are capacity (burst size) and refill rate.

import time

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity       # max tokens (burst size)
        self.refill_rate = refill_rate # tokens added per second
        self.tokens = capacity         # start full
        self.last_refill = time.time()

    def allow(self):
        now = time.time()
        elapsed = now - self.last_refill
        # Refill tokens based on elapsed time
        self.tokens = min(self.capacity,
                          self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True    # request allowed
        return False       # rate limited

bucket = TokenBucket(capacity=5, refill_rate=2)  # 2 tokens/sec, burst=5
for i in range(8):
    allowed = bucket.allow()
    print(f'Request {i+1}: {"ALLOWED" if allowed else "REJECTED"} (tokens={bucket.tokens:.1f})')
    time.sleep(0.1)  # 0.1s between requests

All lessons in this course

The System Design Interview Framework
Scalable Data Storage: SQL vs NoSQL
Caching, CDNs, and Load Balancing
Design Rate Limiter and Design Twitter Feed

← Back to DSA Interview Prep