AI Agents · Lesson

Rate Limiting and Retry Logic

Exponential backoff, 429 handling, and respectful API consumption.

What Is Rate Limiting?

Rate limiting is how APIs protect themselves from being overwhelmed. When your agent sends too many requests too quickly, the API returns 429 Too Many Requests. Common limits include requests per second, per minute, or per day.

Ignoring rate limits results in blocked agents, revoked API keys, and extra charges.

import requests

response = requests.get(
    'https://api.example.com/data',
    headers={'Authorization': 'Bearer YOUR_KEY'}
)

if response.status_code == 429:
    print('Rate limit exceeded!')
    # Check headers for limit details
    limit = response.headers.get('X-RateLimit-Limit')
    remaining = response.headers.get('X-RateLimit-Remaining')
    reset = response.headers.get('X-RateLimit-Reset')
    print(f'Limit: {limit}, Remaining: {remaining}, Reset: {reset}')

The Retry-After Header

When an API returns 429, it often includes a Retry-After header telling you exactly how many seconds to wait before retrying. Always respect this header — ignoring it and retrying immediately will just get you another 429.

import requests
import time

def request_with_retry_after(url, headers):
    response = requests.get(url, headers=headers)

    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 60))
        print(f'Rate limited. Waiting {retry_after} seconds...')
        time.sleep(retry_after)

        # Retry once after waiting
        response = requests.get(url, headers=headers)

    response.raise_for_status()
    return response.json()

All lessons in this course

REST API Fundamentals for Agent Developers
Authentication: API Keys and OAuth
Handling API Responses and Errors
Rate Limiting and Retry Logic

← Back to AI Agents