AI Agents · Lesson

Handling Pagination and Dynamic Content

Following next-page links and using Playwright for JavaScript-rendered pages.

The Pagination Problem

Most real websites don't serve all their data on a single page. Results are split across multiple pages to reduce load. An agent must detect and follow pagination to collect complete datasets.

There are two common pagination patterns: next-page links and page number URL patterns.

Detecting Next-Page Links

Many sites have a 'Next' button or arrow link. Find the anchor tag with rel='next' or with text like 'Next', extract its href, and follow it in a loop until none exists.

from bs4 import BeautifulSoup
import httpx

def scrape_all_pages(start_url: str) -> list:
    results = []
    url = start_url

    while url:
        response = httpx.get(url, timeout=10.0)
        soup = BeautifulSoup(response.text, 'html.parser')

        # Collect data from this page
        for item in soup.select('.result-item'):
            results.append(item.text.strip())

        # Find the next page link
        next_link = soup.find('a', rel='next') or soup.find('a', string='Next')
        url = next_link.get('href') if next_link else None

    return results

All lessons in this course

HTTP Clients for Agents: httpx and requests
Parsing HTML with BeautifulSoup
Handling Pagination and Dynamic Content
Respectful Scraping Practices

← Back to AI Agents