Handling Pagination and Dynamic Content
Following next-page links and using Playwright for JavaScript-rendered pages.
The Pagination Problem
Most real websites don't serve all their data on a single page. Results are split across multiple pages to reduce load. An agent must detect and follow pagination to collect complete datasets.
There are two common pagination patterns: next-page links and page number URL patterns.
Detecting Next-Page Links
Many sites have a 'Next' button or arrow link. Find the anchor tag with rel='next' or with text like 'Next', extract its href, and follow it in a loop until none exists.
from bs4 import BeautifulSoup
import httpx
def scrape_all_pages(start_url: str) -> list:
results = []
url = start_url
while url:
response = httpx.get(url, timeout=10.0)
soup = BeautifulSoup(response.text, 'html.parser')
# Collect data from this page
for item in soup.select('.result-item'):
results.append(item.text.strip())
# Find the next page link
next_link = soup.find('a', rel='next') or soup.find('a', string='Next')
url = next_link.get('href') if next_link else None
return resultsAll lessons in this course
- HTTP Clients for Agents: httpx and requests
- Parsing HTML with BeautifulSoup
- Handling Pagination and Dynamic Content
- Respectful Scraping Practices