Pandas & NumPy Academy · Lesson

Reading from URLs and StringIO

Load remote CSV files directly from a URL and parse in-memory CSV strings using io.StringIO for testing.

Reading Data Without Downloading Files

You do not always have to save a file to disk before loading it into Pandas. pd.read_csv(url) accepts an HTTP or HTTPS URL directly and downloads the file into a DataFrame in one step. This is ideal for public datasets hosted on GitHub, data.gov, or any web server, and makes notebooks reproducible — anyone can run them without pre-downloading assets.

import pandas as pd

url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv'
df = pd.read_csv(url)
print(df.shape)   # (244, 7)
print(df.head(2))

How URL Reading Works Internally

When you pass a URL to read_csv(), Pandas uses Python's urllib or the requests library internally to download the raw bytes, then parses them exactly as if they came from a local file. Most read functions (read_excel, read_json, read_parquet) support URLs. Files compressed as .gz or .bz2 are decompressed automatically.

import pandas as pd

# Compressed CSV from URL -- auto-decompressed
url = 'https://example.com/data.csv.gz'
df = pd.read_csv(url, compression='infer')
print(df.shape)

All lessons in this course

← Back to Pandas & NumPy Academy