Storing Collected Data Efficiently
Saving to CSV/JSON/Parquet, appending safely, deduplication, incremental collection.
From Raw Responses to Stored Data
After collecting data you need to store it so it can be reloaded, shared, and analyzed. The right format and a few habits make a big difference in speed and disk usage.
This lesson covers CSV, Parquet, appending batches, and deduplication.
JSON to DataFrame
API responses are usually lists of dicts. pd.DataFrame turns that directly into a table ready for storage.
import pandas as pd
records = [
{"id": 1, "name": "A", "score": 9.5},
{"id": 2, "name": "B", "score": 8.0},
]
df = pd.DataFrame(records)
print(df)All lessons in this course
- REST API Fundamentals for Data Collection
- Paginating and Collecting Large Datasets
- Storing Collected Data Efficiently
- Working with Public Data APIs