MongoDB Academy · Lesson

Partitioning S3 Data for Query Performance

Learners will define partition attributes on S3 paths so Data Federation can prune irrelevant files and deliver fast analytical queries.

Why Partitioning S3 Data Matters

In Atlas Data Federation, every query against an S3-backed virtual collection potentially reads many files. Without partitioning, even a query asking for a single day's data might scan an entire year of files. Partitioning organises S3 objects into a directory structure that encodes queryable metadata in the path, allowing the query engine to skip irrelevant files — a technique called partition pruning.

Partition Pruning: The Core Mechanism

Partition pruning works because Atlas Data Federation parses the S3 object key (path) and extracts the values defined as partition attributes in the storage configuration. When a query filter matches one of these attributes, the query engine only reads objects whose path values match — skipping all others without even issuing S3 GetObject requests for them.

// Path template with partition attributes
// /events/{year int}/{month int}/{day int}/data.parquet

// Query: fetch March 15, 2025 data
db.events.find({ year: 2025, month: 3, day: 15 })

// Data Federation issues S3 ListObjects only for:
// /events/2025/3/15/
// All other years/months/days are skipped

All lessons in this course

What Is Atlas Data Federation?
Mapping S3 and Atlas Sources to a Virtual Namespace
Running Cross-Source Aggregation Pipelines
Partitioning S3 Data for Query Performance

← Back to MongoDB Academy