AWS Solutions Architect · Lesson

Building a Data Lake on S3

Design an S3-based data lake with a landing, processing, and curated zone, apply bucket policies, and organise data by partition for query efficiency.

What Is a Data Lake?

A data lake is a centralised repository that stores structured, semi-structured, and unstructured data at any scale. Unlike a data warehouse, a data lake stores data in its raw, native format until it is needed for analysis. Amazon S3 is the most common foundation for data lakes on AWS because of its durability, scalability, and integration with analytics services.

Data Lake Zones Architecture

A well-designed S3 data lake uses three logical zones: the Landing Zone (raw ingest, untouched), the Processing Zone (cleansed and transformed), and the Curated Zone (analytics-ready, business-consumable). Each zone is typically a separate S3 prefix or bucket. This pattern is sometimes called a medallion architecture (bronze, silver, gold).

# Example zone structure inside one S3 bucket
# s3://my-data-lake/
#   landing/    <- raw ingest from source systems
#   processing/ <- cleansed, validated data
#   curated/    <- aggregated, analytics-ready

All lessons in this course

Building a Data Lake on S3
AWS Glue: ETL and Data Catalogue
Amazon Athena: Serverless SQL on S3
Kinesis Streams, Firehose, and Real-Time Analytics

← Back to AWS Solutions Architect