Amazon Athena: Serverless SQL on S3
Query S3 data directly with standard SQL in Athena, optimise with columnar formats like Parquet and ORC, and partition for cost control.
What Is Amazon Athena?
Amazon Athena is a serverless, interactive query service that lets you run standard SQL directly against data stored in Amazon S3. There are no servers to provision, no clusters to manage, and you pay only for the data scanned per query (approximately $5 per TB scanned). Athena uses Presto under the hood and integrates natively with the Glue Data Catalogue for table metadata.
Setting Up Athena: Workgroups and Output Location
Before running queries, configure an Athena Workgroup and specify an S3 path for query result output. Workgroups let you separate query history and cost tracking between teams, enforce encryption on results, and set per-query data-scan limits to prevent runaway costs. Each query result is written as a CSV to the configured S3 output bucket.
# Create a workgroup with an encrypted output location
aws athena create-work-group \
--name analytics-team \
--configuration '{
"ResultConfiguration": {
"OutputLocation": "s3://athena-results-123/analytics-team/",
"EncryptionConfiguration": {"EncryptionOption": "SSE_S3"}
},
"EnforceWorkGroupConfiguration": true,
"PublishCloudWatchMetricsEnabled": true,
"BytesScannedCutoffPerQuery": 10737418240
}'All lessons in this course
- Building a Data Lake on S3
- AWS Glue: ETL and Data Catalogue
- Amazon Athena: Serverless SQL on S3
- Kinesis Streams, Firehose, and Real-Time Analytics