Kinesis Streams, Firehose, and Real-Time Analytics
Ingest streaming data with Kinesis Data Streams, deliver it to S3 or Redshift with Firehose, and analyse it in real time with Managed Service for Apache Flink.
The Kinesis Family Overview
Amazon Kinesis is a family of services for collecting, processing, and analysing real-time streaming data. The three core services are: Kinesis Data Streams (low-latency, custom processing), Kinesis Data Firehose (fully managed delivery to S3/Redshift/OpenSearch), and Managed Service for Apache Flink (formerly Kinesis Data Analytics) for real-time SQL and Flink processing. Each service targets a different point in the streaming pipeline.
Kinesis Data Streams Architecture
A Kinesis Data Stream is a durable, ordered log partitioned into shards. Each shard provides 1 MB/s write throughput and 2 MB/s read throughput. Data records are retained for 24 hours by default (extendable to 7 days or 365 days). Producers write records to a stream; consumers — Lambda, KCL applications, Firehose, or Flink — read from one or more shards in parallel. Records are immutable once written.
# Create a Kinesis Data Stream with 4 shards
aws kinesis create-stream \
--stream-name clickstream \
--shard-count 4
# Put a record into the stream
aws kinesis put-record \
--stream-name clickstream \
--partition-key 'user-123' \
--data 'eyJldmVudCI6ICJjbGljayJ9'All lessons in this course
- Building a Data Lake on S3
- AWS Glue: ETL and Data Catalogue
- Amazon Athena: Serverless SQL on S3
- Kinesis Streams, Firehose, and Real-Time Analytics