Multi-AZ Patterns for Stateful Services
Apply Multi-AZ to RDS, ElastiCache, EFS, and ELB to eliminate single points of failure within a Region.
Why Stateful Services Need Multi-AZ
Stateful services — databases, caches, file systems — are the hardest components to make highly available because they hold data that must survive failures. If a single-AZ database fails, your entire application loses its data store. AWS's answer is Multi-AZ deployments, where the service maintains a synchronous or near-synchronous replica in a second Availability Zone that can take over rapidly when the primary fails.
RDS Multi-AZ: Synchronous Standby
RDS Multi-AZ maintains a synchronous standby replica in a different AZ. Every write to the primary is synchronously replicated before acknowledging success — this means zero data loss (RPO=0) but a slight write latency increase. When the primary fails, RDS automatically updates the DNS endpoint to point to the standby in 60-120 seconds. Your application only needs to reconnect to the same endpoint — no code changes required.
# Enable Multi-AZ on existing RDS instance
aws rds modify-db-instance \
--db-instance-identifier mydb \
--multi-az \
--apply-immediately
# RDS endpoint stays the same after failover
# Application reconnects to same DNS nameAll lessons in this course
- HA vs Fault Tolerance: Definitions and Trade-offs
- Multi-AZ Patterns for Stateful Services
- Multi-Region Active-Active and Active-Passive
- Health Checks, Circuit Breakers, and Retry Logic