RTO, RPO, and DR Tiers
Define Recovery Time Objective and Recovery Point Objective, map them to cost tiers, and understand what SLA commitments each DR strategy supports.
Understanding RTO and RPO
Recovery Time Objective (RTO) is the maximum acceptable time from when a disaster occurs until your system is restored to operation. If your RTO is 4 hours, your business can tolerate 4 hours of downtime. Recovery Point Objective (RPO) is the maximum acceptable amount of data loss measured in time — if your RPO is 1 hour, you must be able to recover to a point no more than 1 hour before the disaster. Both metrics are defined by business requirements, not technical preferences.
# RTO and RPO definitions:
# RTO = max time system can be DOWN
# Example: RTO=4h means restore within 4 hours
#
# RPO = max data LOSS acceptable
# Example: RPO=1h means no more than 1 hour of data lost
#
# Lower RTO and RPO = more expensive DR strategy
# Higher RTO and RPO = cheaper but more business impactThe Four DR Tiers
AWS defines four primary Disaster Recovery strategies, ordered from lowest cost / highest RTO to highest cost / lowest RTO: 1) Backup and Restore — cheapest, hours of RTO. 2) Pilot Light — minimal core always running, minutes to hours RTO. 3) Warm Standby — scaled-down but functional, minutes RTO. 4) Multi-Site Active-Active — most expensive, near-zero RTO. Your choice depends on the business cost of downtime versus the cost of the DR infrastructure.
# DR Strategy comparison:
# Strategy | RTO | RPO | Cost
# Backup & Restore | Hours | Hours | Lowest
# Pilot Light | Minutes+ | Minutes | Low
# Warm Standby | Minutes | Seconds | Medium
# Active-Active | ~0 | ~0 | Highest