HA vs Fault Tolerance: Definitions and Trade-offs
Clarify the distinction between high availability (minimising downtime) and fault tolerance (zero-downtime through redundancy), and see how cost scales with each level.
HA vs Fault Tolerance Overview
High Availability (HA) and Fault Tolerance (FT) are two distinct reliability goals that architects often confuse. High availability means a system experiences minimal downtime — it can tolerate failures but may have brief interruptions during recovery. Fault tolerance means a system continues operating without any interruption even when components fail, by having fully redundant paths that take over instantly.
Defining Availability Percentages
Availability is measured as a percentage of uptime over a year. 99.9% availability (three nines) means roughly 8.7 hours of downtime per year, while 99.99% (four nines) allows only 52.6 minutes. 99.999% (five nines) allows just 5.26 minutes. Each additional nine typically requires more redundancy, automation, and cost. The SAA-C03 exam often asks you to identify which architecture meets a given availability target.
# Availability calculations
# 99.9% → 8.76 hours/year downtime
# 99.99% → 52.6 minutes/year downtime
# 99.999% → 5.26 minutes/year downtime
# Formula: downtime = (1 - availability) * 8760 hoursAll lessons in this course
- HA vs Fault Tolerance: Definitions and Trade-offs
- Multi-AZ Patterns for Stateful Services
- Multi-Region Active-Active and Active-Passive
- Health Checks, Circuit Breakers, and Retry Logic