An Availability Zone (AZ) is a physically separate data center (or cluster of data centers) within a cloud region, connected to other AZs in the same region through low-latency, high-bandwidth private fiber links. Each AZ has independent power, cooling, and networking infrastructure so that a failure in one zone — a power outage, fire, or flooding — does not affect the others. Deploying across multiple AZs is the foundational pattern for building highly available systems: you run application instances in at least two AZs and sit them behind a load balancer so traffic automatically routes away from the failed zone. AWS regions typically have 3–6 AZs; Azure calls them Availability Zones; GCP uses zones within regions; OCI has Availability Domains. When would you use multiple Availability Zones? Always, for any production workload that needs to remain available during infrastructure failures. Practically this means: deploying your EC2 Auto Scaling groups or ECS tasks across at least two AZs, enabling Multi-AZ on your RDS databases (automatic standby replica in a different AZ), distributing subnets across AZs in your VPC, and using services like ELB or Route 53 that automatically route around unhealthy AZs. The cost trade-off is worth it — inter-AZ data transfer has a small fee, but the alternative is accepting downtime when (not if) a single AZ has an incident. Common mistakes: deploying all EC2 instances or containers in a single AZ to save on data transfer costs (this creates a single point of failure), confusing Availability Zones with Regions (AZs are within a region — multi-region is a separate, higher-level resilience pattern), and not testing failover behavior (run chaos engineering experiments to verify your application actually handles AZ failures gracefully).
A retail application deploys EC2 instances in us-east-1a, us-east-1b, and us-east-1c. When us-east-1b experiences a networking issue, the Application Load Balancer automatically routes 100% of traffic to the healthy instances in the other two zones — achieving 99.99% uptime with zero manual intervention. Architecture use case: a healthcare SaaS platform uses Multi-AZ RDS for the database (automatic failover to the standby replica in under 60 seconds), an Auto Scaling group spreading EC2 application servers across three AZs, and ElastiCache with Multi-AZ replication — achieving the 99.99% SLA required by their compliance framework.
All four refer to physically separate locations within a larger geographic area used to design for high availability. AWS and Azure call them Availability Zones; Google Cloud uses Zones within a Region; OCI uses Availability Domains (and also Fault Domains for additional separation within an AD).