Disaster Recovery
Definition
Plans and processes to restore technology systems after catastrophic events like fires, floods, or cyberattacks, ensuring business resilience.
Use Cases
- Netflix: Maintain streaming availability during regional outages by failing over services to other AWS regions. — Built a multi-region architecture on AWS with automated failover patterns and regular resilience testing (e.g., chaos engineering) to validate recovery procedures. (Improved service resilience and reduced the risk that a single region failure would cause a prolonged outage for customers.)
- GitHub: Restore core developer services after a major database incident affecting availability. — Used backups, replication, and a coordinated incident response and recovery process to restore database consistency and bring services back online. (Services were restored, and the incident drove improvements to backup/restore procedures, replication strategy, and operational runbooks.)
Provider Equivalents
- AWS: AWS Elastic Disaster Recovery
- Azure: Azure Site Recovery
- GCP: Backup and DR Service
- OCI: OCI Disaster Recovery
Frequently Asked Questions
- What's the difference between Disaster Recovery and High Availability?
- High Availability (HA) is designed to keep an application running during common failures (like a server or zone outage) with minimal interruption. Disaster Recovery (DR) is the plan and tooling to restore systems after a major event (like a region-wide outage, ransomware, or a destroyed data center). HA reduces downtime day-to-day; DR is your safety net for worst-case scenarios.
- When should I use Disaster Recovery?
- Use DR when downtime or data loss would seriously harm your business (lost revenue, safety risk, legal/regulatory impact, or reputational damage). It’s especially important for customer-facing apps, payment systems, healthcare/financial workloads, and any system with strict recovery targets. A common starting point is to define RTO (how fast you must recover) and RPO (how much data you can afford to lose), then choose a DR approach that meets them.
- How much does Disaster Recovery cost?
- Cost depends on your recovery targets and architecture. Main factors include: duplicate infrastructure (hot/warm standby vs. cold), data replication and storage (snapshots, backup retention, cross-region copies), network egress and inter-region transfer, licensing (OS/database), and testing/operations time. Faster recovery (lower RTO/RPO) usually costs more because it requires more always-on capacity and continuous replication.
Category: software
Difficulty: basic
Related Terms
See Also