Cold Standby
Definition
Backup system that's not running and requires manual setup to activate. Like spare equipment in storage that needs installation before use.
Use Cases
- Netflix: Disaster recovery preparedness and regional resilience testing for cloud-based services — Netflix is known for engineering resilience on AWS and has publicly discussed multi-region considerations and failure testing. A cold-standby-style approach can be used for non-critical internal systems by keeping backups and infrastructure definitions ready while not running duplicate capacity until needed. (Lower steady-state costs compared to always-on DR for systems that can tolerate longer recovery times, while still maintaining a path to restore service after major incidents.)
- GitHub: Service continuity planning for platform components that can be restored from backups after major outages — GitHub has publicly documented incident response and recovery practices. For some components, a cold-standby approach is commonly used in the industry: maintain offsite backups and documented runbooks to rebuild and restore systems when required rather than running full parallel environments continuously. (Reduced ongoing infrastructure spend for workloads where longer RTO is acceptable, with recovery dependent on restore time, provisioning time, and operational readiness.)
Frequently Asked Questions
- What's the difference between cold standby and warm standby?
- Cold standby means the backup environment is not running. You typically have backups, images, and deployment scripts stored, and you manually start infrastructure and restore data during an outage. Warm standby keeps a smaller version of the environment running (or partially running), so failover is faster but costs more.
- When should I use cold standby for disaster recovery?
- Use cold standby when you need a low-cost DR option and your business can tolerate longer recovery times (higher RTO). It’s common for small businesses, internal tools, dev/test systems, or applications where a few hours (or more) of downtime is acceptable, and where you can rely on documented runbooks and periodic restore tests.
- How much does cold standby cost?
- Costs are usually dominated by storage (backups, snapshots, archives), plus any minimal supporting services you keep (like DNS hosting or key management). You typically avoid paying for always-on compute in the standby site. During a disaster or DR test, you’ll incur temporary costs for provisioning compute, networking, and data restore/egress (if applicable). Pricing depends on backup size, retention period, storage tier (standard vs archive), restore frequency, and how quickly you need to bring systems online.
Category: cloud
Difficulty: intermediate
Related Terms
See Also