Data Redundancy
Definition
Data redundancy refers to storing multiple copies of data to protect against loss, ensuring data availability and reliability in cloud environments.
Use Cases
- Netflix: Highly available storage for media assets and application data to reduce the risk of data loss and service disruption — Uses AWS and designs for failure by distributing systems across multiple Availability Zones; stores data in managed services that replicate for durability and availability (for example, object storage and replicated databases), and uses backups/snapshots for additional protection (Improved resilience to infrastructure failures and reduced risk of data loss, supporting continuous streaming availability at global scale)
- Dropbox: Protecting user files from disk/server failures while keeping files available — Uses redundancy techniques such as storing multiple copies/replicas of file blocks across different storage nodes and maintaining metadata redundancy to recover from hardware failures (Higher durability and availability of user data, with the ability to recover from component failures without losing files)
Provider Equivalents
- AWS: Amazon S3 (storage classes with multiple Availability Zone redundancy) and Amazon EBS (gp3/io2 with replication within an AZ)
- Azure: Azure Storage (LRS/ZRS/GRS/RA-GRS redundancy options) and Azure Managed Disks
- GCP: Google Cloud Storage (Standard with multi-region/dual-region options) and Persistent Disk
- OCI: OCI Object Storage (replicated within a region) and OCI Block Volumes
Frequently Asked Questions
- What's the difference between data redundancy and backups?
- Data redundancy keeps multiple copies of the same data available at the same time (often automatically) so a failure doesn’t interrupt access. Backups are point-in-time copies kept separately so you can restore data after accidental deletion, corruption, ransomware, or a bad update. Redundancy helps with availability; backups help with recovery.
- When should I use data redundancy?
- Use data redundancy when you need high availability and durability—such as customer-facing apps, critical databases, shared file storage, and compliance-sensitive data. Choose multi-zone or cross-region redundancy when downtime is expensive or when you need disaster recovery protection from a full data center outage.
- How much does data redundancy cost?
- Cost depends on how many copies are kept and where they are stored. Multi-zone or cross-region redundancy typically costs more than single-zone because it uses more storage capacity and may add replication and data transfer charges. Object storage often prices redundancy through storage classes (for example, locally redundant vs geo-redundant options), while databases and block storage may charge for replicas, additional nodes, snapshots, and cross-region replication.
Category: data
Difficulty: intermediate
Related Terms
See Also