Data Replication
Definition
Creating and maintaining duplicate copies of data across multiple locations or servers for enhanced reliability and performance benefits.
Use Cases
- Netflix: Highly available, low-latency access to application data across regions — Netflix has publicly described using Amazon DynamoDB global tables to replicate data across multiple AWS Regions so services can read/write locally and continue operating during regional issues. (Improved resilience to regional failures and reduced user-facing latency by keeping data closer to services and customers.)
- Spotify: Global, consistent data access for user and application data — Spotify has publicly discussed using Google Cloud Spanner, which supports multi-region configurations that replicate data across regions with strong consistency for many workloads. (Simplified operations for globally distributed data while maintaining high availability and consistent reads/writes across regions.)
Provider Equivalents
- AWS: Amazon S3 Replication (CRR/SRR) and Amazon Aurora Replicas
- Azure: Azure Storage replication (LRS/ZRS/GRS/GZRS) and Azure SQL geo-replication
- GCP: Cloud Storage dual-region/multi-region and Cloud Spanner multi-region configurations
- OCI: OCI Object Storage Replication and Autonomous Database/Data Guard
Frequently Asked Questions
- What's the difference between data replication and backup?
- Replication keeps one or more up-to-date copies of data in other locations for high availability and fast access. Backups are point-in-time copies mainly for recovery from accidental deletion, corruption, or ransomware. Replication can copy bad changes quickly; backups let you restore an earlier, known-good version.
- When should I use data replication?
- Use replication when you need higher availability (fail over if a zone/region fails), lower latency for global users (serve data closer to them), or better read scalability (read replicas). It’s common for customer-facing apps, multi-region disaster recovery, and databases that need to handle many reads.
- How much does data replication cost?
- Costs usually come from (1) extra storage for the replicas, (2) data transfer/egress between zones or regions, and (3) replication operations or additional database instances (for read replicas). Cross-region replication is typically more expensive than same-region, and synchronous replication can require more resources and may increase latency-related costs.
Category: data
Difficulty: intermediate
Related Terms
See Also