Replication
Definition
Replication involves creating copies of data across multiple database servers, ensuring high availability and improved performance for applications.
Use Cases
- Netflix: Serve user-facing application data with low latency across multiple AWS regions and remain available during regional issues. — Uses Amazon DynamoDB with global replication (DynamoDB Global Tables) to keep data synchronized across regions so applications can read/write locally. (Improved regional resilience and reduced latency for globally distributed users by keeping data closer to where requests originate.)
- GitHub: Scale read-heavy traffic for core application data while maintaining a primary source of truth. — Uses MySQL replication to create read replicas that offload read queries from the primary database and support high read throughput. (Higher read capacity and better performance under load by distributing read traffic across replicas.)
- Spotify: Provide reliable, low-latency access to data for users across geographies with strong consistency requirements. — Uses Google Cloud Spanner, which replicates data across zones/regions as part of its architecture to support high availability and global scale. (High availability and scalable performance with replicated data across multiple locations to reduce downtime risk.)
Provider Equivalents
- AWS: Amazon RDS (read replicas, Multi-AZ), Amazon Aurora (replicas, Global Database), Amazon DynamoDB Global Tables
- Azure: Azure SQL Database (geo-replication, failover groups), Azure Cosmos DB (multi-region replication), Azure Database for PostgreSQL (read replicas)
- GCP: Cloud SQL (read replicas, HA), AlloyDB (read pools), Cloud Spanner (multi-region replication)
- OCI: Oracle Autonomous Database (Data Guard/standby), OCI MySQL Database Service (read replicas, HA), Oracle Database on OCI with Data Guard
Frequently Asked Questions
- What's the difference between replication and backup?
- Replication keeps one or more live copies of data in sync so applications can keep running if a server fails or to speed up reads. Backups are point-in-time snapshots meant for recovery from accidental deletion, corruption, or ransomware. Replication helps with availability and performance; backups help you restore to an earlier point in time.
- When should I use replication?
- Use replication when you need higher availability (failover to another copy), better read performance (send read traffic to replicas), or lower latency for global users (replicate closer to users). It’s common for production databases, read-heavy workloads, and multi-region applications. If your main goal is long-term retention or restoring old versions, prioritize backups in addition to replication.
- How much does replication cost?
- Costs usually include: (1) extra database instances or nodes for replicas/standbys, (2) additional storage for each copy, (3) network data transfer—especially cross-region replication, and (4) higher I/O or write overhead depending on sync method. Managed services may also charge for features like global databases or multi-region writes. The biggest drivers are number of replicas, instance size, and cross-region traffic.
Category: data
Difficulty: intermediate
Related Terms
See Also