MapReduce

advanced
data
Enhanced Content

Definition

A programming model for processing large datasets in parallel across many machines. It works in two phases: 'Map' breaks the problem into smaller chunks and processes each independently, then 'Reduce' combines the results — like having thousands of workers each count words on one page of a book, then tallying everyone's counts into a final total.

Real-World Example

A web company uses MapReduce to count how many times each URL appears across billions of server log entries, a job that would take days on one machine but finishes in minutes across a cluster.

Cloud Provider Equivalencies

MapReduce is a programming model rather than a single cloud product. AWS EMR, Azure HDInsight, Google Cloud Dataproc, and OCI Big Data Service can run Apache Hadoop MapReduce jobs on managed clusters. These services reduce the operational work of provisioning, scaling, and maintaining Hadoop infrastructure.

AWS
Amazon EMR
AZ
Azure HDInsight
GCP
Google Cloud Dataproc
OCI
Oracle Cloud Infrastructure Big Data Service

Explore More Cloud Computing Terms