Hadoop is an open-source framework for distributed storage and parallel processing of very large datasets across clusters of commodity servers. Its core components, including HDFS for storage and YARN and MapReduce for resource management and batch computation, enable scalable data processing in on-premises and cloud-based big data environments. In cloud architectures, Hadoop is often used for data lakes, ETL pipelines, and large-scale analytics workloads.
A retail company runs Hadoop on a managed cloud cluster to store clickstream and transaction logs in HDFS, then executes batch jobs to aggregate customer behavior data for reporting and recommendation models.