HDFS

advanced
data
Enhanced Content

Definition

Hadoop Distributed File System — the storage layer that Hadoop uses to spread large files across many machines in a cluster. It breaks files into blocks, stores multiple copies of each block on different nodes, and automatically recovers from hardware failures. Think of it like splitting a huge book into chapters and keeping two backup copies of each chapter on different shelves so the book is never lost even if some shelves collapse.

Real-World Example

A media company stores 500 TB of raw video footage in HDFS across a 200-node cluster, so Hadoop jobs can process any video file without waiting for a single disk to serve the whole thing.

Cloud Provider Equivalencies

These services provide scalable, distributed storage solutions similar to HDFS, allowing for large-scale data storage and processing.

AWS
Amazon S3
AZ
Azure Data Lake Storage
GCP
Google Cloud Storage
OCI
OCI Object Storage

Explore More Cloud Computing Terms