Azure Data Lake Storage

advanced
data
Enhanced Content

Definition

Azure's hyperscale storage service purpose-built for big data analytics workloads. Built on top of Azure Blob Storage, it adds a hierarchical namespace — organizing files into real directories the way a traditional filesystem does — and exposes a Hadoop-compatible interface (ABFS) so big data tools like Apache Spark and Hive can read and write data using the same APIs they use on-premises. ADLS Gen2 is the recommended storage foundation for both Azure Synapse Analytics and Databricks on Azure: data lands here first, pipelines transform it in place, and analytics engines query it directly without copying data to a separate store. Fine-grained access control via POSIX-style ACLs lets security teams apply file- and folder-path-level permissions without moving data.

Real-World Example

A retail company stores petabytes of raw clickstream logs in ADLS Gen2 organized under a hierarchical path like /raw/year/month/day/. Azure Synapse serverless SQL queries the raw layer directly for ad-hoc reports, while a Databricks Spark job reads the same files to build Delta Lake tables in the /curated/ layer — all without duplicating any data.

Cloud Provider Equivalencies

These services provide scalable storage solutions for big data analytics, each with unique features like hierarchical namespaces or integration with analytics tools.

AWS
Amazon S3 with AWS Lake Formation
AZ
Azure Data Lake Storage
GCP
Google Cloud Storage with BigQuery
OCI
Oracle Cloud Infrastructure Data Lake

Explore More Cloud Computing Terms