Data Lakehouse

Definition

A modern data architecture that combines the flexibility of data lakes with the structured querying and reliability of data warehouses.

Use Cases

Provider Equivalents

Frequently Asked Questions

What’s the difference between a data lakehouse and a data warehouse?
A data warehouse stores curated, structured data optimized for SQL analytics and strong governance. A data lakehouse keeps data in low-cost object storage like a data lake (including raw and semi-structured data) but adds warehouse-like features—ACID transactions, schema enforcement, and performance optimizations—so you can run reliable SQL analytics and BI directly on the lake data.
When should I use a data lakehouse?
Use a lakehouse when you need one platform for both BI/SQL analytics and data science/ML, especially if you have large volumes of semi-structured data (logs, events, IoT) and want to avoid copying data between a lake and a warehouse. It’s also a good fit when you want open formats (e.g., Delta Lake or Apache Iceberg) and centralized governance over many data types.
How much does a data lakehouse cost?
Cost depends on (1) storage (object storage for raw and curated data), (2) compute for ingestion, transformation, and queries (often billed per vCPU/hour, DBU, or capacity units), (3) concurrency and workload patterns (interactive BI vs batch ETL), and (4) data governance and networking. Lakehouse costs are typically optimized by separating storage and compute, using autoscaling, choosing efficient file/table layouts, and minimizing unnecessary data copies.

Category: data

Difficulty: advanced

Related Terms

See Also