Data Warehouse
Definition
Structured storage system optimized for analysis and reporting of organized business data, supporting decision-making and business intelligence.
Use Cases
- The Home Depot: Enterprise analytics to support merchandising, supply chain, and store operations reporting at scale. — Built a cloud-based analytics platform on Google Cloud using BigQuery as the central data warehouse, integrating data from operational systems and enabling BI and analytics workloads. (Improved ability to run large-scale analytics and reporting with managed scaling, supporting faster insights for business teams.)
- Spotify: Analytics on user listening behavior and product metrics to support reporting and experimentation. — Uses Google BigQuery for large-scale analytics and SQL-based reporting across event and business datasets. (Enables teams to query and analyze very large datasets efficiently for product and business decision-making.)
- Airbnb: Company-wide reporting and analytics on marketplace activity, performance metrics, and business operations. — Adopted a centralized data warehouse approach (including use of cloud data warehousing) to make curated datasets available for analytics and BI across the organization. (More consistent metrics and improved access to trusted analytical datasets for decision-making across teams.)
Provider Equivalents
- AWS: Amazon Redshift
- Azure: Azure Synapse Analytics (Dedicated SQL pool)
- GCP: BigQuery
- OCI: Oracle Autonomous Data Warehouse
Frequently Asked Questions
- What's the difference between a data warehouse and a data lake?
- A data warehouse stores curated, structured data optimized for SQL analytics and reporting (cleaned, modeled, and governed). A data lake stores raw or semi-structured data (files, logs, JSON, images) more flexibly, often used for data science, exploration, and later transformation. Many organizations use both: land data in a lake, then transform and load trusted datasets into a warehouse for BI.
- When should I use a data warehouse?
- Use a data warehouse when you need reliable reporting and dashboards, consistent business metrics (like revenue, churn, inventory turns), fast SQL queries over large historical datasets, and strong governance (access controls, auditing, data quality). It’s especially useful when multiple teams need a shared source of truth for analytics.
- How much does a data warehouse cost?
- Cost depends on (1) compute model (serverless per-query vs provisioned capacity), (2) data storage volume, (3) query frequency and complexity, (4) concurrency (how many users/tools query at once), (5) data ingestion/ETL costs, and (6) data egress/networking. For example, serverless warehouses often charge for data scanned per query plus storage, while provisioned warehouses charge for allocated compute (hourly) plus storage. Optimizing partitioning, clustering/sort keys, materialized views, and workload management can significantly reduce cost.
Category: data
Difficulty: advanced
Related Terms
See Also