🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-900 Domain 4
Domain 4 — Module 2 of 8 25%
21 of 27 overall

DP-900 Study Guide

Domain 1: Core Data Concepts

  • Your First Look at Data Free
  • Data File Formats: CSV, JSON, Parquet & More Free
  • Databases: Relational vs Non-Relational Free
  • Transactional Workloads: Keeping Data Consistent Free
  • Analytical Workloads: Finding the Insights Free
  • Data Roles: DBA, Engineer & Analyst Free
  • The Azure Data Landscape Free

Domain 2: Relational Data on Azure

  • Relational Data: Tables, Keys & Relationships
  • Normalization: Why Duplicate Data is Bad
  • SQL Basics: SELECT, INSERT, UPDATE, DELETE
  • Database Objects: Views, Indexes & More
  • Azure SQL: Your Database in the Cloud
  • Open-Source Databases on Azure
  • Choosing the Right Azure Database

Domain 3: Non-Relational Data on Azure

  • Azure Blob Storage: Files in the Cloud
  • Azure Files & Table Storage
  • Azure Cosmos DB: The Global Database
  • Cosmos DB APIs: SQL, MongoDB & More
  • Choosing Non-Relational Storage

Domain 4: Analytics on Azure

  • Data Ingestion & Processing
  • Analytical Data Stores: Data Lakes, Warehouses & Lakehouses
  • Microsoft Fabric & Azure Databricks
  • Batch vs Streaming: Two Speeds of Data
  • Real-Time Analytics on Azure
  • Power BI: See Your Data
  • Data Models in Power BI
  • Choosing the Right Visualization

DP-900 Study Guide

Domain 1: Core Data Concepts

  • Your First Look at Data Free
  • Data File Formats: CSV, JSON, Parquet & More Free
  • Databases: Relational vs Non-Relational Free
  • Transactional Workloads: Keeping Data Consistent Free
  • Analytical Workloads: Finding the Insights Free
  • Data Roles: DBA, Engineer & Analyst Free
  • The Azure Data Landscape Free

Domain 2: Relational Data on Azure

  • Relational Data: Tables, Keys & Relationships
  • Normalization: Why Duplicate Data is Bad
  • SQL Basics: SELECT, INSERT, UPDATE, DELETE
  • Database Objects: Views, Indexes & More
  • Azure SQL: Your Database in the Cloud
  • Open-Source Databases on Azure
  • Choosing the Right Azure Database

Domain 3: Non-Relational Data on Azure

  • Azure Blob Storage: Files in the Cloud
  • Azure Files & Table Storage
  • Azure Cosmos DB: The Global Database
  • Cosmos DB APIs: SQL, MongoDB & More
  • Choosing Non-Relational Storage

Domain 4: Analytics on Azure

  • Data Ingestion & Processing
  • Analytical Data Stores: Data Lakes, Warehouses & Lakehouses
  • Microsoft Fabric & Azure Databricks
  • Batch vs Streaming: Two Speeds of Data
  • Real-Time Analytics on Azure
  • Power BI: See Your Data
  • Data Models in Power BI
  • Choosing the Right Visualization
Domain 4: Analytics on Azure Premium ⏱ ~12 min read

Analytical Data Stores: Data Lakes, Warehouses & Lakehouses

Data lakes store everything raw. Data warehouses structure it for queries. Lakehouses combine both. Learn the three approaches to storing analytical data.

Where does analytical data live?

☕ Simple explanation

Think of three ways to organise a library.

A data lake is a giant storage room where every book, magazine, photo, and scrap of paper goes in — completely unsorted, but nothing is thrown away. A data warehouse is a curated bookshop — every book is catalogued, shelved by topic, and easy to find. A lakehouse combines both ideas — it keeps everything in the storage room, but adds a catalogue system so you can search and browse it like a bookshop.

The key choice: do you want to organise data before storing it (warehouse), keep it raw and organise later (lake), or do both at once (lakehouse)?

Analytical data stores are purpose-built for read-heavy workloads across large datasets. The three main architectures are:

Data lakes store raw data in its original format. You decide how to interpret the data when you query it (called “schema-on-read”). In Azure, this is Azure Data Lake Storage Gen2 (ADLS Gen2).

Data warehouses store clean, structured data. You define the format before loading (called “schema-on-write”). Optimised for SQL queries. In Azure, this is the Fabric Data Warehouse.

Lakehouses combine both — raw files with a transaction layer (Delta Lake format) that adds structure, versioning, and SQL query support. Microsoft Fabric uses this approach with OneLake.

Data lakes

A data lake stores raw data in its original format — structured, semi-structured, and unstructured — with no upfront schema.

How it works:

  • Data arrives as-is (CSV, JSON, Parquet, images, logs)
  • Stored in a distributed file system (Azure Data Lake Storage Gen2)
  • Schema is applied at read time (“schema-on-read”), not at write time
  • Processing engines (Spark, Fabric) query the data directly

Strengths: Cheap storage, any data type, preserves raw data, flexible Weaknesses: No built-in query optimisation, can become a “data swamp” without governance

In Azure: Azure Data Lake Storage Gen2 (ADLS Gen2) — Blob Storage with hierarchical namespace enabled.

Data warehouses

A data warehouse stores clean, structured, historical data optimised for analytical queries.

How it works:

  • Data is transformed and validated BEFORE loading (schema-on-write)
  • Organised in a star schema or snowflake schema (fact tables + dimension tables)
  • Optimised for SQL aggregate queries (SUM, COUNT, AVG across millions of rows)
  • Serves as the “single source of truth” for business reporting

Strengths: Fast queries, strong data quality, familiar SQL interface Weaknesses: Rigid schema (changes require migration), only structured data, higher cost

In Azure: Fabric Data Warehouse, or dedicated SQL pools.

Lakehouses

A lakehouse combines the flexibility of a data lake with the query performance of a data warehouse.

How it works:

  • Raw data stored in open formats (Parquet, Delta Lake) in a data lake
  • A metadata/transaction layer (Delta Lake format) adds ACID transactions, versioning, and schema enforcement
  • SQL query engine sits on top for warehouse-style queries
  • Same storage serves both data engineering (raw processing) and data analytics (SQL queries)

Strengths: Best of both worlds — raw data preservation plus fast structured queries Weaknesses: Newer technology, requires understanding of both paradigms

In Azure: Microsoft Fabric Lakehouse (built on OneLake + Delta Lake format).

Data lake vs data warehouse vs lakehouse
FeatureData LakeData WarehouseLakehouse
Data formatAny (raw files)Structured (tables)Both (raw + structured)
SchemaOn readOn writeFlexible (supports both)
Query engineSpark, external toolsBuilt-in SQL engineSQL + Spark
Data qualityVaries (raw data)High (pre-validated)Managed by Delta Lake
Best forRaw data storage, data scienceBusiness reporting, SQL analyticsModern analytics (all-in-one)
Azure serviceADLS Gen2Fabric WarehouseFabric Lakehouse
ℹ️ Star schema — the warehouse pattern

Data warehouses organise data in a star schema:

  • Fact table (centre): Stores measurable events — sales transactions, deliveries, page views. Each row has numeric measures (revenue, quantity) and foreign keys to dimension tables.
  • Dimension tables (points of the star): Stores descriptive context — dates, products, stores, customers.

Priya’s star schema at FreshMart:

  • Fact: Sales (date_key, store_key, product_key, quantity, revenue)
  • Dimensions: Date, Store, Product, Customer

Why star? Because queries join the fact table (centre) to dimension tables (points), forming a star shape. It’s optimised for “total sales by store by month” queries.

💡 Exam tip: choosing the analytical store
  • “Store raw data from many sources, any format” → Data lake
  • “Structured data for SQL reporting and dashboards” → Data warehouse
  • “Need both raw storage and SQL queries in one platform” → Lakehouse
  • “Star schema with fact and dimension tables” → Data warehouse (or lakehouse)
  • “Schema-on-read” → Data lake
  • “Schema-on-write” → Data warehouse

Flashcards

Question

What is a data lake?

Click or press Enter to reveal answer

Answer

A storage repository that holds raw data in its original format (any file type). Uses schema-on-read — structure is applied when data is queried, not when it's stored. In Azure: ADLS Gen2.

Click to flip back

Question

What is a data warehouse?

Click or press Enter to reveal answer

Answer

A structured, optimised store for historical analytical data. Uses schema-on-write (data must conform to a defined structure before loading). Organised in star/snowflake schemas for fast SQL queries.

Click to flip back

Question

What is a lakehouse?

Click or press Enter to reveal answer

Answer

A hybrid combining data lake storage (raw files, any format) with data warehouse capabilities (SQL queries, ACID transactions). Uses Delta Lake format for structure. Microsoft Fabric uses this approach.

Click to flip back

Knowledge check

Knowledge Check

FreshMart receives data from 50 store POS systems (structured), supplier JSON feeds (semi-structured), and customer feedback emails (unstructured). They want to store all of it in one place before processing. What should they use?

Knowledge Check

Priya needs fast SQL queries over cleaned, validated sales data organised in a star schema with fact and dimension tables. Which analytical store pattern is this?

🎬 Video coming soon

Next up: Microsoft Fabric & Azure Databricks — the cloud platforms that bring analytics to life.

← Previous

Data Ingestion & Processing

Next →

Microsoft Fabric & Azure Databricks

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.