πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-600 Domain 2
Domain 2 β€” Module 2 of 14 14%
9 of 29 overall

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh
Domain 2: Prepare Data Free ⏱ ~14 min read

Lakehouses: Your Data Foundation

The lakehouse combines the flexibility of a data lake with the structure of a warehouse. Learn how Delta tables, SQL endpoints, and Spark notebooks work together.

What is a lakehouse?

β˜• Simple explanation

Think of a lakehouse as a smart filing cabinet.

A traditional data lake is like dumping all your papers into a big box β€” flexible, cheap, but impossible to search. A data warehouse is like a perfectly organised filing cabinet β€” structured, fast to query, but rigid and expensive to change.

A lakehouse is a filing cabinet that organises itself. You can dump raw files in (like a lake), and the system automatically catalogues them into structured tables (like a warehouse). You get the best of both worlds: flexible storage with structured querying.

In Fabric, the lakehouse stores your data as Delta tables in OneLake. You can query them with both Spark (for big transformations) and SQL (for quick analysis).

A lakehouse in Microsoft Fabric is a data storage item that combines the scalability and flexibility of a data lake with the query performance and structure of a data warehouse. It uses Delta Lake as its table format β€” an open-source storage layer that adds ACID transactions, schema enforcement, and time-travel capabilities to Parquet files.

Every Fabric lakehouse automatically includes two endpoints: a default Spark endpoint (for PySpark/Scala/R notebooks and Spark jobs) and a SQL analytics endpoint (a read-only, auto-generated SQL view of your Delta tables). The SQL analytics endpoint lets Power BI semantic models and SQL queries access lakehouse data without needing a separate warehouse.

Lakehouses are the primary landing zone in Fabric’s medallion architecture: raw data lands in Bronze tables, cleaned data moves to Silver, and business-ready aggregates live in Gold tables β€” all within the same lakehouse (or across lakehouses in the same workspace).

Why lakehouses matter in Fabric

The lakehouse is the most common starting point for DP-600 scenarios. When data enters Fabric, it almost always lands in a lakehouse first.

Two endpoints, one storage

Every lakehouse gives you two ways to access the same data:

EndpointEngineRead/WriteBest For
Spark endpointApache SparkRead + WriteHeavy transformations, Python/PySpark notebooks, Spark jobs
SQL analytics endpointFabric SQLRead onlyQuick SQL queries, Power BI connections, ad-hoc analysis

The SQL analytics endpoint is automatically generated β€” you do not need to create it. When you add a Delta table to your lakehouse, a corresponding SQL view appears in the SQL endpoint within minutes.

πŸ’‘ Scenario: Anita's daily sales pipeline

Anita at FreshCart receives daily CSV files from 2,000 stores β€” about 15 million rows per day. She loads them into a lakehouse using a Fabric pipeline, transforms them with a PySpark notebook (cleaning nulls, standardising product codes), and writes the results as Delta tables.

Her Power BI report connects to the SQL analytics endpoint β€” it does not need Spark to run. The finance team queries the same data using SQL. Both read the same Delta files in OneLake.

Delta tables: The heart of the lakehouse

Delta tables are not just Parquet files. They add critical capabilities:

ACID transactions

Every write to a Delta table is atomic β€” either the entire write succeeds or nothing changes. No more half-loaded tables after a pipeline failure.

Schema enforcement

Delta tables reject writes that do not match the table’s schema. If your pipeline tries to insert a string into an integer column, the write fails β€” protecting data quality.

Schema evolution

You can add new columns to a Delta table without rewriting existing data. Old rows get null for the new column; new rows include the value.

Time travel

Delta tables keep a transaction log of every change. You can query a table as it looked at a specific point in time:

-- What did the sales table look like yesterday?
SELECT * FROM sales TIMESTAMP AS OF '2026-04-20T08:00:00Z'

Optimisation commands

CommandWhat It Does
OPTIMIZECompacts small files into larger ones for faster reads
VACUUMRemoves old files that are no longer referenced by the transaction log
Z-ORDER BYCo-locates related data on disk for faster filtered queries
πŸ’‘ Exam tip: OPTIMIZE vs VACUUM

The exam tests whether you know the difference. OPTIMIZE reorganises files for performance β€” it creates new, larger files but keeps the old ones for time travel. VACUUM deletes old files that are outside the retention period (default 7 days). Running VACUUM too aggressively breaks time travel for the deleted versions.

Common trap: β€œWhich command improves query performance?” β†’ OPTIMIZE (not VACUUM). VACUUM saves storage space.

The medallion architecture

The medallion architecture (Bronze β†’ Silver β†’ Gold) is the recommended pattern for organising lakehouse data:

LayerPurposeData QualityExample (FreshCart)
BronzeRaw ingestion β€” land data exactly as receivedRaw, untransformedRaw CSV from POS systems, supplier feeds
SilverCleaned and standardised β€” deduplicated, typed, validatedCleaned, conformedDeduplicated transactions, standardised product codes
GoldBusiness-ready β€” aggregated, modelled for reportingCurated, trustedDaily sales by store, product category summaries

You can implement all three layers in a single lakehouse (using naming conventions like bronze_sales, silver_sales, gold_sales) or across multiple lakehouses in the same workspace.

πŸ’‘ Scenario: Anita's medallion layers

Anita’s FreshCart lakehouse uses three schemas:

  • Bronze: raw_pos_transactions (15M rows/day, raw CSV loaded as-is)
  • Silver: clean_transactions (deduplicated, nulls handled, product codes standardised, dates parsed)
  • Gold: daily_store_sales (aggregated by store and date), product_performance (aggregated by product and week)

Her Power BI semantic model connects to Gold tables only. Analysts who need raw data can query Silver. Nobody touches Bronze directly β€” it exists for auditability and reprocessing.

Lakehouse vs the SQL analytics endpoint

A common point of confusion: the lakehouse’s SQL analytics endpoint is NOT the same as a Fabric Warehouse.

The SQL analytics endpoint is a read-only SQL view of lakehouse data β€” not a warehouse
FeatureLakehouse SQL EndpointFabric Warehouse
Write accessRead-only β€” you cannot INSERT/UPDATE via SQLFull read-write β€” INSERT, UPDATE, DELETE, MERGE
How tables are createdAuto-generated from Delta tables in the lakehouseCreated via SQL DDL (CREATE TABLE)
Primary use caseQuick SQL queries, Power BI connectionsComplex SQL workloads, stored procedures, full DML
EngineFabric SQL engine (read-only mode)Fabric SQL engine (full mode)
Spark accessYes β€” the underlying Delta tables are accessible via SparkNo Spark access β€” data is in the warehouse format
Best forData engineers who transform with Spark, then serve via SQLSQL-first teams who prefer writing SQL for everything
Question

What two endpoints does every Fabric lakehouse include?

Click or press Enter to reveal answer

Answer

1. Spark endpoint β€” read/write access using PySpark, Scala, or R notebooks. 2. SQL analytics endpoint β€” read-only SQL access to Delta tables. Both endpoints read the same underlying Delta files in OneLake.

Click to flip back

Question

What is the medallion architecture?

Click or press Enter to reveal answer

Answer

A data organisation pattern with three layers: Bronze (raw ingestion), Silver (cleaned/standardised), and Gold (business-ready aggregates). Each layer improves data quality. All layers live in lakehouses as Delta tables.

Click to flip back

Question

What does the OPTIMIZE command do in a Delta lakehouse?

Click or press Enter to reveal answer

Answer

OPTIMIZE compacts many small Parquet files into fewer, larger files. This improves read performance because queries scan fewer files. It does NOT delete old files β€” that is VACUUM's job.

Click to flip back

Question

Can you run INSERT statements through the lakehouse SQL analytics endpoint?

Click or press Enter to reveal answer

Answer

No. The SQL analytics endpoint is read-only. To write data, use Spark notebooks, Spark jobs, Dataflows Gen2, or pipelines. For full SQL DML (INSERT, UPDATE, DELETE, MERGE), use a Fabric Warehouse instead.

Click to flip back

Knowledge Check

Anita at FreshCart loads 15 million rows of daily POS data into a lakehouse using a PySpark notebook. Her finance team wants to query this data using SQL. What should she tell them?

Knowledge Check

After running daily ingestion for a month, Anita notices her lakehouse queries are slowing down. Investigation shows thousands of small Parquet files (1-5 MB each) in her Delta tables. Which command should she run first?

Knowledge Check

Dr. Sarah at Pacific Health Network discovers that a bug in her pipeline corrupted yesterday's patient data load. She needs to restore the Delta table to its state before the bad load. Which Delta feature enables this?

🎬 Video coming soon


Next up: Warehouses in Fabric β€” when SQL-first teams need full read-write access with T-SQL stored procedures.

← Previous

Microsoft Fabric: The Big Picture

Next β†’

Warehouses in Fabric

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.