🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-700 Domain 2
Domain 2 — Module 4 of 10 40%
12 of 26 overall

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance
Domain 2: Ingest and Transform Data Free ⏱ ~14 min read

Data Stores & Tools: Make the Right Choice

Choose between lakehouses, warehouses, and KQL databases. Pick the right transformation tool — Dataflows Gen2, notebooks, KQL, or T-SQL — for every scenario.

How do you choose?

☕ Simple explanation

Think of three types of restaurant kitchens.

A lakehouse is a large open kitchen — you can cook anything (structured, semi-structured, unstructured), use any tools (PySpark, SQL, Python), and store ingredients in any container. Flexible but requires a skilled chef.

A warehouse is a precision kitchen — everything is structured, SQL-based, optimised for fast queries and BI reporting. Less flexible but blazingly fast for analytics.

A KQL database (Eventhouse) is a sushi bar — specialised for fast, real-time queries on streaming data. It excels at time-series analytics and log exploration.

The exam tests whether you pick the right kitchen for the meal you’re cooking.

Fabric offers three primary data stores, each designed for different workloads: Lakehouses (Delta Lake tables + unstructured files, accessed via Spark or SQL endpoint), Warehouses (T-SQL-native relational store optimised for BI workloads), and KQL Databases / Eventhouses (Real-Time Intelligence store optimised for streaming and time-series data using Kusto Query Language).

Similarly, four transformation tools each suit different scenarios: Dataflows Gen2 (visual Power Query), Notebooks (PySpark/Scala code), KQL (Kusto queries in Eventhouses), and T-SQL (SQL in warehouses).

Choosing a data store

Lakehouse for engineering, warehouse for BI, KQL database for real-time
FeatureLakehouseWarehouseKQL Database (Eventhouse)
Data formatDelta Lake tables + files (CSV, Parquet, JSON, images)Relational tables (T-SQL DDL)KQL tables (optimised columnar)
Query languageSpark SQL, PySpark, T-SQL (via SQL endpoint — read-only)T-SQL (full DML — INSERT, UPDATE, DELETE)KQL (Kusto Query Language)
Write methodSpark notebooks, pipelines, Dataflows Gen2T-SQL statements, pipelines, Dataflows Gen2Eventstreams, KQL ingestion, pipelines
Best forData engineering — large-scale ETL, ML, semi-structured dataBI analytics — structured data, complex SQL queries, reportingReal-time analytics — streaming, time-series, log exploration
Unstructured data?Yes — Files/ section stores any file typeNo — structured tables onlyNo — structured time-series data
SQL endpoint?Yes (read-only — SELECT, views)Full T-SQL (read + write)No — KQL only
Power BI integrationVia SQL endpoint or Direct LakeDirect SQL connectionVia KQL query sets or dashboards

Decision patterns (exam-tested)

ScenarioBest StoreWhy
Store 500M rows of sales data for PySpark transformations and ML trainingLakehouseNeed Spark for ML, large-scale transforms
Serve a Power BI dashboard with complex SQL aggregations and stored proceduresWarehouseT-SQL DML, stored procs, optimised for BI
Ingest and query 1M events/second from IoT sensorsKQL DatabaseReal-time ingestion, time-series KQL
Store raw JSON files from an API alongside structured Delta tablesLakehouseFiles/ section handles unstructured; Tables/ handles structured
Build a dimensional model with fact and dimension tables for analystsWarehouse (or Lakehouse with SQL endpoint)Both work; warehouse preferred when analysts need full T-SQL
Real-time clickstream analysis with 10-second query latencyKQL DatabaseSub-second KQL queries on streaming data
💡 Scenario: Anika's dual-store design

ShopStream uses both a lakehouse and a warehouse:

  • Lakehouse: Raw order JSON files land in Files/. PySpark notebooks parse, validate, and transform them into Delta tables in Tables/. ML models train on these tables.
  • Warehouse: The cleaned, modeled data (star schema) is loaded into the warehouse for the BI team. They write stored procedures for complex revenue calculations and build Power BI reports via DirectQuery or Import mode.

The lakehouse is the engineering workbench. The warehouse is the analytics serving layer.

Choosing a transformation tool

Match the tool to the team's skills and the workload's needs
FactorDataflows Gen2NotebooksKQLT-SQL
InterfaceVisual (Power Query)Code (PySpark, SQL, R)Code (KQL)Code (SQL)
Skill requiredLow — drag-and-dropHigh — PySpark/PythonMedium — KQL syntaxMedium — SQL
ScaleSmall to mediumLarge (distributed Spark)Large (columnar engine)Large (MPP engine)
Best forSimple cleaning from 150+ connectorsComplex transforms, ML, custom logicTime-series transforms, aggregationsRelational transforms, stored procs
Output toLakehouse, warehouse, KQL DBLakehouse (Delta tables)KQL DatabaseWarehouse tables
SchedulingBuilt-in refreshVia pipeline notebook activityVia pipeline or EventhouseVia pipeline stored proc activity

Decision patterns

ScenarioBest ToolWhy
Extract from Salesforce, clean, load to lakehouse (no code)Dataflows Gen2150+ connectors, visual, no code
Join 3 Delta tables (1B rows), calculate running averages, write to warehouseNotebook (PySpark)Scale + complex windowed calculations
Calculate hourly aggregates on streaming sensor dataKQLTime-series optimised, bin() function
Run a stored procedure to update dimension tables in a warehouseT-SQLDirect SQL on warehouse, existing proc logic
Load CSVs from SharePoint, standardise dates, filter blanksDataflows Gen2Simple, visual, perfect for SharePoint connector
💡 Exam tip: 'Choose between' questions

When the exam asks “which tool should you use for data transformation,” look for these clues:

  • “No coding” or “visual” → Dataflows Gen2
  • “Large scale” or “ML” or “PySpark” → Notebook
  • “Time-series” or “streaming” or “KQL” → KQL
  • “Stored procedure” or “T-SQL” or “warehouse” → T-SQL
  • “150+ connectors” or “Power Query” → Dataflows Gen2

If multiple tools could work, choose the simplest one that meets the requirement.

When to combine stores and tools

Most real-world solutions use multiple stores and tools together:

PatternFlow
Lakehouse + WarehouseRaw data → Lakehouse (PySpark transforms) → Warehouse (star schema for BI)
Lakehouse + KQL DatabaseBatch data → Lakehouse; Streaming data → KQL Database; both feed dashboards
Dataflow + NotebookDataflow Gen2 (extract/clean from source) → Notebook (complex transforms at scale)

Question

What is the key difference between a lakehouse and a warehouse in Fabric?

Click or press Enter to reveal answer

Answer

Lakehouse: Delta Lake tables + unstructured files, accessed via Spark or SQL endpoint (read-only SQL). Warehouse: T-SQL relational store with full DML (INSERT, UPDATE, DELETE, stored procedures). Lakehouse for engineering; warehouse for BI analytics.

Click to flip back

Question

When should you choose a KQL Database (Eventhouse) over a lakehouse?

Click or press Enter to reveal answer

Answer

When you need real-time ingestion (millions of events/second), sub-second query latency on time-series data, and KQL's time-series functions (bin, summarize, render). For IoT, clickstream, logs, and telemetry.

Click to flip back

Question

Can you write to a lakehouse using T-SQL?

Click or press Enter to reveal answer

Answer

No. The lakehouse's SQL analytics endpoint is read-only (SELECT, views). To write data to a lakehouse, use Spark notebooks, pipelines (Copy activity), or Dataflows Gen2. For full T-SQL DML (INSERT, UPDATE, DELETE), use a warehouse.

Click to flip back


Knowledge Check

ShopStream needs to store raw JSON order files from payment gateways alongside structured Delta tables for transformed data. Which Fabric data store should Anika use?

Knowledge Check

A team of business analysts (no Python skills) need to pull data from Salesforce, rename columns, filter out blank emails, and load into a lakehouse. Which transformation tool is most appropriate?

Knowledge Check

Zoe needs to calculate the average video watch time per hour from a stream of 500,000 playback events per minute. The results must be queryable within 10 seconds of the events occurring. Which data store and query language should she use?

🎬 Video coming soon

Next up: OneLake Shortcuts: Data Without Duplication — access external data as if it were local, without copying a single byte.

← Previous

Dimensional Modeling: Prep for Analytics

Next →

OneLake Shortcuts: Data Without Duplication

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.