πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-700 Domain 2
Domain 2 β€” Module 3 of 10 30%
11 of 26 overall

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance
Domain 2: Ingest and Transform Data Free ⏱ ~13 min read

Dimensional Modeling: Prep for Analytics

Design star schemas for Fabric lakehouses. Build fact tables, dimension tables, and handle slowly changing dimensions (SCD Type 1 and Type 2).

What is dimensional modeling?

β˜• Simple explanation

Think of a receipt from a shop.

The receipt has facts β€” what you bought, how many, the price. But it also references things that don’t change often: the store (its address, region, manager), the product (its category, brand, weight), and the date (which quarter, which financial year).

A dimensional model separates these two things: the fact table (the receipt β€” big, event-driven, grows every day) and the dimension tables (the reference data β€” store, product, date β€” smaller, changes slowly).

This design makes analytics fast because Power BI and SQL can slice facts by dimensions: β€œShow me revenue by region, by product category, by quarter.”

Dimensional modeling is a data warehouse design technique that organises data into fact tables (quantitative event records with measures and foreign keys) and dimension tables (descriptive reference data with attributes for filtering and grouping). The most common layout is the star schema, where a central fact table connects to multiple dimension tables via surrogate keys.

In Fabric, dimensional models power both the SQL analytics endpoint and Power BI semantic models. Proper dimensional design enables efficient queries, clean report filters, and manageable incremental loads.

Star schema design

A star schema has one fact table at the centre surrounded by dimension tables.

          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  DimProduct  β”‚
          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  DimStore  β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€  FactSales   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚          β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”  β”‚
          β”‚  DimDate     β”‚  β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                            β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  DimCustomer   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Fact tables

CharacteristicDetail
ContainsMeasures (revenue, quantity, cost) + foreign keys to dimensions
GrainOne row per event (one sale, one production run, one order line)
SizeLarge β€” millions to billions of rows
GrowthAppended daily/hourly (new events)
Typical columnsdate_key, product_key, store_key, customer_key, quantity, revenue, cost

Dimension tables

CharacteristicDetail
ContainsDescriptive attributes for filtering and grouping
SizeSmall to medium β€” thousands to low millions
GrowthSlow β€” new products, new stores, address changes
Typical columnsproduct_key, product_name, category, brand, weight, is_active
πŸ’‘ Scenario: Carlos designs a production star schema

Carlos models Precision Manufacturing’s production data:

FactProduction (grain: one row per production batch)

  • date_key, factory_key, product_key, machine_key
  • Measures: units_produced, units_defective, runtime_minutes, energy_kwh

Dimensions: DimDate, DimFactory (name, region, capacity), DimProduct (name, category, weight), DimMachine (type, installation_date, maintenance_due)

Power BI can now slice defect rates by factory, by product category, by quarter β€” without complex joins at query time.

Slowly Changing Dimensions (SCD)

Dimension data changes over time. A customer moves to a new city. A product changes category. How you handle these changes is called the Slowly Changing Dimension strategy.

Type 1 overwrites; Type 2 preserves history
FeatureSCD Type 1SCD Type 2
What happens on changeOverwrite the old value with the new valueKeep the old row, add a new row with the new value
History preserved?No β€” only the current value existsYes β€” both old and new values exist with date ranges
Extra columns neededNoneeffective_date, end_date, is_current flag
Table size impactNo growth from changesGrows with each change (one new row per change)
Use whenHistory doesn't matter (fix a typo, update a phone number)History matters (customer moved β€” need to attribute past sales to old region)
ImplementationMERGE with WHEN MATCHED THEN UPDATEMERGE with UPDATE (expire old row) + INSERT (new row)

SCD Type 1 with Delta MERGE

-- Overwrite: update existing rows, insert new ones
MERGE INTO DimCustomer AS target
USING staging_customers AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN
    UPDATE SET target.city = source.city,
               target.phone = source.phone
WHEN NOT MATCHED THEN
    INSERT (customer_id, name, city, phone)
    VALUES (source.customer_id, source.name, source.city, source.phone)

SCD Type 2 with Delta MERGE

-- Step 1: Expire existing rows that have changes
MERGE INTO DimCustomer AS target
USING staging_customers AS source
ON target.customer_id = source.customer_id
   AND target.is_current = true
WHEN MATCHED AND (target.city != source.city OR target.phone != source.phone) THEN
    UPDATE SET target.is_current = false,
               target.end_date = current_date()

-- Step 2: Insert new current rows for changed records
INSERT INTO DimCustomer (customer_id, name, city, phone, effective_date, end_date, is_current)
SELECT customer_id, name, city, phone, current_date(), NULL, true
FROM staging_customers s
WHERE EXISTS (
    SELECT 1 FROM DimCustomer d
    WHERE d.customer_id = s.customer_id
      AND d.is_current = false
      AND d.end_date = current_date()
)
πŸ’‘ Exam tip: Choosing SCD type

The exam often describes a scenario and asks which SCD type to use:

  • β€œA customer changes their email address β€” we don’t need to track old emails” β†’ Type 1 (overwrite)
  • β€œA customer moves to a new city β€” past sales should be attributed to the old city” β†’ Type 2 (preserve history)
  • β€œFix a misspelled product name” β†’ Type 1 (correction, not a meaningful change)
  • β€œA product changes category β€” reports should show which category it was in at the time of each sale” β†’ Type 2

The deciding question: β€œDoes the old value need to be associated with historical facts?” If yes β†’ Type 2.

Denormalization

In operational databases, data is normalised (split across many tables to avoid duplication). For analytics, you denormalise β€” flatten joins into fewer, wider tables for faster queries.

Normalised (source)Denormalised (lakehouse)
Orders β†’ OrderLines β†’ Products β†’ CategoriesFactSales with product_name, category_name already joined
Employees β†’ Departments β†’ RegionsDimEmployee with department_name, region_name included

Denormalization happens during transformation (PySpark joins or SQL views) β€” you take normalised source data and produce star schema tables.


Question

What is the difference between a fact table and a dimension table?

Click or press Enter to reveal answer

Answer

Fact tables contain quantitative measures (revenue, quantity) and foreign keys to dimensions. They're large and grow with events. Dimension tables contain descriptive attributes (product name, region) for filtering and grouping. They're smaller and change slowly.

Click to flip back

Question

When should you use SCD Type 2 instead of Type 1?

Click or press Enter to reveal answer

Answer

Use Type 2 when history matters β€” when you need to associate old dimension values with historical facts. Example: a customer moves cities, and past sales should still be attributed to the old city. Type 1 overwrites (no history).

Click to flip back

Question

What is denormalization in data engineering?

Click or press Enter to reveal answer

Answer

Flattening joins from normalised source tables into fewer, wider tables optimised for analytics. Example: joining Orders + Products + Categories into a single FactSales table with product_name and category_name included.

Click to flip back


Knowledge Check

A product in Precision Manufacturing changes from the 'Standard' category to 'Premium'. Historical reports must show the product in the 'Standard' category for past production data and 'Premium' for current data. Which SCD type should Carlos use?

Knowledge Check

An analyst asks Carlos: 'Why is the DimCustomer table in the lakehouse wider (more columns) than the Customer table in the source SAP system?' What is the most accurate answer?

🎬 Video coming soon

Next up: Data Stores & Tools: Make the Right Choice β€” decide between lakehouses, warehouses, and KQL databases, and pick the best transformation tool.

← Previous

Loading Patterns: Full, Incremental & Streaming

Next β†’

Data Stores & Tools: Make the Right Choice

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.