πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-600 Domain 2
Domain 2 β€” Module 10 of 14 71%
17 of 29 overall

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh
Domain 2: Prepare Data Premium ⏱ ~14 min read

Transforming Data: Reshape & Enrich

Shape raw data into analytical gold. Add columns, merge tables, denormalize hierarchies, and build aggregate layers β€” with SQL and Spark examples.

What does β€œtransform” mean?

β˜• Simple explanation

Think of transforming data like preparing ingredients for a meal.

Raw ingredients arrive from the farm: unpeeled, unsorted, some bruised. Before you can cook (analyse), you need to wash, peel, chop, and organise. That is data transformation β€” taking raw data and reshaping it into something ready for analysis.

Common transformations: adding a β€œfull name” column from first + last name (enriching), combining two tables into one (merging), flattening hierarchies into a single table (denormalizing), and pre-computing totals (aggregating).

Data transformation in Fabric converts raw ingested data into analysis-ready structures. The exam tests four specific transformation skills: enriching (adding computed columns or lookup tables), denormalizing (flattening normalized structures for query performance), aggregating (pre-computing summaries), and merging/joining (combining data from multiple sources).

Transformations can be executed in SQL (warehouse stored procedures, CTAS, INSERT…SELECT), PySpark (notebooks with DataFrame operations), or Dataflows Gen2 (Power Query visual transformations).

Enriching data: Adding columns and tables

Enrichment adds business context to raw data.

Adding computed columns (SQL)

-- Add a profit margin column to the product summary
ALTER TABLE dbo.product_summary
ADD margin_pct AS (revenue - cost) / NULLIF(revenue, 0);

-- Add a fiscal quarter based on date (example: April = Q1 for July-start fiscal year)
SELECT *,
    CASE
        WHEN MONTH(order_date) >= 7 THEN 'FY' + CAST(YEAR(order_date)+1 AS VARCHAR) + '-Q' + CAST((MONTH(order_date)-7)/3 + 1 AS VARCHAR)
        ELSE 'FY' + CAST(YEAR(order_date) AS VARCHAR) + '-Q' + CAST((MONTH(order_date)+5)/3 + 1 AS VARCHAR)
    END AS fiscal_quarter
FROM dbo.fact_sales;

Adding computed columns (PySpark)

from pyspark.sql.functions import col, when, concat, lit

# Add a loyalty segment based on purchase count
df_enriched = df_customers.withColumn(
    "loyalty_segment",
    when(col("total_purchases") >= 100, "Platinum")
    .when(col("total_purchases") >= 50, "Gold")
    .when(col("total_purchases") >= 10, "Silver")
    .otherwise("Bronze")
)

Adding lookup tables

Sometimes enrichment means adding an entirely new table that provides context:

-- Create a holiday calendar dimension for seasonal analysis
CREATE TABLE dbo.dim_holidays (
    date_key INT,
    holiday_name NVARCHAR(100),
    is_public_holiday BIT
);

Denormalizing data

Denormalization flattens normalized tables into fewer, wider tables β€” the standard pattern for moving from Silver to Gold layer.

Example: Flattening a product hierarchy

Before (normalised β€” 3 tables):

product β†’ subcategory β†’ category

After (denormalised β€” 1 table):

CREATE TABLE dbo.dim_product_denorm AS
SELECT
    p.product_id,
    p.product_name,
    p.unit_price,
    sc.subcategory_name,
    c.category_name,
    c.department
FROM dbo.products p
JOIN dbo.subcategories sc ON p.subcategory_id = sc.subcategory_id
JOIN dbo.categories c ON sc.category_id = c.category_id;

Now queries need only one join instead of three.

πŸ’‘ Exam tip: Denormalization trade-offs

The exam tests whether you understand the trade-off:

  • Pro: Faster reads β€” fewer joins, simpler queries, better Power BI performance
  • Pro: Simpler semantic models β€” relationships are cleaner in a star schema
  • Con: Data redundancy β€” category name is repeated on every product row
  • Con: Update complexity β€” if a category name changes, every product row needs updating

For analytical workloads (DP-600’s focus), the pros almost always outweigh the cons. Denormalize your dimensions.

Aggregating data

Aggregation pre-computes summaries to speed up common queries.

SQL aggregation

-- Create a daily aggregate from transaction-level data
CREATE TABLE dbo.agg_daily_sales AS
SELECT
    date_key,
    store_key,
    product_key,
    SUM(quantity) AS total_quantity,
    SUM(total_amount) AS total_revenue,
    COUNT(*) AS transaction_count,
    AVG(total_amount) AS avg_transaction_value
FROM dbo.fact_sales
GROUP BY date_key, store_key, product_key;

PySpark aggregation

# Daily aggregate by store
df_daily = df_sales.groupBy("date_key", "store_key") \
    .agg(
        F.sum("total_amount").alias("total_revenue"),
        F.count("sale_id").alias("transaction_count"),
        F.avg("total_amount").alias("avg_transaction")
    )

# Write to a Gold-layer Delta table
df_daily.write.format("delta").mode("overwrite").saveAsTable("gold_daily_store_sales")

Merging and joining data

Combining data from multiple sources is one of the most common transformation tasks.

Join types

JoinWhat It ReturnsUse Case
INNER JOINOnly rows with matches in both tablesMatch sales to products
LEFT JOINAll rows from left + matching from right (NULL if no match)All customers, even those with no sales
FULL OUTER JOINAll rows from both tablesReconciliation β€” find unmatched records
CROSS JOINEvery combination of rowsGenerate a date-store matrix for gap analysis

MERGE (upsert pattern)

MERGE dbo.dim_product AS target
USING staging.new_products AS source
ON target.product_id = source.product_id
WHEN MATCHED THEN
    UPDATE SET target.product_name = source.product_name,
               target.unit_price = source.unit_price
WHEN NOT MATCHED THEN
    INSERT (product_id, product_name, category, unit_price)
    VALUES (source.product_id, source.product_name, source.category, source.unit_price);
πŸ’‘ Scenario: Anita merges supplier updates

Anita at FreshCart receives weekly product updates from suppliers β€” some are new products, some are price changes. She uses MERGE to handle both in one operation:

  • WHEN MATCHED: Update the price and description
  • WHEN NOT MATCHED: Insert the new product
  • WHEN NOT MATCHED BY SOURCE: (Optional) Flag discontinued products

This single MERGE statement replaces what would otherwise be a three-step process: check if exists, update or insert, flag missing.

Question

What is the difference between enriching and denormalizing?

Click or press Enter to reveal answer

Answer

Enriching adds NEW information (computed columns, lookup tables). Denormalizing flattens EXISTING information (merging related tables into fewer, wider tables). Both improve analytical usability, but they solve different problems.

Click to flip back

Question

What does the MERGE statement do in SQL?

Click or press Enter to reveal answer

Answer

MERGE combines INSERT, UPDATE, and DELETE into one atomic operation. It compares a source dataset against a target table and applies different actions based on matches: update existing rows, insert new rows, optionally delete/flag missing rows.

Click to flip back

Question

Why create aggregate tables?

Click or press Enter to reveal answer

Answer

Aggregate tables pre-compute common summaries (daily totals, monthly averages) so queries do not need to scan billions of transaction-level rows. They dramatically improve dashboard performance while the detail table remains available for drill-down.

Click to flip back

Knowledge Check

Anita at FreshCart has a normalised product hierarchy: product β†’ subcategory β†’ category β†’ department. A Power BI report that shows 'revenue by department' requires three joins. What should Anita do to improve performance?

Knowledge Check

Raj at Atlas Capital receives daily position updates from a trading platform. Some records are new positions; others are updates to existing positions. He needs a single operation that inserts new records and updates existing ones. Which SQL pattern should he use?

🎬 Video coming soon


Next up: Data Quality & Cleansing β€” find and fix duplicates, missing values, type mismatches, and other data quality issues.

← Previous

SQL Objects: Views, Functions & Stored Procedures

Next β†’

Data Quality & Cleansing

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.