πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-600 Domain 2
Domain 2 β€” Module 8 of 14 57%
15 of 29 overall

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh
Domain 2: Prepare Data Free ⏱ ~15 min read

Star Schema Design

The star schema is the most important data modelling pattern in DP-600. Fact tables, dimension tables, grain, and denormalization β€” with a worked retail example.

What is a star schema?

β˜• Simple explanation

Think of a receipt and a set of reference cards.

When you buy groceries, your receipt lists every item: product code, quantity, price, store, date, time. That receipt is a fact table β€” it records what happened (the transaction).

But the receipt alone is not very useful for analysis. To answer β€œwhich category sold the most last month?”, you need a product card (name, category, brand), a store card (location, region, size), and a date card (month, quarter, year). These are dimension tables.

A star schema puts the fact table in the centre and surrounds it with dimension tables β€” like a star. It is the foundation of fast analytics and Power BI performance.

A star schema is a dimensional modelling pattern where a central fact table (containing measurable business events) is surrounded by dimension tables (containing descriptive attributes for filtering, grouping, and labelling). The schema is called β€œstar” because the diagram looks like a star β€” the fact table at the centre with dimension tables radiating outward.

Star schemas are the standard for analytical workloads in Microsoft Fabric. Both lakehouses and warehouses benefit from star schemas because they enable: (1) simpler queries with fewer joins, (2) better compression in columnar storage (Parquet/Delta), (3) faster aggregation for Power BI, and (4) cleaner relationships in semantic models.

In DP-600, star schema design appears in two domains: here in Domain 2 (physical implementation in lakehouse/warehouse) and in Domain 3 (relationship implementation in semantic models).

Fact tables vs dimension tables

Facts measure what happened; dimensions describe context
FeatureFact TableDimension Table
What it recordsBusiness events β€” transactions, measurements, metricsDescriptive attributes β€” who, what, where, when
RowsMany (millions to billions)Few (hundreds to millions)
ColumnsKeys (foreign keys to dimensions) + measures (numbers)Attributes (text, categories, hierarchies)
ChangesAppend-mostly β€” new events added over timeSlowly changing β€” attributes updated occasionally
ExamplesSales transactions, website clicks, sensor readingsProducts, customers, stores, dates, employees
GrainThe level of detail (one row per transaction, per day, per hour)One row per entity (one row per product, per customer)

A worked example: FreshCart’s star schema

Anita at FreshCart designs a star schema for sales analytics. Here is the complete model:

Fact table: fact_sales

ColumnTypeDescription
sale_idINTUnique transaction line ID
date_keyINTFK β†’ dim_date
store_keyINTFK β†’ dim_store
product_keyINTFK β†’ dim_product
customer_keyINTFK β†’ dim_customer
quantityINTNumber of items sold (measure)
unit_priceDECIMALPrice per item (measure)
total_amountDECIMALquantity x unit_price (measure)
discount_amountDECIMALDiscount applied (measure)

Grain: One row per product per transaction (the most detailed level).

Dimension tables

dim_date β€” when the sale happened

ColumnExample Values
date_key20260421
full_date2026-04-21
day_nameTuesday
month_nameApril
quarterQ2
year2026
is_weekendNo
is_holidayNo

dim_store β€” where the sale happened

ColumnExample Values
store_key1042
store_nameFreshCart Auckland CBD
cityAuckland
regionNorth Island
store_typeMetro
opening_date2019-03-15

dim_product β€” what was sold

ColumnExample Values
product_key5678
product_nameOrganic Avocado
categoryFresh Produce
subcategoryFruits
brandFreshGreen
unit_of_measureEach

dim_customer β€” who bought it

ColumnExample Values
customer_key90123
customer_nameJane Smith
loyalty_tierGold
signup_date2022-01-10
cityWellington

The star shape

  dim_date ──────┐
                  β”‚
  dim_store ──── fact_sales ──── dim_product
                  β”‚
  dim_customer β”€β”€β”˜

Every query follows the same pattern: start from the fact table, join to dimensions for context.

-- Total sales by region and month
SELECT
    s.region,
    d.month_name,
    SUM(f.total_amount) AS total_revenue
FROM fact_sales f
JOIN dim_store s ON f.store_key = s.store_key
JOIN dim_date d ON f.date_key = d.date_key
WHERE d.year = 2026
GROUP BY s.region, d.month_name
ORDER BY total_revenue DESC

Why denormalize?

In transactional databases (OLTP), you normalise data to reduce redundancy β€” separate tables linked by foreign keys. In analytical databases (OLAP), you denormalize β€” merge related attributes into fewer, wider tables.

Normalization vs denormalization

ApproachNormalised (OLTP)Denormalised (OLAP / Star Schema)
GoalEliminate redundancyOptimise query performance
TablesMany narrow tables with joinsFewer wide tables with redundant attributes
Joins neededMany (complex queries)Few (simple queries)
Write performanceFast (each fact stored once)Slower (redundant data updated in multiple places)
Read performanceSlower (many joins)Faster (fewer joins, columnar storage)
Best forTransaction processingReporting and analytics

In the FreshCart example, dim_store already includes city and region β€” even though a normalised design would have separate city and region tables. This redundancy makes queries faster because you join fact_sales to dim_store once instead of joining to store, then city, then region.

πŸ’‘ Exam tip: When to denormalize

The exam tests your judgment on denormalization. General rules:

  • Denormalize dimension tables β€” flatten hierarchies into the dimension (city, region, country all in dim_store)
  • Keep fact tables normalised β€” fact tables should contain only keys and measures
  • Create aggregate tables for common high-level queries (daily totals instead of per-transaction)
  • Do NOT denormalize when dimension attributes change frequently β€” updates ripple across all redundant copies

Aggregate tables

For very large fact tables, pre-computing common aggregations dramatically improves performance:

-- Create an aggregate table: daily sales by store
CREATE TABLE agg_daily_store_sales AS
SELECT
    date_key,
    store_key,
    COUNT(*) AS transaction_count,
    SUM(quantity) AS total_units,
    SUM(total_amount) AS total_revenue,
    SUM(discount_amount) AS total_discount
FROM fact_sales
GROUP BY date_key, store_key
Query LevelSourceRows (FreshCart)
Per transactionfact_sales5.4 billion/year
Per day per storeagg_daily_store_sales730,000/year
Per month per regionCustom aggregate48/year

Power BI and Direct Lake semantic models can automatically use aggregate tables when available (via user-defined aggregations).

Question

What is a fact table?

Click or press Enter to reveal answer

Answer

A fact table records measurable business events β€” sales transactions, website clicks, sensor readings. Rows are numerous (millions to billions). Columns contain foreign keys to dimension tables plus numeric measures (quantity, amount, duration).

Click to flip back

Question

What is a dimension table?

Click or press Enter to reveal answer

Answer

A dimension table contains descriptive attributes used for filtering, grouping, and labelling β€” products, customers, stores, dates. Rows are relatively few. Attributes are denormalised (city, region, country in one table) for query performance.

Click to flip back

Question

What is the grain of a fact table?

Click or press Enter to reveal answer

Answer

The grain defines what one row represents β€” the most detailed level of measurement. For FreshCart: one row per product per transaction. Defining grain correctly is the most important design decision because it determines what questions the model can answer.

Click to flip back

Question

Why do we denormalize dimension tables in a star schema?

Click or press Enter to reveal answer

Answer

Denormalization reduces the number of joins needed for analytical queries. Instead of joining store β†’ city β†’ region β†’ country, all attributes live in one dim_store table. This is slower for writes but much faster for reads β€” exactly the trade-off analytics requires.

Click to flip back

Knowledge Check

Anita at FreshCart is designing a star schema for sales analytics. She has transaction-level data (one row per item per sale) plus reference data for products, stores, customers, and dates. How should she structure the schema?

Knowledge Check

Raj at Atlas Capital notices that Power BI dashboards showing 'total portfolio value by region' take 30+ seconds to load. The fact table has 2 billion rows at transaction-level grain. The query aggregates by region and month. What is the best optimisation?

🎬 Video coming soon


Next up: SQL Objects: Views, Functions & Stored Procedures β€” programmable SQL in Fabric warehouses and lakehouse SQL endpoints.

← Previous

Ingesting Data: Dataflows Gen2 & Pipelines

Next β†’

SQL Objects: Views, Functions & Stored Procedures

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.