Fact Tables, Dimension Tables & Keys

Facts and dimensions: the building blocks

Simple explanation

Think of a receipt from a grocery store.

The receipt shows what happened: you bought 3 avocados for $6.00 at 2:30 PM. That’s a fact — a measurable event.

But to understand the receipt, you need context: which product (avocado), which store (Queen Street), when (Tuesday 15 April), who (Customer #4521). Those are dimensions — the “who, what, where, when” that describe the facts.

In Power BI, you organise your data into fact tables (the events with numbers) and dimension tables (the context that describes them). This pattern is called a star schema — and it’s the foundation of every well-designed Power BI model.

Fact tables vs dimension tables

Fact tables hold 'what happened'; dimension tables hold 'who, what, where, when'
Feature	Fact Table	Dimension Table
Contains	Measurable events — numbers you aggregate	Descriptive attributes — text, categories, hierarchies
Typical columns	Foreign keys + numeric measures (Revenue, Quantity, Cost)	Primary key + descriptive columns (Name, Category, Address)
Row count	Very high (millions of transactions)	Low to moderate (hundreds to thousands)
Changes	New rows added frequently (new sales, new orders)	Changes slowly (products updated, customers moved)
Examples	Sales, Orders, Appointments, Production, Web Visits	Products, Customers, Dates, Stores, Employees

Building tables in Power Query

Riley at Coastal Fresh (🛒) receives a single flat table from her point-of-sale system with 50 columns and 2 million rows. She needs to split it into a star schema.

Step 1: Identify the fact table

The fact table is the event — each row is a sale. She keeps:

OrderID (unique identifier)
ProductID (foreign key → Products)
StoreID (foreign key → Stores)
OrderDate (foreign key → Date)
Quantity (measure)
Revenue (measure)
Cost (measure)

Step 2: Extract dimension tables

For each descriptive group, she creates a new query:

Products dimension: Reference the original query → select ProductID, ProductName, Category, Supplier → Remove Duplicates on ProductID

Stores dimension: Reference the original query → select StoreID, StoreName, Region, Address → Remove Duplicates on StoreID

Date dimension: She’ll create this separately (covered in the Date Tables module) — a dedicated calendar table covering all dates in her data range.

Step 3: Remove descriptive columns from the fact table

Now that products and stores have their own tables, Riley removes ProductName, Category, Supplier, StoreName, Region, and Address from the fact table. She keeps only the foreign keys (ProductID, StoreID) and the measures (Quantity, Revenue, Cost).

Why not just use the flat table?

A flat table with 50 columns and 2 million rows works — Power BI can handle it. But it has problems:

Larger file size: ProductName “Organic Avocado” stored 50,000 times instead of once
Slower performance: VertiPaq compresses dimension columns better when they’re in separate tables
Harder DAX: Filter context doesn’t flow as intuitively in a flat table
Maintenance headaches: Updating a product category means updating every row that references it

Star schema solves all of these. The exam strongly expects you to recognise and build star schemas.

Creating appropriate keys

Every relationship in Power BI needs a key — a column that uniquely identifies rows on one side of the relationship.

Key types

Key Type	What It Is	Example
Primary key	A column with unique values in a dimension table	ProductID in the Products table
Foreign key	A column in the fact table that references a primary key	ProductID in the Sales table
Surrogate key	An artificial integer key (1, 2, 3…) created when no natural key exists	CustomerKey added via Index Column
Composite key	A key made from combining two or more columns	Region + StoreNumber → unique StoreID

Creating a surrogate key in Power Query

Dr. Ethan Moss at Bayview Medical Centre (🏥) has a patient referrals table where no single column uniquely identifies a record. He creates a surrogate key:

Add Column → Index Column → From 1
Rename the column to ReferralKey
This gives each row a unique identifier — useful for deduplication and operational tracking. For relationships, the key columns will typically be the foreign keys linking to dimension tables (DoctorID, PatientID, etc.), not the surrogate key itself.

Creating a composite key

Kenji at Apex Manufacturing (🏭) has factory data where FactoryCode + LineNumber together form a unique identifier, but neither alone is unique.

Select both columns → Add Column → Merge Columns
Choose a separator (e.g., hyphen)
Result: “TKY-01”, “TKY-02”, “SHA-01” — now each row has a unique key

Exam tip: key validation

Before creating a relationship, verify your key is truly unique in the dimension table. Use Column Distribution (from the profiling module) — the distinct count should equal the row count.

If it doesn’t, you have duplicates. Common causes:

Multiple entries for the same entity (fix: Remove Duplicates)
The wrong column was chosen as the key
You need a composite key instead of a single column

Knowledge check

Question

What is a star schema?

Click or press Enter to reveal answer

Answer

A data model pattern where a central fact table (measurable events) connects to surrounding dimension tables (descriptive context). Called 'star' because the diagram looks like a star with the fact table in the centre.

Click to flip back

Question

What's the difference between a primary key and a foreign key?

Click or press Enter to reveal answer

Answer

A primary key uniquely identifies each row in a dimension table (e.g., ProductID in Products). A foreign key is the matching column in the fact table that references the primary key (e.g., ProductID in Sales).

Click to flip back

Question

When do you need a surrogate key?

Click or press Enter to reveal answer

Answer

When no natural column uniquely identifies rows in your data. Create one using Add Column → Index Column in Power Query. Common with data from flat files or APIs.

Click to flip back

Knowledge Check

Riley has a flat sales table with 2 million rows and 50 columns including ProductName, Category, StoreName, and Region. What's the best approach for building a Power BI model?

Knowledge Check

Dr. Ethan has a referrals table where no single column uniquely identifies a row. What should he create?

Next up: Configure Data Loading — control how Power Query loads data into your model.