Star Schema Design

What is a star schema?

Simple explanation

Think of a receipt and a set of reference cards.

When you buy groceries, your receipt lists every item: product code, quantity, price, store, date, time. That receipt is a fact table — it records what happened (the transaction).

But the receipt alone is not very useful for analysis. To answer “which category sold the most last month?”, you need a product card (name, category, brand), a store card (location, region, size), and a date card (month, quarter, year). These are dimension tables.

A star schema puts the fact table in the centre and surrounds it with dimension tables — like a star. It is the foundation of fast analytics and Power BI performance.

Fact tables vs dimension tables

Facts measure what happened; dimensions describe context
Feature	Fact Table	Dimension Table
What it records	Business events — transactions, measurements, metrics	Descriptive attributes — who, what, where, when
Rows	Many (millions to billions)	Few (hundreds to millions)
Columns	Keys (foreign keys to dimensions) + measures (numbers)	Attributes (text, categories, hierarchies)
Changes	Append-mostly — new events added over time	Slowly changing — attributes updated occasionally
Examples	Sales transactions, website clicks, sensor readings	Products, customers, stores, dates, employees
Grain	The level of detail (one row per transaction, per day, per hour)	One row per entity (one row per product, per customer)

A worked example: FreshCart’s star schema

Anita at FreshCart designs a star schema for sales analytics. Here is the complete model:

Fact table: `fact_sales`

Column	Type	Description
`sale_id`	INT	Unique transaction line ID
`date_key`	INT	FK → `dim_date`
`store_key`	INT	FK → `dim_store`
`product_key`	INT	FK → `dim_product`
`customer_key`	INT	FK → `dim_customer`
`quantity`	INT	Number of items sold (measure)
`unit_price`	DECIMAL	Price per item (measure)
`total_amount`	DECIMAL	quantity x unit_price (measure)
`discount_amount`	DECIMAL	Discount applied (measure)

Grain: One row per product per transaction (the most detailed level).

Dimension tables

dim_date — when the sale happened

Column	Example Values
`date_key`	20260421
`full_date`	2026-04-21
`day_name`	Tuesday
`month_name`	April
`quarter`	Q2
`year`	2026
`is_weekend`	No
`is_holiday`	No

dim_store — where the sale happened

Column	Example Values
`store_key`	1042
`store_name`	FreshCart Auckland CBD
`city`	Auckland
`region`	North Island
`store_type`	Metro
`opening_date`	2019-03-15

dim_product — what was sold

Column	Example Values
`product_key`	5678
`product_name`	Organic Avocado
`category`	Fresh Produce
`subcategory`	Fruits
`brand`	FreshGreen
`unit_of_measure`	Each

dim_customer — who bought it

Column	Example Values
`customer_key`	90123
`customer_name`	Jane Smith
`loyalty_tier`	Gold
`signup_date`	2022-01-10
`city`	Wellington

The star shape

  dim_date ──────┐
                  │
  dim_store ──── fact_sales ──── dim_product
                  │
  dim_customer ──┘

Every query follows the same pattern: start from the fact table, join to dimensions for context.

-- Total sales by region and month
SELECT
    s.region,
    d.month_name,
    SUM(f.total_amount) AS total_revenue
FROM fact_sales f
JOIN dim_store s ON f.store_key = s.store_key
JOIN dim_date d ON f.date_key = d.date_key
WHERE d.year = 2026
GROUP BY s.region, d.month_name
ORDER BY total_revenue DESC

Why denormalize?

In transactional databases (OLTP), you normalise data to reduce redundancy — separate tables linked by foreign keys. In analytical databases (OLAP), you denormalize — merge related attributes into fewer, wider tables.

Normalization vs denormalization

Approach	Normalised (OLTP)	Denormalised (OLAP / Star Schema)
Goal	Eliminate redundancy	Optimise query performance
Tables	Many narrow tables with joins	Fewer wide tables with redundant attributes
Joins needed	Many (complex queries)	Few (simple queries)
Write performance	Fast (each fact stored once)	Slower (redundant data updated in multiple places)
Read performance	Slower (many joins)	Faster (fewer joins, columnar storage)
Best for	Transaction processing	Reporting and analytics

In the FreshCart example, dim_store already includes city and region — even though a normalised design would have separate city and region tables. This redundancy makes queries faster because you join fact_sales to dim_store once instead of joining to store, then city, then region.

Exam tip: When to denormalize

The exam tests your judgment on denormalization. General rules:

Denormalize dimension tables — flatten hierarchies into the dimension (city, region, country all in dim_store)
Keep fact tables normalised — fact tables should contain only keys and measures
Create aggregate tables for common high-level queries (daily totals instead of per-transaction)
Do NOT denormalize when dimension attributes change frequently — updates ripple across all redundant copies

Aggregate tables

For very large fact tables, pre-computing common aggregations dramatically improves performance:

-- Create an aggregate table: daily sales by store
CREATE TABLE agg_daily_store_sales AS
SELECT
    date_key,
    store_key,
    COUNT(*) AS transaction_count,
    SUM(quantity) AS total_units,
    SUM(total_amount) AS total_revenue,
    SUM(discount_amount) AS total_discount
FROM fact_sales
GROUP BY date_key, store_key

Query Level	Source	Rows (FreshCart)
Per transaction	`fact_sales`	5.4 billion/year
Per day per store	`agg_daily_store_sales`	730,000/year
Per month per region	Custom aggregate	48/year

Power BI and Direct Lake semantic models can automatically use aggregate tables when available (via user-defined aggregations).

Question

What is a fact table?

Click or press Enter to reveal answer

Answer

A fact table records measurable business events — sales transactions, website clicks, sensor readings. Rows are numerous (millions to billions). Columns contain foreign keys to dimension tables plus numeric measures (quantity, amount, duration).

Click to flip back

Question

What is a dimension table?

Click or press Enter to reveal answer

Answer

A dimension table contains descriptive attributes used for filtering, grouping, and labelling — products, customers, stores, dates. Rows are relatively few. Attributes are denormalised (city, region, country in one table) for query performance.

Click to flip back

Question

What is the grain of a fact table?

Click or press Enter to reveal answer

Answer

The grain defines what one row represents — the most detailed level of measurement. For FreshCart: one row per product per transaction. Defining grain correctly is the most important design decision because it determines what questions the model can answer.

Click to flip back

Question

Why do we denormalize dimension tables in a star schema?

Click or press Enter to reveal answer

Answer

Denormalization reduces the number of joins needed for analytical queries. Instead of joining store → city → region → country, all attributes live in one dim_store table. This is slower for writes but much faster for reads — exactly the trade-off analytics requires.

Click to flip back

Knowledge Check

Anita at FreshCart is designing a star schema for sales analytics. She has transaction-level data (one row per item per sale) plus reference data for products, stores, customers, and dates. How should she structure the schema?

Knowledge Check

Raj at Atlas Capital notices that Power BI dashboards showing 'total portfolio value by region' take 30+ seconds to load. The fact table has 2 billion rows at transaction-level grain. The query aggregates by region and month. What is the best optimisation?

Next up: SQL Objects: Views, Functions & Stored Procedures — programmable SQL in Fabric warehouses and lakehouse SQL endpoints.

What is a star schema?

Fact tables vs dimension tables

A worked example: FreshCart’s star schema

Fact table: fact_sales

Dimension tables

The star shape

Why denormalize?

Normalization vs denormalization

Aggregate tables

Fact table: `fact_sales`