SCD, Granularity & Temporal Tables

Slowly changing dimensions

Simple explanation

A customer moves house. What happens to their old address in your database?

That’s the SCD problem. Dimension data (customers, products, stores) changes slowly over time. You have three choices:

Type 1: Overwrite the old value. Simple, but you lose history. “What was their address last year?” — can’t answer.
Type 2: Keep both versions. Add a new row with the new address, mark the old row as expired. You can see both past and present.
Type 3: Add a column for the old value. Quick to query but limited — you can only track one previous value.

Feature	SCD Type 1	SCD Type 2	SCD Type 3
History preserved?	No	Yes (all versions)	Partial (one previous)
Storage growth	Constant	Grows with changes	Constant
Implementation	UPDATE in place	INSERT new row + expire old	UPDATE + shift columns
Query complexity	Simple	Need WHERE is_current = TRUE	Moderate
Best for	Data where history doesn't matter	Customer addresses, pricing, org changes	Rarely used in practice
Delta Lake approach	MERGE with UPDATE	MERGE with INSERT + UPDATE	MERGE with UPDATE

SCD Type 2 in Delta Lake

-- SCD Type 2: Merge new customer data preserving history
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE

-- When existing record changed: expire old, insert new
WHEN MATCHED AND (target.address != source.address OR target.city != source.city) THEN
  UPDATE SET
    target.is_current = FALSE,
    target.effective_end = CURRENT_DATE()

-- When no match: insert new record
WHEN NOT MATCHED THEN
  INSERT (customer_id, name, address, city, is_current, effective_start, effective_end)
  VALUES (source.customer_id, source.name, source.address, source.city,
          TRUE, CURRENT_DATE(), DATE('9999-12-31'));

Mei Lin implements SCD Type 2 for Freshmart’s store dimension — when a store changes its category (from “Express” to “Supermarket”), both the old and new versions are preserved for historical reporting.

Granularity decisions

Granularity is the level of detail in your table — the “grain” of each row.

Granularity	Row Represents	Example
Transaction-level	One event/transaction	Each POS scan at Freshmart
Daily aggregate	One day’s summary	Total sales per store per day
Monthly aggregate	One month’s summary	Monthly revenue per region

Choosing granularity:

Finer grain (transaction) → more flexibility for analysis, more storage, slower aggregation queries
Coarser grain (monthly) → less storage, faster queries, but can’t drill down to individual transactions

Exam tip: Choose the finest granularity that your business requirements demand. You can always aggregate up from transactions to daily/monthly, but you can’t disaggregate monthly back to transactions.

Temporal (history) tables

A temporal table records every version of a row with timestamps:

CREATE TABLE prod_sales.curated.product_history (
  product_id INT,
  product_name STRING,
  price DECIMAL(10,2),
  category STRING,
  valid_from TIMESTAMP,
  valid_to TIMESTAMP,
  is_current BOOLEAN
);

This enables point-in-time queries: “What was the price of product X on March 1st?”

-- Point-in-time query
SELECT product_name, price
FROM product_history
WHERE product_id = 42
  AND valid_from <= '2026-03-01'
  AND valid_to > '2026-03-01';

Temporal tables vs Delta time travel

Don’t confuse these:

Temporal table — a table YOU design with valid_from/valid_to columns to track business history. Permanent. Queryable forever.
Delta time travel — Delta Lake’s built-in version history. Governed by retention settings. Intended for data recovery and auditing, NOT for business history.

Exam pattern: “Track how customer data changes over time for business reporting” → temporal table (SCD Type 2). “Recover data accidentally deleted yesterday” → Delta time travel.

Question

What are the three SCD types and when should you use each?

Click or press Enter to reveal answer

Answer

Type 1: Overwrite (no history needed). Type 2: Add rows with effective dates (full history for reporting). Type 3: Add 'previous' columns (rarely used, limited history).

Click to flip back

Question

How do you choose the right granularity for a table?

Click or press Enter to reveal answer

Answer

Choose the finest granularity your business requires. You can aggregate up (transactions → daily → monthly) but can't disaggregate down. Finer grain = more flexibility but more storage.

Click to flip back

Question

What is the difference between a temporal table and Delta time travel?

Click or press Enter to reveal answer

Answer

Temporal table: business-designed history with valid_from/valid_to columns, permanent. Delta time travel: system version history governed by retention, for recovery/auditing.

Click to flip back

Knowledge check

Knowledge Check

Freshmart changes its store classification system. Mei Lin needs to track how each store's category changed over time so historical reports show the correct category at the time of each sale. Which approach should she use?

Knowledge Check

Ravi is designing a sales fact table for DataPulse clients. Marketing wants to know which promotions drove individual purchases. Finance only needs monthly revenue by region. What granularity should Ravi choose?

Next up: Partitioning, Clustering & Table Optimization — partitioning schemes, liquid clustering, Z-ordering, and deletion vectors.