SCD, Granularity & Temporal Tables
Slowly changing dimensions, choosing the right granularity, and designing temporal history tables — the data modeling patterns that exam scenarios are built on.
Slowly changing dimensions
A customer moves house. What happens to their old address in your database?
That’s the SCD problem. Dimension data (customers, products, stores) changes slowly over time. You have three choices:
- Type 1: Overwrite the old value. Simple, but you lose history. “What was their address last year?” — can’t answer.
- Type 2: Keep both versions. Add a new row with the new address, mark the old row as expired. You can see both past and present.
- Type 3: Add a column for the old value. Quick to query but limited — you can only track one previous value.
| Feature | SCD Type 1 | SCD Type 2 | SCD Type 3 |
|---|---|---|---|
| History preserved? | No | Yes (all versions) | Partial (one previous) |
| Storage growth | Constant | Grows with changes | Constant |
| Implementation | UPDATE in place | INSERT new row + expire old | UPDATE + shift columns |
| Query complexity | Simple | Need WHERE is_current = TRUE | Moderate |
| Best for | Data where history doesn't matter | Customer addresses, pricing, org changes | Rarely used in practice |
| Delta Lake approach | MERGE with UPDATE | MERGE with INSERT + UPDATE | MERGE with UPDATE |
SCD Type 2 in Delta Lake
-- SCD Type 2: Merge new customer data preserving history
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE
-- When existing record changed: expire old, insert new
WHEN MATCHED AND (target.address != source.address OR target.city != source.city) THEN
UPDATE SET
target.is_current = FALSE,
target.effective_end = CURRENT_DATE()
-- When no match: insert new record
WHEN NOT MATCHED THEN
INSERT (customer_id, name, address, city, is_current, effective_start, effective_end)
VALUES (source.customer_id, source.name, source.address, source.city,
TRUE, CURRENT_DATE(), DATE('9999-12-31'));
Mei Lin implements SCD Type 2 for Freshmart’s store dimension — when a store changes its category (from “Express” to “Supermarket”), both the old and new versions are preserved for historical reporting.
Granularity decisions
Granularity is the level of detail in your table — the “grain” of each row.
| Granularity | Row Represents | Example |
|---|---|---|
| Transaction-level | One event/transaction | Each POS scan at Freshmart |
| Daily aggregate | One day’s summary | Total sales per store per day |
| Monthly aggregate | One month’s summary | Monthly revenue per region |
Choosing granularity:
- Finer grain (transaction) → more flexibility for analysis, more storage, slower aggregation queries
- Coarser grain (monthly) → less storage, faster queries, but can’t drill down to individual transactions
Exam tip: Choose the finest granularity that your business requirements demand. You can always aggregate up from transactions to daily/monthly, but you can’t disaggregate monthly back to transactions.
Temporal (history) tables
A temporal table records every version of a row with timestamps:
CREATE TABLE prod_sales.curated.product_history (
product_id INT,
product_name STRING,
price DECIMAL(10,2),
category STRING,
valid_from TIMESTAMP,
valid_to TIMESTAMP,
is_current BOOLEAN
);
This enables point-in-time queries: “What was the price of product X on March 1st?”
-- Point-in-time query
SELECT product_name, price
FROM product_history
WHERE product_id = 42
AND valid_from <= '2026-03-01'
AND valid_to > '2026-03-01';
Temporal tables vs Delta time travel
Don’t confuse these:
- Temporal table — a table YOU design with valid_from/valid_to columns to track business history. Permanent. Queryable forever.
- Delta time travel — Delta Lake’s built-in version history. Governed by retention settings. Intended for data recovery and auditing, NOT for business history.
Exam pattern: “Track how customer data changes over time for business reporting” → temporal table (SCD Type 2). “Recover data accidentally deleted yesterday” → Delta time travel.
🎬 Video coming soon
Knowledge check
Freshmart changes its store classification system. Mei Lin needs to track how each store's category changed over time so historical reports show the correct category at the time of each sale. Which approach should she use?
Ravi is designing a sales fact table for DataPulse clients. Marketing wants to know which promotions drove individual purchases. Finance only needs monthly revenue by region. What granularity should Ravi choose?
Next up: Partitioning, Clustering & Table Optimization — partitioning schemes, liquid clustering, Z-ordering, and deletion vectors.