🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-900 Domain 2
Domain 2 — Module 2 of 7 29%
9 of 27 overall

DP-900 Study Guide

Domain 1: Core Data Concepts

  • Your First Look at Data Free
  • Data File Formats: CSV, JSON, Parquet & More Free
  • Databases: Relational vs Non-Relational Free
  • Transactional Workloads: Keeping Data Consistent Free
  • Analytical Workloads: Finding the Insights Free
  • Data Roles: DBA, Engineer & Analyst Free
  • The Azure Data Landscape Free

Domain 2: Relational Data on Azure

  • Relational Data: Tables, Keys & Relationships
  • Normalization: Why Duplicate Data is Bad
  • SQL Basics: SELECT, INSERT, UPDATE, DELETE
  • Database Objects: Views, Indexes & More
  • Azure SQL: Your Database in the Cloud
  • Open-Source Databases on Azure
  • Choosing the Right Azure Database

Domain 3: Non-Relational Data on Azure

  • Azure Blob Storage: Files in the Cloud
  • Azure Files & Table Storage
  • Azure Cosmos DB: The Global Database
  • Cosmos DB APIs: SQL, MongoDB & More
  • Choosing Non-Relational Storage

Domain 4: Analytics on Azure

  • Data Ingestion & Processing
  • Analytical Data Stores: Data Lakes, Warehouses & Lakehouses
  • Microsoft Fabric & Azure Databricks
  • Batch vs Streaming: Two Speeds of Data
  • Real-Time Analytics on Azure
  • Power BI: See Your Data
  • Data Models in Power BI
  • Choosing the Right Visualization

DP-900 Study Guide

Domain 1: Core Data Concepts

  • Your First Look at Data Free
  • Data File Formats: CSV, JSON, Parquet & More Free
  • Databases: Relational vs Non-Relational Free
  • Transactional Workloads: Keeping Data Consistent Free
  • Analytical Workloads: Finding the Insights Free
  • Data Roles: DBA, Engineer & Analyst Free
  • The Azure Data Landscape Free

Domain 2: Relational Data on Azure

  • Relational Data: Tables, Keys & Relationships
  • Normalization: Why Duplicate Data is Bad
  • SQL Basics: SELECT, INSERT, UPDATE, DELETE
  • Database Objects: Views, Indexes & More
  • Azure SQL: Your Database in the Cloud
  • Open-Source Databases on Azure
  • Choosing the Right Azure Database

Domain 3: Non-Relational Data on Azure

  • Azure Blob Storage: Files in the Cloud
  • Azure Files & Table Storage
  • Azure Cosmos DB: The Global Database
  • Cosmos DB APIs: SQL, MongoDB & More
  • Choosing Non-Relational Storage

Domain 4: Analytics on Azure

  • Data Ingestion & Processing
  • Analytical Data Stores: Data Lakes, Warehouses & Lakehouses
  • Microsoft Fabric & Azure Databricks
  • Batch vs Streaming: Two Speeds of Data
  • Real-Time Analytics on Azure
  • Power BI: See Your Data
  • Data Models in Power BI
  • Choosing the Right Visualization
Domain 2: Relational Data on Azure Premium ⏱ ~10 min read

Normalization: Why Duplicate Data is Bad

Normalization is the art of organising data to eliminate redundancy. It's fundamental to good database design — and a guaranteed exam topic.

What is normalization?

☕ Simple explanation

Normalization means “don’t write the same thing twice.”

Imagine Tom writes every driver’s full address inside every delivery record. If Sarah Park moves house, Tom has to update hundreds of delivery records — miss one and the data is wrong.

Normalization says: store Sarah’s address ONCE in the Drivers table, and just reference her DriverID in the Deliveries table. Now you update one place and everything stays consistent.

Normalization is the process of organising data in a relational database to reduce redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables and defining relationships between them using keys.

The primary goals are: eliminate duplicate data, ensure data dependencies make sense (each fact stored in one place), and prevent update anomalies (inconsistencies caused by modifying data stored in multiple locations).

The problem normalization solves

Before normalization (bad design)

Jake’s early CloudPulse database stored everything in one table:

OrderIDCustomerNameCustomerEmailCustomerCityProductPrice
5001Meridian Healthinfo@meridian.co.nzAucklandPro Plan49.00
5002Meridian Healthinfo@meridian.co.nzAucklandAPI Add-on19.00
5003Kiwi Fitnesshello@kiwi.fitWellingtonPro Plan49.00

Problems:

  • Redundancy: “Meridian Health” and their email/city are stored in every order row
  • Update anomaly: If Meridian Health changes their email, Jake must update EVERY order row. Miss one? Inconsistent data.
  • Delete anomaly: If Jake deletes all orders for Kiwi Fitness, he loses the customer’s contact info entirely
  • Insert anomaly: Jake can’t add a new customer until they place an order

After normalization (good design)

Split into separate tables:

Customers table:

CustomerIDNameEmailCity
C001Meridian Healthinfo@meridian.co.nzAuckland
C002Kiwi Fitnesshello@kiwi.fitWellington

Orders table:

OrderIDCustomerID (FK)ProductPrice
5001C001Pro Plan49.00
5002C001API Add-on19.00
5003C002Pro Plan49.00

Now each fact lives in ONE place. Update Meridian’s email once, and it’s correct everywhere.

Normal forms (simplified)

Normalization is done in stages called normal forms. For DP-900, you need to understand the concept, not memorise every rule:

Normal FormKey RulePlain English
1NFNo repeating groups; each cell has one valueEvery column stores a single value, not a list
2NFEvery non-key column depends on the WHOLE primary keyNo partial dependencies in tables with composite keys
3NFEvery non-key column depends ONLY on the primary keyNo column depends on another non-key column
ℹ️ Normal forms by example

1NF violation — a “Products” column storing “Pro Plan, API Add-on” in one cell. Fix: one row per product.

2NF violation — a table with composite key (OrderID + ProductID) where CustomerName depends only on OrderID, not on the full key. Fix: move CustomerName to a separate table.

3NF violation — a table where City depends on PostalCode, which depends on CustomerID. City doesn’t depend directly on the primary key. Fix: move City and PostalCode to a separate location table.

In practice, most well-designed databases achieve 3NF. That’s the sweet spot between data integrity and query simplicity.

Normalization vs denormalization

Normalization minimises duplication — great for transactional databases. But analytical databases (data warehouses) often denormalize on purpose — combining tables to speed up read-heavy queries.

Normalized vs denormalized data
FeatureNormalizedDenormalized
DuplicationMinimal — each fact stored onceIntentional — data repeated for speed
UpdatesFast — update one placeSlow — update many places
Read queriesRequire joins (slower for big queries)Pre-joined data (faster reads)
Best forTransactional (OLTP) databasesAnalytical (OLAP) data warehouses
Integrity riskLow — constraints enforcedHigher — duplicates can get out of sync
💡 Exam tip: normalization questions

The exam asks about normalization at a conceptual level:

  • “Reduce data redundancy” → Normalization
  • “Prevent update anomalies” → Normalization
  • “Split one large table into multiple related tables” → Normalization
  • “Combine tables for faster reporting” → Denormalization
  • “Optimise for analytical queries” → Denormalization

You don’t need to identify specific normal forms (1NF, 2NF, 3NF) on the exam — just understand WHY normalization exists and when denormalization is appropriate.

Flashcards

Question

What is normalization in database design?

Click or press Enter to reveal answer

Answer

The process of organising data to reduce redundancy by splitting data into multiple related tables. Each fact is stored in one place, connected through key relationships.

Click to flip back

Question

What is an update anomaly?

Click or press Enter to reveal answer

Answer

An inconsistency that occurs when the same data is stored in multiple places and one copy is updated while others are not. Normalization prevents this by storing each fact once.

Click to flip back

Question

When would you denormalize data on purpose?

Click or press Enter to reveal answer

Answer

In analytical databases (OLAP/data warehouses) where read performance is more important than write efficiency. Pre-joining data into fewer tables speeds up aggregation queries.

Click to flip back

Knowledge check

Knowledge Check

Jake notices that when he updates a customer's email address, he has to change it in 47 different order records. What database design problem is this?

Knowledge Check

Priya's data warehouse combines customer, product, and sales data into a single wide table for faster reporting. This is an example of:

🎬 Video coming soon

Next up: SQL Basics: SELECT, INSERT, UPDATE, DELETE — learn the language that relational databases speak.

← Previous

Relational Data: Tables, Keys & Relationships

Next →

SQL Basics: SELECT, INSERT, UPDATE, DELETE

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.