🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-700 Domain 2
Domain 2 — Module 1 of 10 10%
9 of 26 overall

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance
Domain 2: Ingest and Transform Data Free ⏱ ~15 min read

Delta Lake: The Heart of Fabric

Understand Delta Lake — the open-source storage layer that powers every lakehouse table in Fabric. ACID transactions, time travel, schema enforcement, and the transaction log explained.

What is Delta Lake?

☕ Simple explanation

Think of a notebook with a built-in undo button, a table of contents, and a lock on the cover.

Old data lakes were just folders full of files — like loose papers on a desk. If two people edited the same file, you got corruption. If you needed last Tuesday’s version, tough luck. If someone wrote bad data, it mixed in with the good.

Delta Lake fixes all of this. It’s a storage format that wraps Parquet files with a transaction log. That log acts as the undo button (time travel), the table of contents (what files make up the current version), and the lock (only one writer at a time).

In Microsoft Fabric, every lakehouse table is a Delta table. You don’t choose Delta — it’s the default. Understanding Delta is understanding how your data is stored, versioned, and protected.

Delta Lake is an open-source storage layer that brings ACID transactions, scalable metadata handling, and time travel to data lakes. It stores data as Parquet files with an additional transaction log (_delta_log/) that records every change as a JSON commit.

In Microsoft Fabric, Delta Lake (specifically the Delta Lake protocol with Parquet as the columnar file format) is the default and only table format for lakehouse tables. OneLake stores all structured data as Delta tables, and both Spark and the SQL analytics endpoint read from the same Delta files.

Key capabilities: ACID transactions (atomicity, consistency, isolation, durability), time travel (query any previous version), schema enforcement (reject writes that don’t match the schema), schema evolution (add columns without breaking existing readers), and MERGE (upsert pattern for incremental loads).

The transaction log

The transaction log is what makes Delta Lake different from plain Parquet. It’s a folder called _delta_log/ inside every Delta table, containing JSON files numbered sequentially.

How it works

VersionLog EntryEffect
000000000000000000000.json — Add part-0001.parquet, part-0002.parquetInitial table creation with 2 data files
100000000000000000001.json — Add part-0003.parquetNew data appended
200000000000000000002.json — Remove part-0001.parquet, Add part-0004.parquetUpdate operation: old file replaced with new

Every 10 commits, Delta creates a checkpoint file (Parquet format) that summarises the current state — so readers don’t need to replay all JSON logs from the beginning.

💡 Scenario: Carlos discovers the transaction log

Carlos at Precision Manufacturing investigates why a lakehouse table is using more storage than expected. He browses the OneLake file structure:

Tables/FactProduction/
├── _delta_log/
│   ├── 00000000000000000000.json
│   ├── 00000000000000000001.json
│   ├── ...
│   ├── 00000000000000000042.json
│   └── 00000000000000000040.checkpoint.parquet
├── part-00000-...snappy.parquet
├── part-00001-...snappy.parquet
└── ... (hundreds of Parquet files)

He realises that old versions of data files still exist on disk — Delta marks them as “removed” in the log but doesn’t physically delete them (that’s what VACUUM does — covered in the optimisation module).

ACID transactions

Delta Lake provides full ACID guarantees — something traditional data lakes couldn’t offer.

PropertyWhat It MeansExample
AtomicityA write either fully succeeds or fully fails — no partial writesA notebook crashes mid-write → the table stays in its previous valid state
ConsistencySchema enforcement prevents bad data from enteringA CSV with a missing column is rejected before any rows are written
IsolationConcurrent readers see a consistent snapshot while writers are activeA BI report queries the table while an ETL job is writing — the report sees the previous complete version
DurabilityOnce committed, data survives system failuresA power outage after commit → data is safe in OneLake
ℹ️ Why ACID matters for data engineering

Without ACID, a failed write can leave your table in a half-updated state. Imagine loading 10 million rows — if the job crashes at row 5 million, you have a table with incomplete data. Readers might see inconsistent results. You’d need to manually clean up and retry.

With Delta Lake’s ACID transactions, the partially written files exist on disk but are NOT registered in the transaction log. The table stays at its previous version. When you retry the load, it starts fresh.

Time travel

Every commit creates a new version. You can query any previous version by number or timestamp.

Query by version

-- Spark SQL
SELECT * FROM lakehouse.FactSales VERSION AS OF 5

SELECT * FROM delta.`/Tables/FactSales` VERSION AS OF 5

Note: VERSION AS OF is supported in Spark SQL. The lakehouse SQL analytics endpoint does not support time travel queries.

Query by timestamp

# PySpark
df = spark.read.format("delta") \
    .option("timestampAsOf", "2026-04-20T10:00:00") \
    .load("/Tables/FactSales")

Common time travel use cases

Use CaseHow
Audit — “What did the table look like last Tuesday?”Query by timestamp
Debug — “What changed in the last load?”Compare version N vs N-1
Rollback — “The last load was bad, restore the previous version”RESTORE TABLE FactSales TO VERSION AS OF 41
Reproducibility — “Rerun yesterday’s report with yesterday’s data”Query by timestamp matching the report date
💡 Scenario: Anika rolls back a bad load

At ShopStream, Anika’s overnight pipeline loads order data. On Wednesday morning, the analytics team reports that revenue numbers are doubled. Anika investigates:

  1. She queries the table’s history: DESCRIBE HISTORY lakehouse.FactOrders — version 87 was the bad load
  2. She checks what version 86 looked like: SELECT COUNT(*) FROM lakehouse.FactOrders VERSION AS OF 86 — correct count
  3. She restores: RESTORE TABLE lakehouse.FactOrders TO VERSION AS OF 86
  4. Revenue numbers return to normal immediately

Total downtime: 4 minutes. Without Delta time travel, she’d need to rebuild the table from raw files — potentially hours.

Schema enforcement and evolution

Schema enforcement (write-side)

Delta Lake rejects writes that don’t match the existing table schema. If your table has columns (id INT, name STRING, amount DECIMAL) and you try to write data with an extra discount column, the write fails.

This prevents schema drift — the silent corruption where unexpected columns sneak into your data.

Schema evolution (controlled changes)

When you intentionally want to add columns, you enable mergeSchema:

df.write.format("delta") \
    .mode("append") \
    .option("mergeSchema", "true") \
    .save("/Tables/FactOrders")

Existing rows get NULL for the new column. Future reads include it.

Enforcement protects you; evolution lets you grow
FeatureSchema EnforcementSchema Evolution
Default behaviourON — rejects mismatched schemasOFF — must explicitly enable
PurposePrevent accidental schema changesAllow intentional schema changes
How to overrideCannot disable (by design)Set mergeSchema=true on the write operation
When it appliesEvery write operationOnly when mergeSchema is enabled
Exam pattern'Write fails with schema mismatch' — expected behaviour'New column needs to be added to existing table' — use mergeSchema

Delta MERGE (upsert)

MERGE is one of the most important Delta Lake operations for data engineering. It combines INSERT, UPDATE, and DELETE in a single atomic operation.

MERGE INTO target_table AS target
USING source_data AS source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET target.amount = source.amount
WHEN NOT MATCHED THEN INSERT (id, name, amount) VALUES (source.id, source.name, source.amount)

Use cases: incremental data loads, slowly changing dimensions (SCD), deduplication.

💡 Exam tip: MERGE vs INSERT OVERWRITE

The exam frequently tests the choice between MERGE and other write patterns:

  • Full load (replace everything): INSERT OVERWRITE or .mode("overwrite")
  • Append new rows only: .mode("append")
  • Update existing + insert new (upsert): MERGE
  • Delete + insert (SCD Type 1): MERGE with WHEN MATCHED THEN UPDATE
  • Insert new version + expire old (SCD Type 2): MERGE with additional logic

If the scenario mentions “update existing records and add new ones,” the answer is almost always MERGE.


Question

What is the Delta Lake transaction log?

Click or press Enter to reveal answer

Answer

A folder (_delta_log/) containing numbered JSON files that record every change to the table. It enables ACID transactions, time travel, and concurrent reads/writes. Checkpoint files (every 10 commits) summarise the current state for fast reads.

Click to flip back

Question

How do you query a previous version of a Delta table?

Click or press Enter to reveal answer

Answer

By version number: SELECT * FROM table VERSION AS OF 5. By timestamp: read with option('timestampAsOf', '2026-04-20'). To restore: RESTORE TABLE table TO VERSION AS OF 5.

Click to flip back

Question

What is the difference between schema enforcement and schema evolution?

Click or press Enter to reveal answer

Answer

Schema enforcement (always on) REJECTS writes that don't match the existing schema. Schema evolution (opt-in via mergeSchema=true) ALLOWS intentional schema changes like adding new columns. Existing rows get NULL for new columns.

Click to flip back

Question

What does Delta MERGE do?

Click or press Enter to reveal answer

Answer

MERGE combines INSERT, UPDATE, and DELETE in one atomic operation. It matches source rows to target rows on a key, then applies different actions for matched (update/delete) and unmatched (insert) rows. Essential for incremental loads and upserts.

Click to flip back

Question

In Fabric, can you choose a table format other than Delta Lake for lakehouse tables?

Click or press Enter to reveal answer

Answer

No. Delta Lake is the only table format for Fabric lakehouse tables. All tables in the Tables/ section are stored as Delta (Parquet files + transaction log). The Files/ section can hold any file format.

Click to flip back


Knowledge Check

A PySpark notebook crashes halfway through writing 10 million rows to a Delta table. What happens to the table?

Knowledge Check

Anika needs to add a 'discount_percentage' column to an existing Delta table that has 50 million rows. She writes a DataFrame with the new column. The write fails with a schema mismatch error. What should she do?

Knowledge Check

Carlos receives daily production data. Some records are updates to existing rows (changed status), others are new rows. He needs to update existing records and insert new ones in a single operation. Which Delta operation should he use?

🎬 Video coming soon

Next up: Loading Patterns: Full, Incremental & Streaming — design the right data loading strategy for every scenario.

← Previous

Pipeline Patterns: Parameters & Expressions

Next →

Loading Patterns: Full, Incremental & Streaming

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.