🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-700 Domain 3
Domain 3 — Module 5 of 8 63%
23 of 26 overall

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance
Domain 3: Monitor and Optimize an Analytics Solution Premium ⏱ ~14 min read

Optimize Lakehouse Tables: Delta Tuning

Use OPTIMIZE, VACUUM, Z-ordering, and V-ordering to make your Delta Lake tables fast, compact, and storage-efficient.

Why optimize Delta tables?

☕ Simple explanation

Think of a filing cabinet that nobody organises.

After months of use, you have thousands of tiny folders (small files), old documents that should have been shredded (orphaned files), and no logical order to anything (unsorted data). Finding something takes forever.

Delta table optimization is like hiring a professional organiser: OPTIMIZE combines tiny files into bigger ones (compaction). VACUUM removes old files that are no longer needed. Z-ordering sorts data so queries on common columns skip irrelevant files. V-ordering is Fabric’s special compression for even faster reads.

Delta Lake tables accumulate small files over time through append operations, streaming writes, and MERGE operations. This small file problem degrades query performance because Spark must open and read thousands of tiny files instead of fewer large ones. Fabric provides several optimization commands: OPTIMIZE (bin-packing/compaction), VACUUM (remove old files), Z-ORDER BY (co-locate related data), and V-ORDER (Fabric-native compression optimization).

OPTIMIZE (compaction)

Combines many small files into fewer larger files. This is the single most impactful optimization.

-- Compact all files in the table
OPTIMIZE lakehouse.FactOrders

-- Compact with Z-ordering on a specific column
OPTIMIZE lakehouse.FactOrders ZORDER BY (order_date)

-- Compact a specific partition
OPTIMIZE lakehouse.FactOrders WHERE year = 2026

Before and after

MetricBefore OPTIMIZEAfter OPTIMIZE
Number of files12,847 small files156 optimally-sized files
Average file size2 MB128 MB
Query scan time45 seconds6 seconds
ℹ️ Why small files are a problem

Each Parquet file has overhead: file open, metadata read, footer parsing. With 12,847 files, Spark spends more time on overhead than actual data reading.

OPTIMIZE reads all the small files and rewrites them as fewer, optimally-sized files (target: ~128 MB each). The data is identical — only the physical file layout changes.

Best practice: Run OPTIMIZE after batch loads, especially after many small appends or streaming micro-batches.

VACUUM (cleanup)

Removes files that are no longer referenced by the transaction log. These are leftover from UPDATE, DELETE, MERGE, or OPTIMIZE operations.

-- Remove files older than 7 days (default retention)
VACUUM lakehouse.FactOrders

-- Remove files older than 24 hours (careful — reduces time travel window)
VACUUM lakehouse.FactOrders RETAIN 24 HOURS

What VACUUM does NOT delete

  • Current data files (referenced by the latest transaction log)
  • Files within the retention period (default: 7 days)
  • The transaction log itself (_delta_log/)
💡 Exam tip: VACUUM and time travel

Critical trade-off: VACUUM removes old file versions. If you VACUUM with a 24-hour retention, you can only time travel back 24 hours. Files needed for older versions are gone.

Exam pattern: “After running VACUUM, time travel to version 5 fails.” Why? VACUUM deleted the files that version 5 referenced.

Safe default: Keep the 7-day retention period unless storage cost is critical.

Z-ordering

Z-ordering physically sorts data within files by specified columns. This enables data skipping — Spark reads file metadata (min/max values per column) and skips files that can’t contain matching rows.

-- Z-order by the most commonly filtered column
OPTIMIZE lakehouse.FactOrders ZORDER BY (customer_id)

-- Z-order by multiple columns (first column gets priority)
OPTIMIZE lakehouse.FactOrders ZORDER BY (order_date, region)

When to Z-order

ScenarioZ-Order ColumnWhy
Queries filter by dateorder_dateDate range queries skip non-matching files
Queries filter by customercustomer_idCustomer lookups read fewer files
Queries join on product_idproduct_idJoin scans only relevant files
💡 Scenario: Carlos Z-orders production data

Carlos’s FactProduction table has 500M rows. Most queries filter by factory_id and production_date. Without Z-ordering, every query scans all files.

After running OPTIMIZE FactProduction ZORDER BY (production_date, factory_id):

  • A query for “Factory F01, April 2026” scans 12 files instead of 1,500
  • Query time drops from 35 seconds to 3 seconds

V-ordering

V-ordering is Fabric’s proprietary write-time optimization that applies special sorting and compression to Parquet files. It’s enabled by default for Fabric lakehouse tables.

Z-ordering is manual + column-specific; V-ordering is automatic + engine-wide
FeatureZ-orderingV-ordering
When appliedManually with OPTIMIZE commandAutomatically on write (default in Fabric)
What it doesSorts data by specific columns for data skippingApplies optimized compression and read-path optimization
Configured byYou choose the columnsEnabled explicitly via table properties, session config, or OPTIMIZE VORDER option
Works withSpark engine (data skipping)All Fabric engines (Spark, SQL endpoint, Power BI Direct Lake)
RelationshipCan be combined with V-orderingComplements Z-ordering

Table maintenance strategy

TaskFrequencyCommand
OPTIMIZEAfter batch loads, daily for heavy tablesOPTIMIZE table_name
OPTIMIZE + Z-ORDERWeekly or after major loadsOPTIMIZE table ZORDER BY (column)
VACUUMWeeklyVACUUM table_name
Analyze tableAfter major schema or data changesANALYZE TABLE table_name COMPUTE STATISTICS

Question

What does OPTIMIZE do to a Delta table?

Click or press Enter to reveal answer

Answer

Combines many small files into fewer, optimally-sized files (~128 MB each). This reduces file open overhead and dramatically improves query scan time. Data content is unchanged — only the physical layout changes.

Click to flip back

Question

What is the trade-off of running VACUUM with a short retention period?

Click or press Enter to reveal answer

Answer

VACUUM removes old file versions. Short retention (e.g., 24 hours) frees storage but limits time travel — you can only query versions within the retention window. Default 7-day retention is the safe balance.

Click to flip back

Question

What is Z-ordering and when should you use it?

Click or press Enter to reveal answer

Answer

Z-ordering sorts data within files by specified columns, enabling data skipping (Spark reads min/max metadata and skips irrelevant files). Use it on columns that appear frequently in WHERE filters or JOIN conditions.

Click to flip back

Question

What is V-ordering in Fabric?

Click or press Enter to reveal answer

Answer

An automatic, proprietary write-time optimization that applies special compression and read-path optimization to Parquet files. Enabled by default in Fabric. Works with all engines (Spark, SQL endpoint, Direct Lake). No configuration needed.

Click to flip back


Knowledge Check

Carlos runs VACUUM on a Delta table with RETAIN 2 HOURS. Later, he tries to time travel to a version from 3 hours ago. The query fails. Why?

Knowledge Check

A lakehouse table has 15,000 small files (average 1.5 MB each) after months of streaming micro-batches. Queries take 40+ seconds. What is the single most impactful optimization?

🎬 Video coming soon

Next up: Optimize Spark: Speed Up Your Code — partition strategies, caching, broadcast joins, and Spark configuration tuning.

← Previous

Troubleshoot Streaming & Shortcuts

Next →

Optimize Spark: Speed Up Your Code

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.