Partitioning, Clustering & Table Optimization

Why physical layout matters

Simple explanation

Imagine a library. Partitioning is sorting books by genre onto separate shelves. Clustering is sorting within each shelf by author.

When someone asks for “Science Fiction by Asimov,” you go straight to the Sci-Fi shelf (partition pruning) and then to the A section (data skipping). Without this, you’d have to check every book in the library.

The same applies to data. A well-partitioned, well-clustered table lets Spark skip irrelevant files entirely — reading 10 files instead of 10,000.

Partitioning

Partitioning divides a table into physical directories based on column values:

-- Create a partitioned table
CREATE TABLE sales.bronze.transactions (
  txn_id BIGINT,
  store_id INT,
  amount DECIMAL(10,2),
  txn_date DATE
)
PARTITIONED BY (txn_date);

This creates a directory per date: /txn_date=2026-04-01/, /txn_date=2026-04-02/, etc.

When queries filter on txn_date, Spark reads ONLY the relevant directories.

Partitioning rules

Rule	Why
Partition on low-cardinality columns	Too many partitions = too many small files
Each partition should have at least 1 GB of data	Small files hurt read performance
Partition on columns used in WHERE filters	Pruning only works if queries filter on the partition column
Don’t over-partition	10,000+ partitions = metadata overhead and small file problem

The small file problem

If you partition by a high-cardinality column (e.g., customer_id with 1 million values), you get 1 million tiny files. Each file has overhead (metadata, open/close operations), and Spark spends more time managing files than reading data.

Rule of thumb: If each partition would be less than 128 MB, don’t partition on that column — use Z-ordering or liquid clustering instead.

Z-ordering

Z-ordering co-locates related data within files based on one or more columns — without physical partitioning:

-- Z-order the table by region and product_category
OPTIMIZE sales.curated.daily_summary
  ZORDER BY (region, product_category);

Z-ordering sorts data across files so that rows with similar region and product_category values are stored together. This enables data skipping — Spark reads file-level min/max statistics and skips files that can’t contain matching values.

Z-ordering vs partitioning:

Aspect	Partitioning	Z-ordering
Physical layout	Separate directories	Co-located within files
Best for	Low-cardinality columns (date, region)	Medium/high-cardinality columns
Maintenance	Automatic (write-time)	Manual (run OPTIMIZE ZORDER)
Combinable	Yes — partition + Z-order	Z-order within partitions

Liquid clustering (the modern approach)

Liquid clustering is Databricks’ newest layout strategy — it replaces both partitioning and Z-ordering for most use cases:

-- Create a table with liquid clustering
CREATE TABLE sales.curated.transactions
  CLUSTER BY (region, txn_date);

-- Trigger clustering (or let OPTIMIZE do it automatically)
OPTIMIZE sales.curated.transactions;

Feature	Liquid Clustering	Traditional Partitioning	Z-Ordering
Incremental	Yes (only re-clusters new data)	N/A	No (re-processes entire table)
Change cluster keys	Yes (ALTER TABLE)	No (must rewrite table)	Yes
Write performance	Good (minimal overhead)	Good	Overhead at OPTIMIZE time
Flexibility	High (change keys anytime)	Low (locked at creation)	Medium
Recommended for	Most new tables	Legacy compatibility	Legacy tables

Exam default: For new tables, liquid clustering is the recommended approach. It’s more flexible than partitioning and more efficient than Z-ordering.

-- Change clustering keys without rewriting the table
ALTER TABLE sales.curated.transactions CLUSTER BY (store_id, txn_date);

Exam tip: When to use which

New table with predictable filter patterns → Liquid clustering
Legacy table already partitioned → Keep partitioning + add Z-ordering
High-cardinality filter column → Z-ordering or liquid clustering (never partition)
Query filters change often → Liquid clustering (can change keys with ALTER TABLE)
Very large table partitioned by date → Keep date partitioning + Z-order within partitions

Deletion vectors

Deletion vectors improve DELETE and MERGE performance by marking rows as deleted without rewriting entire files:

-- Enable deletion vectors
ALTER TABLE sales.curated.transactions
  SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

Without deletion vectors: DELETE rewrites every affected file (expensive for large files). With deletion vectors: A small metadata file marks which rows are deleted. Files are only rewritten during the next OPTIMIZE.

This is especially important for SCD Type 2 MERGE operations where you expire old rows — deletion vectors make the expire step nearly instant.

Question

What is liquid clustering and why is it recommended over partitioning?

Click or press Enter to reveal answer

Answer

Liquid clustering co-locates data by specified columns, is incremental (only re-clusters new data), and allows changing cluster keys with ALTER TABLE. Unlike partitioning, it doesn't create physical directories or the small file problem.

Click to flip back

Question

What is the key rule for choosing a partition column?

Click or press Enter to reveal answer

Answer

Partition on low-cardinality columns where each partition has at least 1 GB of data. If partitions would be smaller than 128 MB, use Z-ordering or liquid clustering instead.

Click to flip back

Question

What are deletion vectors and why do they matter?

Click or press Enter to reveal answer

Answer

Deletion vectors mark rows as deleted in a small metadata file instead of rewriting entire data files. This makes DELETE and MERGE much faster. Files are compacted during the next OPTIMIZE run.

Click to flip back

Knowledge check

Knowledge Check

Mei Lin's transactions table at Freshmart has 500 million rows. Analysts query it by region (5 values) and product_category (200 values). She's building a new table. Which layout strategy should she use?

Next up: Ingesting Data: Lakeflow Connect & Notebooks — batch and streaming ingestion with Lakeflow Connect and notebook code.