Partitioning, Clustering & Table Optimization
Physical data layout strategies — partitioning schemes, liquid clustering, Z-ordering, and deletion vectors — that make or break query performance.
Why physical layout matters
Imagine a library. Partitioning is sorting books by genre onto separate shelves. Clustering is sorting within each shelf by author.
When someone asks for “Science Fiction by Asimov,” you go straight to the Sci-Fi shelf (partition pruning) and then to the A section (data skipping). Without this, you’d have to check every book in the library.
The same applies to data. A well-partitioned, well-clustered table lets Spark skip irrelevant files entirely — reading 10 files instead of 10,000.
Partitioning
Partitioning divides a table into physical directories based on column values:
-- Create a partitioned table
CREATE TABLE sales.bronze.transactions (
txn_id BIGINT,
store_id INT,
amount DECIMAL(10,2),
txn_date DATE
)
PARTITIONED BY (txn_date);
This creates a directory per date: /txn_date=2026-04-01/, /txn_date=2026-04-02/, etc.
When queries filter on txn_date, Spark reads ONLY the relevant directories.
Partitioning rules
| Rule | Why |
|---|---|
| Partition on low-cardinality columns | Too many partitions = too many small files |
| Each partition should have at least 1 GB of data | Small files hurt read performance |
| Partition on columns used in WHERE filters | Pruning only works if queries filter on the partition column |
| Don’t over-partition | 10,000+ partitions = metadata overhead and small file problem |
The small file problem
If you partition by a high-cardinality column (e.g., customer_id with 1 million values), you get 1 million tiny files. Each file has overhead (metadata, open/close operations), and Spark spends more time managing files than reading data.
Rule of thumb: If each partition would be less than 128 MB, don’t partition on that column — use Z-ordering or liquid clustering instead.
Z-ordering
Z-ordering co-locates related data within files based on one or more columns — without physical partitioning:
-- Z-order the table by region and product_category
OPTIMIZE sales.curated.daily_summary
ZORDER BY (region, product_category);
Z-ordering sorts data across files so that rows with similar region and product_category values are stored together. This enables data skipping — Spark reads file-level min/max statistics and skips files that can’t contain matching values.
Z-ordering vs partitioning:
| Aspect | Partitioning | Z-ordering |
|---|---|---|
| Physical layout | Separate directories | Co-located within files |
| Best for | Low-cardinality columns (date, region) | Medium/high-cardinality columns |
| Maintenance | Automatic (write-time) | Manual (run OPTIMIZE ZORDER) |
| Combinable | Yes — partition + Z-order | Z-order within partitions |
Liquid clustering (the modern approach)
Liquid clustering is Databricks’ newest layout strategy — it replaces both partitioning and Z-ordering for most use cases:
-- Create a table with liquid clustering
CREATE TABLE sales.curated.transactions
CLUSTER BY (region, txn_date);
-- Trigger clustering (or let OPTIMIZE do it automatically)
OPTIMIZE sales.curated.transactions;
| Feature | Liquid Clustering | Traditional Partitioning | Z-Ordering |
|---|---|---|---|
| Incremental | Yes (only re-clusters new data) | N/A | No (re-processes entire table) |
| Change cluster keys | Yes (ALTER TABLE) | No (must rewrite table) | Yes |
| Write performance | Good (minimal overhead) | Good | Overhead at OPTIMIZE time |
| Flexibility | High (change keys anytime) | Low (locked at creation) | Medium |
| Recommended for | Most new tables | Legacy compatibility | Legacy tables |
Exam default: For new tables, liquid clustering is the recommended approach. It’s more flexible than partitioning and more efficient than Z-ordering.
-- Change clustering keys without rewriting the table
ALTER TABLE sales.curated.transactions CLUSTER BY (store_id, txn_date);
Exam tip: When to use which
- New table with predictable filter patterns → Liquid clustering
- Legacy table already partitioned → Keep partitioning + add Z-ordering
- High-cardinality filter column → Z-ordering or liquid clustering (never partition)
- Query filters change often → Liquid clustering (can change keys with ALTER TABLE)
- Very large table partitioned by date → Keep date partitioning + Z-order within partitions
Deletion vectors
Deletion vectors improve DELETE and MERGE performance by marking rows as deleted without rewriting entire files:
-- Enable deletion vectors
ALTER TABLE sales.curated.transactions
SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
Without deletion vectors: DELETE rewrites every affected file (expensive for large files). With deletion vectors: A small metadata file marks which rows are deleted. Files are only rewritten during the next OPTIMIZE.
This is especially important for SCD Type 2 MERGE operations where you expire old rows — deletion vectors make the expire step nearly instant.
🎬 Video coming soon
Knowledge check
Mei Lin's transactions table at Freshmart has 500 million rows. Analysts query it by region (5 values) and product_category (200 values). She's building a new table. Which layout strategy should she use?
Next up: Ingesting Data: Lakeflow Connect & Notebooks — batch and streaming ingestion with Lakeflow Connect and notebook code.