πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-700 Domain 3
Domain 3 β€” Module 7 of 8 88%
25 of 26 overall

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance
Domain 3: Monitor and Optimize an Analytics Solution Premium ⏱ ~12 min read

Optimize Pipelines & Warehouses

Tune pipeline performance with parallelism and resource management. Optimize warehouse queries with statistics, result caching, and efficient T-SQL patterns.

Pipeline optimization

β˜• Simple explanation

Think of an assembly line in a factory.

A slow assembly line usually has one of three problems: bottlenecks (one station takes 10x longer than others), sequencing (stations wait for each other when they could work in parallel), or wasted effort (doing work that isn’t needed).

Pipeline optimization follows the same principles: identify the slowest activity, run independent activities in parallel, and skip unnecessary work.

Pipeline optimization in Fabric focuses on three areas: parallelism (ForEach batch count, concurrent activities), activity-level tuning (Copy activity DIU settings, notebook pool configuration), and data movement reduction (incremental instead of full loads, filtering at source). Warehouse optimization covers query performance (statistics, result-set caching, materialized views) and data loading patterns.

Pipeline performance techniques

The fastest pipeline is one that does less work
TechniqueHowImpact
Parallel ForEachIncrease batch count (up to 50) in ForEach activityProcess multiple items simultaneously instead of sequentially
Copy activity DIUIncrease Data Integration Units (DIU) for large data transfersMore parallelism in the copy operation itself
Incremental over fullUse watermark or CDC to load only changed data90%+ reduction in data volume for large tables
Source-side filteringPush filters into the source query (not post-copy)Reduce data transferred over the network
Avoid unnecessary activitiesRemove logging, temp table creation, or intermediate copies that aren't neededFewer activities = faster pipeline
Activity dependenciesOnly chain activities that truly depend on each other; let independent ones run in parallelReduces total pipeline duration
πŸ’‘ Scenario: Carlos optimizes his nightly pipeline

Carlos’s pipeline takes 3 hours. He profiles each activity:

ActivityDurationDepends On
Copy SAP data45 minβ€”
Copy IoT data30 minβ€”
Transform notebook90 minBoth copies
Semantic model refresh15 minTransform

Before: All activities run sequentially (3 hours total). After: Copy SAP and Copy IoT run in parallel (45 min), then Transform (90 min), then Refresh (15 min) = 2.5 hours. He also switches SAP to incremental load: Copy SAP drops to 8 min. Total: 1.8 hours β€” 40% faster.

Warehouse optimization

Statistics

Fabric warehouses use statistics to create optimal query execution plans. Statistics describe data distribution (row counts, value histograms, distinct counts per column).

-- Update statistics on a table
UPDATE STATISTICS dbo.FactOrders;

-- Create statistics on specific columns
CREATE STATISTICS stat_order_date ON dbo.FactOrders (order_date);

When to update: After large data loads, after significant DELETE operations, or when query plans look suboptimal.

Result-set caching

Fabric warehouses have a result-set caching feature, though availability may vary β€” check the latest Fabric documentation for current status. When available, cached results return instantly for repeated identical queries when underlying data hasn’t changed.

Current best practice: Use statistics updates and efficient query patterns (filter early, select only needed columns) to optimise warehouse query performance.

Query optimization patterns

PatternPoorOptimized
Filter earlySELECT * FROM FactOrders JOIN DimCustomer ON ... WHERE year = 2026SELECT * FROM (SELECT * FROM FactOrders WHERE year = 2026) f JOIN DimCustomer ON ...
Select only needed columnsSELECT * FROM FactOrdersSELECT order_id, revenue, order_date FROM FactOrders
Avoid SELECT DISTINCT on large tablesSELECT DISTINCT customer_id FROM FactOrdersUse GROUP BY customer_id or get distinct from the dimension table
Use appropriate data typesVARCHAR(MAX) for a 10-character codeVARCHAR(10) β€” saves memory and improves performance
πŸ’‘ Exam tip: Warehouse vs lakehouse query performance

The exam may compare query performance between the lakehouse SQL endpoint and the warehouse:

  • Lakehouse SQL endpoint: Read-only, no statistics management, best for ad-hoc exploration
  • Warehouse: Full T-SQL, statistics, result-set caching, stored procedures β€” best for production BI workloads

If a scenario describes slow BI dashboard queries, look for warehouse optimizations (statistics, caching, materialized views) β€” not lakehouse tuning.


Question

What is the most impactful pipeline optimization for large data loads?

Click or press Enter to reveal answer

Answer

Switch from full load to incremental load (using watermark or CDC). This reduces data volume by 90%+ for large tables, dramatically cutting Copy activity duration and network transfer.

Click to flip back

Question

What are statistics in a Fabric warehouse?

Click or press Enter to reveal answer

Answer

Metadata about data distribution (row counts, value histograms, distinct counts) used by the query optimizer to create efficient execution plans. Update statistics after large data loads.

Click to flip back

Question

How can you speed up repeated warehouse queries?

Click or press Enter to reveal answer

Answer

Use materialized views to pre-compute common aggregation patterns. Also ensure statistics are updated after large loads, use efficient query patterns (filter early, select only needed columns), and consider pre-aggregated summary tables for dashboard queries.

Click to flip back


Knowledge Check

Carlos's pipeline has 3 Copy activities that take 30, 25, and 20 minutes respectively. They currently run sequentially (75 min total). None depends on the others. What is the best optimization?

Knowledge Check

A Power BI dashboard runs the same warehouse query 200 times per day. The query takes 8 seconds. What warehouse feature would most improve the user experience?

🎬 Video coming soon

Next up: Optimize Streaming: Real-Time Performance β€” tune Eventstreams, Eventhouses, and streaming queries for maximum throughput.

← Previous

Optimize Spark: Speed Up Your Code

Next β†’

Optimize Streaming: Real-Time Performance

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.