Orchestration: Pick the Right Tool

Three tools, three use cases

Simple explanation

Think of three ways to get to work.

Walking (Dataflows Gen2) — simple, visual, no special skills needed. Perfect for short trips. You see every step clearly.

Driving (Pipelines) — more powerful, handles complex routes, can carry passengers (other activities). But you need to know the roads.

Flying (Notebooks) — maximum power and flexibility. Go anywhere, do anything. But you need a pilot’s licence (coding skills).

The exam tests whether you know which one to pick for a given scenario. The answer is usually the simplest tool that gets the job done.

The decision framework

Choose the simplest tool that meets the requirement
Factor	Dataflows Gen2	Pipelines	Notebooks
Interface	Visual (Power Query drag-and-drop)	Visual (canvas with activities) + JSON	Code (PySpark, SQL, Scala, R)
Best for	Simple data cleaning and shaping from 150+ connectors	Orchestrating multiple activities (copy, dataflow, notebook, stored proc)	Complex transformations, ML, custom logic, large-scale processing
Coding required?	No — M language generated automatically	Minimal — expressions and parameters	Yes — PySpark, SQL, or Scala
Scale	Small to medium datasets (Power Query engine)	Orchestrates at any scale (delegates to other engines)	Large datasets (distributed Spark processing)
Error handling	Basic retry on refresh failure	Rich — retry, conditional paths, failure activities, alerts	Custom — try/except in code, widget notifications
Scheduling	Built-in refresh schedule	Triggers: schedule, tumbling window, event-based	Built-in schedule, via pipeline (notebook activity), or manual run
Output destinations	Lakehouse, warehouse, KQL database	No output itself — orchestrates other tools that produce output	Lakehouse (Delta tables), warehouse, files

When to use what — exam decision patterns

Scenario	Best Tool	Why
Load CSV from blob storage, clean column names, filter rows, write to lakehouse	Dataflows Gen2	Simple ETL, no code needed, Power Query handles it
Run a dataflow, then a notebook, then refresh a semantic model — with retry on failure	Pipeline	Multi-step orchestration with error handling
Join 500M rows across three Delta tables, calculate running averages, write to warehouse	Notebook	Scale + complex logic needs distributed Spark
Copy data from Azure SQL to lakehouse (no transformation)	Pipeline (Copy activity)	Pure data movement — no transformation needed
Transform data using stored procedures in a warehouse	Pipeline (Stored Procedure activity)	Calls existing SQL logic without a notebook
Apply machine learning model to incoming data	Notebook	ML libraries (scikit-learn, MLflow) only available in code

Scenario: Carlos's orchestration design

Carlos at Precision Manufacturing needs to load daily production data:

Copy raw CSV files from an SFTP server to the lakehouse → Pipeline Copy activity
Clean column names, filter invalid records, standardise date formats → Dataflows Gen2 (visual, quick)
Transform — join with dimension tables, calculate defect rates, build fact table → Notebook (500M rows, complex joins)
Refresh the Power BI semantic model → Pipeline (semantic model refresh activity)

Carlos wraps steps 1-4 in a single Pipeline that orchestrates all the activities in sequence, with retry logic on the copy activity and an email alert if the notebook fails.

Schedules and triggers

Once you’ve built your orchestration, you need to make it run automatically.

Trigger types

Trigger Type	How It Works	Best For
Schedule	Runs at fixed intervals (every 6 hours, daily at 3 AM, every Monday)	Regular batch processing on predictable cadence
Tumbling window	Like schedule, but windows don’t overlap and catch up on missed runs	Time-partitioned data loads (process yesterday’s data)
Event-based	Fires when something happens — new file in storage, message in Event Hub	Real-time or near-real-time ingestion
On-demand	Manual trigger or API call	Testing, ad-hoc runs, CI/CD-triggered deployments

Schedule for regular jobs, tumbling window for catch-up, event-based for real-time
Feature	Schedule Trigger	Tumbling Window	Event-Based
Runs on	Fixed clock times	Fixed intervals, catches up on missed	External event (file arrival, message)
Overlap possible?	Yes — if previous run hasn't finished	No — windows don't overlap	N/A — each event triggers one run
Backfill?	No — missed runs are skipped	Yes — runs for each missed window	No — only fires on new events
Typical use	Daily refresh at 3 AM	Process data for each hour, catching up after downtime	New file in ADLS triggers ingestion immediately

Exam tip: Tumbling window vs schedule

The exam often presents a scenario where a pipeline missed runs during a capacity outage. The question: “How do you ensure all missed time windows are processed?”

Answer: Tumbling window trigger. Unlike a schedule trigger (which skips missed runs), a tumbling window trigger keeps track of each window and catches up on any that were missed.

Pattern: “Guaranteed processing of every time window” → tumbling window.

Scenario: Anika's event-driven pipeline

ShopStream receives order data as JSON files dropped into Azure Blob Storage by the payment gateway. Anika configures an event-based trigger:

Event: New blob created in orders/incoming/ container
Action: Pipeline starts → Copy activity moves the file to the lakehouse → Notebook parses JSON, validates, and appends to the orders Delta table

Orders appear in the analytics dashboard within 5 minutes of payment. No scheduled polling — the pipeline runs only when there’s work to do.

Question

When should you use a Dataflow Gen2 instead of a notebook?

Click or press Enter to reveal answer

Answer

When the transformation is simple (cleaning, filtering, renaming, basic joins), the dataset is small-to-medium, and you want a visual, no-code experience. Notebooks are for complex logic, large-scale data, or when you need code libraries (ML, custom functions).

Click to flip back

Question

What is the difference between a schedule trigger and a tumbling window trigger?

Click or press Enter to reveal answer

Answer

Schedule: runs at fixed times, skips missed runs. Tumbling window: runs at fixed intervals, catches up on any missed windows (guaranteed processing of every time period). Use tumbling window when you need backfill capability.

Click to flip back

Question

What is an event-based trigger in Fabric?

Click or press Enter to reveal answer

Answer

A trigger that fires when an external event occurs — typically a new file arriving in Azure Blob Storage or a message in an Event Hub. The pipeline runs automatically in response to the event, without polling or scheduling.

Click to flip back

Question

Can a pipeline contain a dataflow AND a notebook?

Click or press Enter to reveal answer

Answer

Yes. Pipelines are orchestrators — they can contain Dataflow Gen2 activities, Notebook activities, Copy activities, Stored Procedure activities, and more. A typical pattern: Copy → Dataflow (clean) → Notebook (transform) → Semantic model refresh.

Click to flip back

Knowledge Check

A data engineer needs to load 800 million rows from three Delta tables, calculate rolling 7-day averages, and write results to a warehouse. Which tool should they use?

Knowledge Check

Carlos's pipeline runs on a daily schedule at 3 AM. Over the weekend, the Fabric capacity was paused for maintenance, and Saturday and Sunday runs were missed. On Monday, the pipeline runs once. How many days of data were processed?

Next up: Pipeline Patterns: Parameters & Expressions — make your orchestration reusable with dynamic expressions and parameterised pipelines.