πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-600 Domain 2
Domain 2 β€” Module 7 of 14 50%
14 of 29 overall

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh

DP-600 Study Guide

Domain 1: Maintain a Data Analytics Solution

  • Workspace Access Controls
  • Row-Level & Object-Level Security
  • Sensitivity Labels & Endorsement
  • Git Version Control in Fabric
  • Deployment Pipelines: Dev β†’ Test β†’ Prod
  • Impact Analysis & Dependencies
  • XMLA Endpoint & Reusable Assets

Domain 2: Prepare Data

  • Microsoft Fabric: The Big Picture Free
  • Lakehouses: Your Data Foundation Free
  • Warehouses in Fabric Free
  • Choosing the Right Data Store Free
  • Data Connections & OneLake Catalog
  • Shortcuts & OneLake Integration
  • Ingesting Data: Dataflows Gen2 & Pipelines
  • Star Schema Design Free
  • SQL Objects: Views, Functions & Stored Procedures
  • Transforming Data: Reshape & Enrich
  • Data Quality & Cleansing
  • Querying with SQL
  • Querying with KQL
  • Querying with DAX

Domain 3: Implement and Manage Semantic Models

  • Semantic Models: Storage Modes
  • Relationships & Advanced Modeling
  • DAX Essentials: Variables & Functions
  • Calculation Groups & Field Parameters
  • Large Models & Composite Models
  • Direct Lake Mode
  • DAX Performance Optimization
  • Incremental Refresh
Domain 2: Prepare Data Premium ⏱ ~14 min read

Ingesting Data: Dataflows Gen2 & Pipelines

Get data into Fabric β€” no-code Dataflows Gen2, orchestration pipelines, COPY INTO, and notebooks. Match the right ingestion tool to the right scenario.

How does data get into Fabric?

β˜• Simple explanation

Think of a postal service with different delivery options.

Need to send a postcard? Drop it in the letterbox (COPY INTO β€” fast, simple, direct). Need to send 50 different parcels to 50 addresses on a schedule? Use a courier service (Pipeline β€” orchestrates multiple steps). Need to sort, repackage, and relabel items before delivery? Use a fulfilment centre (Dataflow Gen2 β€” transforms along the way). Need a custom, one-off delivery with complex routing? Hire a specialist (Notebook β€” full code control).

Fabric gives you all four options. The exam tests your ability to pick the right one for each scenario.

Microsoft Fabric provides multiple ingestion paths, each optimised for different scenarios:

  • Dataflows Gen2 β€” low-code/no-code data transformation and ingestion using Power Query Online. Best for business users and analysts who need to connect, clean, and load data without writing code.
  • Pipelines β€” orchestration workflows that coordinate multiple activities (copy data, run notebooks, trigger dataflows). Based on Azure Data Factory. Best for complex, multi-step ETL/ELT workflows.
  • Notebooks β€” PySpark/Scala/R code for data ingestion with full programmatic control. Best for complex transformations, API calls, and custom logic.
  • COPY INTO β€” a T-SQL command that bulk-loads files from OneLake or external storage into a warehouse table. Best for high-performance batch loading of structured data.

Comparing ingestion tools

Different tools for different teams and scenarios
ToolCode RequiredBest ForTarget
Dataflows Gen2No code (Power Query GUI)Connect β†’ clean β†’ load with visual transformationsLakehouse, Warehouse, or other Fabric items
PipelineLow code (drag-and-drop activities)Orchestrate multiple steps: copy, transform, load, scheduleAny Fabric item (coordinates other tools)
NotebookFull code (PySpark/Scala/R)Complex transformations, API calls, custom logic, ML prepLakehouse (Delta tables)
COPY INTOT-SQL commandBulk-load Parquet/CSV files into warehouse tablesWarehouse only

Dataflows Gen2

Dataflows Gen2 is the no-code ingestion tool in Fabric. Built on Power Query Online (the same engine as Power BI Desktop’s Get Data experience).

Key capabilities

  • 350+ connectors β€” databases, SaaS apps, files, APIs
  • Visual transformations β€” filter, merge, pivot, unpivot, split columns, add custom columns
  • Scheduling β€” run on a schedule or trigger from a pipeline
  • Staging β€” data is staged in OneLake before loading to the destination
  • Destinations β€” load directly into lakehouses, warehouses, or KQL databases

When to choose Dataflows Gen2

  • Your team prefers no-code/low-code tools
  • The transformation is simple to moderate (cleaning, type conversion, merge/join)
  • You need to connect to a SaaS source (Salesforce, Dynamics, Google Sheets) that does not have a native Fabric connector
  • You want a repeatable, scheduled data load without writing pipelines
πŸ’‘ Scenario: Dr. Sarah's patient data feed

Dr. Sarah at Pacific Health receives daily patient survey results from a third-party SaaS tool. She creates a Dataflow Gen2 that:

  1. Connects to the SaaS API (using a web connector)
  2. Flattens the nested JSON responses into a tabular format
  3. Renames columns to match her lakehouse schema
  4. Filters out test records
  5. Loads the clean data into the silver_patient_surveys table in her lakehouse

Total setup time: 30 minutes. No code written. Runs daily at 6 AM.

Pipelines

Pipelines are the orchestration backbone of Fabric β€” based on Azure Data Factory. They coordinate multiple activities into a single workflow.

Common pipeline activities

ActivityWhat It Does
Copy activityMoves data from source to destination (the most common activity)
Dataflow activityRuns a Dataflow Gen2 as a step in the pipeline
Notebook activityRuns a Spark notebook
Stored procedureExecutes a warehouse stored procedure
For Each / If ConditionControl flow β€” loops and branching
Web activityCalls a REST API
WaitPauses for a specified duration

When to choose Pipelines

  • You need to orchestrate multiple steps (copy β†’ transform β†’ load β†’ notify)
  • You need scheduling with retry logic and error handling
  • You need to parameterise workflows (same pipeline, different source/target per run)
  • You need to coordinate notebooks, dataflows, and stored procedures in sequence
πŸ’‘ Scenario: Anita's nightly ingestion pipeline

Anita at FreshCart builds a pipeline that runs every night at midnight:

  1. Copy activity: Copy CSV files from Azure Blob Storage (2,000 stores) into lakehouse Bronze tables
  2. Notebook activity: Run a PySpark notebook that deduplicates, validates, and loads Silver tables
  3. Stored procedure activity: Call a warehouse stored procedure to rebuild Gold-layer aggregates
  4. Web activity: Send a Slack notification when the pipeline completes

If Step 2 fails, the pipeline retries 3 times. If all retries fail, it sends an alert to the on-call team.

Notebooks

Spark notebooks give you full code control over data ingestion and transformation.

When to choose Notebooks

  • Complex transformations β€” data quality rules, custom parsing, API pagination
  • Semi-structured data β€” JSON, XML, nested structures that need flattening
  • Machine learning prep β€” feature engineering, data sampling
  • Exploratory work β€” ad-hoc investigation before building production pipelines

Common ingestion patterns in notebooks

# Read CSV files from OneLake into a Spark DataFrame
df = spark.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("Files/raw/pos_transactions/*.csv")

# Basic transformations
from pyspark.sql.functions import col, to_date, when

df_clean = df \
    .filter(col("amount") > 0) \
    .withColumn("transaction_date", to_date(col("date_string"), "yyyy-MM-dd")) \
    .withColumn("category", when(col("dept_code") == "GR", "Grocery")
                            .when(col("dept_code") == "HW", "Hardware")
                            .otherwise("Other"))

# Write to a Delta table in the lakehouse
df_clean.write.format("delta") \
    .mode("append") \
    .saveAsTable("silver_transactions")

COPY INTO

The fastest way to bulk-load data into a Fabric Warehouse:

COPY INTO dbo.daily_positions
FROM 'https://onelake.dfs.fabric.microsoft.com/workspace/lakehouse.Lakehouse/Files/exports/positions.parquet'
WITH (
    FILE_TYPE = 'PARQUET',
    CREDENTIAL = (IDENTITY = 'Shared Access Signature', SECRET = '<sas-token>')
)

When to choose COPY INTO

  • Loading Parquet or CSV files directly into warehouse tables
  • You need high throughput for large batch loads
  • Your workflow is SQL-centric (no Spark, no Power Query)
Question

What is a Dataflow Gen2 in Fabric?

Click or press Enter to reveal answer

Answer

A no-code/low-code data transformation and ingestion tool built on Power Query Online. It connects to 350+ sources, applies visual transformations (filter, merge, pivot), and loads data into lakehouses, warehouses, or KQL databases. Best for business users who prefer a GUI over code.

Click to flip back

Question

What is the primary purpose of a Fabric Pipeline?

Click or press Enter to reveal answer

Answer

Orchestration β€” coordinating multiple data activities (copy, notebook, dataflow, stored procedure) into a single, scheduled workflow with retry logic and error handling. Pipelines are based on Azure Data Factory.

Click to flip back

Question

What does the COPY INTO command do?

Click or press Enter to reveal answer

Answer

COPY INTO is a T-SQL command that bulk-loads data from files (Parquet, CSV) in OneLake or external storage into a Fabric Warehouse table. It is the fastest method for high-volume batch loading into a warehouse.

Click to flip back

Knowledge Check

Dr. Sarah at Pacific Health needs to ingest daily patient survey data from a third-party SaaS tool. The data needs light cleaning (rename columns, filter test records, convert types). Her team has no coding skills. Which tool should she use?

Knowledge Check

Anita at FreshCart needs to orchestrate a nightly workflow: (1) copy CSV files from Azure Blob Storage, (2) run a PySpark notebook for transformations, (3) call a warehouse stored procedure for aggregations, (4) send a notification. Which tool coordinates these four steps?

🎬 Video coming soon


Next up: Star Schema Design β€” the data modelling pattern that underpins every high-performance lakehouse and warehouse.

← Previous

Shortcuts & OneLake Integration

Next β†’

Star Schema Design

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.