πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-750 Domain 3
Domain 3 β€” Module 4 of 10 40%
14 of 28 overall

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor
Domain 3: Prepare and Process Data Premium ⏱ ~13 min read

Ingesting Data: Lakeflow Connect & Notebooks

Ingest data using Lakeflow Connect's pre-built connectors and custom notebook code β€” batch and streaming patterns for getting data into your lakehouse.

Two paths to ingestion

β˜• Simple explanation

Lakeflow Connect is like a pre-built plumbing kit. Notebooks are like custom plumbing you build yourself.

Lakeflow Connect: pick your source (Salesforce, SAP, databases), configure connection details, and data flows automatically. No coding β€” just configuration.

Notebooks: write Python or SQL code to read data, transform it, and write it to your lakehouse. Full control, but you build and maintain everything.

Lakeflow Connect provides managed, low-code connectors for SaaS platforms and databases. It handles schema discovery, incremental extraction, CDC, and both batch and streaming modes. Notebooks give you full programmatic control using PySpark, Spark SQL, or Scala for custom ingestion logic.

Lakeflow Connect

Batch ingestion with Lakeflow Connect

Ravi uses Lakeflow Connect to ingest CRM data from Salesforce:

  1. Create a connection β€” specify the source system and credentials
  2. Configure ingestion β€” select tables, choose full or incremental sync
  3. Schedule β€” set the refresh cadence (hourly, daily)
  4. Monitor β€” track ingestion status in the Lakeflow dashboard

Lakeflow Connect automatically handles schema mapping, type conversion, and incremental extraction using watermark columns or change tracking.

Streaming ingestion with Lakeflow Connect

For sources that support change streams (databases with CDC enabled), Lakeflow Connect can stream changes continuously:

  • Source sends changes β†’ Lakeflow Connect reads the change stream
  • Writes to Delta table in near-real-time
  • Handles schema evolution β€” new columns in the source are automatically added

Notebook-based ingestion

Batch ingestion with notebooks

# Read CSV files from ADLS landing zone
raw_df = (spark.read
    .format("csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("abfss://landing@storage.dfs.core.windows.net/sales/*.csv"))

# Write to Delta table
(raw_df.write
    .format("delta")
    .mode("append")
    .saveAsTable("bronze.raw_sales"))

Streaming ingestion with notebooks

# Read streaming data from a source
stream_df = (spark.readStream
    .format("delta")
    .table("bronze.raw_transactions"))

# Transform and write as a streaming query
(stream_df
    .filter("amount > 0")
    .writeStream
    .format("delta")
    .outputMode("append")
    .option("checkpointLocation", "/checkpoints/silver_txn")
    .toTable("silver.valid_transactions"))

Key concept: Streaming uses readStream and writeStream instead of read and write. The checkpoint location tracks what data has been processed.

FeatureLakeflow ConnectNotebooks
Setup effortLow (configuration)High (code + testing)
Custom logicLimitedUnlimited
Error handlingBuilt-in retriesYou implement
Schema evolutionAutomaticManual or with mergeSchema
MonitoringLakeflow dashboardSpark UI + custom logging
Best forStandard sources, quick setupComplex transforms, custom sources
πŸ’‘ Exam tip: Schema evolution in notebook ingestion

When source schemas change (new columns added), notebook ingestion can fail. Enable schema evolution:

# Allow new columns to be added automatically
df.write.option("mergeSchema", "true").mode("append").saveAsTable("my_table")

Or set it at the table level:

ALTER TABLE my_table SET TBLPROPERTIES ('delta.autoOptimize.autoCompact' = 'true');

Exam tip: If the question mentions β€œsource schema changes” or β€œnew columns added” β€” mergeSchema is the answer.

Question

What is the difference between Lakeflow Connect and notebook-based ingestion?

Click or press Enter to reveal answer

Answer

Lakeflow Connect: low-code, pre-built connectors, automatic schema handling. Notebooks: full code control, unlimited custom logic, but you build and maintain everything.

Click to flip back

Question

What makes streaming different from batch in notebook code?

Click or press Enter to reveal answer

Answer

Streaming uses readStream/writeStream (not read/write), requires a checkpoint location for tracking processed data, and runs continuously or with trigger intervals.

Click to flip back

Question

How do you handle schema evolution during notebook ingestion?

Click or press Enter to reveal answer

Answer

Use .option('mergeSchema', 'true') on the write operation. This allows new columns from the source to be automatically added to the target Delta table.

Click to flip back

🎬 Video coming soon

Knowledge check

Knowledge Check

Mei Lin is ingesting data from 15 different Freshmart suppliers. Each supplier sends daily CSV files to an ADLS landing zone. Some suppliers occasionally add new columns. What ingestion approach handles this best?


Next up: Ingesting Data: SQL Methods & CDC β€” CTAS, CREATE OR REPLACE, COPY INTO, and change data capture feeds.

← Previous

Partitioning, Clustering & Table Optimization

Next β†’

Ingesting Data: SQL Methods & CDC

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.