🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-750 Domain 4
Domain 4 — Module 1 of 8 13%
21 of 28 overall

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor
Domain 4: Deploy and Maintain Data Pipelines and Workloads Free ⏱ ~14 min read

Building Data Pipelines

Design order of operations, choose between notebook and Declarative Pipelines, implement task logic, error handling, and build production pipelines.

Designing pipelines

☕ Simple explanation

A data pipeline is an assembly line for data.

Raw materials arrive (ingestion), get cleaned (bronze → silver), assembled into products (silver → gold), and shipped (to dashboards). You design the order of stations, decide which machines to use (notebooks or Declarative Pipelines), and plan what happens when something breaks (error handling).

Pipeline design involves task ordering (dependency graphs), tool selection (notebooks vs Declarative Pipelines), task logic (what each step does), and error handling (retry, alert, fail-fast). The exam tests your ability to design production-grade pipelines that are reliable, maintainable, and cost-efficient.

Notebook vs Declarative Pipelines

AspectNotebook PipelineDeclarative Pipeline
ApproachImperative — you code each stepDeclarative — you define desired state
Dependency managementManual (Lakeflow Jobs task graph)Automatic (inferred from LIVE references)
Error handlingtry/except in codeBuilt-in retry + expectations
Data qualityCustom validation codeBuilt-in expectations (EXPECT)
ComputeAny cluster typeDedicated pipeline compute (serverless default)
MonitoringSpark UI + custom loggingPipeline event log + metrics dashboard
Best forComplex logic, ML integration, custom APIsStandard medallion ETL
Exam preferenceWhen 'custom logic' is mentionedWhen 'managed' or 'automated quality' is mentioned

Notebook pipeline with Lakeflow Jobs

A notebook pipeline chains multiple notebooks as tasks in a Lakeflow Job:

Task 1: ingest_raw       →  Task 2: clean_validate  →  Task 3: build_gold
(bronze layer)                (silver layer)              (gold layer)
                                                            ↘
                                                   Task 4: update_dashboard

Precedence constraints

Tasks can have dependencies: Task 3 only runs if Tasks 1 and 2 succeed.

# In a notebook: signal success or failure for downstream tasks
dbutils.notebook.exit("SUCCESS")  # signals success to the job

# Or raise an error to fail the task
if error_count > threshold:
    raise Exception(f"Data quality failed: {error_count} errors exceeded threshold")

Error handling in notebooks

try:
    # Main processing logic
    df = spark.read.table("bronze.raw_orders")
    clean_df = transform_orders(df)
    clean_df.write.mode("append").saveAsTable("silver.orders")

except Exception as e:
    # Log the error
    print(f"Pipeline failed: {str(e)}")
    # Optionally write to an error table
    spark.sql(f"INSERT INTO pipeline_errors VALUES (CURRENT_TIMESTAMP(), '{str(e)}')")
    # Re-raise to fail the job task
    raise

Declarative Pipeline implementation

-- Complete medallion pipeline in Declarative Pipelines

-- Bronze: Auto Loader ingestion
CREATE OR REFRESH STREAMING TABLE bronze_orders
AS SELECT * FROM cloud_files(
  'abfss://landing@storage.dfs.core.windows.net/orders/',
  'json'
);

-- Silver: cleaned with quality expectations
CREATE OR REFRESH STREAMING TABLE silver_orders (
  CONSTRAINT valid_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT positive_amount EXPECT (amount > 0) ON VIOLATION DROP ROW
)
AS SELECT
  CAST(order_id AS BIGINT) AS order_id,
  customer_id,
  CAST(amount AS DECIMAL(10,2)) AS amount,
  TO_DATE(order_date) AS order_date
FROM STREAM(LIVE.bronze_orders);

-- Gold: materialized view for dashboards
CREATE OR REFRESH MATERIALIZED VIEW gold_daily_summary
AS SELECT
  order_date,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue
FROM LIVE.silver_orders
GROUP BY order_date;

The pipeline engine automatically:

  • Determines execution order from LIVE. references
  • Handles incremental processing (streaming tables)
  • Enforces expectations and logs quality metrics
  • Retries on transient failures
💡 Exam decision tree: notebook or declarative?
  1. Standard bronze → silver → gold ETL? → Declarative Pipeline
  2. Need custom Python ML models in the pipeline? → Notebook
  3. Need built-in data quality metrics? → Declarative Pipeline
  4. Need to call external APIs during processing? → Notebook
  5. Want automatic dependency management? → Declarative Pipeline
  6. Need complex control flow (if/else branching)? → Notebook
Question

When should you use a notebook pipeline vs a Declarative Pipeline?

Click or press Enter to reveal answer

Answer

Notebook: complex logic, ML integration, external APIs, branching control flow. Declarative: standard medallion ETL, automatic dependencies, built-in quality expectations, managed compute.

Click to flip back

Question

How do Declarative Pipelines handle dependency ordering?

Click or press Enter to reveal answer

Answer

Automatically — the engine infers dependencies from LIVE. references. If silver_orders reads from LIVE.bronze_orders, the engine runs bronze first. No manual dependency graph needed.

Click to flip back

Question

How should you handle errors in notebook pipelines?

Click or press Enter to reveal answer

Answer

Use try/except blocks, log errors to an error table, and re-raise the exception to fail the job task. Use dbutils.notebook.exit('SUCCESS') to signal success for downstream task dependencies.

Click to flip back

🎬 Video coming soon

Knowledge check

Knowledge Check

Dr. Sarah Okafor needs to build a standard bronze → silver → gold ETL pipeline for Athena Group. The pipeline should have built-in data quality checks and automatic dependency management. Which approach should she use?


Next up: Lakeflow Jobs: Create & Configure — creating, configuring, and triggering Lakeflow Jobs.

← Previous

Data Quality & Schema Enforcement

Next →

Lakeflow Jobs: Create & Configure

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.