πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-750 Domain 4
Domain 4 β€” Module 2 of 8 25%
22 of 28 overall

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor
Domain 4: Deploy and Maintain Data Pipelines and Workloads Premium ⏱ ~12 min read

Lakeflow Jobs: Create & Configure

Create Lakeflow Jobs, configure task graphs, set up triggers, and automate your data pipelines β€” the operational backbone of production Databricks.

What are Lakeflow Jobs?

β˜• Simple explanation

A Lakeflow Job is an autopilot for your data pipeline.

Instead of manually running notebooks, you configure a job: which notebooks to run, in what order, when to trigger, what compute to use, and what to do if something fails. Then you set it and forget it.

Lakeflow Jobs (formerly Databricks Workflows/Jobs) orchestrate tasks β€” notebooks, Declarative Pipelines, Python scripts, SQL queries, or JAR files β€” in a directed acyclic graph (DAG). Jobs support scheduling, event triggers, parameterisation, automatic retry, alerting, and cluster management.

Creating a job

A job consists of tasks arranged in a dependency graph:

Task ComponentDescription
Task nameDescriptive identifier
Task typeNotebook, Pipeline, Python script, SQL, JAR, dbt
ComputeJob cluster (recommended), existing cluster, or serverless
DependenciesWhich tasks must complete first
ParametersKey-value pairs passed to the task

Task types

TypeUse Case
Notebook taskRun a Databricks notebook
Pipeline taskTrigger a Declarative Pipeline update
Python script taskRun a standalone Python file
SQL taskExecute a SQL query or stored procedure
If/else conditionBranching logic based on task results
For eachLoop over a list of items

Multi-task job example

Ravi’s nightly ETL at DataPulse:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ingest_crm  │────▢│ clean_data  │────▢│ build_reports β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                       β–²
       β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
       └───────────▢│ ingest_pos  β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Both ingestion tasks run in parallel. clean_data waits for ingest_crm. build_reports waits for both clean_data and ingest_pos.

Job triggers

Trigger TypeWhen It FiresUse Case
ScheduledCron expression (e.g., β€œ0 3 * * *” = 3 AM daily)Regular ETL runs
File arrivalNew files appear in a storage pathEvent-driven ingestion
Table updateA Delta table receives new dataDownstream pipeline chaining
ContinuousRuns perpetually, restarting after each completionAlways-on streaming
ManualTriggered by API call or UI clickAd-hoc runs, testing
# Cron examples
0 3 * * *     β†’ Every day at 3 AM UTC
0 */2 * * *   β†’ Every 2 hours
0 8 * * 1-5   β†’ Weekdays at 8 AM
ℹ️ File arrival triggers

File arrival triggers fire when new files appear in a specified storage path:

  • Monitors an ADLS Gen2 path for new files
  • Triggers within minutes of file creation
  • Can filter by file pattern (e.g., *.csv)
  • Ideal for event-driven architectures

Mei Lin uses file arrival triggers at Freshmart β€” when suppliers upload CSV files to the landing zone, the ingestion job starts automatically.

Question

What are the five trigger types for Lakeflow Jobs?

Click or press Enter to reveal answer

Answer

Scheduled (cron), File arrival (new files in storage), Table update (Delta table change), Continuous (perpetual restart), Manual (API/UI). Choose based on the event that should start the pipeline.

Click to flip back

Question

What compute should you use for job tasks?

Click or press Enter to reveal answer

Answer

Job clusters (recommended) β€” created per run, auto-terminate on completion, cost-efficient. Avoid using all-purpose clusters for jobs (wastes money between runs).

Click to flip back

Question

How do task dependencies work in multi-task jobs?

Click or press Enter to reveal answer

Answer

Tasks form a DAG (directed acyclic graph). Dependent tasks wait for their upstream tasks to complete. Multiple tasks with no dependencies between them run in parallel.

Click to flip back

🎬 Video coming soon

Knowledge check

Knowledge Check

Mei Lin wants Freshmart's ingestion pipeline to start automatically whenever suppliers upload new CSV files to the ADLS landing zone. Which trigger type should she configure?


Next up: Lakeflow Jobs: Schedule, Alerts & Recovery β€” scheduling, alerting, and automatic restart configuration.

← Previous

Building Data Pipelines

Next β†’

Lakeflow Jobs: Schedule, Alerts & Recovery

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.