Lakeflow Jobs: Create & Configure

What are Lakeflow Jobs?

Simple explanation

A Lakeflow Job is an autopilot for your data pipeline.

Instead of manually running notebooks, you configure a job: which notebooks to run, in what order, when to trigger, what compute to use, and what to do if something fails. Then you set it and forget it.

Creating a job

A job consists of tasks arranged in a dependency graph:

Task Component	Description
Task name	Descriptive identifier
Task type	Notebook, Pipeline, Python script, SQL, JAR, dbt
Compute	Job cluster (recommended), existing cluster, or serverless
Dependencies	Which tasks must complete first
Parameters	Key-value pairs passed to the task

Task types

Type	Use Case
Notebook task	Run a Databricks notebook
Pipeline task	Trigger a Declarative Pipeline update
Python script task	Run a standalone Python file
SQL task	Execute a SQL query or stored procedure
If/else condition	Branching logic based on task results
For each	Loop over a list of items

Multi-task job example

Ravi’s nightly ETL at DataPulse:

┌─────────────┐     ┌─────────────┐     ┌──────────────┐
│ ingest_crm  │────▶│ clean_data  │────▶│ build_reports │
└─────────────┘     └─────────────┘     └──────────────┘
       │                                       ▲
       │            ┌─────────────┐            │
       └───────────▶│ ingest_pos  │────────────┘
                    └─────────────┘

Both ingestion tasks run in parallel. clean_data waits for ingest_crm. build_reports waits for both clean_data and ingest_pos.

Job triggers

Trigger Type	When It Fires	Use Case
Scheduled	Cron expression (e.g., “0 3 * * *” = 3 AM daily)	Regular ETL runs
File arrival	New files appear in a storage path	Event-driven ingestion
Table update	A Delta table receives new data	Downstream pipeline chaining
Continuous	Runs perpetually, restarting after each completion	Always-on streaming
Manual	Triggered by API call or UI click	Ad-hoc runs, testing

# Cron examples
0 3 * * *     → Every day at 3 AM UTC
0 */2 * * *   → Every 2 hours
0 8 * * 1-5   → Weekdays at 8 AM

File arrival triggers

File arrival triggers fire when new files appear in a specified storage path:

Monitors an ADLS Gen2 path for new files
Triggers within minutes of file creation
Can filter by file pattern (e.g., *.csv)
Ideal for event-driven architectures

Mei Lin uses file arrival triggers at Freshmart — when suppliers upload CSV files to the landing zone, the ingestion job starts automatically.

Question

What are the five trigger types for Lakeflow Jobs?

Click or press Enter to reveal answer

Answer

Scheduled (cron), File arrival (new files in storage), Table update (Delta table change), Continuous (perpetual restart), Manual (API/UI). Choose based on the event that should start the pipeline.

Click to flip back

Question

What compute should you use for job tasks?

Click or press Enter to reveal answer

Answer

Job clusters (recommended) — created per run, auto-terminate on completion, cost-efficient. Avoid using all-purpose clusters for jobs (wastes money between runs).

Click to flip back

Question

How do task dependencies work in multi-task jobs?

Click or press Enter to reveal answer

Answer

Tasks form a DAG (directed acyclic graph). Dependent tasks wait for their upstream tasks to complete. Multiple tasks with no dependencies between them run in parallel.

Click to flip back

Knowledge check

Knowledge Check

Mei Lin wants Freshmart's ingestion pipeline to start automatically whenever suppliers upload new CSV files to the ADLS landing zone. Which trigger type should she configure?

Next up: Lakeflow Jobs: Schedule, Alerts & Recovery — scheduling, alerting, and automatic restart configuration.