Lakeflow Jobs: Create & Configure
Create Lakeflow Jobs, configure task graphs, set up triggers, and automate your data pipelines β the operational backbone of production Databricks.
What are Lakeflow Jobs?
A Lakeflow Job is an autopilot for your data pipeline.
Instead of manually running notebooks, you configure a job: which notebooks to run, in what order, when to trigger, what compute to use, and what to do if something fails. Then you set it and forget it.
Creating a job
A job consists of tasks arranged in a dependency graph:
| Task Component | Description |
|---|---|
| Task name | Descriptive identifier |
| Task type | Notebook, Pipeline, Python script, SQL, JAR, dbt |
| Compute | Job cluster (recommended), existing cluster, or serverless |
| Dependencies | Which tasks must complete first |
| Parameters | Key-value pairs passed to the task |
Task types
| Type | Use Case |
|---|---|
| Notebook task | Run a Databricks notebook |
| Pipeline task | Trigger a Declarative Pipeline update |
| Python script task | Run a standalone Python file |
| SQL task | Execute a SQL query or stored procedure |
| If/else condition | Branching logic based on task results |
| For each | Loop over a list of items |
Multi-task job example
Raviβs nightly ETL at DataPulse:
βββββββββββββββ βββββββββββββββ ββββββββββββββββ
β ingest_crm ββββββΆβ clean_data ββββββΆβ build_reports β
βββββββββββββββ βββββββββββββββ ββββββββββββββββ
β β²
β βββββββββββββββ β
βββββββββββββΆβ ingest_pos ββββββββββββββ
βββββββββββββββ
Both ingestion tasks run in parallel. clean_data waits for ingest_crm. build_reports waits for both clean_data and ingest_pos.
Job triggers
| Trigger Type | When It Fires | Use Case |
|---|---|---|
| Scheduled | Cron expression (e.g., β0 3 * * *β = 3 AM daily) | Regular ETL runs |
| File arrival | New files appear in a storage path | Event-driven ingestion |
| Table update | A Delta table receives new data | Downstream pipeline chaining |
| Continuous | Runs perpetually, restarting after each completion | Always-on streaming |
| Manual | Triggered by API call or UI click | Ad-hoc runs, testing |
# Cron examples
0 3 * * * β Every day at 3 AM UTC
0 */2 * * * β Every 2 hours
0 8 * * 1-5 β Weekdays at 8 AM
File arrival triggers
File arrival triggers fire when new files appear in a specified storage path:
- Monitors an ADLS Gen2 path for new files
- Triggers within minutes of file creation
- Can filter by file pattern (e.g.,
*.csv) - Ideal for event-driven architectures
Mei Lin uses file arrival triggers at Freshmart β when suppliers upload CSV files to the landing zone, the ingestion job starts automatically.
π¬ Video coming soon
Knowledge check
Mei Lin wants Freshmart's ingestion pipeline to start automatically whenever suppliers upload new CSV files to the ADLS landing zone. Which trigger type should she configure?
Next up: Lakeflow Jobs: Schedule, Alerts & Recovery β scheduling, alerting, and automatic restart configuration.