🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-750 Domain 4
Domain 4 — Module 5 of 8 63%
25 of 28 overall

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor
Domain 4: Deploy and Maintain Data Pipelines and Workloads Premium ⏱ ~14 min read

Testing & Databricks Asset Bundles

Implement unit tests, integration tests, and end-to-end testing strategies. Package and deploy with Databricks Asset Bundles via CLI and REST APIs.

Testing strategy

☕ Simple explanation

Testing is taste-testing your food at every stage of cooking.

Unit test: taste each ingredient individually. Integration test: taste the combined sauce. End-to-end test: taste the full dish. UAT: have a customer taste it before putting it on the menu.

Without testing, you serve bad data and only find out when the CEO’s dashboard is wrong.

A data engineering testing strategy includes unit tests (individual functions), integration tests (connected components), end-to-end tests (full pipeline), and UAT (business stakeholder validation). Databricks Asset Bundles (DABs) package notebooks, jobs, pipelines, and configuration into deployable units with CI/CD support.

Testing layers

Test TypeWhat It TestsHowWhen
Unit testIndividual functions/transformspytest with mock dataEvery commit
Integration testComponents working togetherTest tables in dev workspaceEvery PR merge
End-to-end testFull pipeline bronze → goldRun pipeline on test data in stagingBefore production deploy
UATBusiness rules and output qualityStakeholders validate sample outputBefore production release

Unit testing example

# test_transforms.py
from transforms import clean_amount, validate_date

def test_clean_amount_removes_negatives():
    assert clean_amount(-50) is None
    assert clean_amount(100) == 100.0

def test_validate_date_rejects_future():
    assert validate_date("2099-01-01") is False
    assert validate_date("2026-04-01") is True

Integration testing

# Run in a dev workspace with test data
test_df = spark.createDataFrame([
    (1, "Alice", 100.0, "2026-04-01"),
    (2, None, -50.0, "2099-01-01"),  # should be filtered out
], ["id", "name", "amount", "date"])

result = run_silver_pipeline(test_df)
assert result.count() == 1  # only valid row
assert result.filter("name = 'Alice'").count() == 1

Databricks Asset Bundles (DABs)

Asset Bundles package your entire project into a deployable unit:

# databricks.yml — bundle configuration
bundle:
  name: freshmart-etl

workspace:
  host: https://adb-1234567890.1.azuredatabricks.net

resources:
  jobs:
    nightly_etl:
      name: "Freshmart Nightly ETL"
      tasks:
        - task_key: ingest
          notebook_task:
            notebook_path: ./notebooks/01_ingest.py
          job_cluster_key: etl_cluster
        - task_key: transform
          depends_on:
            - task_key: ingest
          notebook_task:
            notebook_path: ./notebooks/02_transform.py
          job_cluster_key: etl_cluster

  pipelines:
    quality_pipeline:
      name: "Freshmart Quality Pipeline"
      target: freshmart_silver
      libraries:
        - notebook:
            path: ./pipelines/quality_checks.sql

environments:
  dev:
    workspace:
      host: https://adb-dev.azuredatabricks.net
  prod:
    workspace:
      host: https://adb-prod.azuredatabricks.net

Deploy via CLI

# Validate the bundle
databricks bundle validate

# Deploy to dev environment
databricks bundle deploy --target dev

# Run a specific job
databricks bundle run nightly_etl --target dev

# Deploy to production
databricks bundle deploy --target prod

Deploy via REST API

import requests

# Deploy using the Databricks REST API
response = requests.post(
    f"{workspace_url}/api/2.1/jobs/create",
    headers={"Authorization": f"Bearer {token}"},
    json=job_config
)
ℹ️ CI/CD with Asset Bundles

A typical CI/CD pipeline:

  1. Developer pushes code to feature branch
  2. CI pipeline (GitHub Actions/Azure DevOps) runs:
    • databricks bundle validate — check config syntax
    • pytest — run unit tests
    • databricks bundle deploy --target dev — deploy to dev
    • Integration tests in dev workspace
  3. PR merged → deploy to staging, run E2E tests
  4. Release → databricks bundle deploy --target prod
Question

What are the four testing levels for data engineering?

Click or press Enter to reveal answer

Answer

Unit tests (individual functions, every commit), integration tests (connected components, every PR), end-to-end tests (full pipeline, before deploy), UAT (business validation, before release).

Click to flip back

Question

What are Databricks Asset Bundles?

Click or press Enter to reveal answer

Answer

DABs package notebooks, jobs, pipelines, and configuration into a deployable unit defined in databricks.yml. Deploy via CLI (databricks bundle deploy) or REST API. Supports multiple environments (dev/staging/prod).

Click to flip back

Question

How do you deploy a bundle to different environments?

Click or press Enter to reveal answer

Answer

Define environments in databricks.yml with different workspace hosts. Deploy with: databricks bundle deploy --target dev (or staging/prod). Each target has its own workspace configuration.

Click to flip back

🎬 Video coming soon

Knowledge check

Knowledge Check

Dr. Sarah Okafor needs to deploy Athena Group's ETL pipeline to three environments (dev, staging, prod) with the same code but different workspace URLs. Which tool should she use?


Next up: Monitoring Clusters & Troubleshooting — cluster monitoring, job repair, and Spark troubleshooting.

← Previous

Git & Version Control

Next →

Monitoring Clusters & Troubleshooting

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.