πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-750 Domain 4
Domain 4 β€” Module 4 of 8 50%
24 of 28 overall

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor

DP-750 Study Guide

Domain 1: Set Up and Configure an Azure Databricks Environment

  • Azure Databricks: Your Lakehouse Platform Free
  • Choosing the Right Compute Free
  • Configuring Compute for Performance Free
  • Unity Catalog: The Three-Level Namespace Free
  • Tables, Views & External Catalogs Free

Domain 2: Secure and Govern Unity Catalog Objects

  • Securing Unity Catalog: Who Gets What
  • Secrets & Authentication
  • Data Discovery & Attribute-Based Access
  • Row Filters, Column Masks & Retention
  • Lineage, Audit Logs & Delta Sharing

Domain 3: Prepare and Process Data

  • Data Modeling: Ingestion Design Free
  • SCD, Granularity & Temporal Tables
  • Partitioning, Clustering & Table Optimization
  • Ingesting Data: Lakeflow Connect & Notebooks
  • Ingesting Data: SQL Methods & CDC
  • Streaming Ingestion: Structured Streaming & Event Hubs
  • Auto Loader & Declarative Pipelines
  • Cleansing & Profiling Data Free
  • Transforming & Loading Data
  • Data Quality & Schema Enforcement

Domain 4: Deploy and Maintain Data Pipelines and Workloads

  • Building Data Pipelines Free
  • Lakeflow Jobs: Create & Configure
  • Lakeflow Jobs: Schedule, Alerts & Recovery
  • Git & Version Control
  • Testing & Databricks Asset Bundles
  • Monitoring Clusters & Troubleshooting
  • Spark Performance: DAG & Query Profile
  • Optimizing Delta Tables & Azure Monitor
Domain 4: Deploy and Maintain Data Pipelines and Workloads Premium ⏱ ~12 min read

Git & Version Control

Apply Git best practices in Databricks β€” branching strategies, pull requests, conflict resolution, and notebook version control.

Git in Databricks

β˜• Simple explanation

Git is the β€œsave game” system for your code.

Every change is tracked. You can go back to any previous version. Multiple people can work on different features without stepping on each other’s work. When ready, changes are reviewed (pull request) and merged into the main version.

Databricks supports Git integration through Repos (Git folders). Notebooks, Python files, SQL files, and configuration can be version-controlled with any Git provider (GitHub, Azure DevOps, GitLab, Bitbucket). The exam tests branching strategies, PR workflows, and conflict resolution patterns.

Branching strategy

BranchPurposeWho Uses It
mainProduction-ready codeDeployments read from here
developIntegration branch for featuresTeam merges features here
feature/xxxIndividual feature workOne developer per branch
hotfix/xxxEmergency production fixesUrgent patches
main ─────────────────────────────▢
  ↑                    ↑
  β”‚ merge PR          β”‚ merge PR
  β”‚                    β”‚
develop ──────────────────────────▢
  ↑         ↑
  β”‚ merge   β”‚ merge
  β”‚         β”‚
feature/a  feature/b

Best practices for Databricks

  • One branch per feature β€” never develop directly on main
  • Use Git folders (Repos) in the workspace β€” each developer works in their own branch
  • Never commit credentials β€” use Key Vault secret scopes instead
  • Commit frequently with descriptive messages
  • Review code via pull requests before merging to develop/main

Pull requests and code review

A pull request (PR) is a request to merge your branch into another:

  1. Developer pushes changes to feature/new-pipeline
  2. Creates a PR to merge into develop
  3. Team reviews the code (logic, data quality, naming)
  4. Reviewer approves β†’ merge completes
  5. Feature branch is deleted

What to review in data engineering PRs

Review AreaWhat to Check
LogicDoes the transformation produce correct results?
Data qualityAre there expectations/checks for bad data?
SchemaAre column types appropriate?
PerformanceWill this scale with production data volumes?
SecurityNo hardcoded secrets? Proper permissions?

Conflict resolution

Conflicts occur when two developers edit the same file:

Developer A: changes line 15 of pipeline.py
Developer B: also changes line 15 of pipeline.py

Resolution steps:

  1. Pull the latest changes from the target branch
  2. Git marks conflicting sections with <<<<<<< and >>>>>>>
  3. Manually choose which changes to keep
  4. Commit the resolved file
  5. Push and update the PR

Prevention: keep feature branches short-lived and merge frequently.

Question

What branching strategy should you use for Databricks projects?

Click or press Enter to reveal answer

Answer

main (production), develop (integration), feature/xxx (individual features), hotfix/xxx (urgent fixes). One branch per feature, merge via pull requests, never commit directly to main.

Click to flip back

Question

What should you check during a data engineering code review?

Click or press Enter to reveal answer

Answer

Logic correctness, data quality expectations, column types/schema, performance at scale, and security (no hardcoded secrets, proper permissions).

Click to flip back

Question

How do you prevent Git conflicts in a team?

Click or press Enter to reveal answer

Answer

Keep feature branches short-lived, merge frequently, communicate about shared files, and use clear file ownership. Pull latest changes before starting new work.

Click to flip back

🎬 Video coming soon

Knowledge check

Knowledge Check

TomΓ‘s accidentally committed a service principal client secret to a notebook in NovaPay's Git repo. What should he do FIRST?


Next up: Testing & Databricks Asset Bundles β€” testing strategies and modern deployment with Asset Bundles.

← Previous

Lakeflow Jobs: Schedule, Alerts & Recovery

Next β†’

Testing & Databricks Asset Bundles

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.