Git & Version Control
Apply Git best practices in Databricks β branching strategies, pull requests, conflict resolution, and notebook version control.
Git in Databricks
Git is the βsave gameβ system for your code.
Every change is tracked. You can go back to any previous version. Multiple people can work on different features without stepping on each otherβs work. When ready, changes are reviewed (pull request) and merged into the main version.
Branching strategy
| Branch | Purpose | Who Uses It |
|---|---|---|
| main | Production-ready code | Deployments read from here |
| develop | Integration branch for features | Team merges features here |
| feature/xxx | Individual feature work | One developer per branch |
| hotfix/xxx | Emergency production fixes | Urgent patches |
main ββββββββββββββββββββββββββββββΆ
β β
β merge PR β merge PR
β β
develop βββββββββββββββββββββββββββΆ
β β
β merge β merge
β β
feature/a feature/b
Best practices for Databricks
- One branch per feature β never develop directly on main
- Use Git folders (Repos) in the workspace β each developer works in their own branch
- Never commit credentials β use Key Vault secret scopes instead
- Commit frequently with descriptive messages
- Review code via pull requests before merging to develop/main
Pull requests and code review
A pull request (PR) is a request to merge your branch into another:
- Developer pushes changes to
feature/new-pipeline - Creates a PR to merge into
develop - Team reviews the code (logic, data quality, naming)
- Reviewer approves β merge completes
- Feature branch is deleted
What to review in data engineering PRs
| Review Area | What to Check |
|---|---|
| Logic | Does the transformation produce correct results? |
| Data quality | Are there expectations/checks for bad data? |
| Schema | Are column types appropriate? |
| Performance | Will this scale with production data volumes? |
| Security | No hardcoded secrets? Proper permissions? |
Conflict resolution
Conflicts occur when two developers edit the same file:
Developer A: changes line 15 of pipeline.py
Developer B: also changes line 15 of pipeline.py
Resolution steps:
- Pull the latest changes from the target branch
- Git marks conflicting sections with
<<<<<<<and>>>>>>> - Manually choose which changes to keep
- Commit the resolved file
- Push and update the PR
Prevention: keep feature branches short-lived and merge frequently.
π¬ Video coming soon
Knowledge check
TomΓ‘s accidentally committed a service principal client secret to a notebook in NovaPay's Git repo. What should he do FIRST?
Next up: Testing & Databricks Asset Bundles β testing strategies and modern deployment with Asset Bundles.