Testing & Databricks Asset Bundles
Implement unit tests, integration tests, and end-to-end testing strategies. Package and deploy with Databricks Asset Bundles via CLI and REST APIs.
Testing strategy
Testing is taste-testing your food at every stage of cooking.
Unit test: taste each ingredient individually. Integration test: taste the combined sauce. End-to-end test: taste the full dish. UAT: have a customer taste it before putting it on the menu.
Without testing, you serve bad data and only find out when the CEO’s dashboard is wrong.
Testing layers
| Test Type | What It Tests | How | When |
|---|---|---|---|
| Unit test | Individual functions/transforms | pytest with mock data | Every commit |
| Integration test | Components working together | Test tables in dev workspace | Every PR merge |
| End-to-end test | Full pipeline bronze → gold | Run pipeline on test data in staging | Before production deploy |
| UAT | Business rules and output quality | Stakeholders validate sample output | Before production release |
Unit testing example
# test_transforms.py
from transforms import clean_amount, validate_date
def test_clean_amount_removes_negatives():
assert clean_amount(-50) is None
assert clean_amount(100) == 100.0
def test_validate_date_rejects_future():
assert validate_date("2099-01-01") is False
assert validate_date("2026-04-01") is True
Integration testing
# Run in a dev workspace with test data
test_df = spark.createDataFrame([
(1, "Alice", 100.0, "2026-04-01"),
(2, None, -50.0, "2099-01-01"), # should be filtered out
], ["id", "name", "amount", "date"])
result = run_silver_pipeline(test_df)
assert result.count() == 1 # only valid row
assert result.filter("name = 'Alice'").count() == 1
Databricks Asset Bundles (DABs)
Asset Bundles package your entire project into a deployable unit:
# databricks.yml — bundle configuration
bundle:
name: freshmart-etl
workspace:
host: https://adb-1234567890.1.azuredatabricks.net
resources:
jobs:
nightly_etl:
name: "Freshmart Nightly ETL"
tasks:
- task_key: ingest
notebook_task:
notebook_path: ./notebooks/01_ingest.py
job_cluster_key: etl_cluster
- task_key: transform
depends_on:
- task_key: ingest
notebook_task:
notebook_path: ./notebooks/02_transform.py
job_cluster_key: etl_cluster
pipelines:
quality_pipeline:
name: "Freshmart Quality Pipeline"
target: freshmart_silver
libraries:
- notebook:
path: ./pipelines/quality_checks.sql
environments:
dev:
workspace:
host: https://adb-dev.azuredatabricks.net
prod:
workspace:
host: https://adb-prod.azuredatabricks.net
Deploy via CLI
# Validate the bundle
databricks bundle validate
# Deploy to dev environment
databricks bundle deploy --target dev
# Run a specific job
databricks bundle run nightly_etl --target dev
# Deploy to production
databricks bundle deploy --target prod
Deploy via REST API
import requests
# Deploy using the Databricks REST API
response = requests.post(
f"{workspace_url}/api/2.1/jobs/create",
headers={"Authorization": f"Bearer {token}"},
json=job_config
)
CI/CD with Asset Bundles
A typical CI/CD pipeline:
- Developer pushes code to feature branch
- CI pipeline (GitHub Actions/Azure DevOps) runs:
databricks bundle validate— check config syntaxpytest— run unit testsdatabricks bundle deploy --target dev— deploy to dev- Integration tests in dev workspace
- PR merged → deploy to staging, run E2E tests
- Release →
databricks bundle deploy --target prod
🎬 Video coming soon
Knowledge check
Dr. Sarah Okafor needs to deploy Athena Group's ETL pipeline to three environments (dev, staging, prod) with the same code but different workspace URLs. Which tool should she use?
Next up: Monitoring Clusters & Troubleshooting — cluster monitoring, job repair, and Spark troubleshooting.