Git & CI/CD for ML Projects
ML code deserves the same discipline as app code. Learn to manage ML projects with Git, automate training with GitHub Actions, and build CI/CD pipelines that ship models safely.
Git for machine learning
Git is version control for code β like βTrack Changesβ in Word, but for everything.
Without Git, your ML project looks like: model_final.py, model_final_v2.py, model_ACTUALLY_final.py. With Git, you have a clean history: who changed what, when, and why. You can go back to any point in time.
But ML projects are special β youβre not just tracking code. Youβre tracking experiments (which code + data + parameters produced which model). Git tracks the code; MLflow (Module 6) tracks the experiments. Together, they give you full reproducibility.
What goes in Git vs what doesnβt
| Artifact | In Git? | Where Instead? |
|---|---|---|
| Training scripts (.py) | Yes | β |
| Pipeline definitions (.yaml) | Yes | β |
| Environment specs (conda.yaml) | Yes | β |
| Component definitions | Yes | β |
| Bicep/IaC templates | Yes | β |
| GitHub Actions workflows | Yes | β |
| Hyperparameter configs | Yes | β |
| Trained model weights (.pkl, .pt) | No | Azure ML model registry |
| Datasets (CSV, parquet, images) | No | Azure ML data assets / datastores |
| Experiment metrics and logs | No | MLflow tracking |
| Secrets and API keys | No | Azure Key Vault |
Exam tip: Never commit models or data to Git
This sounds obvious, but exam questions may present scenarios where someone wants to βversion the model by committing it to Git.β The correct answer is always: register the model in the Azure ML model registry (or a shared registry for cross-workspace access).
Git is for code and configuration. MLflow and Azure ML are for experiment artifacts.
Repository structure for ML projects
A common structure that works well with Azure ML:
ml-project/
.github/
workflows/
train-on-push.yml # CI: train on feature branch push
deploy-on-merge.yml # CD: deploy model on merge to main
infra/
main.bicep # Workspace + compute IaC
params-dev.json
params-prod.json
src/
train.py # Training script
score.py # Inference script
prepare.py # Data preparation
components/
prepare/component.yaml # Pipeline component definitions
train/component.yaml
evaluate/component.yaml
pipelines/
training-pipeline.yaml # Full pipeline definition
environments/
training-env.yaml # Conda environment for training
scoring-env.yaml # Conda environment for inference
tests/
test_prepare.py # Unit tests for data prep
test_score.py # Unit tests for scoring
GitHub integration with Azure ML
Azure ML can connect directly to GitHub repositories, enabling:
- Code tracking β each experiment records the Git commit it ran from
- Automated training β GitHub Actions trigger training jobs on push or PR
- Secure access β GitHub connects to Azure ML via OIDC (no stored secrets)
Scenario: Kai builds NeuralSpark's ML CI/CD
Kai designs the following Git workflow for NeuralSpark:
Feature branch workflow:
- Data scientist creates
feature/improve-churn-modelbranch - Pushes code changes β GitHub Actions runs unit tests
- If tests pass β GitHub Actions submits a training job to Azure ML (dev workspace)
- Training job logs metrics to MLflow
- Data scientist reviews metrics, creates Pull Request
- Team reviews code + experiment results
- Merge to
mainβ triggers deployment workflow
Deployment workflow (on merge to main):
- Registers the model in the Azure ML registry
- Deploys to staging endpoint
- Runs smoke tests against staging
- If tests pass β deploys to production endpoint
Priya (CTO) loves it: βNo more βworks on my machineβ β everything is automated and auditable.β
GitHub Actions for ML workflows
Trigger training on code push
# .github/workflows/train-on-push.yml
name: Train Model
on:
push:
branches: ['feature/**']
paths: ['src/**', 'components/**', 'pipelines/**']
jobs:
train:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Azure Login (OIDC)
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Install Azure ML CLI
run: az extension add --name ml
- name: Submit Training Job
run: |
az ml job create \
--file pipelines/training-pipeline.yaml \
--resource-group rg-ml-dev \
--workspace-name neuralspark-dev \
--set display_name="train-${{ github.sha }}"
Whatβs happening:
- Lines 5-6: Only triggers on feature branches when ML code changes
- Lines 17-22: OIDC authentication β no stored secrets
- Lines 28-33: Submits the training pipeline to the dev workspace, tagged with the Git commit SHA for traceability
Feature-based development
Feature branches isolate experiments. Each branch gets its own training runs, tracked by Git commit:
# Tag the training job with branch info
- name: Submit Training Job
run: |
az ml job create \
--file pipelines/training-pipeline.yaml \
--workspace-name neuralspark-dev \
--set display_name="${{ github.ref_name }}-${{ github.run_number }}" \
--set tags.branch="${{ github.ref_name }}" \
--set tags.commit="${{ github.sha }}"
This means every experiment in MLflow can be traced back to the exact code that produced it.
Environment-based promotion
GitHub Actions environments let you add approval gates and secrets per stage:
# .github/workflows/deploy-model.yml
jobs:
deploy-staging:
environment: staging
runs-on: ubuntu-latest
steps:
# Deploy to staging endpoint...
deploy-production:
needs: deploy-staging
environment: production # Requires manual approval
runs-on: ubuntu-latest
steps:
# Deploy to production endpoint...
Whatβs happening:
- Line 4:
stagingenvironment β automatic, no approvals - Lines 10-11:
productionenvironment β requires a team member to click βApproveβ before deployment proceeds - This is how you prevent untested models from reaching production
Exam tip: GitHub environments and protection rules
The exam tests knowledge of GitHub environments for ML deployment:
- Environments scope secrets and protection rules to specific stages
- Required reviewers add human approval gates before production deployment
- Wait timers add mandatory delay between stages (e.g., 30-minute soak in staging)
If a question asks βhow to require approval before deploying a model to production,β the answer is GitHub environments with required reviewers.
Key terms flashcards
Knowledge check
A data scientist at NeuralSpark commits a trained model file (.pkl) to the Git repository. What is the correct approach?
Kai wants to ensure that models cannot reach production without team review. What should he configure in the GitHub Actions deployment workflow?
π¬ Video coming soon
Next up: MLflow β tracking every experiment so you never lose a good result.