Git & CI/CD for ML Projects

Git for machine learning

Simple explanation

Git is version control for code — like “Track Changes” in Word, but for everything.

Without Git, your ML project looks like: model_final.py, model_final_v2.py, model_ACTUALLY_final.py. With Git, you have a clean history: who changed what, when, and why. You can go back to any point in time.

But ML projects are special — you’re not just tracking code. You’re tracking experiments (which code + data + parameters produced which model). Git tracks the code; MLflow (Module 6) tracks the experiments. Together, they give you full reproducibility.

What goes in Git vs what doesn’t

Artifact	In Git?	Where Instead?
Training scripts (.py)	Yes	—
Pipeline definitions (.yaml)	Yes	—
Environment specs (conda.yaml)	Yes	—
Component definitions	Yes	—
Bicep/IaC templates	Yes	—
GitHub Actions workflows	Yes	—
Hyperparameter configs	Yes	—
Trained model weights (.pkl, .pt)	No	Azure ML model registry
Datasets (CSV, parquet, images)	No	Azure ML data assets / datastores
Experiment metrics and logs	No	MLflow tracking
Secrets and API keys	No	Azure Key Vault

Exam tip: Never commit models or data to Git

This sounds obvious, but exam questions may present scenarios where someone wants to “version the model by committing it to Git.” The correct answer is always: register the model in the Azure ML model registry (or a shared registry for cross-workspace access).

Git is for code and configuration. MLflow and Azure ML are for experiment artifacts.

Repository structure for ML projects

A common structure that works well with Azure ML:

ml-project/
  .github/
    workflows/
      train-on-push.yml      # CI: train on feature branch push
      deploy-on-merge.yml    # CD: deploy model on merge to main
  infra/
    main.bicep               # Workspace + compute IaC
    params-dev.json
    params-prod.json
  src/
    train.py                 # Training script
    score.py                 # Inference script
    prepare.py               # Data preparation
  components/
    prepare/component.yaml   # Pipeline component definitions
    train/component.yaml
    evaluate/component.yaml
  pipelines/
    training-pipeline.yaml   # Full pipeline definition
  environments/
    training-env.yaml        # Conda environment for training
    scoring-env.yaml         # Conda environment for inference
  tests/
    test_prepare.py          # Unit tests for data prep
    test_score.py            # Unit tests for scoring

GitHub integration with Azure ML

Azure ML can connect directly to GitHub repositories, enabling:

Code tracking — each experiment records the Git commit it ran from
Automated training — GitHub Actions trigger training jobs on push or PR
Secure access — GitHub connects to Azure ML via OIDC (no stored secrets)

Scenario: Kai builds NeuralSpark's ML CI/CD

Kai designs the following Git workflow for NeuralSpark:

Feature branch workflow:

Data scientist creates feature/improve-churn-model branch
Pushes code changes → GitHub Actions runs unit tests
If tests pass → GitHub Actions submits a training job to Azure ML (dev workspace)
Training job logs metrics to MLflow
Data scientist reviews metrics, creates Pull Request
Team reviews code + experiment results
Merge to main → triggers deployment workflow

Deployment workflow (on merge to main):

Registers the model in the Azure ML registry
Deploys to staging endpoint
Runs smoke tests against staging
If tests pass → deploys to production endpoint

Priya (CTO) loves it: “No more ‘works on my machine’ — everything is automated and auditable.”

GitHub Actions for ML workflows

Trigger training on code push

# .github/workflows/train-on-push.yml
name: Train Model
on:
  push:
    branches: ['feature/**']
    paths: ['src/**', 'components/**', 'pipelines/**']

jobs:
  train:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Azure Login (OIDC)
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Install Azure ML CLI
        run: az extension add --name ml

      - name: Submit Training Job
        run: |
          az ml job create \
            --file pipelines/training-pipeline.yaml \
            --resource-group rg-ml-dev \
            --workspace-name neuralspark-dev \
            --set display_name="train-${{ github.sha }}"

What’s happening:

Lines 5-6: Only triggers on feature branches when ML code changes
Lines 17-22: OIDC authentication — no stored secrets
Lines 28-33: Submits the training pipeline to the dev workspace, tagged with the Git commit SHA for traceability

Feature-based development

Feature branches isolate experiments. Each branch gets its own training runs, tracked by Git commit:

# Tag the training job with branch info
- name: Submit Training Job
  run: |
    az ml job create \
      --file pipelines/training-pipeline.yaml \
      --workspace-name neuralspark-dev \
      --set display_name="${{ github.ref_name }}-${{ github.run_number }}" \
      --set tags.branch="${{ github.ref_name }}" \
      --set tags.commit="${{ github.sha }}"

This means every experiment in MLflow can be traced back to the exact code that produced it.

Environment-based promotion

GitHub Actions environments let you add approval gates and secrets per stage:

# .github/workflows/deploy-model.yml
jobs:
  deploy-staging:
    environment: staging
    runs-on: ubuntu-latest
    steps:
      # Deploy to staging endpoint...

  deploy-production:
    needs: deploy-staging
    environment: production    # Requires manual approval
    runs-on: ubuntu-latest
    steps:
      # Deploy to production endpoint...

What’s happening:

Line 4: staging environment — automatic, no approvals
Lines 10-11: production environment — requires a team member to click “Approve” before deployment proceeds
This is how you prevent untested models from reaching production

Exam tip: GitHub environments and protection rules

The exam tests knowledge of GitHub environments for ML deployment:

Environments scope secrets and protection rules to specific stages
Required reviewers add human approval gates before production deployment
Wait timers add mandatory delay between stages (e.g., 30-minute soak in staging)

If a question asks “how to require approval before deploying a model to production,” the answer is GitHub environments with required reviewers.

Key terms flashcards

Question

What should go in Git for an ML project?

Click or press Enter to reveal answer

Answer

Code (training scripts, inference scripts), configuration (pipeline YAML, environment specs, Bicep templates), and GitHub Actions workflows. NOT models, datasets, or secrets.

Click to flip back

Question

How does GitHub Actions authenticate to Azure ML securely?

Click or press Enter to reveal answer

Answer

Workload identity federation (OIDC) — issues short-lived tokens per workflow run. No stored secrets. Configured with client-id, tenant-id, and subscription-id.

Click to flip back

Question

What are GitHub environments used for in ML CI/CD?

Click or press Enter to reveal answer

Answer

They scope secrets and add protection rules (required reviewers, wait timers) to deployment stages. Used to add approval gates before production model deployment.

Click to flip back

Question

Feature branch workflow for ML — what's the pattern?

Click or press Enter to reveal answer

Answer

Create branch → push code → CI runs tests + submits training job → review metrics → create PR → merge to main → CD deploys model to staging → approve → deploy to production.

Click to flip back

Knowledge check

Knowledge Check

A data scientist at NeuralSpark commits a trained model file (.pkl) to the Git repository. What is the correct approach?

Knowledge Check

Kai wants to ensure that models cannot reach production without team review. What should he configure in the GitHub Actions deployment workflow?

Next up: MLflow — tracking every experiment so you never lose a good result.