🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 2
Domain 2 — Module 8 of 8 100%
13 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 2: Implement Machine Learning Model Lifecycle and Operations Premium ⏱ ~12 min read

Drift, Monitoring & Retraining

Models degrade over time. Learn to detect data drift, monitor production performance, set up alert triggers, and automate retraining to keep your models accurate.

Why models degrade

☕ Simple explanation

A weather forecast gets worse the further out you look.

A model trained on last year’s data makes predictions about today. But the world changes: customers behave differently, new products launch, economic conditions shift. The model’s “map” no longer matches the “territory.”

This is drift — the data your model sees in production slowly diverges from the data it was trained on. Without monitoring, you won’t know your model is wrong until customers complain.

Model degradation occurs through two mechanisms:

  • Data drift — the statistical distribution of input features changes over time. Example: average customer age shifts from 35 to 42 due to a new product targeting older demographics.
  • Concept drift — the relationship between features and the target changes. Example: “monthly charges” used to predict churn, but after a pricing restructure, it no longer does.

Azure ML provides tools to detect both types, alert teams, and trigger automated retraining.

Data drift vs concept drift

Two types of model degradation
FeatureWhat ChangesHow to DetectHow to Fix
Data DriftInput feature distributions shiftCompare production data statistics to training data baselineRetrain on recent data, or adjust feature engineering
Concept DriftThe relationship between features and target changesMonitor prediction accuracy against ground truth labelsRetrain with new labels that reflect the changed relationship

Configuring data drift monitoring

Azure ML compares production data against a baseline (training data) to detect distribution changes:

from azure.ai.ml.entities import (
    MonitorSchedule,
    MonitorDefinition,
    DataDriftSignal,
    ProductionData,
    ReferenceData,
    AlertNotification
)

# Define the monitoring schedule
monitor = MonitorSchedule(
    name="churn-drift-monitor",
    trigger=RecurrenceTrigger(frequency="day", interval=1),
    create_monitor=MonitorDefinition(
        signals={
            "feature_drift": DataDriftSignal(
                production_data=ProductionData(
                    input_data=Input(
                        type="uri_folder",
                        path="azureml:production-inputs:latest"
                    ),
                ),
                reference_data=ReferenceData(
                    input_data=Input(
                        type="mltable",
                        path="azureml:churn-train:2"
                    ),
                ),
                features=["tenure", "monthly_charges",
                          "support_tickets", "contract_type"],
                metric_thresholds={
                    "normalized_wasserstein_distance": 0.1,
                    "jensen_shannon_distance": 0.05,
                },
            )
        },
        alert_notification=AlertNotification(
            emails=["mlops-team@neuralspark.ai"]
        ),
    ),
)

ml_client.schedules.begin_create_or_update(monitor)

What’s happening:

  • Line 13: Runs daily — compares today’s production data against the training baseline
  • Lines 17-21: Production data is what the model is seeing now
  • Lines 23-27: Reference data is the training dataset — the “expected” distribution
  • Lines 29-30: Monitors specific features (not all — focus on the most important)
  • Lines 31-33: Thresholds — if Wasserstein distance exceeds 0.1 or Jensen-Shannon exceeds 0.05, an alert fires
  • Lines 36-38: Email notification when drift is detected

Drift detection metrics

MetricWhat It MeasuresRange
Normalized Wasserstein distanceHow much a distribution has shifted (works for numerical features)0 (identical) to 1+ (very different)
Jensen-Shannon distanceSymmetric divergence between two distributions0 (identical) to 1 (completely different)
Population Stability Index (PSI)Overall shift magnitudeless than 0.1 = stable, 0.1-0.25 = moderate, over 0.25 = significant
Chi-squared testWhether categorical distributions differ significantlyp-value below 0.05 = drift detected
Scenario: Kai detects drift after a pricing change

NeuralSpark changed their subscription pricing in March. Two weeks later, the drift monitor fires:

  • monthly_charges Wasserstein distance: 0.34 (threshold: 0.1) — way over
  • contract_type Jensen-Shannon: 0.12 (threshold: 0.05) — significant

The pricing change shifted the distribution of both features. The churn model, trained on old pricing data, is now making predictions based on outdated patterns.

Kai’s response:

  1. Acknowledge the alert
  2. Collect 2 weeks of post-pricing data
  3. Retrain the model on the updated dataset
  4. Use blue-green deployment (Module 12) to roll out the retrained model safely

Performance monitoring

Beyond data drift, monitor the model’s actual prediction quality:

MetricWhat to TrackAlert When
Accuracy / F1 / AUCPrediction quality (requires ground truth labels)Drops below baseline by X%
LatencyResponse time per predictionP95 latency exceeds SLA (e.g., 200ms)
ThroughputRequests per secondDrops below expected load
Error rateFailed predictions (5xx, timeout)Exceeds 1%
💡 Exam tip: Ground truth delay

Data drift detection is immediate — you can compare feature distributions daily. But performance monitoring (accuracy, F1) requires ground truth labels, which may arrive with a delay.

Example: a churn model predicts “this customer will churn.” You don’t know if that’s correct until the customer actually churns (or doesn’t) — which might take 30-90 days.

The exam tests this distinction:

  • Data drift = detect immediately, act quickly
  • Performance degradation = detect after ground truth arrives, may lag weeks/months

Automated retraining triggers

Set up automated responses to drift or performance degradation:

TriggerActionWhen
Data drift above thresholdAlert team + queue retraining pipelineDaily check
Performance below baselineAlert team + compare with retrained modelWhen ground truth labels arrive
ScheduledRetrain on fresh data regardlessMonthly (most common)
Data volumeRetrain when enough new data accumulatesAfter N new records
# GitHub Actions: scheduled retraining on the 1st of each month
on:
  schedule:
    - cron: '0 2 1 * *'  # 2 AM on the 1st of every month

jobs:
  retrain:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Azure Login
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - name: Submit Retraining Pipeline
        run: |
          az extension add --name ml
          az ml job create \
            --file pipelines/retraining-pipeline.yaml \
            --workspace-name ml-workspace-prod \
            --resource-group rg-ml-prod

Key terms flashcards

Question

Data drift vs concept drift?

Click or press Enter to reveal answer

Answer

Data drift: input feature distributions change (e.g., average age shifts). Concept drift: the relationship between features and target changes (e.g., pricing restructure changes what predicts churn). Both degrade model performance.

Click to flip back

Question

Why is there a delay in detecting performance degradation?

Click or press Enter to reveal answer

Answer

Performance metrics (accuracy, F1) require ground truth labels. These may arrive weeks or months after predictions are made (e.g., knowing if a customer actually churned). Data drift can be detected immediately.

Click to flip back

Question

What is the Wasserstein distance?

Click or press Enter to reveal answer

Answer

A metric measuring how much a numerical feature distribution has shifted from the baseline. 0 = identical, higher = more drift. Used in Azure ML data drift monitoring with configurable thresholds.

Click to flip back

Knowledge check

Knowledge Check

NeuralSpark changed their subscription pricing. Two weeks later, the churn model's data drift monitor shows monthly_charges Wasserstein distance at 0.34 (threshold: 0.1). What should Kai do?

Knowledge Check

Dr. Fatima wants to detect model degradation as early as possible. She can track data drift daily, but ground truth labels take 60 days to arrive. What monitoring strategy should she use?

🎬 Video coming soon


Next up: Foundry — setting up the GenAI platform with hubs, projects, and access control.

← Previous

Deploying Models: Endpoints in Production

Next →

Foundry: Hubs, Projects & Platform Setup

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.