Drift, Monitoring & Retraining

Why models degrade

Simple explanation

A weather forecast gets worse the further out you look.

A model trained on last year’s data makes predictions about today. But the world changes: customers behave differently, new products launch, economic conditions shift. The model’s “map” no longer matches the “territory.”

This is drift — the data your model sees in production slowly diverges from the data it was trained on. Without monitoring, you won’t know your model is wrong until customers complain.

Data drift vs concept drift

Two types of model degradation
Feature	What Changes	How to Detect	How to Fix
Data Drift	Input feature distributions shift	Compare production data statistics to training data baseline	Retrain on recent data, or adjust feature engineering
Concept Drift	The relationship between features and target changes	Monitor prediction accuracy against ground truth labels	Retrain with new labels that reflect the changed relationship

Configuring data drift monitoring

Azure ML compares production data against a baseline (training data) to detect distribution changes:

from azure.ai.ml.entities import (
    MonitorSchedule,
    MonitorDefinition,
    DataDriftSignal,
    ProductionData,
    ReferenceData,
    AlertNotification
)

# Define the monitoring schedule
monitor = MonitorSchedule(
    name="churn-drift-monitor",
    trigger=RecurrenceTrigger(frequency="day", interval=1),
    create_monitor=MonitorDefinition(
        signals={
            "feature_drift": DataDriftSignal(
                production_data=ProductionData(
                    input_data=Input(
                        type="uri_folder",
                        path="azureml:production-inputs:latest"
                    ),
                ),
                reference_data=ReferenceData(
                    input_data=Input(
                        type="mltable",
                        path="azureml:churn-train:2"
                    ),
                ),
                features=["tenure", "monthly_charges",
                          "support_tickets", "contract_type"],
                metric_thresholds={
                    "normalized_wasserstein_distance": 0.1,
                    "jensen_shannon_distance": 0.05,
                },
            )
        },
        alert_notification=AlertNotification(
            emails=["mlops-team@neuralspark.ai"]
        ),
    ),
)

ml_client.schedules.begin_create_or_update(monitor)

What’s happening:

Line 13: Runs daily — compares today’s production data against the training baseline
Lines 17-21: Production data is what the model is seeing now
Lines 23-27: Reference data is the training dataset — the “expected” distribution
Lines 29-30: Monitors specific features (not all — focus on the most important)
Lines 31-33: Thresholds — if Wasserstein distance exceeds 0.1 or Jensen-Shannon exceeds 0.05, an alert fires
Lines 36-38: Email notification when drift is detected

Drift detection metrics

Metric	What It Measures	Range
Normalized Wasserstein distance	How much a distribution has shifted (works for numerical features)	0 (identical) to 1+ (very different)
Jensen-Shannon distance	Symmetric divergence between two distributions	0 (identical) to 1 (completely different)
Population Stability Index (PSI)	Overall shift magnitude	less than 0.1 = stable, 0.1-0.25 = moderate, over 0.25 = significant
Chi-squared test	Whether categorical distributions differ significantly	p-value below 0.05 = drift detected

Scenario: Kai detects drift after a pricing change

NeuralSpark changed their subscription pricing in March. Two weeks later, the drift monitor fires:

monthly_charges Wasserstein distance: 0.34 (threshold: 0.1) — way over
contract_type Jensen-Shannon: 0.12 (threshold: 0.05) — significant

The pricing change shifted the distribution of both features. The churn model, trained on old pricing data, is now making predictions based on outdated patterns.

Kai’s response:

Acknowledge the alert
Collect 2 weeks of post-pricing data
Retrain the model on the updated dataset
Use blue-green deployment (Module 12) to roll out the retrained model safely

Performance monitoring

Beyond data drift, monitor the model’s actual prediction quality:

Metric	What to Track	Alert When
Accuracy / F1 / AUC	Prediction quality (requires ground truth labels)	Drops below baseline by X%
Latency	Response time per prediction	P95 latency exceeds SLA (e.g., 200ms)
Throughput	Requests per second	Drops below expected load
Error rate	Failed predictions (5xx, timeout)	Exceeds 1%

Exam tip: Ground truth delay

Data drift detection is immediate — you can compare feature distributions daily. But performance monitoring (accuracy, F1) requires ground truth labels, which may arrive with a delay.

Example: a churn model predicts “this customer will churn.” You don’t know if that’s correct until the customer actually churns (or doesn’t) — which might take 30-90 days.

The exam tests this distinction:

Data drift = detect immediately, act quickly
Performance degradation = detect after ground truth arrives, may lag weeks/months

Automated retraining triggers

Set up automated responses to drift or performance degradation:

Trigger	Action	When
Data drift above threshold	Alert team + queue retraining pipeline	Daily check
Performance below baseline	Alert team + compare with retrained model	When ground truth labels arrive
Scheduled	Retrain on fresh data regardless	Monthly (most common)
Data volume	Retrain when enough new data accumulates	After N new records

# GitHub Actions: scheduled retraining on the 1st of each month
on:
  schedule:
    - cron: '0 2 1 * *'  # 2 AM on the 1st of every month

jobs:
  retrain:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Azure Login
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - name: Submit Retraining Pipeline
        run: |
          az extension add --name ml
          az ml job create \
            --file pipelines/retraining-pipeline.yaml \
            --workspace-name ml-workspace-prod \
            --resource-group rg-ml-prod

Key terms flashcards

Question

Data drift vs concept drift?

Click or press Enter to reveal answer

Answer

Data drift: input feature distributions change (e.g., average age shifts). Concept drift: the relationship between features and target changes (e.g., pricing restructure changes what predicts churn). Both degrade model performance.

Click to flip back

Question

Why is there a delay in detecting performance degradation?

Click or press Enter to reveal answer

Answer

Performance metrics (accuracy, F1) require ground truth labels. These may arrive weeks or months after predictions are made (e.g., knowing if a customer actually churned). Data drift can be detected immediately.

Click to flip back

Question

What is the Wasserstein distance?

Click or press Enter to reveal answer

Answer

A metric measuring how much a numerical feature distribution has shifted from the baseline. 0 = identical, higher = more drift. Used in Azure ML data drift monitoring with configurable thresholds.

Click to flip back

Knowledge check

Knowledge Check

NeuralSpark changed their subscription pricing. Two weeks later, the churn model's data drift monitor shows monthly_charges Wasserstein distance at 0.34 (threshold: 0.1). What should Kai do?

Knowledge Check

Dr. Fatima wants to detect model degradation as early as possible. She can track data drift daily, but ground truth labels take 60 days to arrive. What monitoring strategy should she use?

Next up: Foundry — setting up the GenAI platform with hubs, projects, and access control.