Drift, Monitoring & Retraining
Models degrade over time. Learn to detect data drift, monitor production performance, set up alert triggers, and automate retraining to keep your models accurate.
Why models degrade
A weather forecast gets worse the further out you look.
A model trained on last year’s data makes predictions about today. But the world changes: customers behave differently, new products launch, economic conditions shift. The model’s “map” no longer matches the “territory.”
This is drift — the data your model sees in production slowly diverges from the data it was trained on. Without monitoring, you won’t know your model is wrong until customers complain.
Data drift vs concept drift
| Feature | What Changes | How to Detect | How to Fix |
|---|---|---|---|
| Data Drift | Input feature distributions shift | Compare production data statistics to training data baseline | Retrain on recent data, or adjust feature engineering |
| Concept Drift | The relationship between features and target changes | Monitor prediction accuracy against ground truth labels | Retrain with new labels that reflect the changed relationship |
Configuring data drift monitoring
Azure ML compares production data against a baseline (training data) to detect distribution changes:
from azure.ai.ml.entities import (
MonitorSchedule,
MonitorDefinition,
DataDriftSignal,
ProductionData,
ReferenceData,
AlertNotification
)
# Define the monitoring schedule
monitor = MonitorSchedule(
name="churn-drift-monitor",
trigger=RecurrenceTrigger(frequency="day", interval=1),
create_monitor=MonitorDefinition(
signals={
"feature_drift": DataDriftSignal(
production_data=ProductionData(
input_data=Input(
type="uri_folder",
path="azureml:production-inputs:latest"
),
),
reference_data=ReferenceData(
input_data=Input(
type="mltable",
path="azureml:churn-train:2"
),
),
features=["tenure", "monthly_charges",
"support_tickets", "contract_type"],
metric_thresholds={
"normalized_wasserstein_distance": 0.1,
"jensen_shannon_distance": 0.05,
},
)
},
alert_notification=AlertNotification(
emails=["mlops-team@neuralspark.ai"]
),
),
)
ml_client.schedules.begin_create_or_update(monitor)
What’s happening:
- Line 13: Runs daily — compares today’s production data against the training baseline
- Lines 17-21: Production data is what the model is seeing now
- Lines 23-27: Reference data is the training dataset — the “expected” distribution
- Lines 29-30: Monitors specific features (not all — focus on the most important)
- Lines 31-33: Thresholds — if Wasserstein distance exceeds 0.1 or Jensen-Shannon exceeds 0.05, an alert fires
- Lines 36-38: Email notification when drift is detected
Drift detection metrics
| Metric | What It Measures | Range |
|---|---|---|
| Normalized Wasserstein distance | How much a distribution has shifted (works for numerical features) | 0 (identical) to 1+ (very different) |
| Jensen-Shannon distance | Symmetric divergence between two distributions | 0 (identical) to 1 (completely different) |
| Population Stability Index (PSI) | Overall shift magnitude | less than 0.1 = stable, 0.1-0.25 = moderate, over 0.25 = significant |
| Chi-squared test | Whether categorical distributions differ significantly | p-value below 0.05 = drift detected |
Scenario: Kai detects drift after a pricing change
NeuralSpark changed their subscription pricing in March. Two weeks later, the drift monitor fires:
monthly_chargesWasserstein distance: 0.34 (threshold: 0.1) — way overcontract_typeJensen-Shannon: 0.12 (threshold: 0.05) — significant
The pricing change shifted the distribution of both features. The churn model, trained on old pricing data, is now making predictions based on outdated patterns.
Kai’s response:
- Acknowledge the alert
- Collect 2 weeks of post-pricing data
- Retrain the model on the updated dataset
- Use blue-green deployment (Module 12) to roll out the retrained model safely
Performance monitoring
Beyond data drift, monitor the model’s actual prediction quality:
| Metric | What to Track | Alert When |
|---|---|---|
| Accuracy / F1 / AUC | Prediction quality (requires ground truth labels) | Drops below baseline by X% |
| Latency | Response time per prediction | P95 latency exceeds SLA (e.g., 200ms) |
| Throughput | Requests per second | Drops below expected load |
| Error rate | Failed predictions (5xx, timeout) | Exceeds 1% |
Exam tip: Ground truth delay
Data drift detection is immediate — you can compare feature distributions daily. But performance monitoring (accuracy, F1) requires ground truth labels, which may arrive with a delay.
Example: a churn model predicts “this customer will churn.” You don’t know if that’s correct until the customer actually churns (or doesn’t) — which might take 30-90 days.
The exam tests this distinction:
- Data drift = detect immediately, act quickly
- Performance degradation = detect after ground truth arrives, may lag weeks/months
Automated retraining triggers
Set up automated responses to drift or performance degradation:
| Trigger | Action | When |
|---|---|---|
| Data drift above threshold | Alert team + queue retraining pipeline | Daily check |
| Performance below baseline | Alert team + compare with retrained model | When ground truth labels arrive |
| Scheduled | Retrain on fresh data regardless | Monthly (most common) |
| Data volume | Retrain when enough new data accumulates | After N new records |
# GitHub Actions: scheduled retraining on the 1st of each month
on:
schedule:
- cron: '0 2 1 * *' # 2 AM on the 1st of every month
jobs:
retrain:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Azure Login
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Submit Retraining Pipeline
run: |
az extension add --name ml
az ml job create \
--file pipelines/retraining-pipeline.yaml \
--workspace-name ml-workspace-prod \
--resource-group rg-ml-prod
Key terms flashcards
Knowledge check
NeuralSpark changed their subscription pricing. Two weeks later, the churn model's data drift monitor shows monthly_charges Wasserstein distance at 0.34 (threshold: 0.1). What should Kai do?
Dr. Fatima wants to detect model degradation as early as possible. She can track data drift daily, but ground truth labels take 60 days to arrive. What monitoring strategy should she use?
🎬 Video coming soon
Next up: Foundry — setting up the GenAI platform with hubs, projects, and access control.