🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 3
Domain 3 — Module 4 of 5 80%
17 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 3: Design and Implement a GenAIOps Infrastructure Premium ⏱ ~13 min read

Model Versioning & Production Strategies

Foundation models get updated. Learn to version deployments, implement rollback strategies, and manage the transition from one model version to another in production.

Why versioning matters for GenAI

☕ Simple explanation

Versioning is like keeping the old menu while testing new recipes.

When a restaurant introduces a new dish, they don’t throw away the old menu. They test the new recipe with a few tables first. If customers love it, they add it to the main menu. If they hate it, the old dish is still there — no disruption.

Rollback is switching back to the old menu if the new recipe fails. You can only do this if you kept the old menu around.

GenAI models work the same way. GPT-4o gets updated, your prompts might behave differently. If you pinned your deployment to a specific version, you control when to upgrade — not Microsoft.

Foundation models are updated by their providers (OpenAI, Meta, Microsoft). Each update can change:

  • Output quality — newer versions may produce better or different responses
  • Behaviour — safety filters, formatting, and instruction following may change
  • Performance — speed and token costs may differ
  • Deprecation — older versions are retired on a published schedule

Without version pinning, an automatic model update could silently change your application’s behaviour. Production GenAI systems must treat model versions like any other software dependency — pin, test, promote explicitly.

Versioning strategies

Strategies for managing model versions
FeatureHow It WorksRisk LevelBest For
Version PinningSpecify exact model version in deployment (e.g., gpt-4o-2024-11-20)Low — no surprise changesProduction workloads where behaviour must be predictable
Auto-Update (Default)Azure updates to latest stable version automaticallyHigh — behaviour may change without warningNon-critical workloads, prototyping only
Deployment NamingName deployments by version (gpt4o-v1120, gpt4o-v0513)Low — clear which version is in useManaging multiple versions side-by-side
💡 Exam tip: Always pin versions in production

If an exam question describes a production workload and asks about model versioning, the answer is almost always pin to a specific version.

  • “Which approach ensures consistent behaviour?” → Version pinning
  • “The application’s responses changed after a model update. How to prevent this?” → Pin the model version
  • “How to test a new model version before production?” → Deploy the new version alongside the old one, split traffic

Auto-update is only acceptable for non-production, experimental workloads.

Pinning model versions in deployments

# Deploy GPT-4o pinned to a specific version
az cognitiveservices account deployment create \
  --name aoai-genai-prod \
  --resource-group rg-genai-prod \
  --deployment-name gpt4o-prod-v1120 \
  --model-name gpt-4o \
  --model-version "2024-11-20" \
  --model-format OpenAI \
  --sku-capacity 50 \
  --sku-name Standard

What’s happening:

  • Line 5: Deployment name includes the version date — makes it obvious which version is running
  • Line 7: --model-version "2024-11-20" — pins to this exact version. Azure will NOT auto-update it
  • Line 10: Standard SKU (pay-per-token). Change to ProvisionedManaged for PTUs

Blue-green deployment for model updates

When a new model version is available, deploy it alongside the current one and shift traffic gradually:

# Step 1: Deploy the new model version alongside the current one
az cognitiveservices account deployment create \
  --name aoai-genai-prod \
  --resource-group rg-genai-prod \
  --deployment-name gpt4o-prod-v0513 \
  --model-name gpt-4o \
  --model-version "2025-05-13" \
  --model-format OpenAI \
  --sku-capacity 50 \
  --sku-name Standard
# Step 2: Route traffic between versions in your application
import random
from openai import AzureOpenAI

# Configuration for traffic splitting
DEPLOYMENTS = {
    "gpt4o-prod-v1120": 90,  # 90% to current version
    "gpt4o-prod-v0513": 10,  # 10% to new version (canary)
}

def get_deployment():
    """Route requests based on traffic weights."""
    roll = random.randint(1, 100)
    cumulative = 0
    for deployment, weight in DEPLOYMENTS.items():
        cumulative += weight
        if roll <= cumulative:
            return deployment
    return list(DEPLOYMENTS.keys())[0]

client = AzureOpenAI(
    azure_endpoint="https://aoai-genai-prod.openai.azure.com",
    api_version="2024-10-21",
)

# Each request goes to the selected deployment
response = client.chat.completions.create(
    model=get_deployment(),
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarise this loan document..."}
    ]
)

What’s happening:

  • Lines 6-9: Defines traffic weights — 90% goes to the old version, 10% to the new one
  • Lines 11-17: get_deployment() randomly selects which deployment handles each request based on weights
  • Line 27: The selected deployment name is passed as the model parameter
  • To shift traffic: change the weights (e.g., 50/50, then 0/100) and redeploy your application code

Rollback procedures

When the new version causes issues, roll back immediately:

# Option 1: Application-level rollback — change traffic weights
# Update your deployment config to send 100% to the old version
# DEPLOYMENTS = "gpt4o-prod-v1120": 100, "gpt4o-prod-v0513": 0

# Option 2: Delete the problematic deployment entirely
az cognitiveservices account deployment delete \
  --name aoai-genai-prod \
  --resource-group rg-genai-prod \
  --deployment-name gpt4o-prod-v0513

What’s happening:

  • Option 1: Fastest rollback — just change the traffic weights in your application config to send 100% to the old version. No infrastructure changes needed
  • Option 2: Clean up the failed deployment after shifting traffic away. This frees up capacity

A/B testing across model versions

Compare old vs new systematically:

What to CompareHow to MeasureDecision Criteria
Response qualityHuman evaluation scores, LLM-as-judge scoresNew version scores equal or better
LatencyP50, P95, P99 response timesNew version within 10% of old
CostTokens per request (input + output)New version not significantly more expensive
SafetyContent filter trigger rate, harmful response rateNew version same or lower trigger rate
Task accuracyGround truth comparison on test datasetNew version matches or beats accuracy
Scenario: Zara manages GPT-4o to GPT-4o-mini migration at Atlas

Zara wants to cut costs at Atlas Consulting. GPT-4o-mini costs roughly 1/10th of GPT-4o. Marcus Webb, her lead, approves testing it on lower-complexity tasks.

Zara’s migration plan:

  1. Identify candidate tasks: Client email summarisation (lower complexity) vs contract analysis (high complexity)
  2. Deploy GPT-4o-mini as a new deployment: gpt4omini-prod-v0718
  3. A/B test on email summarisation: 10% traffic to mini, 90% to GPT-4o. Compare quality scores from the evaluation pipeline
  4. Results after 1 week: Mini scores 4.2/5 (vs GPT-4o at 4.4/5) on summarisation quality — acceptable
  5. Progressive rollout: 50%, then 100% of email summarisation traffic to mini
  6. Contract analysis stays on GPT-4o: Quality drop was unacceptable (3.1/5 vs 4.6/5)

Result: 60% cost reduction on email summarisation, zero quality compromise on contract analysis. Both versions pinned and running in parallel.

Model deprecation planning

Azure OpenAI publishes a deprecation schedule. Plan ahead:

PhaseWhat HappensAction Required
Generally availableModel is fully supportedDeploy and use normally
Deprecation announcedMicrosoft publishes retirement date (usually 6-12 months notice)Start testing the replacement model
Legacy periodModel still works but no longer receives updatesComplete migration to replacement
RetiredModel is no longer available — API calls return errorsMust be off this version before this date
💡 Exam tip: Deprecation readiness

The exam may test deprecation scenarios:

  • “GPT-4o version 2024-05-13 is being retired in 3 months. What should you do?” → Deploy the replacement version alongside, A/B test, migrate traffic progressively
  • “A deployed model version was retired and the API is returning errors” → You should have pinned the version AND monitored deprecation announcements. Immediate action: deploy the latest supported version

Key practice: subscribe to Azure OpenAI model deprecation notifications and build version migration into your operational runbook.

Key terms flashcards

Question

Why should you pin model versions in production?

Click or press Enter to reveal answer

Answer

Auto-updates can silently change model behaviour — responses, quality, safety filters, and costs may all shift. Pinning to a specific version (e.g., gpt-4o-2024-11-20) ensures predictable behaviour. You control when to upgrade by testing the new version first.

Click to flip back

Question

How does blue-green deployment work for foundation models?

Click or press Enter to reveal answer

Answer

Deploy the new model version as a separate deployment alongside the current one. Split traffic (e.g., 90/10) to test the new version with real requests. Monitor quality metrics. If good, shift more traffic. If bad, send 100% back to the old version (rollback).

Click to flip back

Question

What is the fastest way to roll back a model update?

Click or press Enter to reveal answer

Answer

Application-level rollback: change traffic weights to send 100% to the old deployment. No infrastructure changes needed — just update the routing configuration. The old deployment is still running and handling requests.

Click to flip back

Knowledge check

Knowledge Check

Zara's client-facing summarisation tool at Atlas Consulting suddenly starts producing lower-quality outputs. The team discovers Azure auto-updated the GPT-4o deployment to a newer version overnight. How should Zara prevent this in future?

Knowledge Check

Dr. Fatima is migrating Meridian's document processing from GPT-4o version 2024-05-13 (being deprecated) to 2024-11-20. She processes 50,000 documents daily and cannot tolerate quality regressions. What is the safest migration approach?

🎬 Video coming soon


Next up: PromptOps — treating prompts as code with versioning, testing, and CI/CD.

← Previous

Deploying Foundation Models

Next →

PromptOps: Design, Compare, Version & Ship

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.