🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 2
Domain 2 — Module 1 of 8 13%
6 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 2: Implement Machine Learning Model Lifecycle and Operations Free ⏱ ~14 min read

MLflow: Track Every Experiment

If you can't track it, you can't reproduce it. Master MLflow experiment tracking — log metrics, parameters, and artifacts so every experiment is fully traceable.

What is MLflow?

☕ Simple explanation

MLflow is like a lab notebook that writes itself.

In a science lab, you record every experiment: what you mixed, how much, what temperature, and what happened. Without notes, you can’t repeat a success or understand a failure.

MLflow does this automatically for ML experiments. Every time you train a model, it records: which data you used (parameters), how well it performed (metrics), and the actual model file (artifact). Weeks later, you can look up “which run got 94% accuracy?” and trace back to the exact code and data.

MLflow is an open-source platform for managing the ML lifecycle. Azure Machine Learning integrates MLflow natively — every workspace includes an MLflow tracking server. Key capabilities:

  • Experiment tracking — log parameters, metrics, and artifacts for every run
  • Model registry — version and stage models (dev → staging → production)
  • Model packaging — standard format for deployment across platforms
  • Run comparison — compare metrics across multiple training runs

In Azure ML, MLflow tracking is built-in — no separate server to manage. Your workspace IS the MLflow tracking server.

MLflow concepts

ConceptWhat It IsExample
ExperimentA named group of related runs”churn-prediction-v2”
RunA single execution of a training scriptOne training job with specific hyperparameters
ParameterAn input configuration valuelearning_rate=0.01, n_estimators=100
MetricA measured output valueaccuracy=0.94, loss=0.12
ArtifactA file produced by the runmodel.pkl, feature_importance.png, confusion_matrix.json
TagMetadata label”team=nlp”, “sprint=q2”, “git_commit=abc123”

Logging with MLflow in Azure ML

When you run a training script in Azure ML, MLflow tracking is automatic. Here’s how to use it:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

# MLflow auto-connects to your Azure ML workspace
# No manual server configuration needed

# Start a run
with mlflow.start_run(run_name="rf-baseline"):
    # Log parameters (inputs)
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 10)
    mlflow.log_param("dataset_version", "v2")

    # Train the model
    model = RandomForestClassifier(n_estimators=100, max_depth=10)
    model.fit(X_train, y_train)

    # Log metrics (outputs)
    predictions = model.predict(X_test)
    mlflow.log_metric("accuracy", accuracy_score(y_test, predictions))
    mlflow.log_metric("f1_score", f1_score(y_test, predictions))

    # Log the model as an artifact
    mlflow.sklearn.log_model(model, "churn-model")

    # Log additional artifacts
    mlflow.log_artifact("feature_importance.png")

What’s happening:

  • Line 10: Opens a named run — everything logged inside this block is grouped together
  • Lines 12-14: Records the input configuration so you can reproduce this exact setup
  • Lines 22-23: Records how well the model performed
  • Line 26: Saves the model in MLflow’s standard format (can be deployed to any MLflow-compatible platform)
  • Line 29: Saves additional files (charts, reports) alongside the model
Scenario: Dr. Luca's reproducibility rescue

Dr. Luca Bianchi at GenomeVault ran 47 experiments over three weeks. His colleague asks: “Which run produced the best F1 score, and can we reproduce it?”

Without MLflow: “Um, I think it was the one on Tuesday… let me check my notebooks…”

With MLflow:

# Find the best run across all experiments
runs = mlflow.search_runs(
    experiment_names=["genomics-variant-calling"],
    order_by=["metrics.f1_score DESC"],
    max_results=1
)
print(runs[["run_id", "params.model_type", "metrics.f1_score"]])

Result: Run abc123, model_type=gradient_boost, F1=0.967. Every parameter, the exact code commit (via Git tag), and the trained model are all traceable.

Prof. Sarah Lin: “This is exactly the kind of rigour we need for our publications.”

Autologging

MLflow can automatically log parameters and metrics for popular frameworks — no manual log_param calls needed:

# Enable autologging for scikit-learn
mlflow.sklearn.autolog()

# Just train the model — MLflow captures everything
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)

Supported frameworks for autologging:

FrameworkWhat’s Auto-Logged
scikit-learnAll hyperparameters, metrics (accuracy, F1, etc.), model artifact
PyTorch / PyTorch LightningLoss per epoch, learning rate, model weights
TensorFlow / KerasEpoch metrics, optimizer config, model architecture
XGBoost / LightGBMBoosting params, feature importance, eval metrics
Spark MLPipeline stages, evaluator metrics
💡 Exam tip: Autologging vs manual logging

Autologging is convenient but logs EVERYTHING. For production pipelines, manual logging gives you control over exactly what’s tracked.

The exam may ask when to use each:

  • Autologging: exploration, prototyping, when you want comprehensive tracking with no code changes
  • Manual logging: production pipelines, when you need specific metrics or custom artifacts

Comparing runs

One of MLflow’s most powerful features is comparing runs side by side:

# Search and compare runs
import mlflow

runs = mlflow.search_runs(
    experiment_names=["churn-prediction-v2"],
    filter_string="metrics.accuracy > 0.90",
    order_by=["metrics.f1_score DESC"]
)

# View top runs
print(runs[["run_id", "params.n_estimators", "params.max_depth",
            "metrics.accuracy", "metrics.f1_score"]].head(5))

What’s happening:

  • Line 6: Filters to only runs with accuracy above 90%
  • Line 7: Sorts by F1 score (descending) — best runs first
  • Lines 10-12: Shows the key parameters and metrics for comparison

In the Azure ML Studio UI, you can also visually compare runs — select multiple runs and view metrics in parallel charts, scatter plots, or tables.

Scenario: Kai compares 200 sweep runs

Kai just ran a hyperparameter sweep with 200 trials (covered in Module 7). Now he needs to find the best model.

# Find the top 5 runs from the sweep
best_runs = mlflow.search_runs(
    experiment_names=["churn-sweep-apr-2026"],
    order_by=["metrics.f1_score DESC"],
    max_results=5
)

# Log the winner for the team
winner = best_runs.iloc[0]
print(f"Best run: {winner.run_id}")
print(f"  F1: {winner['metrics.f1_score']:.4f}")
print(f"  Learning rate: {winner['params.learning_rate']}")
print(f"  Max depth: {winner['params.max_depth']}")

Priya (CTO): “Which model do we ship?” Kai: “Run 7f3a2b1 — F1 of 0.9612 with learning_rate=0.03 and max_depth=8.”

Key terms flashcards

Question

What are the three things MLflow tracks for every run?

Click or press Enter to reveal answer

Answer

Parameters (inputs like hyperparameters), Metrics (outputs like accuracy/loss), and Artifacts (files like model weights, charts, reports).

Click to flip back

Question

Do you need a separate MLflow server with Azure ML?

Click or press Enter to reveal answer

Answer

No — Azure ML workspaces include a built-in MLflow tracking server. Your workspace IS the tracking server. No additional setup needed.

Click to flip back

Question

What is MLflow autologging?

Click or press Enter to reveal answer

Answer

A feature that automatically logs parameters, metrics, and models for popular frameworks (sklearn, PyTorch, TensorFlow) — no manual log_param/log_metric calls needed.

Click to flip back

Question

How do you find the best run across many experiments?

Click or press Enter to reveal answer

Answer

mlflow.search_runs() with filter_string and order_by. Example: filter accuracy > 0.90, order by F1 score descending.

Click to flip back

Knowledge check

Knowledge Check

Dr. Luca ran 47 experiments over three weeks. His colleague asks which run produced the best F1 score. What tool should Luca use?

Knowledge Check

Kai wants comprehensive experiment tracking with minimal code changes during early prototyping. What should he enable?

🎬 Video coming soon


Next up: AutoML & Hyperparameter Tuning — letting Azure find the best model for you.

← Previous

Git & CI/CD for ML Projects

Next →

AutoML & Hyperparameter Tuning

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.