AutoML & Hyperparameter Tuning

Finding the best model automatically

Simple explanation

Imagine you’re buying a car but there are 500 models.

You could test-drive every single one — that would take years. Or you could tell a smart assistant: “I need a sedan, under $40K, good fuel economy” and let them narrow it down to 5 finalists for you to try.

AutoML does this for machine learning. Instead of manually trying Random Forest, then XGBoost, then Neural Net… AutoML tries dozens of algorithms and configurations automatically, then tells you which one performed best.

Hyperparameter tuning is the fine-tuning step: once you’ve chosen your car model, you adjust the seat, mirrors, and steering to get the perfect fit.

AutoML: automated model selection

AutoML in Azure ML automatically:

Tries multiple algorithms (Random Forest, XGBoost, LightGBM, Neural Nets…)
Applies feature engineering (encoding, scaling, imputation)
Selects the best model based on your chosen metric
Logs everything to MLflow

from azure.ai.ml import automl

# Define an AutoML classification job
classification_job = automl.classification(
    training_data=Input(type="mltable", path="azureml:churn-data:2"),
    target_column_name="churned",
    primary_metric="AUC_weighted",
    compute="gpu-training-cluster",
    experiment_name="churn-automl-baseline",
)

# Configure limits
classification_job.set_limits(
    max_trials=50,           # Try up to 50 model configurations
    max_concurrent_trials=4,  # Run 4 trials in parallel
    timeout_minutes=120,      # Stop after 2 hours
    enable_early_termination=True  # Stop bad trials early
)

# Submit the job
returned_job = ml_client.jobs.create_or_update(classification_job)

What’s happening:

Lines 4-9: Defines a classification task — AutoML needs to know the data, target column, and which metric to optimise
Line 7: AUC_weighted is the metric AutoML maximises — it tries different algorithms to get the highest score
Lines 13-17: Limits prevent runaway costs — max 50 trials, 4 at a time, 2-hour cap
Line 17: Early termination stops trials that are clearly performing poorly

AutoML task types

Task	Use Case	Example Metric
Classification	Predict a category	AUC_weighted, accuracy, F1
Regression	Predict a number	RMSE, R2, MAE
Time-series forecasting	Predict future values	MAPE, RMSE
Image classification	Classify images	Accuracy
Object detection	Find objects in images	mAP
NLP text classification	Classify text documents	Accuracy, F1

Scenario: Kai establishes a baseline fast

Kai has a new customer churn dataset and needs a baseline model by Friday. Instead of spending days trying different algorithms:

Runs AutoML with 50 trials and a 2-hour timeout
AutoML tries 12 algorithms with various feature engineering
Best model: LightGBM with AUC of 0.943
Kai logs the winner and uses it as the benchmark

Now the data science team knows: “Beat 0.943 AUC or we ship the AutoML model.”

Priya (CTO): “We have a production-ready baseline in 2 hours? I love this.”

Sweep jobs: hyperparameter tuning

Once you’ve chosen an algorithm, sweep jobs search for the best hyperparameters:

from azure.ai.ml.sweep import Choice, Uniform, BanditPolicy
from azure.ai.ml import command

# Define the training command
train_command = command(
    code="./src",
    command="python train.py "
            "--learning-rate ${{search_space.learning_rate}} "
            "--n-estimators ${{search_space.n_estimators}} "
            "--max-depth ${{search_space.max_depth}}",
    environment="azureml:churn-training:3",
    compute="gpu-training-cluster",
)

# Define the search space
sweep_job = train_command.sweep(
    sampling_algorithm="bayesian",
    primary_metric="f1_score",
    goal="maximize",
)

sweep_job.search_space = {
    "learning_rate": Uniform(min_value=0.001, max_value=0.1),
    "n_estimators": Choice(values=[50, 100, 200, 500]),
    "max_depth": Choice(values=[5, 8, 10, 15, 20]),
}

# Early termination — stop bad runs
sweep_job.early_termination = BanditPolicy(
    slack_factor=0.1,
    evaluation_interval=2,
)

sweep_job.set_limits(max_total_trials=200, max_concurrent_trials=8)

# Submit
returned_job = ml_client.jobs.create_or_update(sweep_job)

What’s happening:

Lines 6-12: The training script accepts hyperparameters as command-line arguments
Line 17: Bayesian sampling learns from previous trials to choose smarter next trials
Lines 23-26: The search space defines ranges — MLflow logs each combination tried
Lines 29-31: Bandit policy cancels runs that fall behind the best run by more than 10%

Sampling algorithms

Hyperparameter sampling strategies
Feature	Intelligence	Speed	Best For
Grid	None — tries every combination	Slow (exhaustive)	Small search spaces, need all results
Random	None — picks randomly	Fast start, good coverage	Large spaces, initial exploration
Bayesian	Learns from previous trials	Slower per trial, fewer needed	When trials are expensive, want optimal result

Early termination policies

Policy	How It Works	When to Use
Bandit	Stops runs that lag behind the best by a slack factor	Most common — good balance of exploration and cost
Median stopping	Stops runs below the median of all runs at same point	When you want to keep more diverse trials
Truncation selection	Cancels bottom X% of runs at each interval	Aggressive pruning for large sweeps

Exam tip: Bayesian vs random sampling

The exam often tests when to use each sampling algorithm:

Random: best when the search space is large and you want broad coverage quickly. Also useful when you can afford many trials.
Bayesian: best when each trial is expensive (GPU hours) and you want to converge on the optimum with fewer trials. NOT available with early termination policies that need all runs to complete.
Grid: only practical for very small search spaces (under 20 combinations).

If the question mentions “limited compute budget” and “find the optimal configuration,” the answer is usually Bayesian.

Key terms flashcards

Question

AutoML vs sweep jobs — what's the difference?

Click or press Enter to reveal answer

Answer

AutoML: tries multiple algorithms and feature engineering automatically (broad search). Sweep jobs: searches hyperparameters for ONE chosen algorithm (deep search). Use AutoML for baseline, sweeps for optimization.

Click to flip back

Question

What are the three sampling algorithms for sweep jobs?

Click or press Enter to reveal answer

Answer

Grid (exhaustive, every combination), Random (fast, broad coverage), Bayesian (learns from previous trials, fewer trials needed). Bayesian is best when trials are expensive.

Click to flip back

Question

What does the Bandit early termination policy do?

Click or press Enter to reveal answer

Answer

Cancels runs that fall behind the best-performing run by more than a specified slack factor. Saves compute by stopping clearly underperforming trials.

Click to flip back

Knowledge check

Knowledge Check

Kai has a new dataset and needs a baseline model by Friday. He doesn't know which algorithm will work best. What should he use?

Knowledge Check

Dr. Luca is running a hyperparameter sweep for a genomics model. Each trial uses an A100 GPU and takes 45 minutes. He has budget for about 30 trials. Which sampling algorithm should he choose?

Next up: Training Pipelines — automating the entire training workflow end to end.