AutoML & Hyperparameter Tuning
Don't guess hyperparameters — sweep them. Learn AutoML for automated model selection and hyperparameter tuning with sweep jobs to find the optimal configuration.
Finding the best model automatically
Imagine you’re buying a car but there are 500 models.
You could test-drive every single one — that would take years. Or you could tell a smart assistant: “I need a sedan, under $40K, good fuel economy” and let them narrow it down to 5 finalists for you to try.
AutoML does this for machine learning. Instead of manually trying Random Forest, then XGBoost, then Neural Net… AutoML tries dozens of algorithms and configurations automatically, then tells you which one performed best.
Hyperparameter tuning is the fine-tuning step: once you’ve chosen your car model, you adjust the seat, mirrors, and steering to get the perfect fit.
AutoML: automated model selection
AutoML in Azure ML automatically:
- Tries multiple algorithms (Random Forest, XGBoost, LightGBM, Neural Nets…)
- Applies feature engineering (encoding, scaling, imputation)
- Selects the best model based on your chosen metric
- Logs everything to MLflow
from azure.ai.ml import automl
# Define an AutoML classification job
classification_job = automl.classification(
training_data=Input(type="mltable", path="azureml:churn-data:2"),
target_column_name="churned",
primary_metric="AUC_weighted",
compute="gpu-training-cluster",
experiment_name="churn-automl-baseline",
)
# Configure limits
classification_job.set_limits(
max_trials=50, # Try up to 50 model configurations
max_concurrent_trials=4, # Run 4 trials in parallel
timeout_minutes=120, # Stop after 2 hours
enable_early_termination=True # Stop bad trials early
)
# Submit the job
returned_job = ml_client.jobs.create_or_update(classification_job)
What’s happening:
- Lines 4-9: Defines a classification task — AutoML needs to know the data, target column, and which metric to optimise
- Line 7:
AUC_weightedis the metric AutoML maximises — it tries different algorithms to get the highest score - Lines 13-17: Limits prevent runaway costs — max 50 trials, 4 at a time, 2-hour cap
- Line 17: Early termination stops trials that are clearly performing poorly
AutoML task types
| Task | Use Case | Example Metric |
|---|---|---|
| Classification | Predict a category | AUC_weighted, accuracy, F1 |
| Regression | Predict a number | RMSE, R2, MAE |
| Time-series forecasting | Predict future values | MAPE, RMSE |
| Image classification | Classify images | Accuracy |
| Object detection | Find objects in images | mAP |
| NLP text classification | Classify text documents | Accuracy, F1 |
Scenario: Kai establishes a baseline fast
Kai has a new customer churn dataset and needs a baseline model by Friday. Instead of spending days trying different algorithms:
- Runs AutoML with 50 trials and a 2-hour timeout
- AutoML tries 12 algorithms with various feature engineering
- Best model: LightGBM with AUC of 0.943
- Kai logs the winner and uses it as the benchmark
Now the data science team knows: “Beat 0.943 AUC or we ship the AutoML model.”
Priya (CTO): “We have a production-ready baseline in 2 hours? I love this.”
Sweep jobs: hyperparameter tuning
Once you’ve chosen an algorithm, sweep jobs search for the best hyperparameters:
from azure.ai.ml.sweep import Choice, Uniform, BanditPolicy
from azure.ai.ml import command
# Define the training command
train_command = command(
code="./src",
command="python train.py "
"--learning-rate ${{search_space.learning_rate}} "
"--n-estimators ${{search_space.n_estimators}} "
"--max-depth ${{search_space.max_depth}}",
environment="azureml:churn-training:3",
compute="gpu-training-cluster",
)
# Define the search space
sweep_job = train_command.sweep(
sampling_algorithm="bayesian",
primary_metric="f1_score",
goal="maximize",
)
sweep_job.search_space = {
"learning_rate": Uniform(min_value=0.001, max_value=0.1),
"n_estimators": Choice(values=[50, 100, 200, 500]),
"max_depth": Choice(values=[5, 8, 10, 15, 20]),
}
# Early termination — stop bad runs
sweep_job.early_termination = BanditPolicy(
slack_factor=0.1,
evaluation_interval=2,
)
sweep_job.set_limits(max_total_trials=200, max_concurrent_trials=8)
# Submit
returned_job = ml_client.jobs.create_or_update(sweep_job)
What’s happening:
- Lines 6-12: The training script accepts hyperparameters as command-line arguments
- Line 17: Bayesian sampling learns from previous trials to choose smarter next trials
- Lines 23-26: The search space defines ranges — MLflow logs each combination tried
- Lines 29-31: Bandit policy cancels runs that fall behind the best run by more than 10%
Sampling algorithms
| Feature | Intelligence | Speed | Best For |
|---|---|---|---|
| Grid | None — tries every combination | Slow (exhaustive) | Small search spaces, need all results |
| Random | None — picks randomly | Fast start, good coverage | Large spaces, initial exploration |
| Bayesian | Learns from previous trials | Slower per trial, fewer needed | When trials are expensive, want optimal result |
Early termination policies
| Policy | How It Works | When to Use |
|---|---|---|
| Bandit | Stops runs that lag behind the best by a slack factor | Most common — good balance of exploration and cost |
| Median stopping | Stops runs below the median of all runs at same point | When you want to keep more diverse trials |
| Truncation selection | Cancels bottom X% of runs at each interval | Aggressive pruning for large sweeps |
Exam tip: Bayesian vs random sampling
The exam often tests when to use each sampling algorithm:
- Random: best when the search space is large and you want broad coverage quickly. Also useful when you can afford many trials.
- Bayesian: best when each trial is expensive (GPU hours) and you want to converge on the optimum with fewer trials. NOT available with early termination policies that need all runs to complete.
- Grid: only practical for very small search spaces (under 20 combinations).
If the question mentions “limited compute budget” and “find the optimal configuration,” the answer is usually Bayesian.
Key terms flashcards
Knowledge check
Kai has a new dataset and needs a baseline model by Friday. He doesn't know which algorithm will work best. What should he use?
Dr. Luca is running a hyperparameter sweep for a genomics model. Each trial uses an A100 GPU and takes 45 minutes. He has budget for about 30 trials. Which sampling algorithm should he choose?
🎬 Video coming soon
Next up: Training Pipelines — automating the entire training workflow end to end.