Infrastructure as Code: Provisioning at Scale

Why Infrastructure as Code for ML?

Simple explanation

Imagine building IKEA furniture without the instructions.

You could probably figure it out — but it would take longer, you’d make mistakes, and if you needed to build the same bookshelf for 10 offices, you’d go insane. Now imagine the instructions were code: run it once, get a perfect bookshelf every time.

Infrastructure as Code (IaC) is the same idea for Azure resources. Instead of clicking through the portal, you write a file that describes what you want — workspaces, compute clusters, networking — and a tool builds it identically every time.

Bicep: Azure’s IaC language

Bicep is a domain-specific language for deploying Azure resources. It compiles to ARM templates but is much more readable.

// main.bicep — Deploy an ML workspace with all dependencies
param location string = resourceGroup().location
param workspaceName string = 'ml-workspace-prod'

// Storage account for workspace artifacts
resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: 'mlstorage${uniqueString(resourceGroup().id)}'
  location: location
  sku: { name: 'Standard_LRS' }
  kind: 'StorageV2'
}

// Key Vault for secrets
resource keyVault 'Microsoft.KeyVault/vaults@2023-07-01' = {
  name: 'mlkv-${uniqueString(resourceGroup().id)}'
  location: location
  properties: {
    tenantId: subscription().tenantId
    sku: { family: 'A', name: 'standard' }
    accessPolicies: []
  }
}

// Application Insights
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: 'ml-insights-${uniqueString(resourceGroup().id)}'
  location: location
  kind: 'web'
  properties: { Application_Type: 'web' }
}

// The ML workspace itself
resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = {
  name: workspaceName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    storageAccount: storage.id
    keyVault: keyVault.id
    applicationInsights: appInsights.id
  }
}

What’s happening:

Lines 2-3: Parameters make the template reusable (different names per environment)
Lines 6-11: Creates a storage account with a unique name
Lines 14-22: Creates a Key Vault for workspace secrets
Lines 25-30: Creates Application Insights for monitoring
Lines 33-42: Creates the ML workspace and links it to all three supporting resources
Line 37: SystemAssigned managed identity — the workspace authenticates without passwords

Deploy with Azure CLI:

# Deploy the Bicep template
az deployment group create \
  --resource-group rg-ml-prod \
  --template-file main.bicep \
  --parameters workspaceName=ml-workspace-prod

Scenario: Dr. Fatima's IaC strategy at Meridian

Meridian Financial needs identical ML environments across three regions (East US, UK South, Australia East) for data residency compliance. James Chen (CISO) requires that every infrastructure change goes through a pull request review.

Dr. Fatima’s approach:

Bicep templates in a Git repository (infra/ folder)
Parameter files per environment: params-dev.json, params-prod-eastus.json, params-prod-uksouth.json
GitHub Actions deploys automatically when changes merge to main
Pull request reviews — compliance team approves infra changes before merge

Result: 3 identical production workspaces deployed from one template. Audit trail in Git.

GitHub Actions for infrastructure deployment

A GitHub Actions workflow can deploy your Bicep templates automatically:

# .github/workflows/deploy-ml-infra.yml
name: Deploy ML Infrastructure
on:
  push:
    branches: [main]
    paths: ['infra/**']  # Only trigger when infra files change

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # Required for OIDC auth
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Azure Login (OIDC — no secrets)
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Deploy Bicep
        uses: azure/arm-deploy@v2
        with:
          resourceGroupName: rg-ml-prod
          template: infra/main.bicep
          parameters: infra/params-prod.json

What’s happening:

Lines 5-6: Only triggers when files in the infra/ folder change — not every push
Lines 11-12: OIDC (OpenID Connect) permissions — federated credentials, no stored secrets
Lines 17-22: Uses workload identity federation to authenticate to Azure — the recommended approach
Lines 24-28: Deploys the Bicep template with environment-specific parameters

Exam tip: OIDC vs stored credentials in GitHub Actions

The exam favours workload identity federation (OIDC) over stored service principal secrets for GitHub Actions.

Why OIDC is better:

No secrets to rotate
Short-lived tokens (issued per run)
Scoped to specific branches and environments

If a question asks “what is the most secure way to authenticate GitHub Actions to Azure,” the answer is federated credentials with OIDC.

Network restrictions for workspaces

Enterprise workspaces often need network isolation:

Protection Layer	How	When
Private endpoint	Workspace accessible only via VNet	Regulated industries, data sovereignty
Managed VNet	Azure auto-creates an isolated VNet for the workspace	Simpler setup, still secure
Service endpoints	Restrict storage/KV access to specific VNets	Defence in depth with private endpoints
No public access	Disable public network access entirely	Maximum isolation

// Add private endpoint to the workspace
resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = {
  name: workspaceName
  location: location
  properties: {
    publicNetworkAccess: 'Disabled'     // No public access
    managedNetwork: {
      isolationMode: 'AllowOnlyApprovedOutbound'
    }
  }
}

Exam tip: Managed VNet vs bring-your-own VNet

Azure ML offers two networking models:

Managed VNet: Azure creates and manages the network. Simpler to configure. Use isolationMode to control outbound traffic.
Bring-your-own VNet: You create the VNet and subnets. More control. Required when integrating with existing enterprise networking.

The exam tests when to use each:

Managed VNet when: you need isolation but don’t have complex existing networking
BYO VNet when: you need to integrate with on-premises networks, existing firewall rules, or hub-spoke topologies

Key terms flashcards

Question

What is Bicep?

Click or press Enter to reveal answer

Answer

Azure's native Infrastructure as Code language. Compiles to ARM templates but is more readable. Used to deploy ML workspaces, compute, networking, and all supporting resources.

Click to flip back

Question

Why use OIDC (federated credentials) in GitHub Actions?

Click or press Enter to reveal answer

Answer

No secrets to rotate, short-lived tokens per run, scoped to branches/environments. More secure than stored service principal secrets.

Click to flip back

Question

Managed VNet vs bring-your-own VNet for ML workspaces?

Click or press Enter to reveal answer

Answer

Managed: Azure creates and manages the network (simpler). BYO: you control subnets and peering (required for existing enterprise networking or hub-spoke topologies).

Click to flip back

Knowledge check

Knowledge Check

Dr. Fatima needs to deploy identical ML workspaces across three Azure regions for data residency compliance. Every change must be auditable. What approach should she use?

Knowledge Check

Kai is setting up a GitHub Actions workflow to deploy ML infrastructure. James (the CISO) says 'no long-lived secrets in GitHub.' What authentication method should Kai use?

Next up: Git & CI/CD for ML Projects — managing ML code with source control and automated workflows.