Designing AI Infrastructure
Before you write a line of code, you need the right Azure infrastructure. Learn how to design the foundation for AI apps and agents β regions, networking, resource topology, and deployment options.
Planning your AI infrastructure
Building AI infrastructure is like setting up a restaurant kitchen before opening night.
You need to decide: Where will the kitchen be? (region) How many ovens do you need? (compute) Should the kitchen be open to walk-ins or reservation-only? (networking) Can you share equipment between branches? (resource topology)
Get these decisions wrong and youβll spend more, go slower, or fail compliance audits. Get them right and everything else flows smoothly.
Region selection
Not all Azure regions offer the same AI services. Your region choice affects:
| Factor | Impact | Example |
|---|---|---|
| Model availability | Not all models are in all regions | GPT-4o may be available in East US but not Australia East |
| Data residency | Regulated industries require data to stay in specific geographies | EU healthcare data must stay in EU regions |
| Latency | Closer regions = faster responses | An app serving users in Asia should use an Asia-Pacific region |
| Capacity | Popular regions may have longer queue times | East US 2 may have shorter wait times than East US |
Exam tip: Region + model availability
The exam may present a scenario where the correct answer depends on model availability in a specific region. Key rule: always check model availability before choosing a region. A region that meets data residency requirements but doesnβt offer your required model is not a valid choice.
Deployment options
| Feature | Serverless (Pay-per-token) | Provisioned Throughput |
|---|---|---|
| How it works | Pay only for tokens consumed | Reserve fixed compute capacity (TPM) |
| Cost model | Variable β scales with usage | Fixed β predictable monthly cost |
| Best for | Development, variable workloads, prototyping | Production with predictable, high-volume traffic |
| Latency | May queue during peak times | Guaranteed capacity, consistent latency |
| Rate limits | Shared pool, may be throttled | Dedicated capacity, higher limits |
| Setup | Deploy model, start calling | Reserve capacity, then deploy |
Other deployment patterns
| Pattern | When to Use |
|---|---|
| Managed compute | Default for most scenarios β Foundry manages the infrastructure |
| Connected compute (self-hosted) | When you need models on your own VMs or Kubernetes |
| Edge deployment | SLMs on IoT devices or local servers (Phi-4-mini on ONNX) |
| Global deployment | Route requests across regions for availability and latency |
Resource topology
A typical AI solution connects multiple Azure resources:
| Resource | Role | Connects To |
|---|---|---|
| Foundry Project | Central workspace for AI development | All other resources |
| Azure AI Search | Retrieval and grounding index | Foundry Project (data connection) |
| Azure Storage | Raw document storage, training data | Search (indexer source), Foundry |
| Azure Key Vault | Secrets and API keys | All services via managed identity |
| Azure Container Apps | Host custom agent code and orchestrators | Foundry Project (via SDK) |
| Azure Monitor / App Insights | Observability and tracing | All services |
Real-world example: Kai's infrastructure design
Kai is designing the infrastructure for the logistics platformβs AI features:
- Region: East US 2 (GPT-4o available, closest to main user base)
- Foundry Project: One project per environment (dev, staging, prod)
- Model deployment: Serverless for dev (low cost), provisioned for prod (predictable latency)
- Search: Azure AI Search Standard tier (handles 10,000 shipping documents)
- Storage: Azure Blob Storage for raw shipment documents
- Networking: Private endpoints for prod, public for dev
- Identity: Managed identity everywhere β no API keys in code
Key terms
Knowledge check
Atlas Financial is deploying a compliance review agent that processes 100,000 loan applications per month with strict SLA requirements. The workload is predictable and steady. Which deployment option should they choose?
NeuralMed must keep all patient data within the European Union due to GDPR requirements. They need GPT-4o for their diagnostic assistant. What should they verify FIRST when choosing an Azure region?
π¬ Video coming soon