Choosing the Right AI Model
Not all AI models are created equal. Learn how to pick the right model for each task — LLMs, SLMs, multimodal models, and Foundry Tools — so you don't overspend or underperform.
Why model selection matters
Picking an AI model is like hiring for a job — you wouldn’t hire a brain surgeon to stack shelves, and you wouldn’t hire a shelf-stacker to do surgery.
Large language models (LLMs) like GPT-4o are powerful but expensive. Small language models (SLMs) like Phi-4 are cheaper and faster but less capable. Multimodal models handle images, audio, and video — not just text. And Foundry Tools are pre-built AI services you don’t need to train at all.
The exam tests whether you can match the right model to the right task — balancing cost, speed, accuracy, and capability.
The four model categories
| Feature | LLMs | SLMs | Multimodal | Foundry Tools |
|---|---|---|---|---|
| What they do | Complex reasoning, generation, analysis | Simpler tasks, fast inference | Process text + images + audio + video | Pre-built AI capabilities (search, OCR, speech) |
| Examples | GPT-4o, GPT-4.1, Llama 3.3 | Phi-4, Phi-4-mini, Mistral Small | GPT-4o (vision), Llama 4 | Azure AI Search, Content Understanding, Translator |
| Cost | Higher (more tokens, more compute) | Lower (smaller, faster) | Medium-high (depends on modalities) | Pay-per-use (no model hosting) |
| Best for | Agents, RAG, complex workflows | Edge devices, high-volume simple tasks | Apps that need to see, hear, and read | Structured tasks: search, OCR, translation |
| Deployment | Cloud (Foundry hosted or serverless) | Cloud or edge | Cloud | Managed service (no deployment needed) |
When to use what — decision framework
The exam loves “which model should you use?” questions. Here’s the decision tree:
| Scenario | Best Choice | Why |
|---|---|---|
| Complex multi-step reasoning with tools | LLM (GPT-4o, GPT-4.1) | Needs strong reasoning and function-calling |
| Summarising thousands of support tickets | SLM (Phi-4) | Simple task at high volume — cost matters |
| Analysing medical images alongside patient notes | Multimodal (GPT-4o vision) | Needs to process both text and images |
| Extracting invoice fields from scanned PDFs | Foundry Tool (Content Understanding) | Purpose-built for document extraction |
| Real-time speech transcription in a call centre | Foundry Tool (Azure Speech) | Dedicated speech service, optimised for streaming |
| Building a chatbot that searches company docs | LLM + Foundry Tool (GPT-4o + Azure AI Search) | Combine reasoning with retrieval |
Exam tip: The 'cheapest correct option' trap
The exam often presents scenarios where multiple models could work. The correct answer is usually the one that meets the requirements at the lowest cost and complexity.
For example: “A company needs to classify customer emails as positive, negative, or neutral.” You might think GPT-4o — but Phi-4 or even a Foundry sentiment analysis tool would be cheaper and sufficient. The exam rewards right-sizing, not over-engineering.
Meet the characters
Throughout this course, you’ll follow four teams building AI solutions:
| Character | Who They Are | AI Use Cases |
|---|---|---|
| 🏥 NeuralMed | Health-tech startup, 25 engineers | AI diagnostic assistants, medical record extraction, patient chatbots |
| 🏦 Atlas Financial | Enterprise bank, 3000 employees | Compliance agents, fraud detection, customer service bots |
| 🚀 MediaForge | Content operations platform, 40 developers | Image/video generation, marketing content pipelines, prompt optimisation |
| 👨💻 Kai | AI engineer at a logistics company | Infrastructure decisions, CI/CD for AI, deployment troubleshooting |
Real-world example: Kai's model selection
Kai needs to build three features for the logistics platform:
- Package label OCR — reads shipping labels from photos → Content Understanding (Foundry Tool — purpose-built, no model hosting)
- Route optimisation chatbot — answers complex questions about delivery routes → GPT-4o (LLM — needs reasoning over structured data)
- Automated status updates — generates short “your package is on its way” messages → Phi-4-mini (SLM — simple generation, high volume, low cost)
Three features, three different model choices. That’s model selection in practice.
Foundry Tools vs models
A common exam confusion: Foundry Tools are not models you deploy — they’re managed services you call.
| Foundry Tool | What It Does | When to Use Instead of a Model |
|---|---|---|
| Azure AI Search | Semantic, vector, and hybrid search | When you need retrieval/grounding for RAG |
| Content Understanding | OCR, layout analysis, field extraction from documents | When extracting structured data from PDFs, forms, images |
| Azure Speech | Speech-to-text, text-to-speech | When you need dedicated speech processing |
| Azure Translator | Text and document translation | When you need reliable multilingual translation |
Exam tip: Foundry Tools vs prompting an LLM
The exam tests whether you know when to use a dedicated Foundry Tool versus prompting an LLM to do the same task. Key rule: if a Foundry Tool exists for the task, it’s usually the correct answer — it’s cheaper, more reliable, and purpose-built.
Example: “Translate a 500-page legal document from English to Japanese.” Answer: Azure Translator (Foundry Tool), NOT “prompt GPT-4o to translate.”
The model catalog and Model Router
Microsoft Foundry’s model catalog gives you access to 11,000+ models from OpenAI, Meta, Mistral, Anthropic, and more. You don’t have to use only Microsoft models.
Model Router is a deployable model in the Foundry catalog — you deploy it like any other model and call it via the Chat Completions API. It automatically selects the best underlying model for each request based on cost-performance trade-offs. Think of it as “auto-scaling for model intelligence” — simple requests get routed to cheaper models, complex ones to more capable models.
Key terms
Knowledge check
NeuralMed needs to extract patient names, dates of birth, and medication lists from scanned handwritten prescriptions. Which approach should they use?
Atlas Financial processes 50,000 customer emails daily and needs to classify each as 'complaint', 'enquiry', or 'compliment'. Which model type is most cost-effective?
MediaForge is building a content review tool that analyses both marketing images and their accompanying ad copy together. Which model type should they choose?
🎬 Video coming soon