Choosing the Right AI Model
Not all AI models are the same. Some are great at text, others at images, others at code. This module teaches you how to pick the right model for the job β a key exam skill.
Why model choice matters
Picking an AI model is like choosing the right tool from a toolbox.
You wouldnβt use a hammer to cut wood, and you wouldnβt use a saw to drive a nail. AI models work the same way β each one is designed for specific tasks. A text model excels at writing, a vision model excels at understanding images, and a speech model excels at converting voice to text.
The exam tests your ability to match the right model to the right scenario.
Model categories
| Category | What They Do | Examples | Best For |
|---|---|---|---|
| Large Language Models (LLMs) | Generate and understand text | GPT-4o, GPT-4, Phi-4 | Chat, summarisation, translation, code |
| Small Language Models (SLMs) | Text tasks with lower cost/latency | Phi-4-mini, Phi-3-small | Simple tasks, edge devices, cost-sensitive apps |
| Image generation models | Create images from text descriptions | GPT-image-1.5 | Marketing visuals, concept art, design |
| Vision models | Analyse and understand images | GPT-4o (vision), Florence | Image classification, object detection, OCR |
| Speech models | Convert speech β text | Azure Speech Service | Transcription, voice assistants, TTS |
| Embedding models | Convert text to numerical vectors | text-embedding-ada-002 | Search, similarity, RAG retrieval |
How to choose: the decision framework
When the exam gives you a scenario, use this framework:
Step 1: What type of input/output do you need?
- Text in, text out β LLM
- Text in, image out β Image generation
- Image in, text out β Vision model
- Audio in, text out β Speech model
- Multiple types β Multimodal model (GPT-4o)
Step 2: Whatβs the complexity?
- Simple task (classify, extract) β Smaller, cheaper model
- Complex task (reason, create) β Larger, more capable model
Step 3: What are your constraints?
- Low budget β SLM (Phi-4-mini)
- Low latency β SLM or smaller LLM
- Highest quality β GPT-4o or GPT-4
- Privacy-sensitive β On-device or edge model
| Feature | When to Use | Model Choice |
|---|---|---|
| Chat assistant for customers | Need natural conversation, reasoning | GPT-4o or GPT-4 |
| Summarise meeting notes | Text in, text out, moderate complexity | GPT-4o-mini or Phi-4 |
| Generate product images | Text description β new image | GPT-image-1.5 |
| Classify support tickets | Simple text classification | Phi-4-mini (cost-efficient) |
| Transcribe phone calls | Audio β text | Azure Speech Service |
| Analyse medical X-rays | Image understanding + reasoning | GPT-4o (multimodal) |
| Search company documents | Need to find relevant passages | Embedding model + RAG |
Large vs small models
| Feature | Large Models (GPT-4o) | Small Models (Phi-4-mini) |
|---|---|---|
| Parameters | Hundreds of billions | Billions (10x-100x smaller) |
| Capability | Broad, complex reasoning | Focused, specific tasks |
| Cost | Higher per-token pricing | Significantly cheaper |
| Latency | Slower (more computation) | Faster responses |
| Best for | Complex tasks, multimodal, creative | Classification, extraction, simple chat |
| Can run on edge? | No β cloud only | Yes β can run on devices |
Microsoft's Phi family β small but mighty
Microsoft developed the Phi family of small language models specifically for scenarios where cost, latency, or deployment location matters more than maximum capability.
- Phi-4 β latest, most capable small model
- Phi-4-mini β even smaller, great for classification and extraction
- Phi-3 β previous generation, still widely deployed
The key insight: for many business tasks (email classification, FAQ answers, data extraction), a small model performs nearly as well as GPT-4o at a fraction of the cost.
Exam relevance: When a scenario mentions βcost-effectiveβ or βedge deploymentβ or βlow latencyβ β think Phi or other SLMs.
The Foundry model catalog
Microsoft Foundry includes a model catalog β a library of models from multiple providers that you can deploy directly:
| Provider | Models | Strengths |
|---|---|---|
| OpenAI | GPT-4o, GPT-4, GPT-image-1.5 | Best general-purpose, multimodal |
| Microsoft | Phi-4, Phi-4-mini | Cost-efficient, edge-friendly |
| Meta | Llama 3 | Open-source, customisable |
| Mistral | Mistral Large, Mistral Small | European alternative, efficient |
| Cohere | Command R | Strong at RAG and retrieval |
Key exam concept: You donβt need to memorise every model. You need to understand the categories (LLM, SLM, vision, speech, embedding) and know how to choose based on task requirements.
π¬ Video walkthrough
π¬ Video coming soon
Choosing the Right AI Model β AI-901 Module 4
Choosing the Right AI Model β AI-901 Module 4
~12 minFlashcards
Knowledge Check
GreenLeaf wants to automatically classify incoming support emails into categories: billing, technical, general inquiry. They process 50,000 emails per day and need to keep costs low. Which model approach is most appropriate?
MediSpark needs an AI model that can accept both a medical image (X-ray) and a text question ('What abnormalities are visible?') and return a text response. Which type of model do they need?
Priya needs to build a search feature that finds the most relevant company documents when a user types a question. Which combination of models should she use?
Next up: Deploying AI Models β configuration parameters like temperature, top-p, and max tokens that control how your model behaves.