Matching the Right AI Model to Your Business Need
Learn how to choose the right AI model — large or small, commercial or open-source — based on capability, cost, latency, and compliance requirements.
Not all AI models are equal
Choosing an AI model is like choosing a vehicle. A sports car (large model) is fast and powerful but expensive. A motorbike (small model) is cheaper and nimble but carries less. You pick the one that fits the job.
Large models like GPT-4o are powerful — they handle complex reasoning, long documents, and nuanced tasks. But they cost more and respond slower.
Small language models (SLMs) like Phi are lighter and cheaper. They handle simpler tasks brilliantly — classification, summarisation, FAQ answers — at a fraction of the cost. Some even run on phones and edge devices.
Smart organisations use BOTH: big models for hard tasks, small models for easy ones.
The model landscape in Foundry
Microsoft Foundry provides access to 1,800+ models across multiple providers. For the exam, you need to understand the key categories:
| Model Category | Examples | Strengths | Best For |
|---|---|---|---|
| OpenAI large models | GPT-4o, GPT-4.1 | Complex reasoning, long context, multimodal | Document analysis, strategic planning, complex Q and A |
| OpenAI reasoning models | o3, o4-mini | Deep multi-step reasoning, math, logic | Financial modelling, scientific analysis, complex problem-solving |
| OpenAI efficient models | GPT-4o mini | Good quality at lower cost | High-volume tasks: classification, summarisation, FAQ |
| Microsoft SLMs | Phi-4, Phi-4-mini | Small, fast, deployable on edge devices | On-device AI, mobile apps, low-latency scenarios |
| Meta open-source | Llama 3.1, Llama 4 | Open weights, customisable, no vendor lock-in | Organisations wanting full control and transparency |
| Mistral | Mistral Large, Mistral Small | European-headquartered, strong multilingual | EU data sovereignty requirements, multilingual tasks |
| Embedding models | text-embedding-ada-002 | Convert text to vectors for search | RAG retrieval, semantic search, similarity matching |
Exam tip: Know the model categories, not specific version numbers
The exam won’t ask you to name the latest GPT version. It tests whether you understand:
- Large vs small models — when to use each
- Commercial vs open-source — trade-offs
- Reasoning models — when multi-step logic is needed
- SLMs for edge — when the model needs to run on a device, not in the cloud
- Embedding models — used for search, not generation
Focus on the categories and decision criteria, not specific model names.
Model selection criteria
The five decision factors
| Factor | Question to Ask | Impact |
|---|---|---|
| Capability | Can this model handle the task? | Eliminates models that can’t do the job |
| Cost | How much per token at our scale? | A 10x cost difference is millions at enterprise volume |
| Latency | How fast does it need to respond? | Real-time apps need fast models; batch processing can wait |
| Data sensitivity | Where does the data go? | Some models require cloud API calls; SLMs can run locally |
| Compliance | What certifications are required? | Regulated industries need specific deployment options |
Large models vs small models
| Aspect | Large Models (GPT-4o, Llama 3.1 405B) | Small Models (GPT-4o mini, Phi-4) |
|---|---|---|
| Reasoning ability | Excellent — handles complex, multi-step tasks | Good for simple to moderate tasks |
| Cost per token | Higher — premium pricing | Lower — often 10x cheaper |
| Response speed | Slower — more computation needed | Faster — less computation |
| Context window | Large (128K+ tokens) | Moderate (varies by model) |
| Edge deployment | No — requires cloud infrastructure | Yes — some run on phones and devices |
| Best for | Complex analysis, long documents, nuanced reasoning | Classification, FAQ, summarisation, high-volume tasks |
The right model for common tasks
| Business Task | Recommended Model Size | Why |
|---|---|---|
| Answer FAQ questions from a knowledge base | Small (GPT-4o mini, Phi) | Simple retrieval + response, high volume, latency matters |
| Analyse a 50-page contract for legal risks | Large (GPT-4o, o3) | Needs long context window and nuanced reasoning |
| Classify customer support tickets by urgency | Small (GPT-4o mini, Phi) | Pattern matching task, high volume, speed important |
| Generate a strategic market analysis report | Large (GPT-4o) | Complex synthesis, reasoning, and writing quality |
| Real-time translation in a customer chat | Small to medium (Mistral Small) | Speed critical, straightforward language task |
| Financial forecasting with complex variables | Reasoning (o3, o4-mini) | Multi-step mathematical reasoning |
Open-source vs commercial models
| Factor | Commercial (OpenAI GPT) | Open-Source (Llama, Phi, Mistral) |
|---|---|---|
| Access | API-based, managed by provider | Downloadable, self-hostable |
| Customisation | Limited to fine-tuning via API | Full weight access, deeper customisation |
| Vendor lock-in | Tied to provider’s API and pricing | Run anywhere, switch freely |
| Support | Enterprise support from provider | Community support, or commercial support from hosting providers |
| Compliance | Provider manages compliance | YOU manage compliance for self-hosted models |
| Cost | Pay-per-token (predictable per-call) | Infrastructure cost (predictable per-month) |
When open-source models make sense
Open-source models like Llama and Phi are not just “free alternatives.” They offer genuine strategic advantages:
- Data sovereignty: Self-host the model so data never leaves your environment
- Customisation: Modify the model weights for specialised tasks
- No vendor lock-in: Switch hosting providers without changing the model
- Transparency: Inspect model architecture and behaviour
The trade-off: you take on the responsibility for hosting, scaling, monitoring, and compliance. Foundry simplifies this by hosting open-source models as managed endpoints.
SLMs for edge and on-device AI
Small language models (SLMs) like Phi-4 can run directly on devices — laptops, phones, edge hardware — without cloud connectivity.
| Edge Use Case | Why SLMs Work | Example |
|---|---|---|
| Offline scenarios | No internet required — model runs locally | Field workers in remote areas with no connectivity |
| Low latency | No network round-trip — instant response | Real-time translation during face-to-face conversations |
| Data privacy | Data never leaves the device | Healthcare notes processed locally, nothing sent to the cloud |
| Cost efficiency | No per-token API charges | High-volume local processing at fixed infrastructure cost |
| IoT and manufacturing | Lightweight models run on industrial hardware | Quality inspection on production line edge devices |
📊 Dr. Patel evaluates models for a financial services client
Dr. Anisha Patel, Board Advisor, is helping a financial services firm choose AI models for three use cases.
Use case 1: Customer service chatbot
- Volume: 50,000 queries per day
- Requirement: Fast response, simple question answering
- Dr. Patel’s recommendation: GPT-4o mini
- Reasoning: High volume makes cost critical. Questions are straightforward. Fast response time improves customer experience. Using GPT-4o would cost 10x more with minimal quality improvement.
Use case 2: Regulatory compliance analysis
- Volume: 200 documents per week
- Requirement: Analyse complex regulations, identify compliance gaps
- Dr. Patel’s recommendation: GPT-4o or o3
- Reasoning: Complex, nuanced reasoning required. Long documents need a large context window. Accuracy is more important than speed or cost. Regulatory mistakes have serious consequences.
Use case 3: Fraud detection pattern recognition
- Volume: Continuous real-time analysis
- Requirement: Low latency, data sovereignty, no cloud dependency
- Dr. Patel’s recommendation: Phi-4 on edge infrastructure
- Reasoning: Real-time processing needs minimal latency. Transaction data must stay on-premises. SLM handles pattern matching efficiently. No per-token costs for continuous analysis.
The multi-model strategy
Notice that Dr. Patel recommended three different models for three different use cases within the same company. This is the multi-model strategy the exam expects you to understand:
- Use the cheapest model that meets quality requirements
- Match model size to task complexity
- Consider deployment constraints (cloud, edge, on-premise)
- Use Foundry to deploy and manage multiple models from a single platform
The exam rewards answers that choose the right-sized model, not the biggest or most impressive one.
Tomás's customer support team at PacificSteel processes 100,000 emails daily and needs AI to classify each email by topic and urgency. Which model approach is most cost-effective?
Dr. Patel recommends Phi-4 on edge hardware for a fraud detection system. What is the PRIMARY reason for choosing an edge-deployed SLM?
🎬 Video coming soon
Congratulations! You’ve completed Domain 2: Identify Benefits, Capabilities, and Opportunities for Microsoft AI Apps and Services. You now understand how to map business needs to AI solutions, compare Copilot versions, and choose the right AI models and platforms.
Next up: Responsible AI and Governance — start Domain 3 by learning the principles that keep your AI deployments safe and ethical.