Matching the Right AI Model to Your Business Need

Not all AI models are equal

Simple explanation

Choosing an AI model is like choosing a vehicle. A sports car (large model) is fast and powerful but expensive. A motorbike (small model) is cheaper and nimble but carries less. You pick the one that fits the job.

Large models like GPT-4o are powerful — they handle complex reasoning, long documents, and nuanced tasks. But they cost more and respond slower.

Small language models (SLMs) like Phi are lighter and cheaper. They handle simpler tasks brilliantly — classification, summarisation, FAQ answers — at a fraction of the cost. Some even run on phones and edge devices.

Smart organisations use BOTH: big models for hard tasks, small models for easy ones.

The model landscape in Foundry

Microsoft Foundry provides access to 1,800+ models across multiple providers. For the exam, you need to understand the key categories:

Model Category	Examples	Strengths	Best For
OpenAI large models	GPT-4o, GPT-4.1	Complex reasoning, long context, multimodal	Document analysis, strategic planning, complex Q and A
OpenAI reasoning models	o3, o4-mini	Deep multi-step reasoning, math, logic	Financial modelling, scientific analysis, complex problem-solving
OpenAI efficient models	GPT-4o mini	Good quality at lower cost	High-volume tasks: classification, summarisation, FAQ
Microsoft SLMs	Phi-4, Phi-4-mini	Small, fast, deployable on edge devices	On-device AI, mobile apps, low-latency scenarios
Meta open-source	Llama 3.1, Llama 4	Open weights, customisable, no vendor lock-in	Organisations wanting full control and transparency
Mistral	Mistral Large, Mistral Small	European-headquartered, strong multilingual	EU data sovereignty requirements, multilingual tasks
Embedding models	text-embedding-ada-002	Convert text to vectors for search	RAG retrieval, semantic search, similarity matching

Exam tip: Know the model categories, not specific version numbers

The exam won’t ask you to name the latest GPT version. It tests whether you understand:

Large vs small models — when to use each
Commercial vs open-source — trade-offs
Reasoning models — when multi-step logic is needed
SLMs for edge — when the model needs to run on a device, not in the cloud
Embedding models — used for search, not generation

Focus on the categories and decision criteria, not specific model names.

Model selection criteria

The five decision factors

Factor	Question to Ask	Impact
Capability	Can this model handle the task?	Eliminates models that can’t do the job
Cost	How much per token at our scale?	A 10x cost difference is millions at enterprise volume
Latency	How fast does it need to respond?	Real-time apps need fast models; batch processing can wait
Data sensitivity	Where does the data go?	Some models require cloud API calls; SLMs can run locally
Compliance	What certifications are required?	Regulated industries need specific deployment options

Large models vs small models

Aspect	Large Models (GPT-4o, Llama 3.1 405B)	Small Models (GPT-4o mini, Phi-4)
Reasoning ability	Excellent — handles complex, multi-step tasks	Good for simple to moderate tasks
Cost per token	Higher — premium pricing	Lower — often 10x cheaper
Response speed	Slower — more computation needed	Faster — less computation
Context window	Large (128K+ tokens)	Moderate (varies by model)
Edge deployment	No — requires cloud infrastructure	Yes — some run on phones and devices
Best for	Complex analysis, long documents, nuanced reasoning	Classification, FAQ, summarisation, high-volume tasks

The right model for common tasks

Business Task	Recommended Model Size	Why
Answer FAQ questions from a knowledge base	Small (GPT-4o mini, Phi)	Simple retrieval + response, high volume, latency matters
Analyse a 50-page contract for legal risks	Large (GPT-4o, o3)	Needs long context window and nuanced reasoning
Classify customer support tickets by urgency	Small (GPT-4o mini, Phi)	Pattern matching task, high volume, speed important
Generate a strategic market analysis report	Large (GPT-4o)	Complex synthesis, reasoning, and writing quality
Real-time translation in a customer chat	Small to medium (Mistral Small)	Speed critical, straightforward language task
Financial forecasting with complex variables	Reasoning (o3, o4-mini)	Multi-step mathematical reasoning

Question

When should you choose a small language model (SLM) over a large model?

Click or press Enter to reveal answer

Answer

Choose SLMs when: the task is straightforward (classification, FAQ, summarisation), volume is high (cost matters at scale), latency must be low (real-time responses), or the model needs to run on edge devices (phones, local hardware). SLMs are often 10x cheaper than large models.

Click to flip back

Question

What are 'reasoning models' and when should you use them?

Click or press Enter to reveal answer

Answer

Reasoning models (like o3 and o4-mini) are designed for complex, multi-step reasoning — mathematics, logic problems, financial modelling, and scientific analysis. They 'think' through problems step by step. Use them when the task requires deep logical reasoning, not just pattern matching or text generation.

Click to flip back

Open-source vs commercial models

Factor	Commercial (OpenAI GPT)	Open-Source (Llama, Phi, Mistral)
Access	API-based, managed by provider	Downloadable, self-hostable
Customisation	Limited to fine-tuning via API	Full weight access, deeper customisation
Vendor lock-in	Tied to provider’s API and pricing	Run anywhere, switch freely
Support	Enterprise support from provider	Community support, or commercial support from hosting providers
Compliance	Provider manages compliance	YOU manage compliance for self-hosted models
Cost	Pay-per-token (predictable per-call)	Infrastructure cost (predictable per-month)

When open-source models make sense

Open-source models like Llama and Phi are not just “free alternatives.” They offer genuine strategic advantages:

Data sovereignty: Self-host the model so data never leaves your environment
Customisation: Modify the model weights for specialised tasks
No vendor lock-in: Switch hosting providers without changing the model
Transparency: Inspect model architecture and behaviour

The trade-off: you take on the responsibility for hosting, scaling, monitoring, and compliance. Foundry simplifies this by hosting open-source models as managed endpoints.

Question

Why might an organisation choose an open-source model over a commercial one?

Click or press Enter to reveal answer

Answer

Open-source models offer: data sovereignty (self-host, data stays local), full customisation (modify model weights), no vendor lock-in (switch freely), and transparency (inspect model behaviour). The trade-off: you manage hosting, scaling, and compliance yourself. Foundry can host open-source models as managed endpoints.

Click to flip back

SLMs for edge and on-device AI

Small language models (SLMs) like Phi-4 can run directly on devices — laptops, phones, edge hardware — without cloud connectivity.

Edge Use Case	Why SLMs Work	Example
Offline scenarios	No internet required — model runs locally	Field workers in remote areas with no connectivity
Low latency	No network round-trip — instant response	Real-time translation during face-to-face conversations
Data privacy	Data never leaves the device	Healthcare notes processed locally, nothing sent to the cloud
Cost efficiency	No per-token API charges	High-volume local processing at fixed infrastructure cost
IoT and manufacturing	Lightweight models run on industrial hardware	Quality inspection on production line edge devices

Question

What is the advantage of running an SLM on an edge device instead of calling a cloud API?

Click or press Enter to reveal answer

Answer

Edge SLMs provide: no internet dependency (works offline), zero network latency (instant response), data privacy (nothing leaves the device), and no per-token costs (fixed infrastructure cost). Ideal for remote workers, healthcare privacy, manufacturing lines, and IoT devices.

Click to flip back

📊 Dr. Patel evaluates models for a financial services client

Dr. Anisha Patel, Board Advisor, is helping a financial services firm choose AI models for three use cases.

Use case 1: Customer service chatbot

Volume: 50,000 queries per day
Requirement: Fast response, simple question answering
Dr. Patel’s recommendation: GPT-4o mini
Reasoning: High volume makes cost critical. Questions are straightforward. Fast response time improves customer experience. Using GPT-4o would cost 10x more with minimal quality improvement.

Use case 2: Regulatory compliance analysis

Volume: 200 documents per week
Requirement: Analyse complex regulations, identify compliance gaps
Dr. Patel’s recommendation: GPT-4o or o3
Reasoning: Complex, nuanced reasoning required. Long documents need a large context window. Accuracy is more important than speed or cost. Regulatory mistakes have serious consequences.

Use case 3: Fraud detection pattern recognition

Volume: Continuous real-time analysis
Requirement: Low latency, data sovereignty, no cloud dependency
Dr. Patel’s recommendation: Phi-4 on edge infrastructure
Reasoning: Real-time processing needs minimal latency. Transaction data must stay on-premises. SLM handles pattern matching efficiently. No per-token costs for continuous analysis.

The multi-model strategy

Notice that Dr. Patel recommended three different models for three different use cases within the same company. This is the multi-model strategy the exam expects you to understand:

Use the cheapest model that meets quality requirements
Match model size to task complexity
Consider deployment constraints (cloud, edge, on-premise)
Use Foundry to deploy and manage multiple models from a single platform

The exam rewards answers that choose the right-sized model, not the biggest or most impressive one.

Knowledge Check

Tomás's customer support team at PacificSteel processes 100,000 emails daily and needs AI to classify each email by topic and urgency. Which model approach is most cost-effective?

Knowledge Check

Dr. Patel recommends Phi-4 on edge hardware for a fraud detection system. What is the PRIMARY reason for choosing an edge-deployed SLM?

Congratulations! You’ve completed Domain 2: Identify Benefits, Capabilities, and Opportunities for Microsoft AI Apps and Services. You now understand how to map business needs to AI solutions, compare Copilot versions, and choose the right AI models and platforms.

Next up: Responsible AI and Governance — start Domain 3 by learning the principles that keep your AI deployments safe and ethical.