🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AB-100 Domain 1
Domain 1 — Module 6 of 7 86%
6 of 29 overall

AB-100 Study Guide

Domain 1: Plan AI-Powered Business Solutions

  • Agent Requirements & Data Readiness
  • AI Strategy & the Cloud Adoption Framework
  • Multi-Agent Solution Design
  • Build, Buy, or Extend
  • Generative AI, Knowledge Sources & Prompt Engineering
  • Small Language Models & Model Selection
  • ROI, TCO & Business Case Analysis

Domain 2: Design AI-Powered Business Solutions

  • Copilot in D365 Customer Experience & Service
  • Agent Types: Task, Autonomous & Prompt/Response
  • Foundry Tools & Code-First Solutions
  • Copilot Studio: Topics, Flows & Prompt Actions
  • Power Apps, WAF & Data Processing
  • Extensibility: Custom Models, M365 Agents & Copilot Studio
  • MCP, Computer Use & Agent Behaviours
  • M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot
  • D365 AI Orchestration: Finance, SCM & Customer Experience

Domain 3: Deploy AI-Powered Business Solutions

  • Agent Monitoring: Tools, Metrics, and Processes
  • Telemetry Interpretation and Agent Tuning
  • Testing Strategy for AI Agents
  • Custom Model Validation and Prompt Best Practices
  • End-to-End Testing for Multi-App AI Solutions
  • ALM Foundations & Data Lifecycle for AI
  • ALM for Copilot Studio Agents
  • ALM for Microsoft Foundry Agents
  • ALM for D365 AI Features
  • Agent Security Free
  • Governance for AI Agents Free
  • Prompt Security & AI Vulnerabilities Free
  • Responsible AI & Audit Trails Free

AB-100 Study Guide

Domain 1: Plan AI-Powered Business Solutions

  • Agent Requirements & Data Readiness
  • AI Strategy & the Cloud Adoption Framework
  • Multi-Agent Solution Design
  • Build, Buy, or Extend
  • Generative AI, Knowledge Sources & Prompt Engineering
  • Small Language Models & Model Selection
  • ROI, TCO & Business Case Analysis

Domain 2: Design AI-Powered Business Solutions

  • Copilot in D365 Customer Experience & Service
  • Agent Types: Task, Autonomous & Prompt/Response
  • Foundry Tools & Code-First Solutions
  • Copilot Studio: Topics, Flows & Prompt Actions
  • Power Apps, WAF & Data Processing
  • Extensibility: Custom Models, M365 Agents & Copilot Studio
  • MCP, Computer Use & Agent Behaviours
  • M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot
  • D365 AI Orchestration: Finance, SCM & Customer Experience

Domain 3: Deploy AI-Powered Business Solutions

  • Agent Monitoring: Tools, Metrics, and Processes
  • Telemetry Interpretation and Agent Tuning
  • Testing Strategy for AI Agents
  • Custom Model Validation and Prompt Best Practices
  • End-to-End Testing for Multi-App AI Solutions
  • ALM Foundations & Data Lifecycle for AI
  • ALM for Copilot Studio Agents
  • ALM for Microsoft Foundry Agents
  • ALM for D365 AI Features
  • Agent Security Free
  • Governance for AI Agents Free
  • Prompt Security & AI Vulnerabilities Free
  • Responsible AI & Audit Trails Free
Domain 1: Plan AI-Powered Business Solutions Premium ⏱ ~13 min read

Small Language Models & Model Selection

Not every AI task needs GPT-4. Learn when small language models (SLMs) are the right choice, how model routers intelligently select the best model for each request, and how to design a model selection strategy for enterprise AI solutions.

Why bigger isn’t always better

☕ Simple explanation

Imagine you need someone to sort your mail. You wouldn’t hire a brain surgeon — you’d hire someone who’s fast, efficient, and cheap for that specific task.

Small language models (SLMs) are the efficient mail sorters of the AI world. They’re trained for specific tasks — classifying text, extracting data, answering domain-specific questions — and they do those tasks faster and cheaper than massive general-purpose models like GPT-4.

A model router is like a smart receptionist who looks at each incoming request and decides: “This is a simple question — send it to the small model. This needs deep reasoning — send it to the big model.” It optimises cost without sacrificing quality.

Small language models (SLMs) are AI models with fewer parameters than large language models (LLMs), typically ranging from 1B to 14B parameters. Microsoft’s Phi family is a prominent example. SLMs offer lower latency, lower cost, and the ability to run on edge devices — making them ideal for specific, well-defined tasks in business solutions.

Model routers in Microsoft Foundry are deployable AI models trained to analyse incoming prompts and route them to the most suitable underlying LLM in real time. They evaluate query complexity, cost, and performance to make routing decisions. Available in Balanced, Quality, and Cost modes, model routers deliver high performance while optimising compute spend.

When to use small language models

SLMs shine in specific scenarios. The exam tests whether you can identify when an SLM is the right architectural choice.

Use cases for small language models in business solutions
FeatureBest ForExample SLMWhy Not an LLM?
Edge/on-device inferenceManufacturing sensor log analysis, retail POS text recommendationsPhi-3-mini, Phi-3.5-miniLLMs require cloud connectivity and have higher latency
High-volume, simple tasksEmail classification, sentiment analysis, intent detectionPhi-3-small, fine-tuned Phi modelsCost of running GPT-4 on millions of simple classifications is prohibitive
Domain-specific reasoningLegal document analysis, medical coding, financial report parsingFine-tuned Phi or custom-trained modelsAfter fine-tuning, SLMs match LLM quality on narrow domains at lower cost
Low-latency requirementsReal-time customer service routing, chatbot intent detectionPhi-3-mini, ONNX-optimised modelsLLMs take 2-5 seconds; SLMs respond in milliseconds
Data sovereigntyGovernment or regulated industries where data cannot leave the premisesSelf-hosted Phi models on Azure or on-premisesCloud LLM APIs may not meet data residency requirements
💡 Exam tip: SLM decision signals

Look for these keywords in exam scenarios:

  • “Edge,” “on-premises,” “limited connectivity” — SLM (can run locally)
  • “Millions of requests,” “high volume,” “cost-sensitive” — SLM (cheaper per inference)
  • “Under 1 second response time,” “real-time” — SLM (lower latency)
  • “Data cannot leave the environment” — SLM (self-hosted)
  • “Complex reasoning,” “multi-step analysis,” “creative generation” — LLM (SLMs lack depth for these)
  • “General-purpose assistant across many topics” — LLM (SLMs are narrow specialists)

Model router: intelligent model selection

A model router in Microsoft Foundry is a deployed model that analyses each incoming prompt and routes it to the most suitable underlying LLM. You deploy it like any other model — one endpoint, one deployment — but behind the scenes it selects from multiple models.

How model router works:

  1. You deploy a model router from the Foundry model catalogue
  2. You send requests to a single endpoint (just like calling GPT-4)
  3. The router analyses each prompt — complexity, task type, reasoning requirements
  4. It routes to the best model based on your selected routing mode
  5. The response includes a model field revealing which underlying model was selected

Routing modes:

ModeBehaviourBest For
Balanced (default)Considers all models within a small quality range (1-2% of best) and picks the most cost-effectiveMost enterprise workloads
QualityAlways picks the highest-quality model regardless of costLegal review, medical summaries, complex reasoning
CostConsiders a larger quality band (5-6% of best) and picks the cheapestHigh-volume classification, simple Q&A, content tagging
💡 Scenario: Kai implements model routing for Apex Industries

Kai designs a model routing strategy for Apex’s AI platform:

Agent 1 — Customer FAQ bot: Handles thousands of simple product questions daily. Routing mode: Cost — most questions are straightforward; smaller models handle them fine.

Agent 2 — Quality inspection analyser: Reviews complex inspection reports and identifies potential compliance issues. Routing mode: Quality — accuracy is critical; regulatory compliance can’t tolerate errors.

Agent 3 — General supply chain assistant: A mix of simple lookups and complex analysis. Routing mode: Balanced — the router automatically sends simple queries to cheap models and complex ones to powerful models.

Cost impact: By using model routing instead of sending everything to GPT-4, Kai estimates a 40% reduction in inference costs with minimal quality degradation.

💡 Deep dive: model router architecture details

Key architectural facts for the exam:

  • Single deployment: You deploy model router once. Don’t deploy the underlying models separately (except Claude models, which need their own deployment)
  • Content filters: Applied at the router level — one filter covers all underlying models
  • Rate limits: Applied at the router level — one quota for all traffic
  • Model subset: You can restrict which underlying models the router uses (useful if you need specific context window sizes or want to exclude certain models)
  • Auto-update: Router versions can auto-update, which changes the underlying model set
  • Automatic failover: If a routed model has issues, the router transparently redirects to the next best model
  • Monitoring: Use Azure Monitor to track which underlying models are being selected and at what cost

Designing a model selection strategy

As an architect, you need a strategy that covers the full spectrum of AI tasks:

Task ComplexityRecommended ApproachCost
Simple classification, intent detectionSLM (Phi-3-mini) or model router in Cost modeVery low
Standard Q&A, summarisation, content generationModel router in Balanced modeLow to medium
Complex reasoning, multi-step analysisModel router in Quality mode or direct LLM (GPT-4)Medium to high
Domain-specific with strict accuracyFine-tuned SLM or fine-tuned LLM with RAGVariable (training cost upfront, low inference)
Edge/offline scenariosDeployed SLM on edge device (ONNX runtime)One-time deployment cost
💡 Exam tip: model selection hierarchy

The exam rewards architects who follow this cost-optimisation hierarchy:

  1. Can a model router handle it? — Use model router first (it automatically optimises)
  2. Is it a narrow, high-volume task? — Consider an SLM
  3. Does it need domain expertise? — Fine-tune an SLM on your data
  4. Does it need deep reasoning across broad topics? — Use a direct LLM
  5. Does it need to run offline? — Deploy an SLM to edge

The wrong answer is almost always “use GPT-4 for everything.” The right answer considers cost, latency, and accuracy requirements for each specific task.

Flashcards

Question

What is a model router in Microsoft Foundry?

Click or press Enter to reveal answer

Answer

A deployable AI model that analyses incoming prompts and routes them to the most suitable underlying LLM in real time. It optimises cost while maintaining quality, and is deployed as a single endpoint that selects from multiple models behind the scenes.

Click to flip back

Question

Name the three routing modes for model router.

Click or press Enter to reveal answer

Answer

Balanced (default — optimises cost within a small quality range), Quality (always picks the best model regardless of cost), and Cost (picks the cheapest model within a larger quality band).

Click to flip back

Question

When should you use a small language model instead of a large language model?

Click or press Enter to reveal answer

Answer

For edge/on-device inference, high-volume simple tasks, domain-specific reasoning after fine-tuning, low-latency requirements, and data sovereignty scenarios where data cannot leave the environment.

Click to flip back

Question

What happens when a model in a model router deployment experiences issues?

Click or press Enter to reveal answer

Answer

Automatic failover — the router transparently redirects the request to the next most appropriate model. This is built-in and requires no additional configuration.

Click to flip back

Knowledge check

Knowledge Check

Kai's manufacturing client needs an AI system that classifies equipment maintenance logs from text sensor outputs on the production floor. The factory has intermittent internet connectivity, and the classification must happen in under 500 milliseconds. Which approach should Kai recommend?

Knowledge Check

Adrienne's financial services company processes 2 million customer emails per month for intent classification (complaint, inquiry, request, compliment). The classification is straightforward — most emails clearly fall into one category. Which model strategy minimises cost while maintaining accuracy?

Knowledge Check

Which of the following is NOT a benefit of using a model router compared to deploying a single large language model?

🎬 Video coming soon

Next up: ROI, TCO & Business Case Analysis — building the financial case for AI investments, understanding total cost of ownership, and proving value to leadership.

← Previous

Generative AI, Knowledge Sources & Prompt Engineering

Next →

ROI, TCO & Business Case Analysis

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.