🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-103 Domain 2
Domain 2 — Module 6 of 11 55%
14 of 27 overall

AI-103 Study Guide

Domain 1: Plan and Manage an Azure AI Solution

  • Choosing the Right AI Model Free
  • Foundry Services: Your AI Toolkit Free
  • Retrieval, Indexing & Agent Memory
  • Designing AI Infrastructure
  • Deploying Models & CI/CD
  • Quotas, Scaling & Cost
  • Monitoring & Security
  • Responsible AI: Filters, Auditing & Governance

Domain 2: Implement Generative AI and Agentic Solutions

  • Connecting Your App to Foundry Free
  • Building RAG Applications
  • Workflows & Reasoning Pipelines
  • Evaluating AI Models & Apps
  • Agent Fundamentals: Roles, Goals & Tools Free
  • Building Agents with Retrieval & Memory
  • Agent Tools & Knowledge Integration
  • Multi-Agent Orchestration & Safeguards
  • Agent Monitoring & Error Analysis
  • Prompt Engineering & Model Tuning
  • Observability & Production Operations

Domain 3: Implement Computer Vision Solutions

  • Image & Video Generation
  • Multimodal Visual Understanding
  • Responsible AI for Visual Content

Domain 4: Implement Text Analysis Solutions

  • Text Analysis with Language Models
  • Speech, Translation & Voice Agents

Domain 5: Implement Information Extraction Solutions

  • Ingestion, Indexing & Grounding Pipelines
  • Extracting Content with Content Understanding
  • Exam Prep: Putting It All Together

AI-103 Study Guide

Domain 1: Plan and Manage an Azure AI Solution

  • Choosing the Right AI Model Free
  • Foundry Services: Your AI Toolkit Free
  • Retrieval, Indexing & Agent Memory
  • Designing AI Infrastructure
  • Deploying Models & CI/CD
  • Quotas, Scaling & Cost
  • Monitoring & Security
  • Responsible AI: Filters, Auditing & Governance

Domain 2: Implement Generative AI and Agentic Solutions

  • Connecting Your App to Foundry Free
  • Building RAG Applications
  • Workflows & Reasoning Pipelines
  • Evaluating AI Models & Apps
  • Agent Fundamentals: Roles, Goals & Tools Free
  • Building Agents with Retrieval & Memory
  • Agent Tools & Knowledge Integration
  • Multi-Agent Orchestration & Safeguards
  • Agent Monitoring & Error Analysis
  • Prompt Engineering & Model Tuning
  • Observability & Production Operations

Domain 3: Implement Computer Vision Solutions

  • Image & Video Generation
  • Multimodal Visual Understanding
  • Responsible AI for Visual Content

Domain 4: Implement Text Analysis Solutions

  • Text Analysis with Language Models
  • Speech, Translation & Voice Agents

Domain 5: Implement Information Extraction Solutions

  • Ingestion, Indexing & Grounding Pipelines
  • Extracting Content with Content Understanding
  • Exam Prep: Putting It All Together
Domain 2: Implement Generative AI and Agentic Solutions Premium ⏱ ~14 min read

Building Agents with Retrieval & Memory

Agents need to remember conversations and search for information. Learn how to build agents that integrate retrieval (searching your data), function calling (taking actions), and conversation memory.

Agents that know, remember, and act

☕ Simple explanation

An agent with retrieval is like an employee with access to the company library. An agent with memory is like an employee who remembers your last conversation. An agent with function calling is like an employee who can actually do things — not just talk about them.

The best agents combine all three: they search for information, remember context, and take action. This module shows you how to build them.

Production agents in Microsoft Foundry typically integrate three capabilities:

  • Retrieval — searching knowledge bases (Foundry IQ, Azure AI Search) to ground responses in data
  • Function calling — executing external functions (APIs, databases, services) to take actions
  • Conversation memory — maintaining context across turns in a conversation thread

The Foundry Agent Service manages these capabilities through the Responses API, handling the orchestration loop: model reasons → calls tool → receives result → reasons again → responds to user.

The agent orchestration loop

StepWhat HappensExample
1. User messageUser sends a request”What’s the late payment fee for commercial accounts?“
2. Agent reasonsModel analyses the question and decides what to doDetermines it needs to search the fee schedule
3. RetrievalAgent searches knowledge baseFoundry IQ returns the fee schedule document
4. Function call (if needed)Agent calls a function for additional dataget_account_type(account_id) returns “commercial”
5. SynthesiseModel combines retrieved data + function resultsGenerates answer with specific fee amount and policy reference
6. Memory updateConversation history is updatedThread now includes user question + agent response
7. Ready for next turnAgent remembers context for follow-upUser can ask “What about residential accounts?” — agent remembers context

Retrieval integration

MethodHow to Set UpBest For
Foundry IQUpload docs to the agent’s knowledge storeQuick setup, small document sets
Azure AI SearchConnect search index to agent toolsLarge, complex document collections
Custom retrievalWrite a function that searches your own databaseProprietary data systems

Function calling patterns

Function calling patterns
FeatureSingle Function CallParallel Function CallsSequential Calls
PatternAgent calls one function per turnAgent calls multiple functions simultaneouslyAgent chains function calls based on results
ExampleLook up order statusCheck inventory + get shipping rates at same timeGet customer ID, then look up their orders
SpeedFast per callFastest for independent callsSlower but necessary for dependent data
ComplexityLowMediumHigher — error handling between steps

Conversation memory implementation

Memory FeatureHow It WorksConfiguration
Thread historyAll messages in a thread are sent with each new requestAutomatic in Responses API
Context window managementOlder messages are summarised or truncated when history exceeds limitsConfigure max history length
Memory extractionAgent extracts key facts from conversations for long-term storageCustom tool or built-in memory feature
Memory retrievalAgent searches stored memories to inform responsesAutomatic or explicit tool call
ℹ️ Real-world example: Atlas Financial's compliance agent

Atlas Financial’s compliance agent combines all three capabilities:

Retrieval: Connected to Azure AI Search with 50,000 indexed regulations

  • When a user asks about compliance requirements, the agent automatically searches the regulation index

Function calling:

  • get_loan_application(app_id) — retrieves application details from the loan system
  • check_credit_score(applicant_ssn) — checks credit history via external API
  • generate_assessment(findings) — creates a formatted compliance assessment
  • flag_for_review(app_id, reason) — escalates to human reviewer

Memory:

  • Thread-based: each loan review is a separate thread
  • The agent remembers all findings within a review session
  • “Show me a summary of all the issues we found” works because the agent has the full thread context
💡 Exam tip: When the agent should NOT call a tool

Not every user message needs a tool call. The agent should:

  • Call a tool when it needs external data or needs to take an action
  • Use memory when the answer is in the conversation history
  • Use its own reasoning when the answer is in the already-retrieved context

Over-calling tools wastes time and tokens. A well-designed agent knows when to reason from existing context.

Key terms

Question

What is the agent orchestration loop?

Click or press Enter to reveal answer

Answer

The cycle where an agent: receives a message, reasons about what to do, optionally calls tools or retrieves data, synthesises a response, updates memory, and waits for the next turn. Managed by the Foundry Responses API.

Click to flip back

Question

What is parallel function calling?

Click or press Enter to reveal answer

Answer

When an agent calls multiple independent functions simultaneously in a single turn. Example: checking inventory and getting shipping rates at the same time. Faster than sequential calls for independent data.

Click to flip back

Question

What is context window management in agents?

Click or press Enter to reveal answer

Answer

The process of handling conversation history when it grows too long for the model's context window. Strategies include summarising older messages, truncating, or using a sliding window of recent messages.

Click to flip back

Knowledge check

Knowledge Check

NeuralMed's patient assistant needs to answer a follow-up question: 'What about the side effects?' The previous message discussed a specific medication. How should the agent handle this?

Knowledge Check

Kai's logistics agent needs to: (1) get the customer's shipping history and (2) get current fuel surcharge rates. These are independent data lookups. What's the most efficient function calling pattern?

🎬 Video coming soon

← Previous

Agent Fundamentals: Roles, Goals & Tools

Next →

Agent Tools & Knowledge Integration

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.