Building Agents with Retrieval & Memory

Agents that know, remember, and act

Simple explanation

An agent with retrieval is like an employee with access to the company library. An agent with memory is like an employee who remembers your last conversation. An agent with function calling is like an employee who can actually do things — not just talk about them.

The best agents combine all three: they search for information, remember context, and take action. This module shows you how to build them.

The agent orchestration loop

Step	What Happens	Example
1. User message	User sends a request	”What’s the late payment fee for commercial accounts?“
2. Agent reasons	Model analyses the question and decides what to do	Determines it needs to search the fee schedule
3. Retrieval	Agent searches knowledge base	Foundry IQ returns the fee schedule document
4. Function call (if needed)	Agent calls a function for additional data	`get_account_type(account_id)` returns “commercial”
5. Synthesise	Model combines retrieved data + function results	Generates answer with specific fee amount and policy reference
6. Memory update	Conversation history is updated	Thread now includes user question + agent response
7. Ready for next turn	Agent remembers context for follow-up	User can ask “What about residential accounts?” — agent remembers context

Retrieval integration

Method	How to Set Up	Best For
Foundry IQ	Upload docs to the agent’s knowledge store	Quick setup, small document sets
Azure AI Search	Connect search index to agent tools	Large, complex document collections
Custom retrieval	Write a function that searches your own database	Proprietary data systems

Function calling patterns

Function calling patterns
Feature	Single Function Call	Parallel Function Calls	Sequential Calls
Pattern	Agent calls one function per turn	Agent calls multiple functions simultaneously	Agent chains function calls based on results
Example	Look up order status	Check inventory + get shipping rates at same time	Get customer ID, then look up their orders
Speed	Fast per call	Fastest for independent calls	Slower but necessary for dependent data
Complexity	Low	Medium	Higher — error handling between steps

Conversation memory implementation

Memory Feature	How It Works	Configuration
Thread history	All messages in a thread are sent with each new request	Automatic in Responses API
Context window management	Older messages are summarised or truncated when history exceeds limits	Configure max history length
Memory extraction	Agent extracts key facts from conversations for long-term storage	Custom tool or built-in memory feature
Memory retrieval	Agent searches stored memories to inform responses	Automatic or explicit tool call

Real-world example: Atlas Financial's compliance agent

Atlas Financial’s compliance agent combines all three capabilities:

Retrieval: Connected to Azure AI Search with 50,000 indexed regulations

When a user asks about compliance requirements, the agent automatically searches the regulation index

Function calling:

get_loan_application(app_id) — retrieves application details from the loan system
check_credit_score(applicant_ssn) — checks credit history via external API
generate_assessment(findings) — creates a formatted compliance assessment
flag_for_review(app_id, reason) — escalates to human reviewer

Memory:

Thread-based: each loan review is a separate thread
The agent remembers all findings within a review session
“Show me a summary of all the issues we found” works because the agent has the full thread context

Exam tip: When the agent should NOT call a tool

Not every user message needs a tool call. The agent should:

Call a tool when it needs external data or needs to take an action
Use memory when the answer is in the conversation history
Use its own reasoning when the answer is in the already-retrieved context

Over-calling tools wastes time and tokens. A well-designed agent knows when to reason from existing context.

Key terms

Question

What is the agent orchestration loop?

Click or press Enter to reveal answer

Answer

The cycle where an agent: receives a message, reasons about what to do, optionally calls tools or retrieves data, synthesises a response, updates memory, and waits for the next turn. Managed by the Foundry Responses API.

Click to flip back

Question

What is parallel function calling?

Click or press Enter to reveal answer

Answer

When an agent calls multiple independent functions simultaneously in a single turn. Example: checking inventory and getting shipping rates at the same time. Faster than sequential calls for independent data.

Click to flip back

Question

What is context window management in agents?

Click or press Enter to reveal answer

Answer

The process of handling conversation history when it grows too long for the model's context window. Strategies include summarising older messages, truncating, or using a sliding window of recent messages.

Click to flip back

Knowledge check

Knowledge Check

NeuralMed's patient assistant needs to answer a follow-up question: 'What about the side effects?' The previous message discussed a specific medication. How should the agent handle this?

Knowledge Check

Kai's logistics agent needs to: (1) get the customer's shipping history and (2) get current fuel surcharge rates. These are independent data lookups. What's the most efficient function calling pattern?