Building Agents with Retrieval & Memory
Agents need to remember conversations and search for information. Learn how to build agents that integrate retrieval (searching your data), function calling (taking actions), and conversation memory.
Agents that know, remember, and act
An agent with retrieval is like an employee with access to the company library. An agent with memory is like an employee who remembers your last conversation. An agent with function calling is like an employee who can actually do things — not just talk about them.
The best agents combine all three: they search for information, remember context, and take action. This module shows you how to build them.
The agent orchestration loop
| Step | What Happens | Example |
|---|---|---|
| 1. User message | User sends a request | ”What’s the late payment fee for commercial accounts?“ |
| 2. Agent reasons | Model analyses the question and decides what to do | Determines it needs to search the fee schedule |
| 3. Retrieval | Agent searches knowledge base | Foundry IQ returns the fee schedule document |
| 4. Function call (if needed) | Agent calls a function for additional data | get_account_type(account_id) returns “commercial” |
| 5. Synthesise | Model combines retrieved data + function results | Generates answer with specific fee amount and policy reference |
| 6. Memory update | Conversation history is updated | Thread now includes user question + agent response |
| 7. Ready for next turn | Agent remembers context for follow-up | User can ask “What about residential accounts?” — agent remembers context |
Retrieval integration
| Method | How to Set Up | Best For |
|---|---|---|
| Foundry IQ | Upload docs to the agent’s knowledge store | Quick setup, small document sets |
| Azure AI Search | Connect search index to agent tools | Large, complex document collections |
| Custom retrieval | Write a function that searches your own database | Proprietary data systems |
Function calling patterns
| Feature | Single Function Call | Parallel Function Calls | Sequential Calls |
|---|---|---|---|
| Pattern | Agent calls one function per turn | Agent calls multiple functions simultaneously | Agent chains function calls based on results |
| Example | Look up order status | Check inventory + get shipping rates at same time | Get customer ID, then look up their orders |
| Speed | Fast per call | Fastest for independent calls | Slower but necessary for dependent data |
| Complexity | Low | Medium | Higher — error handling between steps |
Conversation memory implementation
| Memory Feature | How It Works | Configuration |
|---|---|---|
| Thread history | All messages in a thread are sent with each new request | Automatic in Responses API |
| Context window management | Older messages are summarised or truncated when history exceeds limits | Configure max history length |
| Memory extraction | Agent extracts key facts from conversations for long-term storage | Custom tool or built-in memory feature |
| Memory retrieval | Agent searches stored memories to inform responses | Automatic or explicit tool call |
Real-world example: Atlas Financial's compliance agent
Atlas Financial’s compliance agent combines all three capabilities:
Retrieval: Connected to Azure AI Search with 50,000 indexed regulations
- When a user asks about compliance requirements, the agent automatically searches the regulation index
Function calling:
get_loan_application(app_id)— retrieves application details from the loan systemcheck_credit_score(applicant_ssn)— checks credit history via external APIgenerate_assessment(findings)— creates a formatted compliance assessmentflag_for_review(app_id, reason)— escalates to human reviewer
Memory:
- Thread-based: each loan review is a separate thread
- The agent remembers all findings within a review session
- “Show me a summary of all the issues we found” works because the agent has the full thread context
Exam tip: When the agent should NOT call a tool
Not every user message needs a tool call. The agent should:
- Call a tool when it needs external data or needs to take an action
- Use memory when the answer is in the conversation history
- Use its own reasoning when the answer is in the already-retrieved context
Over-calling tools wastes time and tokens. A well-designed agent knows when to reason from existing context.
Key terms
Knowledge check
NeuralMed's patient assistant needs to answer a follow-up question: 'What about the side effects?' The previous message discussed a specific medication. How should the agent handle this?
Kai's logistics agent needs to: (1) get the customer's shipping history and (2) get current fuel surcharge rates. These are independent data lookups. What's the most efficient function calling pattern?
🎬 Video coming soon