Retrieval, Indexing & Agent Memory

Choosing a retrieval strategy

Simple explanation

Retrieval is how your AI finds the right information before answering a question.

Imagine you’re studying for an exam with 500 pages of notes. You could: search for exact words (keyword search), search by meaning (semantic search), search by “vibes” — finding notes that feel similar even if the words are different (vector search), or combine all three (hybrid search).

Each method has trade-offs. The exam tests whether you know when to pick which one.

The four search methods

Four retrieval methods in Azure AI Search
Feature	Keyword	Semantic	Vector	Hybrid
How it works	Matches exact words (BM25)	Understands meaning using a re-ranker	Compares embeddings in vector space	Combines keyword + vector + re-ranker
Strengths	Fast, precise for exact terms	Understands synonyms and intent	Finds conceptually similar content	Best of all worlds
Weaknesses	Misses synonyms ('car' won't find 'vehicle')	Slower than keyword alone	Needs embedding pipeline	Most complex to configure
Best for	Product codes, error IDs, exact names	Natural language questions	Finding similar documents	Production RAG applications
Azure AI Search feature	Full-text search (default)	Semantic ranker (add-on)	Vector search (configure embeddings)	Hybrid search (combine all)

Exam tip: Hybrid search is usually the right answer

When the exam asks “which search method should you use for a RAG application?” and the scenario doesn’t have a specific constraint, hybrid search is almost always correct. It combines the precision of keyword search with the semantic understanding of vector search, plus re-ranking for relevance.

Only pick a single method when the scenario explicitly constrains you (e.g., “exact product SKU lookup” = keyword, “find conceptually similar research papers” = vector).

Indexing strategies

Before you can search, you need to index your content. Key decisions:

Decision	Options	Impact
Chunking strategy	Fixed-size, paragraph, semantic, document	Affects retrieval precision — too big and you get noise, too small and you lose context
Embedding model	text-embedding-3-small, text-embedding-ada-002, custom	Affects vector search quality and cost
Metadata	Title, source URL, date, section headings	Enables filtering and improves citation quality
Refresh frequency	Real-time, scheduled, on-change	Balances freshness against indexing cost

Real-world example: NeuralMed's indexing strategy

NeuralMed indexes 10,000 medical articles for their patient chatbot:

Chunking: Paragraph-level (medical information needs context — a sentence alone is often meaningless)
Embedding: text-embedding-3-small (good accuracy, lower cost than large)
Metadata: Article title, publication date, medical specialty, source journal
Refresh: Weekly batch (medical literature doesn’t change hourly)
Search type: Hybrid (patients ask natural-language questions, but drug names need exact match)

Agent memory and knowledge integration

Agents need three types of memory:

Memory Type	What It Stores	Scope
Conversation memory	Chat history within a session	Per-thread (one conversation)
Persistent memory	Facts learned across conversations	Per-user or per-agent
Knowledge	External data sources the agent can search	Shared across all conversations

Tool and knowledge integration for agents

Integration Type	Service	Use Case
Knowledge stores	Foundry IQ, Azure AI Search	Agent searches company docs to answer questions
Function calling	Custom functions, APIs	Agent calls external systems (CRM, database, calendar)
Code interpreter	Built-in Foundry tool	Agent writes and runs Python code to analyse data
Web search	Bing grounding	Agent searches the web for current information

Exam tip: Memory vs knowledge

The exam distinguishes between memory (what the agent remembers from conversations) and knowledge (external data the agent can search). A common trap:

“The agent needs to remember the user’s preferences across sessions” → Persistent memory
“The agent needs to answer questions about company policies” → Knowledge integration (Foundry IQ or Search)

Memory is about the conversation. Knowledge is about the data.