🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 5
Domain 5 — Module 2 of 3 67%
24 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 5: Optimize Generative AI Systems and Model Performance Premium ⏱ ~14 min read

Embeddings & Hybrid Search

Vector search alone isn't enough. Learn to select embedding models, implement hybrid search combining semantic and keyword retrieval, and optimize for domain-specific accuracy.

How embeddings power search

☕ Simple explanation

Embeddings translate words into coordinates on a map.

Imagine a giant map where every word and sentence has a location. “Dog” and “puppy” are neighbours. “Dog” and “refrigerator” are on opposite sides of the map. “Bank” (money) and “bank” (river) are in completely different neighbourhoods.

When you search, the system finds your query’s location on the map and returns whatever is closest. “What’s the refund policy?” lands near documents about returns, refunds, and exchanges — even if those documents don’t use the exact word “refund.”

That’s the magic: embeddings understand meaning, not just matching words.

An embedding is a dense vector representation of text in a high-dimensional space (typically 256-3072 dimensions). Key properties:

  • Semantic similarity — texts with similar meaning have vectors that are close together (measured by cosine similarity)
  • Fixed dimensions — regardless of input length, the output vector has the same number of dimensions
  • Learned representations — embedding models are trained on massive text corpora to capture meaning

In RAG, embeddings power the retrieval step: documents are embedded at index time, queries are embedded at search time, and the closest vectors are returned as context.

Embedding model selection

Azure OpenAI offers several embedding models with different trade-offs:

Azure OpenAI embedding models
FeatureDimensionsMax InputRelative QualityCost
text-embedding-3-small15368191 tokensGood — general purposeLowest
text-embedding-3-large30728191 tokensBest — highest accuracy~6x more than small
text-embedding-ada-00215368191 tokensLegacy — still worksBetween small and large

Choosing the right model

Use CaseRecommended ModelWhy
General-purpose chatbottext-embedding-3-smallGood quality, lowest cost, fast
High-accuracy domain searchtext-embedding-3-largeBest quality, worth the cost for critical apps
Budget-constrained, high volumetext-embedding-3-small (reduced dimensions)Can reduce to 256 dims for cost savings
Scientific/medical domainDomain-specific model or fine-tunedGeneral models miss specialised terminology

Dimensionality trade-off

The text-embedding-3 models support dimension reduction — you can request fewer dimensions to save storage and speed up search:

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="your-key",
    api_version="2024-06-01",
    azure_endpoint="https://your-resource.openai.azure.com"
)

# Full dimensions (1536)
response_full = client.embeddings.create(
    model="text-embedding-3-small",
    input="What is the refund policy?",
)
# response_full.data[0].embedding → 1536-dim vector

# Reduced dimensions (256) — smaller, faster, slightly less accurate
response_small = client.embeddings.create(
    model="text-embedding-3-small",
    input="What is the refund policy?",
    dimensions=256,
)
# response_small.data[0].embedding → 256-dim vector

What’s happening:

  • Lines 10-13: Standard embedding call — returns a 1536-dimensional vector
  • Lines 17-21: Same model with dimensions=256 — returns a 256-dimensional vector
  • Fewer dimensions = smaller index, faster search, but slightly lower accuracy
  • For most applications, 512-1024 dimensions provide a good balance
💡 Exam tip: Dimensionality affects quality AND cost

The exam tests the trade-off:

  • Higher dimensions = better quality, larger index, slower search, more storage cost
  • Lower dimensions = slightly lower quality, smaller index, faster search, less storage
  • text-embedding-3-small at 256 dimensions can be sufficient for many use cases while being significantly cheaper to store and search
  • You CANNOT increase dimensions beyond the model’s maximum (1536 for small, 3072 for large)

If a question asks how to reduce search latency or storage cost without changing models, the answer is reduce embedding dimensions.

Vector search vs keyword search

Neither vector search nor keyword search is universally better — they excel at different things:

Vector search vs keyword search
FeatureStrengthsWeaknessesBest For
Vector SearchUnderstands meaning and synonyms; finds semantically similar results even with different wordsMisses exact terms (product codes, IDs, names); can match unrelated content with surface-level similarityNatural language questions, conceptual queries
Keyword Search (BM25)Exact term matching; great for codes, names, specific phrases; fast and well-understoodMisses synonyms and paraphrases; 'car' won't match 'automobile'Specific lookups, product codes, legal citations

Example of where each fails:

QueryVector SearchKeyword Search
”How do I return an item?”Finds documents about refund policy, returns process, exchanges (correct)Only finds docs containing “return” and “item” (misses docs about “refund process”)
“Policy ABC-2024-Q3”Might return any policy document (wrong)Finds the exact policy by ID (correct)

Hybrid search: the best of both worlds

Hybrid search combines vector (semantic) and keyword (BM25) retrieval, then merges the results. This almost always outperforms either approach alone.

How hybrid search works

  1. Vector search: embed the query, find the top N semantically similar chunks
  2. Keyword search (BM25): run the same query as a text search, find the top N keyword matches
  3. Merge results: combine both result sets using a fusion algorithm
  4. Return top K: the merged, re-ordered results become the context for the LLM

Reciprocal Rank Fusion (RRF)

RRF is the most common algorithm for merging hybrid search results. It scores each document based on its rank in each result set:

RRF score = sum of 1 / (k + rank) for each result set

Where k is a constant (typically 60). Documents that appear high in BOTH result sets get the best combined score.

Example:

DocumentVector RankKeyword RankRRF Score
Doc A151/61 + 1/65 = 0.0318
Doc B311/63 + 1/61 = 0.0323
Doc C281/62 + 1/68 = 0.0308

Doc B wins — it ranked well in both searches. Doc A was the best vector match but only 5th in keywords. Doc B was the best keyword match AND 3rd in vector search, giving it the highest combined relevance.

Configuring hybrid search in Azure AI Search

from azure.search.documents import SearchClient

search_client = SearchClient(
    endpoint="https://your-search.search.windows.net",
    index_name="documents-index",
    credential=credential,
)

# Hybrid search: vector + keyword combined
results = search_client.search(
    search_text="What is the refund policy?",   # Keyword search
    vector_queries=[
        {
            "kind": "vector",
            "vector": query_embedding,            # Vector search
            "k_nearest_neighbors": 10,
            "fields": "content_vector",
        }
    ],
    query_type="semantic",                        # Enable semantic ranking
    semantic_configuration_name="semantic-config",
    top=5,
)

for result in results:
    print(f"Score: {result['@search.score']:.4f} | {result['title']}")

What’s happening:

  • Line 11: search_text triggers keyword (BM25) search
  • Lines 12-19: vector_queries triggers vector search using the query embedding
  • Lines 20-21: query_type="semantic" enables an additional semantic re-ranking layer
  • Azure AI Search automatically fuses the results using RRF
  • Line 22: top=5 returns the best 5 combined results
Scenario: Dr. Luca uses domain-specific embeddings for genomics

Dr. Luca Bianchi at GenomeVault is building a RAG system over 50,000 genomics research papers. The general-purpose text-embedding-3-large model misses critical matches:

Problem: Searching for “BRCA1 mutation pathogenicity” returns papers about general cancer genetics but misses papers that use the term “BRCA1 variant of uncertain significance (VUS)” — semantically related but using different terminology.

Root cause: General embeddings don’t understand that “pathogenicity” and “variant of uncertain significance” are closely related in genomics. In general English, these phrases have no connection.

Solution: Dr. Luca evaluates two approaches:

ApproachRelevance ScoreLatencyCost
General embeddings (text-embedding-3-large)3.4120msBaseline
Hybrid search (general embeddings + BM25)4.1150ms+25%

Prof. Sarah Lin approves the hybrid approach as the best cost-quality balance. The keyword search catches exact gene names (BRCA1, TP53) that vector search sometimes misses, while vector search catches semantic relationships that keywords miss.

For future work, Dr. Luca may explore training a custom embedding model with sentence-transformers on GenomeVault’s paper corpus — but hybrid search provides immediate improvement without that investment.

Domain-specific embedding optimization

When general embeddings don’t understand your domain’s terminology, you have several options — but fine-tuning Azure OpenAI embedding models directly is NOT one of them.

Azure OpenAI embedding models (text-embedding-3-small, text-embedding-3-large) are pre-trained and NOT fine-tunable. To optimize for your domain, you can: (1) use a larger embedding model for better accuracy, (2) adjust the dimensions parameter to trade accuracy for cost, (3) train a custom embedding model outside Azure OpenAI using frameworks like sentence-transformers.

When to optimize embeddings

ConditionApproachAlternative
General domain, standard vocabularyNo optimization neededUse text-embedding-3-small/large
Specialised terminology (medical, legal, scientific)Hybrid search firstCombine vector + keyword to catch domain terms
Critical accuracy requirements AND specialised vocabCustom model (sentence-transformers)Train outside Azure OpenAI on domain pairs
Limited data (under 1000 examples)Hybrid search + prompt engineeringMost cost-effective approach
💡 Exam tip: Hybrid search almost always outperforms pure approaches

Key exam takeaway: hybrid search (vector + keyword) almost always outperforms either approach alone. This is well-established in information retrieval research.

If a question asks how to improve retrieval quality, and hybrid search is an option, it’s very likely the correct answer. The only exception is if the question specifically asks about reducing latency or simplifying architecture — in those cases, pure vector search is simpler.

Also remember: embedding dimensionality affects both quality AND cost. Higher dimensions = better quality but more storage and slower search.

Key terms flashcards

Question

What is an embedding?

Click or press Enter to reveal answer

Answer

A dense vector representation of text in a high-dimensional space (256-3072 dimensions). Texts with similar meaning have vectors that are close together (measured by cosine similarity). Powers the retrieval step in RAG.

Click to flip back

Question

What is hybrid search?

Click or press Enter to reveal answer

Answer

Combining vector search (semantic, understands meaning) with keyword search (BM25, exact matching) and merging results using Reciprocal Rank Fusion (RRF). Almost always outperforms either approach alone.

Click to flip back

Question

What is Reciprocal Rank Fusion (RRF)?

Click or press Enter to reveal answer

Answer

An algorithm for merging results from multiple search methods. Scores each document as sum of 1/(k+rank) across result sets. Documents that rank well in BOTH vector and keyword search get the highest combined score.

Click to flip back

Question

When should you optimize embedding models for your domain?

Click or press Enter to reveal answer

Answer

Azure OpenAI embeddings are NOT fine-tunable. Instead: (1) try hybrid search first (vector + keyword), (2) use a larger model (3-large), (3) adjust dimensions for cost/quality trade-off, (4) for critical domains, train a custom model with sentence-transformers outside Azure OpenAI.

Click to flip back

Knowledge check

Knowledge Check

Dr. Luca's search for 'TP53 loss of function' using pure vector search returns papers about general protein function loss but misses papers about 'TP53 tumour suppressor inactivation.' Adding keyword search fixes this. Why?

Knowledge Check

Zara needs to reduce the storage cost of Atlas's vector index by 60% without changing the embedding model. What should she do?

🎬 Video coming soon


Next up: Fine-Tuning: Methods, Data & Production — the last resort when prompting and RAG aren’t enough.

← Previous

RAG Optimization: Better Retrieval, Better Answers

Next →

Fine-Tuning: Methods, Data & Production

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.