πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-800 Domain 3
Domain 3 β€” Module 5 of 7 71%
26 of 28 overall

DP-800 Study Guide

Domain 1: Design and Develop Database Solutions

  • Table Design: Columns, Types, and Indexes Free
  • Constraints: Protecting Your Data Free
  • Specialised Tables and Graph Queries Free
  • JSON in SQL: Store, Query, and Index
  • Programmability: Views, Functions, Procedures, and Triggers
  • Advanced T-SQL: CTEs, Windows, and Correlated Queries
  • Pattern Power: Regular Expressions in T-SQL
  • Fuzzy String Matching: Finding Similar Text
  • AI-Assisted SQL with GitHub Copilot Free
  • MCP: Connecting AI to Your Database

Domain 2: Secure, Optimize, and Deploy Database Solutions

  • Encryption: Always Encrypted and Column-Level
  • Dynamic Data Masking and Row-Level Security
  • Permissions, Auditing, and Passwordless Access
  • Transaction Isolation and Concurrency
  • Query Performance: Plans, DMVs, and Query Store
  • SQL Database Projects: Build and Validate
  • CI/CD Pipelines for SQL Databases
  • Data API Builder: REST and GraphQL from SQL
  • Securing AI and API Endpoints
  • Change Detection: CES, CDC, and Change Tracking
  • Azure Integration and Monitoring

Domain 3: Implement AI Capabilities in Database Solutions

  • External AI Models: Choose, Create, Manage
  • Embeddings: Design, Chunk, and Generate
  • Choose Your Search Strategy
  • Vector Data: Types, Indexes, and Storage
  • Vector Search: Distance, ANN, and ENN
  • Hybrid Search and Reciprocal Rank Fusion
  • RAG with SQL: Prompt, Process, Respond

DP-800 Study Guide

Domain 1: Design and Develop Database Solutions

  • Table Design: Columns, Types, and Indexes Free
  • Constraints: Protecting Your Data Free
  • Specialised Tables and Graph Queries Free
  • JSON in SQL: Store, Query, and Index
  • Programmability: Views, Functions, Procedures, and Triggers
  • Advanced T-SQL: CTEs, Windows, and Correlated Queries
  • Pattern Power: Regular Expressions in T-SQL
  • Fuzzy String Matching: Finding Similar Text
  • AI-Assisted SQL with GitHub Copilot Free
  • MCP: Connecting AI to Your Database

Domain 2: Secure, Optimize, and Deploy Database Solutions

  • Encryption: Always Encrypted and Column-Level
  • Dynamic Data Masking and Row-Level Security
  • Permissions, Auditing, and Passwordless Access
  • Transaction Isolation and Concurrency
  • Query Performance: Plans, DMVs, and Query Store
  • SQL Database Projects: Build and Validate
  • CI/CD Pipelines for SQL Databases
  • Data API Builder: REST and GraphQL from SQL
  • Securing AI and API Endpoints
  • Change Detection: CES, CDC, and Change Tracking
  • Azure Integration and Monitoring

Domain 3: Implement AI Capabilities in Database Solutions

  • External AI Models: Choose, Create, Manage
  • Embeddings: Design, Chunk, and Generate
  • Choose Your Search Strategy
  • Vector Data: Types, Indexes, and Storage
  • Vector Search: Distance, ANN, and ENN
  • Hybrid Search and Reciprocal Rank Fusion
  • RAG with SQL: Prompt, Process, Respond
Domain 3: Implement AI Capabilities in Database Solutions Premium ⏱ ~13 min read

Vector Search: Distance, ANN, and ENN

Implement vector search with VECTOR_DISTANCE and VECTOR_SEARCH β€” understand approximate vs exact nearest neighbours and evaluate search performance.

Finding the closest vectors

β˜• Simple explanation

Imagine finding the nearest coffee shop.

Exact nearest neighbour (ENN) checks every single coffee shop in the city and guarantees the absolute closest one. Accurate, but slow if there are 100,000 shops. Approximate nearest neighbour (ANN) uses a smart shortcut β€” it checks only nearby neighbourhoods and finds a coffee shop that is very likely the closest (99%+ accuracy) but in a fraction of the time.

For most applications, ANN is the right choice β€” it is dramatically faster and almost always finds the correct answer.

ANN (Approximate Nearest Neighbour) uses index structures like DiskANN to find the top-K most similar vectors without scanning every row. It trades a tiny amount of accuracy for massive speed improvements. ENN (Exact Nearest Neighbour) performs a brute-force scan guaranteeing perfect results but at much higher cost. The VECTOR_SEARCH function supports both modes.

ANN vs ENN

ANN vs ENN β€” ANN is correct for most production scenarios
FeatureANN (Approximate)ENN (Exact)
Accuracy~95-99% recall (may miss a few results)100% β€” guaranteed correct top-K
SpeedMilliseconds (uses vector index)Seconds to minutes (full scan)
Requires index?Yes β€” DiskANN vector indexNo β€” works without index
Scales toBillions of vectorsThousands to low millions
Use whenProduction search, real-time queriesSmall datasets, validation, accuracy-critical analysis

Implementing vector search

VECTOR_SEARCH function

-- ANN search: find top 10 similar products (uses vector index)
SELECT vs.ProductId, vs.Name, vs.distance
FROM VECTOR_SEARCH(
    Products,                    -- table
    'DescriptionEmbedding',     -- vector column
    @queryVector,               -- query vector
    10,                         -- top K results
    'cosine'                    -- distance metric
) AS vs;

-- ENN search: exact nearest neighbours (brute force)
SELECT TOP 10 ProductId, Name,
    VECTOR_DISTANCE('cosine', DescriptionEmbedding, @queryVector) AS Distance
FROM Products
ORDER BY Distance ASC;

Combining vector search with filters

-- Search within a specific category (pre-filter)
SELECT vs.ProductId, vs.Name, vs.distance
FROM VECTOR_SEARCH(
    (SELECT * FROM Products WHERE CategoryId = 5),  -- pre-filtered
    'DescriptionEmbedding',
    @queryVector,
    10,
    'cosine'
) AS vs;

-- Post-filter: search first, then filter
SELECT * FROM (
    SELECT vs.ProductId, vs.Name, vs.distance, p.Price
    FROM VECTOR_SEARCH(Products, 'DescriptionEmbedding', @queryVector, 50, 'cosine') vs
    INNER JOIN Products p ON vs.ProductId = p.ProductId
) filtered
WHERE Price < 100
ORDER BY distance ASC;
πŸ’‘ Exam tip: Pre-filter vs post-filter

Pre-filtering applies WHERE clauses BEFORE vector search β€” the ANN index searches only within the filtered subset. More accurate but may not use the vector index efficiently if the subset is small.

Post-filtering runs vector search on the full dataset and filters results afterward. Simpler and uses the index fully, but you may need to request more results (top 50) to get enough after filtering (top 10).

The exam may present a scenario where vector search returns irrelevant results from the wrong category β€” the fix is adding a pre-filter or post-filter.

Evaluating search performance

Key metrics

MetricWhat It MeasuresTarget
Recall@KProportion of true top-K results found by ANNgreater than 95%
Latency (p50/p95)Query response timeless than 100ms for production
Queries per secondThroughput under loadDepends on workload
Index build timeTime to create/rebuild vector indexHours for millions of vectors

Testing recall

-- Compare ANN results with ENN (ground truth) for a sample
-- 1. Get exact top 10 (ENN)
SELECT TOP 10 ProductId, VECTOR_DISTANCE('cosine', DescriptionEmbedding, @queryVector) AS Distance
INTO #ENN_Results FROM Products ORDER BY Distance ASC;

-- 2. Get approximate top 10 (ANN)
SELECT ProductId, distance INTO #ANN_Results
FROM VECTOR_SEARCH(Products, 'DescriptionEmbedding', @queryVector, 10, 'cosine');

-- 3. Calculate recall
SELECT CAST(COUNT(*) AS FLOAT) / 10 AS Recall
FROM #ANN_Results a
WHERE EXISTS (SELECT 1 FROM #ENN_Results e WHERE e.ProductId = a.ProductId);
πŸ’‘ Scenario: Leo tests search quality at SearchWave

Leo runs a recall test on SearchWave’s product search. For 100 random queries, he compares ANN results (top 20) with ENN ground truth (top 20). Average recall is 97% β€” meaning ANN finds 97 out of 100 of the exact nearest neighbours. The 3 missed results are very close in distance (marginal differences). For an e-commerce search, this is excellent β€” users would not notice the difference.

He also measures p95 latency: ANN = 23ms, ENN = 4,200ms. ANN is 180x faster. The decision is clear: ANN for production.

Question

When should you use ENN instead of ANN?

Click or press Enter to reveal answer

Answer

Use ENN (exact nearest neighbour) for: small datasets (under 10K vectors), validation/testing of ANN accuracy, and scenarios where 100% recall is required (e.g., legal discovery). For everything else, ANN is the right choice β€” it is dramatically faster with negligible accuracy loss.

Click to flip back

Question

What is recall@K in vector search?

Click or press Enter to reveal answer

Answer

The proportion of true top-K nearest neighbours that the ANN algorithm actually returns. Recall@10 = 0.95 means ANN found 9.5 out of 10 exact nearest neighbours on average. Higher recall = more accurate but potentially slower.

Click to flip back

Question

What is the difference between VECTOR_SEARCH and VECTOR_DISTANCE?

Click or press Enter to reveal answer

Answer

VECTOR_SEARCH is a table-valued function that uses the DiskANN index for fast ANN search β€” returns top-K results. VECTOR_DISTANCE is a scalar function that calculates the distance between two specific vectors β€” used for ENN (brute force with ORDER BY) or for scoring individual results.

Click to flip back

Knowledge Check

Priya at Vault Bank has 10 million embeddings and needs sub-50ms search latency. She runs a recall test and gets 98% recall with ANN. The security team asks: 'Can you guarantee you always find the most similar record?' What should Priya explain?

🎬 Video coming soon

Next up: Hybrid Search and Reciprocal Rank Fusion β€” combine keyword and vector search for the best of both worlds.

← Previous

Vector Data: Types, Indexes, and Storage

Next β†’

Hybrid Search and Reciprocal Rank Fusion

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.