πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-800 Domain 3
Domain 3 β€” Module 4 of 7 57%
25 of 28 overall

DP-800 Study Guide

Domain 1: Design and Develop Database Solutions

  • Table Design: Columns, Types, and Indexes Free
  • Constraints: Protecting Your Data Free
  • Specialised Tables and Graph Queries Free
  • JSON in SQL: Store, Query, and Index
  • Programmability: Views, Functions, Procedures, and Triggers
  • Advanced T-SQL: CTEs, Windows, and Correlated Queries
  • Pattern Power: Regular Expressions in T-SQL
  • Fuzzy String Matching: Finding Similar Text
  • AI-Assisted SQL with GitHub Copilot Free
  • MCP: Connecting AI to Your Database

Domain 2: Secure, Optimize, and Deploy Database Solutions

  • Encryption: Always Encrypted and Column-Level
  • Dynamic Data Masking and Row-Level Security
  • Permissions, Auditing, and Passwordless Access
  • Transaction Isolation and Concurrency
  • Query Performance: Plans, DMVs, and Query Store
  • SQL Database Projects: Build and Validate
  • CI/CD Pipelines for SQL Databases
  • Data API Builder: REST and GraphQL from SQL
  • Securing AI and API Endpoints
  • Change Detection: CES, CDC, and Change Tracking
  • Azure Integration and Monitoring

Domain 3: Implement AI Capabilities in Database Solutions

  • External AI Models: Choose, Create, Manage
  • Embeddings: Design, Chunk, and Generate
  • Choose Your Search Strategy
  • Vector Data: Types, Indexes, and Storage
  • Vector Search: Distance, ANN, and ENN
  • Hybrid Search and Reciprocal Rank Fusion
  • RAG with SQL: Prompt, Process, Respond

DP-800 Study Guide

Domain 1: Design and Develop Database Solutions

  • Table Design: Columns, Types, and Indexes Free
  • Constraints: Protecting Your Data Free
  • Specialised Tables and Graph Queries Free
  • JSON in SQL: Store, Query, and Index
  • Programmability: Views, Functions, Procedures, and Triggers
  • Advanced T-SQL: CTEs, Windows, and Correlated Queries
  • Pattern Power: Regular Expressions in T-SQL
  • Fuzzy String Matching: Finding Similar Text
  • AI-Assisted SQL with GitHub Copilot Free
  • MCP: Connecting AI to Your Database

Domain 2: Secure, Optimize, and Deploy Database Solutions

  • Encryption: Always Encrypted and Column-Level
  • Dynamic Data Masking and Row-Level Security
  • Permissions, Auditing, and Passwordless Access
  • Transaction Isolation and Concurrency
  • Query Performance: Plans, DMVs, and Query Store
  • SQL Database Projects: Build and Validate
  • CI/CD Pipelines for SQL Databases
  • Data API Builder: REST and GraphQL from SQL
  • Securing AI and API Endpoints
  • Change Detection: CES, CDC, and Change Tracking
  • Azure Integration and Monitoring

Domain 3: Implement AI Capabilities in Database Solutions

  • External AI Models: Choose, Create, Manage
  • Embeddings: Design, Chunk, and Generate
  • Choose Your Search Strategy
  • Vector Data: Types, Indexes, and Storage
  • Vector Search: Distance, ANN, and ENN
  • Hybrid Search and Reciprocal Rank Fusion
  • RAG with SQL: Prompt, Process, Respond
Domain 3: Implement AI Capabilities in Database Solutions Premium ⏱ ~13 min read

Vector Data: Types, Indexes, and Storage

Design your database for vector data β€” understand the VECTOR data type, choose index types and distance metrics, and use VECTOR_NORMALIZE and VECTORPROPERTY.

Storing vectors in SQL

β˜• Simple explanation

Think of GPS coordinates for every piece of data.

Regular data has an address (a primary key). Vector data has GPS coordinates in a high-dimensional space β€” except instead of latitude/longitude (2 dimensions), embeddings have 256 to 3,072 dimensions. SQL Server 2025 adds a native VECTOR data type to store these coordinates efficiently, and vector indexes to find nearby points quickly.

SQL Server 2025 introduces the VECTOR data type for storing fixed-dimension floating-point arrays (embeddings). Combined with vector indexes (DiskANN-based), distance functions (VECTOR_DISTANCE), and search functions (VECTOR_SEARCH), SQL Server becomes a native vector database alongside its relational capabilities.

The VECTOR data type

-- Declare a vector column with specific dimensions
CREATE TABLE Products (
    ProductId INT NOT NULL PRIMARY KEY,
    Name NVARCHAR(200) NOT NULL,
    DescriptionEmbedding VECTOR(1536) NULL  -- 1536-dimension vector
);

-- Insert a vector value (JSON array format)
INSERT INTO Products (ProductId, Name, DescriptionEmbedding)
VALUES (1, 'Wireless Mouse', '[0.0123, -0.0456, 0.0789, ...]');

Choosing vector dimensions

Embedding ModelDimensionsStorage Per VectorQuality
text-embedding-3-small1536~6 KBGood for most use cases
text-embedding-3-large3072~12 KBHigher quality, more storage
text-embedding-ada-0021536~6 KBLegacy, still widely used
Custom/smaller models256-7681-3 KBFaster search, lower quality

Vector functions

VECTOR_NORMALIZE

Normalises a vector to unit length (magnitude = 1). Essential before using cosine distance.

-- Normalise vectors for consistent distance calculations
UPDATE Products
SET DescriptionEmbedding = VECTOR_NORMALIZE(DescriptionEmbedding, 'norm2');

VECTORPROPERTY

Inspects vector metadata:

-- Get the number of dimensions
SELECT VECTORPROPERTY(DescriptionEmbedding, 'Dimensions') AS Dims
FROM Products WHERE ProductId = 1;
-- Returns: 1536

VECTOR_DISTANCE

Calculates the distance between two vectors:

-- Find the 5 most similar products to a query vector
SELECT TOP 5 ProductId, Name,
    VECTOR_DISTANCE('cosine', DescriptionEmbedding, @queryVector) AS Distance
FROM Products
ORDER BY Distance ASC;  -- Lower distance = more similar

Distance metrics

Distance metrics β€” cosine is the default for text embeddings
MetricRangeInterpretationBest For
Cosine0 to 20 = identical direction, 2 = oppositeText embeddings (most common)
Euclidean (L2)0 to infinity0 = same point, larger = fartherWhen magnitude matters (image features)
Dot product-infinity to infinityHigher = more similar (for normalised vectors)Normalised vectors, high-performance search

Vector indexes

Without an index, vector search requires scanning every row (brute force). Vector indexes enable fast approximate nearest neighbour (ANN) search.

-- Create a vector index
CREATE VECTOR INDEX IX_Products_Embedding
ON Products (DescriptionEmbedding)
WITH (
    METRIC = 'cosine',
    TYPE = 'DISKANN'
);

DiskANN index

SQL Server uses DiskANN (Disk-based Approximate Nearest Neighbours) β€” Microsoft Research’s algorithm optimised for SSD-based storage. Key properties:

  • Handles billions of vectors
  • Works with data larger than memory
  • Configurable accuracy vs speed trade-off
πŸ’‘ Exam tip: Vector index sizing

Vector data is large. Plan storage carefully:

  • 1 million rows x 1536 dimensions x 4 bytes = ~6 GB just for vectors
  • Add the DiskANN index overhead (~20-30% additional)
  • Total: ~8 GB for 1M vectors with 1536 dimensions

If the exam asks about storage planning for vector data, remember to account for both the vector column AND the index overhead. Smaller dimensions (256-768) significantly reduce storage requirements.

Question

What is the VECTOR data type in SQL Server 2025?

Click or press Enter to reveal answer

Answer

A native data type that stores fixed-dimension floating-point arrays (embeddings). Declared as VECTOR(N) where N is the number of dimensions (e.g., VECTOR(1536)). Values are stored in an optimised binary format and inserted as JSON arrays.

Click to flip back

Question

Why should you normalise vectors before searching?

Click or press Enter to reveal answer

Answer

VECTOR_NORMALIZE converts vectors to unit length (magnitude = 1). This ensures cosine distance works correctly β€” without normalisation, vectors with different magnitudes may produce misleading distance values even if they point in the same direction.

Click to flip back

Question

What distance metric should you use for text embeddings?

Click or press Enter to reveal answer

Answer

Cosine distance is the standard for text embeddings. It measures the angle between vectors (direction), ignoring magnitude. A cosine distance of 0 means identical direction (most similar), 2 means opposite. Most embedding models are optimised for cosine similarity.

Click to flip back

Question

What is DiskANN?

Click or press Enter to reveal answer

Answer

DiskANN (Disk-based Approximate Nearest Neighbours) is Microsoft Research's vector index algorithm used in SQL Server. It is optimised for SSD storage, handles datasets larger than memory, and provides configurable accuracy vs speed trade-offs. It is the default vector index type.

Click to flip back

Knowledge Check

Priya at Vault Bank is designing a vector search system for 5 million customer support tickets. Each ticket has a 1536-dimension embedding. She needs sub-second search latency. What should she create?

🎬 Video coming soon

Next up: Vector Search: Distance, ANN, and ENN β€” implement vector search with approximate and exact nearest neighbours.

← Previous

Choose Your Search Strategy

Next β†’

Vector Search: Distance, ANN, and ENN

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.