Vector Data: Types, Indexes, and Storage
Design your database for vector data β understand the VECTOR data type, choose index types and distance metrics, and use VECTOR_NORMALIZE and VECTORPROPERTY.
Storing vectors in SQL
Think of GPS coordinates for every piece of data.
Regular data has an address (a primary key). Vector data has GPS coordinates in a high-dimensional space β except instead of latitude/longitude (2 dimensions), embeddings have 256 to 3,072 dimensions. SQL Server 2025 adds a native VECTOR data type to store these coordinates efficiently, and vector indexes to find nearby points quickly.
The VECTOR data type
-- Declare a vector column with specific dimensions
CREATE TABLE Products (
ProductId INT NOT NULL PRIMARY KEY,
Name NVARCHAR(200) NOT NULL,
DescriptionEmbedding VECTOR(1536) NULL -- 1536-dimension vector
);
-- Insert a vector value (JSON array format)
INSERT INTO Products (ProductId, Name, DescriptionEmbedding)
VALUES (1, 'Wireless Mouse', '[0.0123, -0.0456, 0.0789, ...]');
Choosing vector dimensions
| Embedding Model | Dimensions | Storage Per Vector | Quality |
|---|---|---|---|
| text-embedding-3-small | 1536 | ~6 KB | Good for most use cases |
| text-embedding-3-large | 3072 | ~12 KB | Higher quality, more storage |
| text-embedding-ada-002 | 1536 | ~6 KB | Legacy, still widely used |
| Custom/smaller models | 256-768 | 1-3 KB | Faster search, lower quality |
Vector functions
VECTOR_NORMALIZE
Normalises a vector to unit length (magnitude = 1). Essential before using cosine distance.
-- Normalise vectors for consistent distance calculations
UPDATE Products
SET DescriptionEmbedding = VECTOR_NORMALIZE(DescriptionEmbedding, 'norm2');
VECTORPROPERTY
Inspects vector metadata:
-- Get the number of dimensions
SELECT VECTORPROPERTY(DescriptionEmbedding, 'Dimensions') AS Dims
FROM Products WHERE ProductId = 1;
-- Returns: 1536
VECTOR_DISTANCE
Calculates the distance between two vectors:
-- Find the 5 most similar products to a query vector
SELECT TOP 5 ProductId, Name,
VECTOR_DISTANCE('cosine', DescriptionEmbedding, @queryVector) AS Distance
FROM Products
ORDER BY Distance ASC; -- Lower distance = more similar
Distance metrics
| Metric | Range | Interpretation | Best For |
|---|---|---|---|
| Cosine | 0 to 2 | 0 = identical direction, 2 = opposite | Text embeddings (most common) |
| Euclidean (L2) | 0 to infinity | 0 = same point, larger = farther | When magnitude matters (image features) |
| Dot product | -infinity to infinity | Higher = more similar (for normalised vectors) | Normalised vectors, high-performance search |
Vector indexes
Without an index, vector search requires scanning every row (brute force). Vector indexes enable fast approximate nearest neighbour (ANN) search.
-- Create a vector index
CREATE VECTOR INDEX IX_Products_Embedding
ON Products (DescriptionEmbedding)
WITH (
METRIC = 'cosine',
TYPE = 'DISKANN'
);
DiskANN index
SQL Server uses DiskANN (Disk-based Approximate Nearest Neighbours) β Microsoft Researchβs algorithm optimised for SSD-based storage. Key properties:
- Handles billions of vectors
- Works with data larger than memory
- Configurable accuracy vs speed trade-off
Exam tip: Vector index sizing
Vector data is large. Plan storage carefully:
- 1 million rows x 1536 dimensions x 4 bytes = ~6 GB just for vectors
- Add the DiskANN index overhead (~20-30% additional)
- Total: ~8 GB for 1M vectors with 1536 dimensions
If the exam asks about storage planning for vector data, remember to account for both the vector column AND the index overhead. Smaller dimensions (256-768) significantly reduce storage requirements.
Priya at Vault Bank is designing a vector search system for 5 million customer support tickets. Each ticket has a 1536-dimension embedding. She needs sub-second search latency. What should she create?
π¬ Video coming soon
Next up: Vector Search: Distance, ANN, and ENN β implement vector search with approximate and exact nearest neighbours.