🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-800 Domain 1
Domain 1 — Module 8 of 10 80%
8 of 28 overall

DP-800 Study Guide

Domain 1: Design and Develop Database Solutions

  • Table Design: Columns, Types, and Indexes Free
  • Constraints: Protecting Your Data Free
  • Specialised Tables and Graph Queries Free
  • JSON in SQL: Store, Query, and Index
  • Programmability: Views, Functions, Procedures, and Triggers
  • Advanced T-SQL: CTEs, Windows, and Correlated Queries
  • Pattern Power: Regular Expressions in T-SQL
  • Fuzzy String Matching: Finding Similar Text
  • AI-Assisted SQL with GitHub Copilot Free
  • MCP: Connecting AI to Your Database

Domain 2: Secure, Optimize, and Deploy Database Solutions

  • Encryption: Always Encrypted and Column-Level
  • Dynamic Data Masking and Row-Level Security
  • Permissions, Auditing, and Passwordless Access
  • Transaction Isolation and Concurrency
  • Query Performance: Plans, DMVs, and Query Store
  • SQL Database Projects: Build and Validate
  • CI/CD Pipelines for SQL Databases
  • Data API Builder: REST and GraphQL from SQL
  • Securing AI and API Endpoints
  • Change Detection: CES, CDC, and Change Tracking
  • Azure Integration and Monitoring

Domain 3: Implement AI Capabilities in Database Solutions

  • External AI Models: Choose, Create, Manage
  • Embeddings: Design, Chunk, and Generate
  • Choose Your Search Strategy
  • Vector Data: Types, Indexes, and Storage
  • Vector Search: Distance, ANN, and ENN
  • Hybrid Search and Reciprocal Rank Fusion
  • RAG with SQL: Prompt, Process, Respond

DP-800 Study Guide

Domain 1: Design and Develop Database Solutions

  • Table Design: Columns, Types, and Indexes Free
  • Constraints: Protecting Your Data Free
  • Specialised Tables and Graph Queries Free
  • JSON in SQL: Store, Query, and Index
  • Programmability: Views, Functions, Procedures, and Triggers
  • Advanced T-SQL: CTEs, Windows, and Correlated Queries
  • Pattern Power: Regular Expressions in T-SQL
  • Fuzzy String Matching: Finding Similar Text
  • AI-Assisted SQL with GitHub Copilot Free
  • MCP: Connecting AI to Your Database

Domain 2: Secure, Optimize, and Deploy Database Solutions

  • Encryption: Always Encrypted and Column-Level
  • Dynamic Data Masking and Row-Level Security
  • Permissions, Auditing, and Passwordless Access
  • Transaction Isolation and Concurrency
  • Query Performance: Plans, DMVs, and Query Store
  • SQL Database Projects: Build and Validate
  • CI/CD Pipelines for SQL Databases
  • Data API Builder: REST and GraphQL from SQL
  • Securing AI and API Endpoints
  • Change Detection: CES, CDC, and Change Tracking
  • Azure Integration and Monitoring

Domain 3: Implement AI Capabilities in Database Solutions

  • External AI Models: Choose, Create, Manage
  • Embeddings: Design, Chunk, and Generate
  • Choose Your Search Strategy
  • Vector Data: Types, Indexes, and Storage
  • Vector Search: Distance, ANN, and ENN
  • Hybrid Search and Reciprocal Rank Fusion
  • RAG with SQL: Prompt, Process, Respond
Domain 1: Design and Develop Database Solutions Premium ⏱ ~10 min read

Fuzzy String Matching: Finding Similar Text

Use EDIT_DISTANCE and JARO_WINKLER functions to find strings that are close but not identical — catch typos, match misspelled names, and deduplicate messy data.

When exact matching is not enough

☕ Simple explanation

Imagine searching for “Jon Smith” in a customer database.

An exact search finds nothing — but the customer exists as “John Smith.” A fuzzy match says: “those two strings are 90% similar — probably the same person.” That is fuzzy string matching: measuring how close two strings are, even when they are not identical.

SQL Server 2025 adds built-in functions that calculate the “distance” between strings — how many edits (insertions, deletions, swaps) it takes to turn one into the other. Fewer edits = more similar.

Fuzzy string matching (approximate string matching) uses algorithms to calculate similarity between strings that may differ due to typos, spelling variations, abbreviations, or data entry errors. SQL Server 2025 introduces native functions based on two algorithms: Damerau-Levenshtein (EDIT_DISTANCE) and Jaro-Winkler (JARO_WINKLER_DISTANCE).

These functions are currently in preview and require the PREVIEW_FEATURES database scoped configuration to be enabled. They support CHAR, NCHAR, VARCHAR, and NVARCHAR (not MAX types).

The fuzzy matching functions

Four fuzzy functions — two algorithms, each with distance and similarity variants
FunctionAlgorithmReturnsBest For
EDIT_DISTANCEDamerau-LevenshteinInteger (number of edits)Measuring exact number of changes needed
EDIT_DISTANCE_SIMILARITYDamerau-Levenshtein0-100 (percentage)Comparing similarity as a normalised score
JARO_WINKLER_DISTANCEJaro-WinklerFloat 0.0-1.0 (distance)Matching names and short strings where prefix matters
JARO_WINKLER_SIMILARITYJaro-Winkler0-100 (percentage)Similarity score favouring strings that match from the start

EDIT_DISTANCE: count the changes

EDIT_DISTANCE calculates the minimum number of insertions, deletions, substitutions, and transpositions needed to transform one string into another.

-- How many edits to go from "Colour" to "Color"?
SELECT EDIT_DISTANCE('Colour', 'Color') AS Distance;
-- Returns: 1 (delete the 'u')

-- Find customers whose name is within 2 edits of a search term
SELECT FullName, Email
FROM Customers
WHERE EDIT_DISTANCE(FullName, 'Jon Smith') <= 2;
-- Matches: "John Smith" (1 edit), "Jon Smyth" (2 edits)

EDIT_DISTANCE_SIMILARITY: normalised score

SELECT EDIT_DISTANCE_SIMILARITY('Colour', 'Color') AS Similarity;
-- Returns: 83  (83% similar)

-- Find products with names at least 80% similar to the search
SELECT Name, EDIT_DISTANCE_SIMILARITY(Name, @SearchTerm) AS Score
FROM Products
WHERE EDIT_DISTANCE_SIMILARITY(Name, @SearchTerm) >= 80
ORDER BY Score DESC;

The formula: (1 - (edit_distance / GREATEST(LEN(string1), LEN(string2)))) * 100

JARO_WINKLER: prefix-weighted matching

Jaro-Winkler gives extra weight to strings that match from the beginning — making it ideal for matching names where the first few characters are usually correct.

-- Compare "Martha" and "Marhta" (transposition)
SELECT JARO_WINKLER_DISTANCE('Martha', 'Marhta') AS Distance;
-- Returns: ~0.039 (very close — low distance = high similarity)

SELECT JARO_WINKLER_SIMILARITY('Martha', 'Marhta') AS Similarity;
-- Returns: 96 (96% similar)

When to use which algorithm?

ScenarioBest AlgorithmWhy
Name matching (people, companies)Jaro-WinklerPrefix weighting helps — “John” vs “Jon” scores higher than “ohnJ” vs “Jon”
General typo detectionEdit DistanceCounts all edits equally — good for any position
Address matchingEdit DistanceAddresses have errors anywhere, not just at the end
Short strings (2-5 chars)Jaro-WinklerMore sensitive to small differences
Long strings (50+ chars)Edit DistanceMore meaningful absolute counts
DeduplicationEDIT_DISTANCE_SIMILARITYNormalised score makes threshold setting easier across varying string lengths
💡 Exam tip: Preview features require configuration

Fuzzy matching functions are currently in preview. To use them:

ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON;

The exam may test whether you know this prerequisite. If a question mentions fuzzy functions failing, check whether PREVIEW_FEATURES is enabled.

💡 Scenario: Priya's customer deduplication at Vault Bank

Vault Bank has 2 million customer records from three merged systems. Many are duplicates with slight name variations: “Robert Chen” / “Rob Chen” / “Robert C Chen.” Priya builds a deduplication query:

SELECT a.CustomerId, a.FullName, b.CustomerId, b.FullName,
       EDIT_DISTANCE_SIMILARITY(a.FullName, b.FullName) AS NameScore,
       JARO_WINKLER_SIMILARITY(a.Email, b.Email) AS EmailScore
FROM Customers a
INNER JOIN Customers b ON a.CustomerId < b.CustomerId
WHERE EDIT_DISTANCE_SIMILARITY(a.FullName, b.FullName) >= 85
  AND JARO_WINKLER_SIMILARITY(a.Email, b.Email) >= 90;

She uses both algorithms together — Edit Distance for name similarity and Jaro-Winkler for email prefix matching — to find likely duplicates with high confidence.

Question

What does EDIT_DISTANCE measure?

Click or press Enter to reveal answer

Answer

The minimum number of insertions, deletions, substitutions, and transpositions needed to transform one string into another (Damerau-Levenshtein algorithm). A distance of 0 means identical strings. Lower distance = more similar.

Click to flip back

Question

Why is Jaro-Winkler better than Edit Distance for name matching?

Click or press Enter to reveal answer

Answer

Jaro-Winkler gives extra weight to strings that match from the beginning (prefix weighting). Since people's names usually have the correct first few characters (even with typos), this makes it more accurate for name matching. 'John' vs 'Jon' scores higher with Jaro-Winkler than with Edit Distance.

Click to flip back

Question

What prerequisite is needed to use fuzzy matching functions in SQL Server 2025?

Click or press Enter to reveal answer

Answer

Fuzzy matching functions are in preview and require: ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON; Without this, the functions will not be available.

Click to flip back

Knowledge Check

Ingrid at Nordic Shield Insurance needs to match incoming claim names against the policyholder database. Names often have minor typos but the first few characters are usually correct. Which function is most appropriate?

🎬 Video coming soon

Next up: AI-Assisted SQL with GitHub Copilot — use AI tools to write, debug, and optimise T-SQL code.

← Previous

Pattern Power: Regular Expressions in T-SQL

Next →

AI-Assisted SQL with GitHub Copilot

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.