Pattern Power: Regular Expressions in T-SQL

Pattern matching finally comes to T-SQL

Simple explanation

Think of LIKE as a basic search, and regex as a super-powered search.

LIKE can find “starts with A” or “contains ‘bank’.” But what if you need: “find all phone numbers in any format,” “extract the domain from an email address,” or “replace all credit card numbers with asterisks”? That is what regular expressions (regex) do — they describe patterns, not specific text.

SQL Server 2025 adds seven REGEXP functions. Before this, you had to use CLR assemblies or export data to another language. Now it is all native T-SQL.

The seven REGEXP functions

Seven REGEXP functions — each returns a different result type
Function	Returns	What It Does	Equivalent To
REGEXP_LIKE	Boolean	Does the pattern match?	LIKE but with full regex
REGEXP_REPLACE	String	Replace matched text	REPLACE but pattern-based
REGEXP_SUBSTR	String	Extract matched text	SUBSTRING with pattern matching
REGEXP_INSTR	Integer	Position of match	CHARINDEX with patterns
REGEXP_COUNT	Integer	Count of matches	No simple equivalent
REGEXP_MATCHES	Table	All captured groups as rows	No simple equivalent
REGEXP_SPLIT_TO_TABLE	Table	Split string by pattern into rows	STRING_SPLIT with regex delimiters

REGEXP_LIKE: does it match?

The simplest function — returns TRUE or FALSE.

-- Find customers with valid email format
SELECT FullName, Email
FROM Customers
WHERE REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

-- Find phone numbers in any format (with or without dashes/spaces)
SELECT * FROM Contacts
WHERE REGEXP_LIKE(Phone, '^\+?\d[\d\s\-]{7,15}$');

Flags

All REGEXP functions accept an optional flags parameter:

Flag	Meaning	Default
`c`	Case-sensitive	Yes (default)
`i`	Case-insensitive	No
`m`	Multiline (^ and $ match line boundaries)	No
`s`	Dot matches newline	No

-- Case-insensitive search
SELECT * FROM Products
WHERE REGEXP_LIKE(Description, 'wireless|bluetooth', 'i');

REGEXP_REPLACE: find and transform

-- Mask credit card numbers: 4111-2222-3333-4444 -> ****-****-****-4444
SELECT REGEXP_REPLACE(CardNumber, '\d{4}-\d{4}-\d{4}-', '****-****-****-') AS Masked
FROM Payments;

-- Standardise phone numbers: remove all non-digits
SELECT REGEXP_REPLACE(Phone, '[^\d+]', '') AS CleanPhone
FROM Contacts;

-- Clean up extra whitespace
SELECT REGEXP_REPLACE(Description, '\s+', ' ') AS Cleaned
FROM Products;

Backreferences in replacements

REGEXP_REPLACE supports captured groups with \1, \2, etc.:

-- Reformat dates: 2026-04-21 -> 21/04/2026
SELECT REGEXP_REPLACE('2026-04-21', '(\d{4})-(\d{2})-(\d{2})', '\3/\2/\1');
-- Returns: 21/04/2026

REGEXP_SUBSTR: extract matching text

-- Extract domain from email
SELECT REGEXP_SUBSTR(Email, '[^@]+$') AS Domain
FROM Customers;

-- Extract first number from a mixed string
SELECT REGEXP_SUBSTR('Order #12345 - Priority', '\d+') AS OrderNumber;
-- Returns: 12345

REGEXP_INSTR: find the position

-- Find position of first digit in a string
SELECT REGEXP_INSTR('ABC-123-DEF', '\d') AS FirstDigitPosition;
-- Returns: 5

REGEXP_COUNT: how many matches?

-- Count how many words are in a description
SELECT Description, REGEXP_COUNT(Description, '\b\w+\b') AS WordCount
FROM Products;

-- Count email addresses in a text block
SELECT REGEXP_COUNT(Notes, '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}') AS EmailCount
FROM SupportTickets;

REGEXP_MATCHES: extract all captured groups

Returns a table of all captured subgroups — one row per match.

-- Extract all key=value pairs from a config string
SELECT match_value
FROM REGEXP_MATCHES('host=db.local;port=1433;user=admin', '(\w+)=([^;]+)');

REGEXP_SPLIT_TO_TABLE: split by pattern

Like STRING_SPLIT but the delimiter is a regex pattern.

-- Split a CSV with inconsistent delimiters (commas, semicolons, pipes)
SELECT value
FROM REGEXP_SPLIT_TO_TABLE('apple,banana;cherry|date', '[,;|]');
-- Returns 4 rows: apple, banana, cherry, date

Scenario: Priya's data cleaning at Vault Bank

Vault Bank receives transaction descriptions in messy formats from various payment processors. Priya uses REGEXP functions to standardise them:

-- Extract merchant name (everything before the first number or special char)
UPDATE Transactions
SET MerchantName = TRIM(REGEXP_SUBSTR(RawDescription, '^[A-Za-z\s]+'));

-- Flag suspicious descriptions containing known fraud patterns
SELECT * FROM Transactions
WHERE REGEXP_LIKE(RawDescription, '(TEST|FRAUD|STOLEN|XXX)', 'i');

-- Count how many transactions have non-ASCII characters
SELECT COUNT(*) FROM Transactions
WHERE REGEXP_LIKE(RawDescription, '[^\x00-\x7F]');

Exam tip: REGEXP_LIKE requires compatibility level 170

REGEXP_LIKE, REGEXP_MATCHES, and REGEXP_SPLIT_TO_TABLE specifically require database compatibility level 170 or above. The other four REGEXP functions work at all compatibility levels. If the exam mentions a database at compatibility level 160, those three will not be available — but REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR, and REGEXP_COUNT will still work.

Check and set compatibility level:

-- Check
SELECT compatibility_level FROM sys.databases WHERE name = DB_NAME();

-- Set to 170
ALTER DATABASE [MyDB] SET COMPATIBILITY_LEVEL = 170;

Question

What is the difference between LIKE and REGEXP_LIKE?

Click or press Enter to reveal answer

Answer

LIKE supports only basic wildcards (% for any characters, _ for one character). REGEXP_LIKE supports full regular expressions — character classes, quantifiers, alternation, anchors, and capture groups. REGEXP_LIKE requires compatibility level 170+.

Click to flip back

Question

Which REGEXP functions return a table instead of a scalar value?

Click or press Enter to reveal answer

Answer

REGEXP_MATCHES returns a table of captured groups (one row per match). REGEXP_SPLIT_TO_TABLE returns a table of substrings split by the pattern. Both are used with FROM or CROSS APPLY.

Click to flip back

Question

How do you use backreferences in REGEXP_REPLACE?

Click or press Enter to reveal answer

Answer

Capture groups with parentheses in the pattern, then reference them in the replacement with \\1, \\2, etc. Example: REGEXP_REPLACE('2026-04-21', '(\\d{4})-(\\d{2})-(\\d{2})', '\\3/\\2/\\1') returns '21/04/2026'.

Click to flip back

Knowledge Check

Ingrid at Nordic Shield Insurance needs to find all policy records where the PolicyNumber field does not match the expected format: two uppercase letters followed by a dash and exactly six digits (e.g., NZ-123456). Which query identifies non-conforming records?

Knowledge Check

Dev at PixelForge Studios needs to extract all hashtags (words starting with #) from user-generated content stored in a Description column. Each row may contain zero or more hashtags. Which approach returns one row per hashtag?

Next up: Fuzzy String Matching: Finding Similar Text — use EDIT_DISTANCE and JARO_WINKLER to find strings that are similar but not identical.