Pattern Power: Regular Expressions in T-SQL
Use the brand-new REGEXP functions in SQL Server 2025 to find, replace, extract, count, and split text with powerful pattern matching — directly in your T-SQL queries.
Pattern matching finally comes to T-SQL
Think of LIKE as a basic search, and regex as a super-powered search.
LIKE can find “starts with A” or “contains ‘bank’.” But what if you need: “find all phone numbers in any format,” “extract the domain from an email address,” or “replace all credit card numbers with asterisks”? That is what regular expressions (regex) do — they describe patterns, not specific text.
SQL Server 2025 adds seven REGEXP functions. Before this, you had to use CLR assemblies or export data to another language. Now it is all native T-SQL.
The seven REGEXP functions
| Function | Returns | What It Does | Equivalent To |
|---|---|---|---|
| REGEXP_LIKE | Boolean | Does the pattern match? | LIKE but with full regex |
| REGEXP_REPLACE | String | Replace matched text | REPLACE but pattern-based |
| REGEXP_SUBSTR | String | Extract matched text | SUBSTRING with pattern matching |
| REGEXP_INSTR | Integer | Position of match | CHARINDEX with patterns |
| REGEXP_COUNT | Integer | Count of matches | No simple equivalent |
| REGEXP_MATCHES | Table | All captured groups as rows | No simple equivalent |
| REGEXP_SPLIT_TO_TABLE | Table | Split string by pattern into rows | STRING_SPLIT with regex delimiters |
REGEXP_LIKE: does it match?
The simplest function — returns TRUE or FALSE.
-- Find customers with valid email format
SELECT FullName, Email
FROM Customers
WHERE REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');
-- Find phone numbers in any format (with or without dashes/spaces)
SELECT * FROM Contacts
WHERE REGEXP_LIKE(Phone, '^\+?\d[\d\s\-]{7,15}$');
Flags
All REGEXP functions accept an optional flags parameter:
| Flag | Meaning | Default |
|---|---|---|
c | Case-sensitive | Yes (default) |
i | Case-insensitive | No |
m | Multiline (^ and $ match line boundaries) | No |
s | Dot matches newline | No |
-- Case-insensitive search
SELECT * FROM Products
WHERE REGEXP_LIKE(Description, 'wireless|bluetooth', 'i');
REGEXP_REPLACE: find and transform
-- Mask credit card numbers: 4111-2222-3333-4444 -> ****-****-****-4444
SELECT REGEXP_REPLACE(CardNumber, '\d{4}-\d{4}-\d{4}-', '****-****-****-') AS Masked
FROM Payments;
-- Standardise phone numbers: remove all non-digits
SELECT REGEXP_REPLACE(Phone, '[^\d+]', '') AS CleanPhone
FROM Contacts;
-- Clean up extra whitespace
SELECT REGEXP_REPLACE(Description, '\s+', ' ') AS Cleaned
FROM Products;
Backreferences in replacements
REGEXP_REPLACE supports captured groups with \1, \2, etc.:
-- Reformat dates: 2026-04-21 -> 21/04/2026
SELECT REGEXP_REPLACE('2026-04-21', '(\d{4})-(\d{2})-(\d{2})', '\3/\2/\1');
-- Returns: 21/04/2026
REGEXP_SUBSTR: extract matching text
-- Extract domain from email
SELECT REGEXP_SUBSTR(Email, '@(.+)$') AS Domain
FROM Customers;
-- Extract first number from a mixed string
SELECT REGEXP_SUBSTR('Order #12345 - Priority', '\d+') AS OrderNumber;
-- Returns: 12345
REGEXP_INSTR: find the position
-- Find position of first digit in a string
SELECT REGEXP_INSTR('ABC-123-DEF', '\d') AS FirstDigitPosition;
-- Returns: 5
REGEXP_COUNT: how many matches?
-- Count how many words are in a description
SELECT Description, REGEXP_COUNT(Description, '\b\w+\b') AS WordCount
FROM Products;
-- Count email addresses in a text block
SELECT REGEXP_COUNT(Notes, '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}') AS EmailCount
FROM SupportTickets;
REGEXP_MATCHES: extract all captured groups
Returns a table of all captured subgroups — one row per match.
-- Extract all key=value pairs from a config string
SELECT match_value
FROM REGEXP_MATCHES('host=db.local;port=1433;user=admin', '(\w+)=([^;]+)');
REGEXP_SPLIT_TO_TABLE: split by pattern
Like STRING_SPLIT but the delimiter is a regex pattern.
-- Split a CSV with inconsistent delimiters (commas, semicolons, pipes)
SELECT value
FROM REGEXP_SPLIT_TO_TABLE('apple,banana;cherry|date', '[,;|]');
-- Returns 4 rows: apple, banana, cherry, date
Scenario: Priya's data cleaning at Vault Bank
Vault Bank receives transaction descriptions in messy formats from various payment processors. Priya uses REGEXP functions to standardise them:
-- Extract merchant name (everything before the first number or special char)
UPDATE Transactions
SET MerchantName = TRIM(REGEXP_SUBSTR(RawDescription, '^[A-Za-z\s]+'));
-- Flag suspicious descriptions containing known fraud patterns
SELECT * FROM Transactions
WHERE REGEXP_LIKE(RawDescription, '(TEST|FRAUD|STOLEN|XXX)', 'i');
-- Count how many transactions have non-ASCII characters
SELECT COUNT(*) FROM Transactions
WHERE REGEXP_LIKE(RawDescription, '[^\x00-\x7F]'); Exam tip: REGEXP_LIKE requires compatibility level 170
REGEXP_LIKE specifically requires database compatibility level 170 or above. The other six REGEXP functions work at all compatibility levels. If the exam mentions a database at compatibility level 160, REGEXP_LIKE will not be available — but REGEXP_REPLACE, REGEXP_SUBSTR, etc. will still work.
Check and set compatibility level:
-- Check
SELECT compatibility_level FROM sys.databases WHERE name = DB_NAME();
-- Set to 170
ALTER DATABASE [MyDB] SET COMPATIBILITY_LEVEL = 170; Ingrid at Nordic Shield Insurance needs to find all policy records where the PolicyNumber field does not match the expected format: two uppercase letters followed by a dash and exactly six digits (e.g., NZ-123456). Which query identifies non-conforming records?
Dev at PixelForge Studios needs to extract all hashtags (words starting with #) from user-generated content stored in a Description column. Each row may contain zero or more hashtags. Which approach returns one row per hashtag?
🎬 Video coming soon
Next up: Fuzzy String Matching: Finding Similar Text — use EDIT_DISTANCE and JARO_WINKLER to find strings that are similar but not identical.