Hello everybody!
Excited for another post ???
In this article I would like to show you how we can use regular expressions (RegExp) in your database through queries, whether to create some monitoring or even to create BI reports and indicators or some system.
Regular expression (or the regex or regexp foreignness, short for regular expression) provides a concise and flexible way to identify strings of interest, such as particular characters, words, or character patterns. In other words, regular expression is a way of simply performing extremely complex string operations that would require various conditions to handle this data.
As you may know, the Regexp dialect is not something that is common sense among programming languages. There are several different implementations and the rules of regular expression are similar, but can change between languages. Ex: A regexp expression that works in .NET may not work in PHP or Java and vice versa.
In SQL Server, we can use 2 dialects to use our regular expressions:
- . NET: Dialect available using SQLCLR (available from SQL Server 2005) and the .NET Framework system.text.regularexpression library
- ECMA: Dialect available using OLE Automation (available from SQL Server 2000) and the VBScript.RegExp module. This dialect is the same as Javascript
For terms of comparison:
- SQL Server native regexp (using LIKE or PATINDEX): Limited (compared to most common dialects) and has the best performance among 3
- .NET: As complete as the ECMA dialect (ie, it has MUCH more features than the native SQL Server), it has performance slightly below the native SQL Server, but MUCH (very much) above ECMA (because of OLE Automation, not from the dialect itself).
- ECMA: As complete as the .NET dialect (ie, it has MUCH more features than the native SQL Server) but has a very poor performance when compared to other 2. Its advantage over .NET is that it does not require the creation of SQLCLR objects and is available in SQL Server 2000.
The purpose of this post is to demonstrate how to use regular expressions in SQL Server and to show some simple examples of this. The focus here is not on becoming a regular expression expert. Regexp was not meant to be easily understood. It requires a lot of effort and study to master this language. For this, I suggest the excellent site Regular-Expressions.info and also the Brazilian website of Aurelio Jargas.
To help you understand the use of regular expressions (.NET and ECMA dialects), follow the image below from the site RegExLib:
Alternative #1: LIKE and PATINDEX
Using this native SQL Server solution, you will be able to use simple but very useful regular expressions in your daily life, and with good execution performance. The advantage of this method is that it does not require any additional permissions on the database, nor does it need to enable any advanced features or create new objects on the database.
View contentCase sensitive
Filtering the first letter
Filtering the first letters + Case sensitive
Applying a specific custom filter
Using the negation operator (^)
Using the escape operator
Identifying Special Characters
Need to remove special characters? Learn more by accessing the post How to remove accent and special characters from a string in SQL Server.
Using numbers
Validating Emails
1 2 3 4 5 6 7 8 9 10 | DECLARE @Teste TABLE ( [Email] VARCHAR(200) ) INSERT INTO @Teste VALUES ('[email protected]'), ('Dirceu'), ('[email protected]'), ('@'), ('@.com'), ('[email protected]'), ('[email protected]'), ('[email protected]'), ('[email protected]') SELECT * FROM @Teste WHERE [Email] NOT LIKE '%[^a-z,0-9,@,.-_]%' |
Retrieving only the numeric part of a string
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | CREATE FUNCTION [dbo].[fncRecupera_Numeros] ( @str VARCHAR(MAX) ) RETURNS VARCHAR(500) BEGIN DECLARE @startingIndex INT SET @startingIndex = 0 WHILE (1 = 1) BEGIN -- Vamos identificar todos os caracteres não-numéricos (não estão no intervalo 0-9) e substituir por '' SET @startingIndex = PATINDEX('%[^0-9]%', @str) IF @startingIndex <> 0 SET @str = REPLACE(@str, SUBSTRING(@str, @startingIndex, 1), '') ELSE BREAK END RETURN @str END |
Want to know more about data validation? Access the posts below:
- Validating CPF, CNPJ, Email, Phone, and Zip Code in SQL Server
- How to validate state registration for all states using C # (CSharp) and SQL Server CLR
- How to validate state registration using T-SQL function in SQL Server
Alternative #2: .NET Dialect with SQLCLR
With this method you will have access to use the .NET Framework regular expressions, which features high performance and all available dialect features using SQLCLR and C #. It is available since SQL Server 2005.
If you would like to know more about SQLCLR, be sure to read the following articles:
- Introduction to SQL Common Language Runtime (CLR) in SQL Server
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- Other posts about SQLCLR
FncRegex_Match Examples
Identifying repeated words
1 2 3 | SELECT CLR.dbo.fncRegex_Match('Essa frase frase contém palavras repetidas', '\b(\w+)\s+\1\b'), CLR.dbo.fncRegex_Match('Essa frase NÃO contém palavras repetidas', '\b(\w+)\s+\1\b') |
Validating a specific numeric mask (zip code)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT -- Retorna 1 se a string atende o formato "99999-999" CLR.dbo.fncRegex_Match('29090-270', '[0-9]{5}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29.090-270', '[0-9]{5}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29090270', '[0-9]{5}-[0-9]{3}') SELECT -- Retorna 1 se a string atende o formato "99.999-999" CLR.dbo.fncRegex_Match('29090-270', '[0-9]{2}.[0-9]{3}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29.090-270', '[0-9]{2}.[0-9]{3}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29090270', '[0-9]{2}.[0-9]{3}-[0-9]{3}') SELECT -- Retorna 1 se a string atende o formato "99999999" CLR.dbo.fncRegex_Match('29090-270', '[0-9]{8}'), CLR.dbo.fncRegex_Match('29.090-270', '[0-9]{8}'), CLR.dbo.fncRegex_Match('29090270', '[0-9]{8}') |
Validating CPF and CNPJ (format only, no DV)
1 2 3 4 5 6 7 8 9 10 11 | SELECT -- Valida se o CPF informado atende a máscara "999.999.999-99" CLR.dbo.fncRegex_Match('123.456.789-09', '^\d{3}\.\d{3}\.\d{3}\-\d{2}$'), CLR.dbo.fncRegex_Match('12X.456.789-09', '^\d{3}\.\d{3}\.\d{3}\-\d{2}$'), CLR.dbo.fncRegex_Match('12345678909', '^\d{3}\.\d{3}\.\d{3}\-\d{2}$') SELECT -- Valida se o CNPJ informado atende a máscara "99.999.999/9999-99" CLR.dbo.fncRegex_Match('12.345.678/1234-09', '^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$'), CLR.dbo.fncRegex_Match('12.3X5.678/1234-09', '^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$'), CLR.dbo.fncRegex_Match('12345678123409', '^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$') |
FncRegex_Find Examples
Recover passages where the word “Dirceu” appears in the text
1 2 3 4 5 6 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende e quero demonstrar como o Regexp é útil nesse artigo do Dirceu blog Resende do Wordpress' SELECT '...' + Ds_Texto + '...' FROM CLR.dbo.fncRegexp_Find(@exemplo, '\bDirceu(?:\W+\w+){0,3}') |
Finding repeated words over and over
1 2 3 4 | DECLARE @exemplo VARCHAR(2000) = 'nesse exemplo exemplo eu gostaria de eu demonstrar como identificar palavras palavras repetidas repetidas eu' SELECT [Ds_Texto] FROM CLR.dbo.fncRegexp_Find(@exemplo, '\b(\w+)\s+\1\b') |
Returning a resultset with line breaks (split)
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(MAX) = ' Segunda linha Terceira linha Quarta linha' SELECT Ds_Texto FROM CLR.dbo.fncRegexp_Find(@exemplo, '[^\r\n]*(?:[\r\n]*)') WHERE LEN(Ds_Texto) > 0 |
Retrieve valid dates only
1 2 3 4 5 6 7 8 9 10 11 12 13 | DECLARE @exemplo VARCHAR(MAX) = ' 12/2/2006 12:30 <> 13/2/2007 32/3/2007 2-4-2007 25.8.2007 1/1/2005 34/2/2104 2/5/2006' SELECT DISTINCT CONVERT(DATETIME, Ds_Texto, 103) FROM CLR.dbo.fncRegexp_Find(@exemplo, '\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20?[0-9]{2})\b') |
Identifying license plates on a string
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende meu carro é placa AAA-9999 e BBB9999 ou ccc2222', @mascara VARCHAR(MAX) = '[a-zA-Z]{3}-?\d{4}' SELECT * FROM CLR.dbo.fncRegexp_Find(@exemplo, @mascara) |
FncRegex_Replace Examples
Removing Special Characters
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = '[email protected]%ndo car$act¨eres ()_+especi*%#!ais no meu texto:^{}<>|\' SELECT CLR.dbo.fncRegex_Replace(@examplo, '[^a-zA-Z0-9 ]', '') |
Removing HTML Tags
1 2 3 4 5 | DECLARE @exemplo VARCHAR(MAX) SET @exemplo = '<span class="Teste">Utilizando</span> <strong>RegExp</strong> para <u>remover</u> códigos HTML <!-- (até comentários) -->. Veja mais no <a href="https://www.dirceuresende.com/blog">meu blog</a>' SELECT CLR.dbo.fncRegex_Replace(@exemplo, '<(?:[^>''"]*|([''"]).*?\1)*>', '') |
Convert tabular data to INSERT command in database
1 2 3 4 5 6 7 8 9 10 11 12 | DECLARE @exemplo VARCHAR(MAX) = '1|Dirceu|154 2|Resende|467 3|Blog|1348 4|SQL Server|139 ' PRINT CLR.dbo.fncRegex_Replace( @exemplo, '([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+', 'INSERT INTO Tabela (Coluna1, Coluna2, Coluna3) VALUES ($1, ''$2'', $3); '); |
Remove duplicate words
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = 'RegExp para remover remover palavras duplicadas duplicadas duplicadas' SELECT CLR.dbo.fncRegex_Replace(@examplo, '\b(\w+)(?:\s+\1\b)+', '$1') |
C # source code of functions
If you want to use these functions in your own SQLCLR project, you can use the source codes provided below:
fncRegex_Replace
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | using System; using System.Data.SqlTypes; using System.Text.RegularExpressions; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlString fncRegex_Replace(SqlString Ds_Texto, SqlString Ds_Mascara, SqlString Ds_Substituir) { if (Ds_Texto.IsNull || Ds_Mascara.IsNull || Ds_Substituir.IsNull) return SqlString.Null; try { return Regex.Replace(Ds_Texto.Value, Ds_Mascara.Value, Ds_Substituir.Value); } catch (Exception e) { return SqlString.Null; } } } |
fncRegexp_Match
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | using System; using System.Data.SqlTypes; using System.Text.RegularExpressions; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlString fncRegex_Match(SqlString Ds_Texto, SqlString Ds_Mascara) { if (Ds_Texto.IsNull || Ds_Mascara.IsNull) return SqlString.Null; try { var resultado = Regex.Match(Ds_Texto.Value, Ds_Mascara.Value); return resultado.Success ? resultado.Groups[0].Value : SqlString.Null; } catch (Exception e) { return SqlString.Null; } } } |
fncRegexp_Find
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | using System.Collections; using System.Data.SqlTypes; using System.Text.RegularExpressions; using Microsoft.SqlServer.Server; public partial class UserDefinedFunctions { private class RegexpFind { public SqlInt32 Nr_Linha; public SqlString Ds_Texto; public RegexpFind(SqlInt32 nrLinha, SqlString dsTexto) { Nr_Linha = nrLinha; Ds_Texto = dsTexto; } } [Microsoft.SqlServer.Server.SqlFunction( FillRowMethodName = "FillRow_Regexp_Find", TableDefinition = "Nr_Linha INT, Ds_Texto NVARCHAR(MAX)", DataAccess = DataAccessKind.Read, SystemDataAccess = SystemDataAccessKind.Read )] public static IEnumerable fncRegexp_Find(string Ds_Texto, string Ds_Mascara) { var regexpFindCollection = new ArrayList(); if (string.IsNullOrEmpty(Ds_Texto) || string.IsNullOrEmpty(Ds_Mascara)) return regexpFindCollection; var contador = 1; var retorno = Regex.Matches(Ds_Texto, Ds_Mascara, RegexOptions.None); foreach (var linha in retorno) { regexpFindCollection.Add(new RegexpFind( contador, linha.ToString() )); contador++; } return regexpFindCollection; } protected static void FillRow_Regexp_Find(object objRegexpFind, out SqlInt32 nrLinha, out SqlString dsTexto) { var regexpFind = (RegexpFind) objRegexpFind; nrLinha = regexpFind.Nr_Linha; dsTexto = regexpFind.Ds_Texto; } } |
T-SQL source code of functions
If you are not very familiar with C # or Visual Studio projects, but still want to use regular expressions in SQL Server, I will make available the T-SQL code below, which will allow you to create the assembly and functions in your database, without much effort, just pressing “F5” in this script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | CREATE ASSEMBLY [Regexp] FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C0103008C57115B0000000000000000E00022200B013000000E00000006000000000000BE2C00000020000000400000000000100020000000020000040000000000000006000000000000000080000000020000000000000300608500001000001000000000100000100000000000001000000000000000000000006C2C00004F00000000400000A002000000000000000000000000000000000000006000000C000000342B00001C0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000C40C000000200000000E000000020000000000000000000000000000200000602E72737263000000A0020000004000000004000000100000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001400000000000000000000000000004000004200000000000000000000000000000000A02C00000000000048000000020005004C220000E808000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001B3002006A00000001000011000F00280500000A2D090F01280500000A2B01170A062C087E0600000A0B2B4800000F00280700000A0F01280700000A280800000A0C086F0900000A2D077E0600000A2B16086F0A00000A166F0B00000A6F0C00000A280D00000A0BDE0A0D007E0600000A0BDE00072A000001100000000021003D5E000A090000011B3003005900000002000011000F00280500000A2D120F01280500000A2D090F02280500000A2B01170A062C087E0600000A0B2B2E00000F00280700000A0F01280700000A0F02280700000A280E00000A280D00000A0BDE0A0C007E0600000A0BDE00072A0000000110000000002A00234D000A090000011B3003008D0000000300001100730F00000A0A02281000000A2D0803281000000A2B01170D092C050613042B69170B020316281100000A0C00086F1200000A13052B2D11056F1300000A1306000607281400000A11066F1500000A280D00000A73060000066F1600000A260717580B0011056F1700000A2DCADE1611057510000001130711072C0811076F1800000A00DC0613042B0011042A00000001100000020035003A6F001600000000133002002100000004000011000274030000020A03067B01000004811100000104067B0200000481070000012A2202281900000A002A5E02281900000A000002037D0100000402047D020000042A000042534A4201000100000000000C00000076342E302E33303331390000000005006C000000C8020000237E0000340300002403000023537472696E6773000000005806000004000000235553005C0600001000000023475549440000006C0600007C02000023426C6F620000000000000002000001571502000902000000FA01330016000001000000170000000300000002000000060000000C00000019000000060000000400000001000000030000000100000000005A010100000000000600F00024020600100124020600C70011020F00440200000600DB0279010A00DB00C8010A0038015B020E00540170020600A00179010A007F00C8010A007900C80106009F008F020600FA028F020E00800170020600F1018F020600AB0079010A0001005B020E000D0370020E00BB0170020E00900170020E00B70070020600440179010E00B7027002000000000A00000000000100010001001000A2020000150001000100030010006E000000150001000600060013008D000600AA011E0050200000000096004B0191000100D8200000000096008E009A00030050210000000096005C00A5000600FC210000000094004800AC00080029220000000086180B0206000B0032220000000086180B02B7000B0000000100AA0100000200240000000100AA0100000200240000000300E30100000100AA01000002002400000001006B00020002001C0002000300B301000001001C0000000200B30109000B02010011000B02060019000B020A0031000B020600390065011A0039006B011E0039002E0122009100540126009900CF021A004100C4022D00A10070013200A9002E0122003900E202380091009700460069000B020600B10013035D009100530262007100FD016B007900EE0270008900E2027400290042012200690044007A00790004031A008100BF00060029000B02060020002300F0002E000B00BF002E001300C8002E001B00E70040002300F00060002300F50010003E004D007F00048000000000000000000000000000000000C101000004000000000000000000000084003B000000000004000000000000000000000084002F000000000004000000000000000000000084007901000000000300020000000053716C496E743332003C4D6F64756C653E004E725F4C696E6861006E724C696E68610044735F4D6173636172610053797374656D2E44617461006D73636F726C6962004164640046696C6C526F775F5265676578705F46696E6400666E635265676578705F46696E64006F626A52656765787046696E640053797374656D446174614163636573734B696E6400666E6352656765785F5265706C6163650049456E756D657261626C650049446973706F7361626C65004361707475726500446973706F73650044656275676761626C654174747269627574650053716C46756E6374696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C697479417474726962757465006765745F56616C75650053716C537472696E6700546F537472696E6700666E6352656765785F4D61746368005265676578702E646C6C006765745F49734E756C6C006765745F4974656D0053797374656D004D61746368436F6C6C656374696F6E0047726F7570436F6C6C656374696F6E00457863657074696F6E0044735F546578746F006473546578746F0047726F757000526567657870004D6963726F736F66742E53716C5365727665722E5365727665720044735F537562737469747569720049456E756D657261746F7200476574456E756D657261746F72002E63746F720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F646573004D6174636865730053797374656D2E446174612E53716C54797065730053797374656D2E546578742E526567756C617245787072657373696F6E730053797374656D2E436F6C6C656374696F6E730055736572446566696E656446756E6374696F6E730052656765784F7074696F6E73006765745F47726F757073006765745F53756363657373004F626A656374006F705F496D706C69636974006765745F43757272656E740041727261794C697374004D6F76654E6578740052656765780049734E756C6C4F72456D70747900000000000000000C14EAF7F46A2245B903BE327055CEBE0004200101080320000105200101111109070402111D12211225032000020306111D0320000E06000212210E0E0420001251052001124D08050001111D0E07070302111D12250600030E0E0E0E0F07081235081239021231123D1C1241040001020E08000312390E0E115D042000123D0320001C050001114508042001081C040701120C08B77A5C561934E08903061145080002111D111D111D0A0003111D111D111D111D06000212310E0E0A0003011C10114510111D072002011145111D0801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000701000000000401000000818501000400540E1146696C6C526F774D6574686F644E616D651346696C6C526F775F5265676578705F46696E64540E0F5461626C65446566696E6974696F6E244E725F4C696E686120494E542C2044735F546578746F204E56415243484152284D4158295455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D342E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A446174614163636573730100000054557F4D6963726F736F66742E53716C5365727665722E5365727665722E53797374656D446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D342E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038391053797374656D4461746141636365737301000000000000008C57115B00000000020000001C010000502B0000500D000052534453A6F053CD17E4E94A923D821619E3D25F01000000433A5C55736572735C646966696C5C446F63756D656E74735C56697375616C2053747564696F20323031375C50726F6A656374735C5265676578705C5265676578705C6F626A5C44656275675C5265676578702E7064620000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000942C00000000000000000000AE2C0000002000000000000000000000000000000000000000000000A02C0000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF25002000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100100000001800008000000000000000000000000000000100010000003000008000000000000000000000000000000100000000004800000058400000440200000000000000000000440234000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000000000000000000000000000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B004A4010000010053007400720069006E006700460069006C00650049006E0066006F0000008001000001003000300030003000300034006200300000002C0002000100460069006C0065004400650073006300720069007000740069006F006E000000000020000000300008000100460069006C006500560065007200730069006F006E000000000030002E0030002E0030002E003000000036000B00010049006E007400650072006E0061006C004E0061006D00650000005200650067006500780070002E0064006C006C00000000002800020001004C006500670061006C0043006F0070007900720069006700680074000000200000003E000B0001004F0072006900670069006E0061006C00460069006C0065006E0061006D00650000005200650067006500780070002E0064006C006C0000000000340008000100500072006F006400750063007400560065007200730069006F006E00000030002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000030002E0030002E0030002E00300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000C03C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 WITH PERMISSION_SET = SAFE GO CREATE FUNCTION [dbo].[fncRegex_Match] (@Ds_Texto [nvarchar](MAX), @Ds_Mascara [nvarchar](MAX)) RETURNS [nvarchar](MAX) AS EXTERNAL NAME [Regexp].[UserDefinedFunctions].[fncRegex_Match]; GO CREATE FUNCTION [dbo].[fncRegex_Replace] (@Ds_Texto [nvarchar](MAX), @Ds_Mascara [nvarchar](MAX), @Ds_Substituir [nvarchar](MAX)) RETURNS [nvarchar](MAX) AS EXTERNAL NAME [Regexp].[UserDefinedFunctions].[fncRegex_Replace]; GO CREATE FUNCTION [dbo].[fncRegexp_Find](@Ds_Texto [nvarchar](max), @Ds_Mascara [nvarchar](max)) RETURNS TABLE ( [Nr_Linha] [int] NULL, [Ds_Texto] [nvarchar](max) NULL ) WITH EXECUTE AS CALLER AS EXTERNAL NAME [Regexp].[UserDefinedFunctions].[fncRegexp_Find] GO |
Alternative #3: ECMA Dialect with OLE Automation and VBScript
Using the OLE Automation method, you will have access to all features of the ECMA dialect in your SQL Server, and can create very complex regular expressions. The cost of this is that using OLE Automation may slow your code execution than the other options, but its implementation is simpler than the alternative using SQLCLR.
Available since the 2000 release, OLE Automation may cause some instability in your instance due to known memory management issues by SQL Server, one of the reasons why SQLCLR was replaced with this feature. The message is: Use sparingly.
Want to know more about OLE Automation? Be sure to read these articles below:
- Enabling OLE Automation via T-SQL on SQL Server
- File Operations Using OLE Automation in SQL Server
- Other posts about OLE Automation
How to enable OLE Automation
To enable OLE Automation on your instance, simply run the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | sp_configure 'show advanced options', 1; GO RECONFIGURE; GO sp_configure 'Ole Automation Procedures', 1; GO RECONFIGURE; GO sp_configure 'Agent XPs', 1; GO RECONFIGURE; GO sp_configure 'show advanced options', 1; GO RECONFIGURE; GO |
To know my other articles about OLE Automation, access this link here.
For this topic, I will use the excellent functions of Phil Factor RegexReplace, RegexMatch and RegexFind (source code after the examples).
RegexMatch Usage Examples
Identify repeated words
1 2 3 | SELECT dbo.RegexMatch('\b(\w+)\s+\1\b','Essa frase frase contém palavras repetidas'), dbo.RegexMatch('\b(\w+)\s+\1\b','Essa frase NÃO contém palavras repetidas') |
Identify nearby words
1 2 3 4 5 6 7 | -- Verifica se na frase, a palavra "Resende" está de 1 até 6 palavras após a palavra "Dirceu" SELECT dbo.RegexMatch('\bDirceu(?:\W+\w+){1,6}?\W+Resende\b', 'Meu nome é Dirceu e meu sobrenome é Resende. Trabalho com BI e SQL Server') -- Verifica se na frase, a palavra "Resende" está de 1 até 2 palavras após a palavra "Dirceu" SELECT dbo.RegexMatch('\bDirceu(?:\W+\w+){1,2}?\W+Resende\b', 'Meu nome é Dirceu e meu sobrenome é Resende. Trabalho com BI e SQL Server') |
Validating a specific numeric mask (zip code)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT -- Retorna 1 se a string atende o formato "99999-999" dbo.RegexMatch('[0-9]{5}-[0-9]{3}', '29090-270'), dbo.RegexMatch('[0-9]{5}-[0-9]{3}', '29.090-270'), dbo.RegexMatch('[0-9]{5}-[0-9]{3}', '29090270') SELECT -- Retorna 1 se a string atende o formato "99.999-999" dbo.RegexMatch('[0-9]{2}.[0-9]{3}-[0-9]{3}', '29090-270'), dbo.RegexMatch('[0-9]{2}.[0-9]{3}-[0-9]{3}', '29.090-270'), dbo.RegexMatch('[0-9]{2}.[0-9]{3}-[0-9]{3}', '29090270') SELECT -- Retorna 1 se a string atende o formato "99999999" dbo.RegexMatch('[0-9]{8}', '29090-270'), dbo.RegexMatch('[0-9]{8}', '29.090-270'), dbo.RegexMatch('[0-9]{8}', '29090270') |
Validating CPF and CNPJ (format only, no DV)
1 2 3 4 5 6 7 8 9 10 11 | SELECT -- Valida se o CPF informado atende a máscara "999.999.999-99" dbo.RegexMatch('^\d{3}\.\d{3}\.\d{3}\-\d{2}$','123.456.789-09'), dbo.RegexMatch('^\d{3}\.\d{3}\.\d{3}\-\d{2}$','12X.456.789-09'), dbo.RegexMatch('^\d{3}\.\d{3}\.\d{3}\-\d{2}$','12345678909') SELECT -- Valida se o CNPJ informado atende a máscara "99.999.999/9999-99" dbo.RegexMatch('^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$','12.345.678/1234-09'), dbo.RegexMatch('^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$','12.3X5.678/1234-09'), dbo.RegexMatch('^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$','12345678123409') |
RegexFind Usage Examples
Recover passages where the word “Dirceu” appears in the text
1 2 3 4 5 6 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende e quero demonstrar como o Regexp é útil nesse artigo do Dirceu Resende blog do Wordpress' SELECT '...' + SUBSTRING(@exemplo, FirstIndex - 8, Length + 16) + '...' FROM dbo.RegexFind('\bDirceu(?:\W+\w+){0,3}', @exemplo, 1, 1) |
Finding repeated words over and over
1 2 3 4 | -- encontrando palavras repetidas seguidamente DECLARE @exemplo VARCHAR(2000) = 'nesse exemplo exemplo eu gostaria de eu demonstrar como identificar palavras palavras repetidas repetidas eu' SELECT [Value] FROM dbo.RegexFind ('\b(\w+)\s+\1\b', @exemplo, 1, 1) |
Returning a resultset with line breaks (split)
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(MAX) = ' Segunda linha Terceira linha Quarta linha' SELECT Value FROM dbo.RegexFind('[^\r\n]*(?:[\r\n]*)', @exemplo,1,1) WHERE [Length] > 0 |
Break lines for each word of the sentence (split)
1 2 3 4 | DECLARE @exemplo VARCHAR(MAX) = 'Com essa função, cada palavra vai ficar em uma linha do resultado' SELECT Value FROM dbo.RegexFind ('\b[\w]+\b', @exemplo,1,1) |
Retrieve valid dates only
1 2 3 4 5 6 7 8 9 10 11 12 13 | DECLARE @exemplo VARCHAR(MAX) = ' 12/2/2006 12:30 <> 13/2/2007 32/3/2007 2-4-2007 25.8.2007 1/1/2005 34/2/2104 2/5/2006' SELECT DISTINCT CONVERT(DATETIME, Value, 103) FROM dbo.RegexFind ('\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20?[0-9]{2})\b', @exemplo, 1, 1) |
Identifying license plates on a string
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende meu carro é placa AAA-9999 e BBB9999 ou ccc2222', @mascara VARCHAR(MAX) = '[a-zA-Z]{3}-?\d{4}' SELECT [Value] FROM dirceuresende.dbo.RegexFind(@mascara, @exemplo, 1, 1) |
RegexReplace Usage Examples
Identifies URL and encapsulates as html link - tag A
1 2 3 4 5 6 | SELECT dbo.RegexReplace( '\b(https?|ftp|file)://([-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|])', '<a href="$2">$2</a>', 'Testando Regexp com o site do https://www.google.com.br', 1, 1 ) |
Remove HTML strings from text
1 2 3 4 5 | DECLARE @exemplo VARCHAR(MAX) SET @exemplo = '<span class="Teste">Utilizando</span> <strong>RegExp</strong> para <u>remover</u> códigos HTML <!-- (até comentários) -->. Veja mais no <a href="https://www.dirceuresende.com/blog">meu blog</a>' SELECT dbo.RegexReplace('<(?:[^>''"]*|([''"]).*?\1)*>', '', @exemplo, 1, 1) |
Convert tabular data to INSERT command in database
1 2 3 4 5 6 7 8 9 10 11 12 13 | DECLARE @exemplo VARCHAR(MAX) = '1|Dirceu|154 2|Resende|467 3|Blog|1348 4|SQL Server|139 ' PRINT dbo.RegexReplace( '([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+', 'INSERT INTO Tabela (Coluna1, Coluna2, Coluna3) VALUES ($1, ''$2'', $3); ', @exemplo, 1, 1 ); |
Remove duplicate words
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = 'RegExp para remover remover palavras duplicadas duplicadas duplicadas' SELECT dbo.RegexReplace('\b(\w+)(?:\s+\1\b)+', '$1', @examplo, 1, 1) |
Removing Special Characters
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = '[email protected]%ndo car$act¨eres ()_+especi*%#!ais no meu texto:^{}<>|\' SELECT dbo.RegexReplace('[^a-zA-Z0-9 ]', '', @examplo, 1, 1) |
RegexFind Function Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | CREATE FUNCTION RegexFind ( @pattern VARCHAR(255), @matchstring VARCHAR(MAX), @global BIT = 1, @Multiline bit = 1 ) RETURNS @result TABLE ( Match_ID INT, FirstIndex INT , [Length] INT , [Value] VARCHAR(2000), Submatch_ID INT, SubmatchValue VARCHAR(2000), Error VARCHAR(255) ) AS BEGIN DECLARE @objRegexExp INT, @objErrorObject INT, @objMatch INT, @objSubMatches INT, @strErrorMessage VARCHAR(255), @error VARCHAR(255), @Substituted VARCHAR(8000), @hr INT, @matchcount INT, @SubmatchCount INT, @ii INT, @jj INT, @FirstIndex INT, @length INT, @Value VARCHAR(2000), @SubmatchValue VARCHAR(2000), @objSubmatchValue INT, @command VARCHAR(8000), @Match_ID INT DECLARE @match TABLE ( Match_ID INT IDENTITY(1, 1) NOT NULL, FirstIndex INT NOT NULL, [length] INT NOT NULL, [Value] VARCHAR(2000) ) DECLARE @Submatch TABLE ( Submatch_ID INT IDENTITY(1, 1), match_ID INT NOT NULL, SubmatchNo INT NOT NULL, SubmatchValue VARCHAR(2000) ) SELECT @strErrorMessage = 'creating a regex object',@error='' EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT IF @hr = 0 SELECT @strErrorMessage = 'Setting the Regex pattern', @objErrorObject = @objRegexExp IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern IF @hr = 0 SELECT @strErrorMessage = 'Specifying a case-insensitive match' IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1 IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'MultiLine', @Multiline IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'Global', @global IF @hr = 0 SELECT @strErrorMessage = 'Doing a match' IF @hr = 0 EXEC @hr= sp_OAMethod @objRegexExp, 'execute', @objMatch OUT, @matchstring IF @hr = 0 SELECT @strErrorMessage = 'Getting the number of matches' IF @hr = 0 EXEC @hr= sp_OAGetProperty @objmatch, 'count', @matchcount OUT SET @ii = 0 /* The Match object has four read-only properties. The FirstIndex property indicates the number of characters in the string to the left of the match. The Length property of the Match object indicates the number of characters in the match. The Value property returns the text that was matched. */ WHILE (@hr = 0 AND @ii < @Matchcount) BEGIN SELECT @strErrorMessage = 'Getting the FirstIndex property', @command = 'item(' + CAST(@ii AS VARCHAR) + ').FirstIndex' IF @hr = 0 EXEC @hr= sp_OAGetProperty @objmatch, @command, @Firstindex OUT IF @hr = 0 SELECT @strErrorMessage = 'Getting the length property', @command = 'item(' + CAST(@ii AS VARCHAR) + ').Length' IF @hr = 0 EXEC @hr= sp_OAGetProperty @objmatch, @command, @Length OUT IF @hr = 0 SELECT @strErrorMessage = 'Getting the value property', @command = 'item(' + CAST(@ii AS VARCHAR) + ').Value' IF @hr = 0 EXEC @hr= sp_OAGetProperty @objmatch, @command, @Value OUT INSERT INTO @match ( Firstindex, [Length], [Value] ) SELECT @firstindex + 1, @Length, @Value SELECT @Match_ID = @@Identity -- The SubMatches property of the Match object is a collection of strings. It will only hold values if your regular expression has capturing groups. The collection will hold one string for each capturing group. The Count property (returned as SubmatchCount) indicates the number of string in the collection. The Item property takes an index parameter, and returns the text matched by the capturing group. IF @hr = 0 SELECT @strErrorMessage = 'Getting the SubMatches collection', @command = 'item(' + CAST(@ii AS VARCHAR) + ').SubMatches' IF @hr = 0 SELECT @strErrorMessage = 'Getting the number of submatches' IF @hr = 0 EXEC @hr= sp_OAGetProperty @objSubmatches, 'count', @submatchCount OUT SET @jj = 0 WHILE (@hr = 0 AND @jj < @submatchCount) BEGIN IF @hr = 0 SELECT @strErrorMessage = 'Getting the submatch value property', @command = 'item(' + CAST(@jj AS VARCHAR) + ')' ,@submatchValue=NULL IF @hr = 0 EXEC @hr= sp_OAGetProperty @objSubmatches, @command, @SubmatchValue OUT INSERT INTO @Submatch ( Match_ID, SubmatchNo, SubmatchValue ) SELECT @Match_ID, @jj+1, @SubmatchValue SET @jj += 1 END EXEC @hr= sp_OAGetProperty @objmatch, @command, @objSubmatches OUT SET @ii += 1 END IF (@hr <> 0) BEGIN DECLARE @Source VARCHAR(255), @Description VARCHAR(255), @Helpfile VARCHAR(255), @HelpID INT EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT, @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT SET @Error = 'Error whilst ' + COALESCE(@strErrorMessage, 'doing something') + ', ' + COALESCE(@Description, '') END EXEC sp_OADestroy @objRegexExp EXEC sp_OADestroy @objMatch EXEC sp_OADestroy @objSubMatches INSERT INTO @result ( Match_ID, FirstIndex, [Length], [Value], Submatch_ID, SubmatchValue, Error ) SELECT m.[Match_ID], [FirstIndex], [Length], [Value], [SubmatchNo], [SubmatchValue], @error FROM @match m LEFT OUTER JOIN @submatch s ON m.match_ID = s.match_ID IF (@@ROWCOUNT = 0 AND LEN(@error) > 0) INSERT INTO @result(Error) SELECT @error RETURN END GO |
RegexMatch Function Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | CREATE FUNCTION dbo.RegexMatch ( @pattern VARCHAR(2000), @matchstring VARCHAR(MAX)--Varchar(8000) got SQL Server 2000 ) RETURNS INT AS BEGIN DECLARE @objRegexExp INT, @objErrorObject INT, @strErrorMessage VARCHAR(255), @hr INT, @match BIT SELECT @strErrorMessage = 'creating a regex object' EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern --Specifying a case-insensitive match IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1 --Doing a Test' IF @hr = 0 EXEC @hr= sp_OAMethod @objRegexExp, 'Test', @match OUT, @matchstring IF @hr <> 0 BEGIN RETURN NULL END EXEC sp_OADestroy @objRegexExp RETURN @match END GO |
RegexReplace Function Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | CREATE FUNCTION dbo.RegexReplace ( @pattern VARCHAR(255), @replacement VARCHAR(255), @Subject VARCHAR(MAX), @global BIT = 1, @Multiline bit =1 ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @objRegexExp INT, @objErrorObject INT, @strErrorMessage VARCHAR(255), @Substituted VARCHAR(8000), @hr INT, @Replace BIT SELECT @strErrorMessage = 'creating a regex object' EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT IF @hr = 0 SELECT @strErrorMessage = 'Setting the Regex pattern', @objErrorObject = @objRegexExp IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern IF @hr = 0 /*By default, the regular expression is case sensitive. Set the IgnoreCase property to True to make it case insensitive.*/ SELECT @strErrorMessage = 'Specifying the type of match' IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1 IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'MultiLine', @Multiline IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'Global', @global IF @hr = 0 SELECT @strErrorMessage = 'Doing a Replacement' IF @hr = 0 EXEC @hr= sp_OAMethod @objRegexExp, 'Replace', @Substituted OUT, @subject, @Replacement /*If the RegExp.Global property is False (the default), Replace will return the @subject string with the first regex match (if any) substituted with the replacement text. If RegExp.Global is true, the @Subject string will be returned with all matches replaced.*/ IF @hr <> 0 BEGIN DECLARE @Source VARCHAR(255), @Description VARCHAR(255), @Helpfile VARCHAR(255), @HelpID INT EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT, @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT SELECT @strErrorMessage = 'Error whilst ' + COALESCE(@strErrorMessage, 'doing something') + ', ' + COALESCE(@Description, '') RETURN @strErrorMessage END EXEC sp_OADestroy @objRegexExp RETURN @Substituted END GO |
In this post, we could see how useful regular expressions (Regexp or Regex) can be of use in our daily lives, whether you are a DBA, BI analyst or developer. I also demonstrated how easy it is to use this powerful feature in SQL Server, either using LIKE or PATINDEX (native DBMS functions) or using other features such as OLE Automation and SQLCLR.
I hope this post has answered some questions about Regexp and has shown a great new feature for those who didn't know it yet.
Regards and see you next post!
Congratulations, Dirceu. Excellent post.
I already used the solution with Like and I confess that it worked very well. Saves a lot of programming time when you master Regex. On the contrary it is easier to do the same routines on the nail.
When it comes to very complex strings, it is best to invest some time and better understand REGEX implementations.
Congratulations!