Hello everybody!
Excited for another post ???
In this article I would like to show you how we can use regular expressions (RegExp) in your database through queries, whether to create some monitoring or even to create BI reports and indicators or some system.
Regular expression (or the regex or regexp foreignness, short for regular expression) provides a concise and flexible way to identify strings of interest, such as particular characters, words, or character patterns. In other words, regular expression is a way of simply performing extremely complex string operations that would require various conditions to handle this data.
As you may know, the Regexp dialect is not something that is common sense among programming languages. There are several different implementations and the rules of regular expression are similar, but can change between languages. Ex: A regexp expression that works in .NET may not work in PHP or Java and vice versa.
In SQL Server, we can use 2 dialects to use our regular expressions:
- . NET: Dialect available using SQLCLR (available from SQL Server 2005) and the .NET Framework system.text.regularexpression library
- ECMA: Dialect available using OLE Automation (available from SQL Server 2000) and the VBScript.RegExp module. This dialect is the same as Javascript
For terms of comparison:
- SQL Server native regexp (using LIKE or PATINDEX): Limited (compared to most common dialects) and has the best performance among 3
- .NET: As complete as the ECMA dialect (ie, it has MUCH more features than the native SQL Server), it has performance slightly below the native SQL Server, but MUCH (very much) above ECMA (because of OLE Automation, not from the dialect itself).
- ECMA: As complete as the .NET dialect (ie, it has MUCH more features than the native SQL Server) but has a very poor performance when compared to other 2. Its advantage over .NET is that it does not require the creation of SQLCLR objects and is available in SQL Server 2000.
The purpose of this post is to demonstrate how to use regular expressions in SQL Server and to show some simple examples of this. The focus here is not on becoming a regular expression expert. Regexp was not meant to be easily understood. It requires a lot of effort and study to master this language. For this, I suggest the excellent site Regular-Expressions.info and also the Brazilian website of Aurelio Jargas.
To help you understand the use of regular expressions (.NET and ECMA dialects), follow the image below from the site RegExLib:
Alternative #1: LIKE and PATINDEX
Using this native SQL Server solution, you will be able to use simple but very useful regular expressions in your daily life, and with good execution performance. The advantage of this method is that it does not require any additional permissions on the database, nor does it need to enable any advanced features or create new objects on the database.
View contentCase sensitive
Filtering the first letter
Filtering the first letters + Case sensitive
Applying a specific custom filter
Using the negation operator (^)
Using the escape operator
Identifying Special Characters
Need to remove special characters? Learn more by accessing the post How to remove accent and special characters from a string in SQL Server.
Using numbers
Validating Emails
1 2 3 4 5 6 7 8 9 10 | DECLARE @Teste TABLE ( [Email] VARCHAR(200) ) INSERT INTO @Teste VALUES ('[email protected]'), ('Dirceu'), ('Dirceu@'), ('@'), ('@.com'), ('[email protected]'), ('[email protected]'), ('[email protected]'), ('[email protected].') SELECT * FROM @Teste WHERE [Email] NOT LIKE '%[^a-z,0-9,@,.-_]%' AND [Email] LIKE '%_@_%_.__%' AND [Email] NOT LIKE '%_@@_%_.__%' |
Retrieving only the numeric part of a string
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | CREATE FUNCTION [dbo].[fncRecupera_Numeros] ( @str VARCHAR(MAX) ) RETURNS VARCHAR(500) BEGIN DECLARE @startingIndex INT SET @startingIndex = 0 WHILE (1 = 1) BEGIN -- Vamos identificar todos os caracteres não-numéricos (não estão no intervalo 0-9) e substituir por '' SET @startingIndex = PATINDEX('%[^0-9]%', @str) IF @startingIndex <> 0 SET @str = REPLACE(@str, SUBSTRING(@str, @startingIndex, 1), '') ELSE BREAK END RETURN @str END |
Want to know more about data validation? Access the posts below:
- Validating CPF, CNPJ, Email, Phone, and Zip Code in SQL Server
- How to validate state registration for all states using C # (CSharp) and SQL Server CLR
- How to validate state registration using T-SQL function in SQL Server
Alternative #2: .NET Dialect with SQLCLR
With this method you will have access to use the .NET Framework regular expressions, which features high performance and all available dialect features using SQLCLR and C #. It is available since SQL Server 2005.
If you would like to know more about SQLCLR, be sure to read the following articles:
- Introduction to SQL Common Language Runtime (CLR) in SQL Server
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- Other posts about SQLCLR
FncRegex_Match Examples
Identifying repeated words
1 2 3 | SELECT CLR.dbo.fncRegex_Match('Essa frase frase contém palavras repetidas', '\b(\w+)\s+\1\b'), CLR.dbo.fncRegex_Match('Essa frase NÃO contém palavras repetidas', '\b(\w+)\s+\1\b') |
Validating a specific numeric mask (zip code)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT -- Retorna 1 se a string atende o formato "99999-999" CLR.dbo.fncRegex_Match('29090-270', '[0-9]{5}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29.090-270', '[0-9]{5}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29090270', '[0-9]{5}-[0-9]{3}') SELECT -- Retorna 1 se a string atende o formato "99.999-999" CLR.dbo.fncRegex_Match('29090-270', '[0-9]{2}.[0-9]{3}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29.090-270', '[0-9]{2}.[0-9]{3}-[0-9]{3}'), CLR.dbo.fncRegex_Match('29090270', '[0-9]{2}.[0-9]{3}-[0-9]{3}') SELECT -- Retorna 1 se a string atende o formato "99999999" CLR.dbo.fncRegex_Match('29090-270', '[0-9]{8}'), CLR.dbo.fncRegex_Match('29.090-270', '[0-9]{8}'), CLR.dbo.fncRegex_Match('29090270', '[0-9]{8}') |
Validating CPF and CNPJ (format only, no DV)
1 2 3 4 5 6 7 8 9 10 11 | SELECT -- Valida se o CPF informado atende a máscara "999.999.999-99" CLR.dbo.fncRegex_Match('123.456.789-09', '^\d{3}\.\d{3}\.\d{3}\-\d{2}$'), CLR.dbo.fncRegex_Match('12X.456.789-09', '^\d{3}\.\d{3}\.\d{3}\-\d{2}$'), CLR.dbo.fncRegex_Match('12345678909', '^\d{3}\.\d{3}\.\d{3}\-\d{2}$') SELECT -- Valida se o CNPJ informado atende a máscara "99.999.999/9999-99" CLR.dbo.fncRegex_Match('12.345.678/1234-09', '^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$'), CLR.dbo.fncRegex_Match('12.3X5.678/1234-09', '^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$'), CLR.dbo.fncRegex_Match('12345678123409', '^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$') |
FncRegex_Find Examples
Recover passages where the word “Dirceu” appears in the text
1 2 3 4 5 6 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende e quero demonstrar como o Regexp é útil nesse artigo do Dirceu blog Resende do Wordpress' SELECT '...' + Ds_Texto + '...' FROM CLR.dbo.fncRegexp_Find(@exemplo, '\bDirceu(?:\W+\w+){0,3}') |
Finding repeated words over and over
1 2 3 4 | DECLARE @exemplo VARCHAR(2000) = 'nesse exemplo exemplo eu gostaria de eu demonstrar como identificar palavras palavras repetidas repetidas eu' SELECT [Ds_Texto] FROM CLR.dbo.fncRegexp_Find(@exemplo, '\b(\w+)\s+\1\b') |
Returning a resultset with line breaks (split)
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(MAX) = ' Segunda linha Terceira linha Quarta linha' SELECT Ds_Texto FROM CLR.dbo.fncRegexp_Find(@exemplo, '[^\r\n]*(?:[\r\n]*)') WHERE LEN(Ds_Texto) > 0 |
Retrieve valid dates only
1 2 3 4 5 6 7 8 9 10 11 12 13 | DECLARE @exemplo VARCHAR(MAX) = ' 12/2/2006 12:30 <> 13/2/2007 32/3/2007 2-4-2007 25.8.2007 1/1/2005 34/2/2104 2/5/2006' SELECT DISTINCT CONVERT(DATETIME, Ds_Texto, 103) FROM CLR.dbo.fncRegexp_Find(@exemplo, '\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20?[0-9]{2})\b') |
Identifying license plates on a string
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende meu carro é placa AAA-9999 e BBB9999 ou ccc2222', @mascara VARCHAR(MAX) = '[a-zA-Z]{3}-?\d{4}' SELECT * FROM CLR.dbo.fncRegexp_Find(@exemplo, @mascara) |
FncRegex_Replace Examples
Removing Special Characters
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = '[email protected]%ndo car$act¨eres ()_+especi*%#!ais no meu texto:^{}<>|\' SELECT CLR.dbo.fncRegex_Replace(@examplo, '[^a-zA-Z0-9 ]', '') |
Removing HTML Tags
1 2 3 4 5 | DECLARE @exemplo VARCHAR(MAX) SET @exemplo = '<span class="Teste">Utilizando</span> <strong>RegExp</strong> para <u>remover</u> códigos HTML <!-- (até comentários) -->. Veja mais no <a href="https://dirceuresende.com/blog">meu blog</a>' SELECT CLR.dbo.fncRegex_Replace(@exemplo, '<(?:[^>''"]*|([''"]).*?\1)*>', '') |
Convert tabular data to INSERT command in database
1 2 3 4 5 6 7 8 9 10 11 12 | DECLARE @exemplo VARCHAR(MAX) = '1|Dirceu|154 2|Resende|467 3|Blog|1348 4|SQL Server|139 ' PRINT CLR.dbo.fncRegex_Replace( @exemplo, '([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+', 'INSERT INTO Tabela (Coluna1, Coluna2, Coluna3) VALUES ($1, ''$2'', $3); '); |
Remove duplicate words
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = 'RegExp para remover remover palavras duplicadas duplicadas duplicadas' SELECT CLR.dbo.fncRegex_Replace(@examplo, '\b(\w+)(?:\s+\1\b)+', '$1') |
C # source code of functions
If you want to use these functions in your own SQLCLR project, you can use the source codes provided below:
fncRegex_Replace
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | using System; using System.Data.SqlTypes; using System.Text.RegularExpressions; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlString fncRegex_Replace(SqlString Ds_Texto, SqlString Ds_Mascara, SqlString Ds_Substituir) { if (Ds_Texto.IsNull || Ds_Mascara.IsNull || Ds_Substituir.IsNull) return SqlString.Null; try { return Regex.Replace(Ds_Texto.Value, Ds_Mascara.Value, Ds_Substituir.Value); } catch (Exception e) { return SqlString.Null; } } } |
fncRegexp_Match
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | using System; using System.Data.SqlTypes; using System.Text.RegularExpressions; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlString fncRegex_Match(SqlString Ds_Texto, SqlString Ds_Mascara) { if (Ds_Texto.IsNull || Ds_Mascara.IsNull) return SqlString.Null; try { var resultado = Regex.Match(Ds_Texto.Value, Ds_Mascara.Value); return resultado.Success ? resultado.Groups[0].Value : SqlString.Null; } catch (Exception e) { return SqlString.Null; } } } |
fncRegexp_Find
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | using System.Collections; using System.Data.SqlTypes; using System.Text.RegularExpressions; using Microsoft.SqlServer.Server; public partial class UserDefinedFunctions { private class RegexpFind { public SqlInt32 Nr_Linha; public SqlString Ds_Texto; public RegexpFind(SqlInt32 nrLinha, SqlString dsTexto) { Nr_Linha = nrLinha; Ds_Texto = dsTexto; } } [Microsoft.SqlServer.Server.SqlFunction( FillRowMethodName = "FillRow_Regexp_Find", TableDefinition = "Nr_Linha INT, Ds_Texto NVARCHAR(MAX)", DataAccess = DataAccessKind.Read, SystemDataAccess = SystemDataAccessKind.Read )] public static IEnumerable fncRegexp_Find(string Ds_Texto, string Ds_Mascara) { var regexpFindCollection = new ArrayList(); if (string.IsNullOrEmpty(Ds_Texto) || string.IsNullOrEmpty(Ds_Mascara)) return regexpFindCollection; var contador = 1; var retorno = Regex.Matches(Ds_Texto, Ds_Mascara, RegexOptions.None); foreach (var linha in retorno) { regexpFindCollection.Add(new RegexpFind( contador, linha.ToString() )); contador++; } return regexpFindCollection; } protected static void FillRow_Regexp_Find(object objRegexpFind, out SqlInt32 nrLinha, out SqlString dsTexto) { var regexpFind = (RegexpFind) objRegexpFind; nrLinha = regexpFind.Nr_Linha; dsTexto = regexpFind.Ds_Texto; } } |
T-SQL source code of functions
If you are not very familiar with C # or Visual Studio projects, but still want to use regular expressions in SQL Server, I will make available the T-SQL code below, which will allow you to create the assembly and functions in your database, without much effort, just pressing “F5” in this script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | CREATE ASSEMBLY [Regexp] FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C0103008C57115B0000000000000000E00022200B013000000E00000006000000000000BE2C00000020000000400000000000100020000000020000040000000000000006000000000000000080000000020000000000000300608500001000001000000000100000100000000000001000000000000000000000006C2C00004F00000000400000A002000000000000000000000000000000000000006000000C000000342B00001C0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000C40C000000200000000E000000020000000000000000000000000000200000602E72737263000000A0020000004000000004000000100000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001400000000000000000000000000004000004200000000000000000000000000000000A02C00000000000048000000020005004C220000E808000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001B3002006A00000001000011000F00280500000A2D090F01280500000A2B01170A062C087E0600000A0B2B4800000F00280700000A0F01280700000A280800000A0C086F0900000A2D077E0600000A2B16086F0A00000A166F0B00000A6F0C00000A280D00000A0BDE0A0D007E0600000A0BDE00072A000001100000000021003D5E000A090000011B3003005900000002000011000F00280500000A2D120F01280500000A2D090F02280500000A2B01170A062C087E0600000A0B2B2E00000F00280700000A0F01280700000A0F02280700000A280E00000A280D00000A0BDE0A0C007E0600000A0BDE00072A0000000110000000002A00234D000A090000011B3003008D0000000300001100730F00000A0A02281000000A2D0803281000000A2B01170D092C050613042B69170B020316281100000A0C00086F1200000A13052B2D11056F1300000A1306000607281400000A11066F1500000A280D00000A73060000066F1600000A260717580B0011056F1700000A2DCADE1611057510000001130711072C0811076F1800000A00DC0613042B0011042A00000001100000020035003A6F001600000000133002002100000004000011000274030000020A03067B01000004811100000104067B0200000481070000012A2202281900000A002A5E02281900000A000002037D0100000402047D020000042A000042534A4201000100000000000C00000076342E302E33303331390000000005006C000000C8020000237E0000340300002403000023537472696E6773000000005806000004000000235553005C0600001000000023475549440000006C0600007C02000023426C6F620000000000000002000001571502000902000000FA01330016000001000000170000000300000002000000060000000C00000019000000060000000400000001000000030000000100000000005A010100000000000600F00024020600100124020600C70011020F00440200000600DB0279010A00DB00C8010A0038015B020E00540170020600A00179010A007F00C8010A007900C80106009F008F020600FA028F020E00800170020600F1018F020600AB0079010A0001005B020E000D0370020E00BB0170020E00900170020E00B70070020600440179010E00B7027002000000000A00000000000100010001001000A2020000150001000100030010006E000000150001000600060013008D000600AA011E0050200000000096004B0191000100D8200000000096008E009A00030050210000000096005C00A5000600FC210000000094004800AC00080029220000000086180B0206000B0032220000000086180B02B7000B0000000100AA0100000200240000000100AA0100000200240000000300E30100000100AA01000002002400000001006B00020002001C0002000300B301000001001C0000000200B30109000B02010011000B02060019000B020A0031000B020600390065011A0039006B011E0039002E0122009100540126009900CF021A004100C4022D00A10070013200A9002E0122003900E202380091009700460069000B020600B10013035D009100530262007100FD016B007900EE0270008900E2027400290042012200690044007A00790004031A008100BF00060029000B02060020002300F0002E000B00BF002E001300C8002E001B00E70040002300F00060002300F50010003E004D007F00048000000000000000000000000000000000C101000004000000000000000000000084003B000000000004000000000000000000000084002F000000000004000000000000000000000084007901000000000300020000000053716C496E743332003C4D6F64756C653E004E725F4C696E6861006E724C696E68610044735F4D6173636172610053797374656D2E44617461006D73636F726C6962004164640046696C6C526F775F5265676578705F46696E6400666E635265676578705F46696E64006F626A52656765787046696E640053797374656D446174614163636573734B696E6400666E6352656765785F5265706C6163650049456E756D657261626C650049446973706F7361626C65004361707475726500446973706F73650044656275676761626C654174747269627574650053716C46756E6374696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C697479417474726962757465006765745F56616C75650053716C537472696E6700546F537472696E6700666E6352656765785F4D61746368005265676578702E646C6C006765745F49734E756C6C006765745F4974656D0053797374656D004D61746368436F6C6C656374696F6E0047726F7570436F6C6C656374696F6E00457863657074696F6E0044735F546578746F006473546578746F0047726F757000526567657870004D6963726F736F66742E53716C5365727665722E5365727665720044735F537562737469747569720049456E756D657261746F7200476574456E756D657261746F72002E63746F720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F646573004D6174636865730053797374656D2E446174612E53716C54797065730053797374656D2E546578742E526567756C617245787072657373696F6E730053797374656D2E436F6C6C656374696F6E730055736572446566696E656446756E6374696F6E730052656765784F7074696F6E73006765745F47726F757073006765745F53756363657373004F626A656374006F705F496D706C69636974006765745F43757272656E740041727261794C697374004D6F76654E6578740052656765780049734E756C6C4F72456D70747900000000000000000C14EAF7F46A2245B903BE327055CEBE0004200101080320000105200101111109070402111D12211225032000020306111D0320000E06000212210E0E0420001251052001124D08050001111D0E07070302111D12250600030E0E0E0E0F07081235081239021231123D1C1241040001020E08000312390E0E115D042000123D0320001C050001114508042001081C040701120C08B77A5C561934E08903061145080002111D111D111D0A0003111D111D111D111D06000212310E0E0A0003011C10114510111D072002011145111D0801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000701000000000401000000818501000400540E1146696C6C526F774D6574686F644E616D651346696C6C526F775F5265676578705F46696E64540E0F5461626C65446566696E6974696F6E244E725F4C696E686120494E542C2044735F546578746F204E56415243484152284D4158295455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D342E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A446174614163636573730100000054557F4D6963726F736F66742E53716C5365727665722E5365727665722E53797374656D446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D342E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038391053797374656D4461746141636365737301000000000000008C57115B00000000020000001C010000502B0000500D000052534453A6F053CD17E4E94A923D821619E3D25F01000000433A5C55736572735C646966696C5C446F63756D656E74735C56697375616C2053747564696F20323031375C50726F6A656374735C5265676578705C5265676578705C6F626A5C44656275675C5265676578702E7064620000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000942C00000000000000000000AE2C0000002000000000000000000000000000000000000000000000A02C0000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF25002000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100100000001800008000000000000000000000000000000100010000003000008000000000000000000000000000000100000000004800000058400000440200000000000000000000440234000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000000000000000000000000000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B004A4010000010053007400720069006E006700460069006C00650049006E0066006F0000008001000001003000300030003000300034006200300000002C0002000100460069006C0065004400650073006300720069007000740069006F006E000000000020000000300008000100460069006C006500560065007200730069006F006E000000000030002E0030002E0030002E003000000036000B00010049006E007400650072006E0061006C004E0061006D00650000005200650067006500780070002E0064006C006C00000000002800020001004C006500670061006C0043006F0070007900720069006700680074000000200000003E000B0001004F0072006900670069006E0061006C00460069006C0065006E0061006D00650000005200650067006500780070002E0064006C006C0000000000340008000100500072006F006400750063007400560065007200730069006F006E00000030002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000030002E0030002E0030002E00300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000C03C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 WITH PERMISSION_SET = SAFE GO CREATE FUNCTION [dbo].[fncRegex_Match] (@Ds_Texto [nvarchar](MAX), @Ds_Mascara [nvarchar](MAX)) RETURNS [nvarchar](MAX) AS EXTERNAL NAME [Regexp].[UserDefinedFunctions].[fncRegex_Match]; GO CREATE FUNCTION [dbo].[fncRegex_Replace] (@Ds_Texto [nvarchar](MAX), @Ds_Mascara [nvarchar](MAX), @Ds_Substituir [nvarchar](MAX)) RETURNS [nvarchar](MAX) AS EXTERNAL NAME [Regexp].[UserDefinedFunctions].[fncRegex_Replace]; GO CREATE FUNCTION [dbo].[fncRegexp_Find](@Ds_Texto [nvarchar](max), @Ds_Mascara [nvarchar](max)) RETURNS TABLE ( [Nr_Linha] [int] NULL, [Ds_Texto] [nvarchar](max) NULL ) WITH EXECUTE AS CALLER AS EXTERNAL NAME [Regexp].[UserDefinedFunctions].[fncRegexp_Find] GO |
Alternative #3: ECMA Dialect with OLE Automation and VBScript
Using the OLE Automation method, you will have access to all features of the ECMA dialect in your SQL Server, and can create very complex regular expressions. The cost of this is that using OLE Automation may slow your code execution than the other options, but its implementation is simpler than the alternative using SQLCLR.
Available since the 2000 release, OLE Automation may cause some instability in your instance due to known memory management issues by SQL Server, one of the reasons why SQLCLR was replaced with this feature. The message is: Use sparingly.
Want to know more about OLE Automation? Be sure to read these articles below:
- Enabling OLE Automation via T-SQL on SQL Server
- File Operations Using OLE Automation in SQL Server
- Other posts about OLE Automation
How to enable OLE Automation
To enable OLE Automation on your instance, simply run the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | sp_configure 'show advanced options', 1; GO RECONFIGURE; GO sp_configure 'Ole Automation Procedures', 1; GO RECONFIGURE; GO sp_configure 'Agent XPs', 1; GO RECONFIGURE; GO sp_configure 'show advanced options', 1; GO RECONFIGURE; GO |
To know my other articles about OLE Automation, access this link here.
For this topic, I will use the excellent functions of Phil Factor RegexReplace, RegexMatch and RegexFind (source code after the examples).
RegexMatch Usage Examples
Identify repeated words
1 2 3 | SELECT dbo.RegexMatch('\b(\w+)\s+\1\b','Essa frase frase contém palavras repetidas'), dbo.RegexMatch('\b(\w+)\s+\1\b','Essa frase NÃO contém palavras repetidas') |
Identify nearby words
1 2 3 4 5 6 7 | -- Verifica se na frase, a palavra "Resende" está de 1 até 6 palavras após a palavra "Dirceu" SELECT dbo.RegexMatch('\bDirceu(?:\W+\w+){1,6}?\W+Resende\b', 'Meu nome é Dirceu e meu sobrenome é Resende. Trabalho com BI e SQL Server') -- Verifica se na frase, a palavra "Resende" está de 1 até 2 palavras após a palavra "Dirceu" SELECT dbo.RegexMatch('\bDirceu(?:\W+\w+){1,2}?\W+Resende\b', 'Meu nome é Dirceu e meu sobrenome é Resende. Trabalho com BI e SQL Server') |
Validating a specific numeric mask (zip code)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT -- Retorna 1 se a string atende o formato "99999-999" dbo.RegexMatch('[0-9]{5}-[0-9]{3}', '29090-270'), dbo.RegexMatch('[0-9]{5}-[0-9]{3}', '29.090-270'), dbo.RegexMatch('[0-9]{5}-[0-9]{3}', '29090270') SELECT -- Retorna 1 se a string atende o formato "99.999-999" dbo.RegexMatch('[0-9]{2}.[0-9]{3}-[0-9]{3}', '29090-270'), dbo.RegexMatch('[0-9]{2}.[0-9]{3}-[0-9]{3}', '29.090-270'), dbo.RegexMatch('[0-9]{2}.[0-9]{3}-[0-9]{3}', '29090270') SELECT -- Retorna 1 se a string atende o formato "99999999" dbo.RegexMatch('[0-9]{8}', '29090-270'), dbo.RegexMatch('[0-9]{8}', '29.090-270'), dbo.RegexMatch('[0-9]{8}', '29090270') |
Validating CPF and CNPJ (format only, no DV)
1 2 3 4 5 6 7 8 9 10 11 | SELECT -- Valida se o CPF informado atende a máscara "999.999.999-99" dbo.RegexMatch('^\d{3}\.\d{3}\.\d{3}\-\d{2}$','123.456.789-09'), dbo.RegexMatch('^\d{3}\.\d{3}\.\d{3}\-\d{2}$','12X.456.789-09'), dbo.RegexMatch('^\d{3}\.\d{3}\.\d{3}\-\d{2}$','12345678909') SELECT -- Valida se o CNPJ informado atende a máscara "99.999.999/9999-99" dbo.RegexMatch('^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$','12.345.678/1234-09'), dbo.RegexMatch('^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$','12.3X5.678/1234-09'), dbo.RegexMatch('^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$','12345678123409') |
RegexFind Usage Examples
Recover passages where the word “Dirceu” appears in the text
1 2 3 4 5 6 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende e quero demonstrar como o Regexp é útil nesse artigo do Dirceu Resende blog do Wordpress' SELECT '...' + SUBSTRING(@exemplo, FirstIndex - 8, Length + 16) + '...' FROM dbo.RegexFind('\bDirceu(?:\W+\w+){0,3}', @exemplo, 1, 1) |
Finding repeated words over and over
1 2 3 4 | -- encontrando palavras repetidas seguidamente DECLARE @exemplo VARCHAR(2000) = 'nesse exemplo exemplo eu gostaria de eu demonstrar como identificar palavras palavras repetidas repetidas eu' SELECT [Value] FROM dbo.RegexFind ('\b(\w+)\s+\1\b', @exemplo, 1, 1) |
Returning a resultset with line breaks (split)
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(MAX) = ' Segunda linha Terceira linha Quarta linha' SELECT Value FROM dbo.RegexFind('[^\r\n]*(?:[\r\n]*)', @exemplo,1,1) WHERE [Length] > 0 |
Break lines for each word of the sentence (split)
1 2 3 4 | DECLARE @exemplo VARCHAR(MAX) = 'Com essa função, cada palavra vai ficar em uma linha do resultado' SELECT Value FROM dbo.RegexFind ('\b[\w]+\b', @exemplo,1,1) |
Retrieve valid dates only
1 2 3 4 5 6 7 8 9 10 11 12 13 | DECLARE @exemplo VARCHAR(MAX) = ' 12/2/2006 12:30 <> 13/2/2007 32/3/2007 2-4-2007 25.8.2007 1/1/2005 34/2/2104 2/5/2006' SELECT DISTINCT CONVERT(DATETIME, Value, 103) FROM dbo.RegexFind ('\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20?[0-9]{2})\b', @exemplo, 1, 1) |
Identifying license plates on a string
1 2 3 4 5 6 7 8 | DECLARE @exemplo VARCHAR(2000) = 'Olá pessoal! Meu nome é Dirceu Resende meu carro é placa AAA-9999 e BBB9999 ou ccc2222', @mascara VARCHAR(MAX) = '[a-zA-Z]{3}-?\d{4}' SELECT [Value] FROM dirceuresende.dbo.RegexFind(@mascara, @exemplo, 1, 1) |
RegexReplace Usage Examples
Identifies URL and encapsulates as html link - tag A
1 2 3 4 5 6 | SELECT dbo.RegexReplace( '\b(https?|ftp|file)://([-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|])', '<a href="$2">$2</a>', 'Testando Regexp com o site do https://www.google.com.br', 1, 1 ) |
Remove HTML strings from text
1 2 3 4 5 | DECLARE @exemplo VARCHAR(MAX) SET @exemplo = '<span class="Teste">Utilizando</span> <strong>RegExp</strong> para <u>remover</u> códigos HTML <!-- (até comentários) -->. Veja mais no <a href="https://dirceuresende.com/blog">meu blog</a>' SELECT dbo.RegexReplace('<(?:[^>''"]*|([''"]).*?\1)*>', '', @exemplo, 1, 1) |
Convert tabular data to INSERT command in database
1 2 3 4 5 6 7 8 9 10 11 12 13 | DECLARE @exemplo VARCHAR(MAX) = '1|Dirceu|154 2|Resende|467 3|Blog|1348 4|SQL Server|139 ' PRINT dbo.RegexReplace( '([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+', 'INSERT INTO Tabela (Coluna1, Coluna2, Coluna3) VALUES ($1, ''$2'', $3); ', @exemplo, 1, 1 ); |
Remove duplicate words
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = 'RegExp para remover remover palavras duplicadas duplicadas duplicadas' SELECT dbo.RegexReplace('\b(\w+)(?:\s+\1\b)+', '$1', @examplo, 1, 1) |
Removing Special Characters
1 2 3 4 | DECLARE @examplo VARCHAR(MAX) = '[email protected]%ndo car$act¨eres ()_+especi*%#!ais no meu texto:^{}<>|\' SELECT dbo.RegexReplace('[^a-zA-Z0-9 ]', '', @examplo, 1, 1) |
RegexFind Function Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |