Click on the banner to learn about and purchase my database training on Azure

SQL Server - How to use regular expressions (RegExp) in your database

Views: 26.755 views
Reading Time: 15 minutes

Hello everybody!
Excited for another post ???

In this article I would like to show you how we can use regular expressions (RegExp) in your database through queries, whether to create some monitoring or even to create BI reports and indicators or some system.

Regular expression (or the regex or regexp foreignness, short for regular expression) provides a concise and flexible way to identify strings of interest, such as particular characters, words, or character patterns. In other words, regular expression is a way of simply performing extremely complex string operations that would require various conditions to handle this data.

As you may know, the Regexp dialect is not something that is common sense among programming languages. There are several different implementations and the rules of regular expression are similar, but can change between languages. Ex: A regexp expression that works in .NET may not work in PHP or Java and vice versa.

In SQL Server, we can use 2 dialects to use our regular expressions:

  • . NET: Dialect available using SQLCLR (available from SQL Server 2005) and the .NET Framework system.text.regularexpression library
  • ECMA: Dialect available using OLE Automation (available from SQL Server 2000) and the VBScript.RegExp module. This dialect is the same as Javascript

For terms of comparison:

  • SQL Server native regexp (using LIKE or PATINDEX): Limited (compared to most common dialects) and has the best performance among 3
  • .NET: As complete as the ECMA dialect (ie, it has MUCH more features than the native SQL Server), it has performance slightly below the native SQL Server, but MUCH (very much) above ECMA (because of OLE Automation, not from the dialect itself).
  • ECMA: As complete as the .NET dialect (ie, it has MUCH more features than the native SQL Server) but has a very poor performance when compared to other 2. Its advantage over .NET is that it does not require the creation of SQLCLR objects and is available in SQL Server 2000.

The purpose of this post is to demonstrate how to use regular expressions in SQL Server and to show some simple examples of this. The focus here is not on becoming a regular expression expert. Regexp was not meant to be easily understood. It requires a lot of effort and study to master this language. For this, I suggest the excellent site Regular-Expressions.info and also the Brazilian website of Aurelio Jargas.

To help you understand the use of regular expressions (.NET and ECMA dialects), follow the image below from the site RegExLib:

Alternative #1: LIKE and PATINDEX

Using this native SQL Server solution, you will be able to use simple but very useful regular expressions in your daily life, and with good execution performance. The advantage of this method is that it does not require any additional permissions on the database, nor does it need to enable any advanced features or create new objects on the database.

View content
Many people don't know it yet, but SQL Server has native support for using some regular expressions (RegExp) through the LIKE and PATINDEX operators, as I will demonstrate some examples below:

Case sensitive

Filtering the first letter

Filtering the first letters + Case sensitive

Applying a specific custom filter

Using the negation operator (^)

Using the escape operator

Identifying Special Characters

Need to remove special characters? Learn more by accessing the post How to remove accent and special characters from a string in SQL Server.

Using numbers

Validating Emails

Result:

Retrieving only the numeric part of a string

Example:

Want to know more about data validation? Access the posts below:
- Validating CPF, CNPJ, Email, Phone, and Zip Code in SQL Server
- How to validate state registration for all states using C # (CSharp) and SQL Server CLR
- How to validate state registration using T-SQL function in SQL Server

Alternative #2: .NET Dialect with SQLCLR

With this method you will have access to use the .NET Framework regular expressions, which features high performance and all available dialect features using SQLCLR and C #. It is available since SQL Server 2005.

If you would like to know more about SQLCLR, be sure to read the following articles:
- Introduction to SQL Common Language Runtime (CLR) in SQL Server
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- Other posts about SQLCLR

View content

FncRegex_Match Examples

Identifying repeated words

Validating a specific numeric mask (zip code)

Validating CPF and CNPJ (format only, no DV)

FncRegex_Find Examples

Recover passages where the word “Dirceu” appears in the text

Finding repeated words over and over

Returning a resultset with line breaks (split)

Retrieve valid dates only

Identifying license plates on a string

FncRegex_Replace Examples

Removing Special Characters

Removing HTML Tags

Convert tabular data to INSERT command in database

Remove duplicate words

C # source code of functions

If you want to use these functions in your own SQLCLR project, you can use the source codes provided below:

fncRegex_Replace

fncRegexp_Match

fncRegexp_Find

T-SQL source code of functions

If you are not very familiar with C # or Visual Studio projects, but still want to use regular expressions in SQL Server, I will make available the T-SQL code below, which will allow you to create the assembly and functions in your database, without much effort, just pressing “F5” in this script:

Alternative #3: ECMA Dialect with OLE Automation and VBScript

Using the OLE Automation method, you will have access to all features of the ECMA dialect in your SQL Server, and can create very complex regular expressions. The cost of this is that using OLE Automation may slow your code execution than the other options, but its implementation is simpler than the alternative using SQLCLR.

Available since the 2000 release, OLE Automation may cause some instability in your instance due to known memory management issues by SQL Server, one of the reasons why SQLCLR was replaced with this feature. The message is: Use sparingly.

Want to know more about OLE Automation? Be sure to read these articles below:
- Enabling OLE Automation via T-SQL on SQL Server
- File Operations Using OLE Automation in SQL Server
- Other posts about OLE Automation

View content
An alternative to the SQL Server LIKE operator, which, while offering (relatively) good performance and some cool features, is still somewhat limited and does not support most complex regular expression operators, we can use the ECMA dialect. (same as Javascript) with the help of the OLE Automation feature.

How to enable OLE Automation

To enable OLE Automation on your instance, simply run the following command:

To know my other articles about OLE Automation, access this link here.

For this topic, I will use the excellent functions of Phil Factor RegexReplace, RegexMatch and RegexFind (source code after the examples).

RegexMatch Usage Examples

Identify repeated words

Identify nearby words

Validating a specific numeric mask (zip code)

Validating CPF and CNPJ (format only, no DV)

RegexFind Usage Examples

Recover passages where the word “Dirceu” appears in the text

Finding repeated words over and over

Returning a resultset with line breaks (split)

Break lines for each word of the sentence (split)

Retrieve valid dates only

Identifying license plates on a string

RegexReplace Usage Examples

Identifies URL and encapsulates as html link - tag A

Remove HTML strings from text

Convert tabular data to INSERT command in database

Remove duplicate words

Removing Special Characters

RegexFind Function Code