Speak guys!
In this article, I'm going to share with you a series of options to filter and separate the number and text of a string using T-SQL functions and commands in SQL Server, which is a very common need in the daily lives of those who develop SQL queries several or want to validate the data in some table.
Test base creation
For the examples I'm going to demonstrate in this article, I'll use the following script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | IF (OBJECT_ID('tempdb..#Teste') IS NOT NULL) DROP TABLE #Teste CREATE TABLE #Teste ( Nr_Documento VARCHAR(50) ) INSERT INTO #Teste VALUES ('12345678909'), ('123.456.789-09'), ('Dirceu12345Resende678909.com'), (' 12345678909 '), ('"12345678909"'), ('d12345678909'), ('12345+6789-09'), ('123456.789'), ('R$ 123456.789'), ('$ 123456.789'), ('+123456.789'), ('-123456.789'), ('Dirceu Resende'), ('Dirceu[Resende]') |
How to filter the numeric part and text part of a string
A very common need too is to filter the numeric part and the text part of a string. Let's learn how to do this?
Example 1 – How to return rows with only numbers (NOT LIKE)
One of the frequent ways I see people trying to achieve this goal is to use the NOT LIKE operator to ignore lines that have certain characters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | SELECT * FROM #Teste WHERE Nr_Documento IS NOT NULL AND Nr_Documento NOT LIKE '%a%' AND Nr_Documento NOT LIKE '%b%' AND Nr_Documento NOT LIKE '%c%' AND Nr_Documento NOT LIKE '%d%' AND Nr_Documento NOT LIKE '%e%' AND Nr_Documento NOT LIKE '%f%' AND Nr_Documento NOT LIKE '%g%' AND Nr_Documento NOT LIKE '%h%' AND Nr_Documento NOT LIKE '%i%' AND Nr_Documento NOT LIKE '%j%' AND Nr_Documento NOT LIKE '%k%' AND Nr_Documento NOT LIKE '%l%' AND Nr_Documento NOT LIKE '%m%' AND Nr_Documento NOT LIKE '%n%' AND Nr_Documento NOT LIKE '%o%' AND Nr_Documento NOT LIKE '%p%' AND Nr_Documento NOT LIKE '%q%' AND Nr_Documento NOT LIKE '%r%' AND Nr_Documento NOT LIKE '%s%' AND Nr_Documento NOT LIKE '%t%' AND Nr_Documento NOT LIKE '%u%' AND Nr_Documento NOT LIKE '%v%' AND Nr_Documento NOT LIKE '%x%' AND Nr_Documento NOT LIKE '%w%' AND Nr_Documento NOT LIKE '%z%' AND Nr_Documento NOT LIKE '%y%' AND Nr_Documento NOT LIKE '%.%' AND Nr_Documento NOT LIKE '%-%' AND Nr_Documento NOT LIKE '%"%' AND Nr_Documento NOT LIKE '%[%' AND Nr_Documento NOT LIKE '%]%' |
Well.. The result was achieved, but the query turned out to be too large, difficult to maintain, poor performance and may need changes depending on the collation, especially if you are using a case-sensitive collation.
Example 2 – How to return rows with only numbers (ISNUMERIC)
In order to try to deliver a simpler and more effective solution, let's try to use another way to achieve this result through the function INUMERIC(), which will return 0 if the given parameter has any non-numeric character.
1 2 3 | SELECT * FROM #Teste WHERE ISNUMERIC(Nr_Documento) = 1 |
Once again, the desired result was achieved and the query was very simple and objective. But what if I want to return rows that contain only numbers anyway, ignoring even decimals, (+), (-) signs, currency symbols, etc?
Example 3 – How to return rows with only numbers (NOT LIKE and Regular Expression)
Another way to be able to return rows with only numbers is using NOT LIKE and Regular Expressions in SQL Server.
Example:
1 2 3 | SELECT * FROM #Teste WHERE Nr_Documento NOT LIKE '%[A-z]%' |
In this case, we return lines that don't have letters, but are still returning symbols...
Let's change our query a little bit and try again, but this time, I want to return the lines where I DON'T HAVE any characters that are NOT in the range 0 to 9 (only numbers):
1 2 3 | SELECT * FROM #Teste WHERE Nr_Documento NOT LIKE '%[^0-9]%' |
Show!! Now I finally have the result I wanted! Only lines that only have numbers.
Example 4 – How to return lines with only letters
In a scenario now opposite to the previous one, I want to do the opposite. I want to return lines with just letters. So the query is as simple as the one in our last example, right?
Example:
1 2 3 | SELECT * FROM #Teste WHERE Nr_Documento NOT LIKE '%[^A-z]%' |
Hey.. Nothing returned! But there was a string "Dirceu Resende" that should return... Ah, but there's the space... Let's try again:
1 2 3 | SELECT * FROM #Teste WHERE Nr_Documento NOT LIKE '%[^A-z ]%' -- Reparem esse espaço no final |
Ready 🙂
Example 5 – How to return lines that contain special characters
If you want to identify which lines have special characters, it's also pretty easy.
Example:
1 2 3 4 | -- Estou permitindo também, os caracteres (+), (.) e (-), além do espaço SELECT * FROM #Teste WHERE Nr_Documento LIKE '%[^A-z0-9 +.-]%' |
It worked! This is very useful to identify possible errors in data integrations and those invisible and hidden characters in the middle of the string (Read more about this in my article SQL Server - How to identify and replace “strange things” (hidden characters) in strings and tables)
How to separate the numeric part and the text part of a string
Unlike the examples above, here what I really want is to return the numeric part of the lines, where it is equal to “12345678909”, even if there are other characters in the middle of the string.
Example:
1 2 3 4 5 6 7 | SELECT Nr_Documento, dbo.fncRecupera_Numeros(Nr_Documento) FROM #Teste WHERE dbo.fncRecupera_Numeros(Nr_Documento) = '12345678909' |
I can also do the same thing with the textual part, where I'll return the lines whose textual phrase is “Dirceu Resende”:
1 2 3 4 5 6 7 8 | SELECT Nr_Documento, dbo.fncRecupera_Numeros(Nr_Documento) AS Parte_Numerica, dbo.fncRecupera_Letras(Nr_Documento) AS Parte_Textual FROM #Teste WHERE dbo.fncRecupera_Letras(Nr_Documento) IN ('Dirceu Resende', 'DirceuResende') |
Analyzing all the lines of this table, now also using the 2 functions:
1 2 3 4 5 6 | SELECT Nr_Documento, dbo.fncRecupera_Numeros(Nr_Documento) AS Parte_Numerica, dbo.fncRecupera_Letras(Nr_Documento) AS Parte_Textual FROM #Teste |
Another example to help fix the idea:
Liked these functions, right? Below is the source code for both:
Source code of fncRecupera_Letras
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | CREATE FUNCTION [dbo].[fncRecupera_Letras] ( @str VARCHAR(500) ) RETURNS VARCHAR(500) BEGIN DECLARE @startingIndex INT = 0 WHILE (1 = 1) BEGIN SET @startingIndex = PATINDEX('%[^a-Z|^ ]%', @str) IF @startingIndex <> 0 BEGIN SET @str = REPLACE(@str, SUBSTRING(@str, @startingIndex, 1), '') END ELSE BREAK END RETURN @str END |
Source code of fncRecupera_Numbers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | CREATE FUNCTION [dbo].[fncRecupera_Numeros] ( @str VARCHAR(500) ) RETURNS VARCHAR(500) BEGIN DECLARE @startingIndex INT = 0 WHILE (1 = 1) BEGIN SET @startingIndex = PATINDEX('%[^0-9]%', @str) IF @startingIndex <> 0 BEGIN SET @str = REPLACE(@str, SUBSTRING(@str, @startingIndex, 1), '') END ELSE BREAK END RETURN @str END |
Another tip: In general, user functions (UDF) are slow to run on large tables. Another alternative to greatly improve the performance of functions is to use SQLCLR functions, as I demonstrate the gains in the article SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function.
How to improve query performance using these functions
Another nice tip is that you can even create a calculated, indexed column to make searches much faster:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | -- Crio a tabela física no mesmo database das funções IF (OBJECT_ID('dbo.DadosExemplo') IS NOT NULL) DROP TABLE dbo.DadosExemplo CREATE TABLE dbo.DadosExemplo ( Nr_Documento VARCHAR(50) ) INSERT INTO dbo.DadosExemplo VALUES ('12345678909'), ('123.456.789-09'), ('Dirceu12345Resende678909.com'), (' 12345678909 '), ('"12345678909"'), ('d12345678909'), ('12345+6789-09'), ('123456.789'), ('R$ 123456.789'), ('$ 123456.789'), ('+123456.789'), ('-123456.789'), ('Dirceu Resende'), ('Dirceu[Resende]') |
Now I'm going to try to include the columns calculated using the functions:
1 2 3 | ALTER TABLE dbo.DadosExemplo ADD Parte_Numerica AS (dbo.fncRecupera_Numeros(Nr_Documento)) PERSISTED, Parte_Textual AS (dbo.fncRecupera_Letras(Nr_Documento)) PERSISTED |
And the result is an error message:
Computed column 'Numeric_Part' in table 'ExampleData' cannot be persisted because the column is non-deterministic.
As I explained about this error in the article SQL Server - Using Calculated Columns (or Computed Columns) for Performance Tuning, to use function as calculated column, the function must be deterministic. In this case, to make these 2 functions deterministic, let's add the WITH SCHEMABINDING clause in the CREATE FUNCTION command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | IF (OBJECT_ID('dbo.fncRecupera_Letras') IS NOT NULL) DROP FUNCTION dbo.fncRecupera_Letras GO CREATE FUNCTION [dbo].[fncRecupera_Letras] ( @str VARCHAR(500) ) RETURNS VARCHAR(500) WITH SCHEMABINDING BEGIN DECLARE @startingIndex INT = 0 WHILE (1 = 1) BEGIN SET @startingIndex = PATINDEX('%[^a-Z|^ ]%', @str) IF @startingIndex <> 0 BEGIN SET @str = REPLACE(@str, SUBSTRING(@str, @startingIndex, 1), '') END ELSE BREAK END RETURN @str END GO IF (OBJECT_ID('dbo.fncRecupera_Numeros') IS NOT NULL) DROP FUNCTION dbo.fncRecupera_Numeros GO CREATE FUNCTION [dbo].[fncRecupera_Numeros] ( @str VARCHAR(500) ) RETURNS VARCHAR(500) WITH SCHEMABINDING BEGIN DECLARE @startingIndex INT = 0 WHILE (1 = 1) BEGIN SET @startingIndex = PATINDEX('%[^0-9]%', @str) IF @startingIndex <> 0 BEGIN SET @str = REPLACE(@str, SUBSTRING(@str, @startingIndex, 1), '') END ELSE BREAK END RETURN @str END GO |
Now I will try to include again the columns calculated using the functions, which have now been created as deterministic (WITH SCHEMABINDING):
1 2 3 | ALTER TABLE dbo.DadosExemplo ADD Parte_Numerica AS (dbo.fncRecupera_Numeros(Nr_Documento)) PERSISTED, Parte_Textual AS (dbo.fncRecupera_Letras(Nr_Documento)) PERSISTED |
And the command was executed successfully. Let's look at our table again:
And now, we can index these columns normally:
1 2 | CREATE NONCLUSTERED INDEX SK01_DadosExemplo ON dbo.DadosExemplo(Parte_Numerica) INCLUDE(Nr_Documento) WITH(DATA_COMPRESSION=PAGE) CREATE NONCLUSTERED INDEX SK02_DadosExemplo ON dbo.DadosExemplo(Parte_Textual) INCLUDE(Nr_Documento) WITH(DATA_COMPRESSION=PAGE) |
And now when I want to query this calculated column, it will have excellent performance (of course, depending on how the query is done and how the indexes were created):
- SQL Server - Introduction to Performance Tuning Study
- Understanding Index Functioning in SQL Server
- SQL Server - Performance Tuning Tips: Implicit Conversion? NEVER!
- SQL Server - Performance Tuning Tips: What is the Difference Between Seek Predicate and Predicate?
- SQL Server is extremely fast. You who don't know how to use it!
And that's it folks!
I hope you enjoyed this post and see you next time!