Speak guys!
In this article I would like to share a situation that usually occurs from time to time in my day-to-day consultancy, serving several different clients and environments, and which happened again on Friday, which is when there are “strange things” in strings and tables (an allusion to the series “Stranger Things”), which is the occurrence of non-visible characters.
After helping a consultant to identify and resolve this, I thought of creating a practical solution to resolve this more quickly the next time and also help other people who may have this problem in their daily lives, which is usually the case especially in environments who have just undergone migration or work with importing data from various systems, APIs or files.
Simulating the problem
To demonstrate to you this problem happening, I will create a very simple table, and fill this table with random values and a fixed line that I will insert. For this, I will use a function to generate random values, which is fncRand, especially useful for use in functions:
1 2 3 4 5 6 7 8 9 |
CREATE FUNCTION dbo.fncRand( @Numero BIGINT ) RETURNS BIGINT AS BEGIN RETURN (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * @Numero END GO |
With this function, I will create the table and insert some random values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
IF (OBJECT_ID('tempdb..#Teste') IS NOT NULL) DROP TABLE #Teste CREATE TABLE #Teste ( Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY, Nome VARCHAR(100) NOT NULL ) DECLARE @Contador INT = 1, @Total INT = 100, @Contador2 INT = 1, @Total2 INT = 10, @String VARCHAR(100) WHILE(@Contador <= @Total) BEGIN SET @Contador2 = 1 SET @String = '' WHILE(@Contador2 <= @Total2) BEGIN IF (@Contador2 <= 8) SET @String += CHAR(65 + dbo.fncRand(25)) ELSE SET @String += CHAR(dbo.fncRand(255)) SET @Contador2 += 1 END INSERT INTO #Teste VALUES(@String) SET @Contador += 1 END |
Example of the generated table:
Now that we created the table, I will demonstrate what the customer was trying to do, failed, and triggered us to help understand what was going on:
Ue .. The equality query returned nothing, only using the LIKE operator. I copied and pasted the string returned by the LIKE query and it still didn't work. What is going on?
Identifying the hidden character problem
Well, they probably have hidden characters in the middle of the string. A pretty quick way to figure this out is to count the number of characters in the string and analyze how many characters we can see:
According to the print above, I can see 8 characters, but the string has 10, according to the LEN and DATALENGTH functions, which probably indicates that we have “invisible” characters in our column or string, which can be the control characters of the ASCII table (remembering that the table varies according to language and collation):
To help with this identification, I created the function below that will help us identify the lines that have these control characters:
1 2 3 4 5 6 7 8 |
CREATE FUNCTION [dbo].[fncPossui_Caractere_Oculto]( @String VARCHAR(MAX) ) RETURNS BIT AS BEGIN RETURN (CASE WHEN PATINDEX('%[^ !"#$%&''()*+,-./0123456789:;<=>[email protected]\^_`abcdefghijklmnopqrstuvwxyz|{}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ[[]%', REPLACE(@String, ']', '')) > 0 THEN 1 ELSE 0 END) END |
Its use is very simple, and returns the records that have “not visible” characters:
Identifying which characters are hidden
We now need to identify what these hidden characters are to evaluate if we will try to replace them or not. To make this task easier, I created the function fncMostra_Character_Hidden, which will take the original string and return the position and what is the ASCII code of each hidden character in the string:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
CREATE FUNCTION [dbo].[fncMostra_Caracteres_Ocultos]( @String VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Result VARCHAR(MAX) = '', @Contador INT = 1, @Total INT, @AdicionarBarra BIT = 0 SET @Total = LEN(@String) WHILE(@Contador <= @Total) BEGIN IF (PATINDEX('%[^ !"#$%&''()*+,-./0123456789:;<=>[email protected]\^_`abcdefghijklmnopqrstuvwxyz|{}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ[[]%', SUBSTRING(REPLACE(@String, ']', ''), @Contador, 1)) > 0) BEGIN SET @Result += (CASE WHEN @AdicionarBarra = 1 THEN ' | ' ELSE '' END) + 'Pos ' + CAST(@Contador AS VARCHAR(100)) + ': CHAR(' + CAST(ASCII(SUBSTRING(@String, @Contador, 1)) AS VARCHAR(5)) + ')' SET @AdicionarBarra = 1 END SET @Contador += 1 END RETURN @Result END GO |
Usage example - Analyzing a string
Usage example - Analyzing records from a table
Returning data without hidden characters
With the fncMostra_Character_Ocultos function, I was able to identify which characters of my string that are making my equality select not return the data. I'll test if that's right:
If you want to return the information without the “invisible” characters in an even easier way, you can use the function fncRemove_Caracteres_Ocultos:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
CREATE FUNCTION [dbo].[fncRemove_Caracteres_Ocultos]( @String VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Result VARCHAR(MAX), @StartingIndex INT = 0 WHILE (1 = 1) BEGIN SET @StartingIndex = PATINDEX('%[^ !"#$%&''()*+,-./0123456789:;<=>[email protected]\^_`abcdefghijklmnopqrstuvwxyz|{}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ[[]%', REPLACE(@String, ']', '')) IF (@StartingIndex <> 0) SET @String = REPLACE(@String,SUBSTRING(@String, @StartingIndex,1),'') ELSE BREAK END SET @Result = REPLACE(@String,'|','') RETURN @Result END GO |
Example of using the fncRemove_Character_Hidden function:
Important Note: Like every UDF function, these functions can be slow when used in large volumes of data. If you need to use them frequently in large volumes of data, I suggest implementing functions using SQLCLR, as they often deliver much higher performance than UDF T-SQL functions, as I explained in the article. SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function.
Well guys, I hope you enjoyed this tip that I am sharing with you and that these functions can be useful in your daily life.
A big hug and see you in the next article.
You are to be congratulated for the site, lots of nice information and for sure if my company needs I will recommend it. Thank you very much for your help.
In fact, with the two quotation marks in front of the hyphen, he stopped taking all the characters…
Good afternoon…
Great article Dirceu, ball show!
I had the same problem with Alexandre and managed to solve it by adding two single quotes ”in front of the hyphen. As a result, the “-” stopped being flagged as a special character.
I believe this was because the hyphen is part of sql syntax when referring to lists (eg [0-9]). Is that why Dirceu?
But now I would like to add the tab to the allowed list, and this one I haven't gotten yet. Is there how I put by the ASCII code Dirceu?
Article show and functions Dirceu, I'm already using. Thank you.
I only had one question, in Patindex, there is the character “-“, I removed it from the functions but it continues to be returned as the special, why is it?
Send me your function by email that I will see what you have changed and tell you what is missing, beauty?