SQL Server - How to identify and replace “strange things” (hidden characters) in strings and tables

Views: 3.455
Reading Time: 5 minutes

Hey guys!
In this article I would like to share a situation that often occurs from time to time in my day to day consultancy, serving a number of different clients and environments, and which on Friday reoccurs, which is when there are “weird things” in strings and tables (an allusion to the Stranger Things series), which is the occurrence of non-visible characters.

After helping a consultant identify and solve this, I thought I'd come up with a practical solution to resolve this faster next time, and also help others who might have this problem in their daily lives, which is especially true in environments. who have just migrated or are working on importing data from multiple systems, APIs or files.

Simulating the problem

To demonstrate to you this problem happening, I will create a very simple table, and fill this table with random values ​​and a fixed line that I will insert. For this, I will use a function to generate random values, which is fncRand, especially useful for use in functions:

With this function, I will create the table and insert some random values:

Example of the generated table:

Now that we created the table, I will demonstrate what the customer was trying to do, failed, and triggered us to help understand what was going on:

Ue .. The equality query returned nothing, only using the LIKE operator. I copied and pasted the string returned by the LIKE query and it still didn't work. What is going on?

Identifying the hidden character problem

Well, they probably have hidden characters in the middle of the string. A pretty quick way to figure this out is to count the number of characters in the string and analyze how many characters we can see:

As per the above print, I can see 8 characters, but the string has 10, according to the LEN and DATALENGTH functions, which probably indicates that we have “invisible” characters in our column or string, which may be the control characters of the string. ASCII table (remembering that the table varies by language and collation):

To learn more about the DATALENGTH function and its difference to the LEN function, read the article. SQL Server - How to identify occurrences of a specific character in a string or table.

To help with this identification, I created the function below that will help us identify the lines that have these control characters:

To learn more about PATINDEX used in conjunction with regular expressions (RegExp), I suggest reading the article. SQL Server - How to use regular expressions (RegExp) in your database.

Its use is quite simple, and returns the records that have “not visible” characters:

Identifying which characters are hidden

We now need to identify what these hidden characters are to evaluate if we will try to replace them or not. To make this task easier, I created the function fncMostra_Character_Hidden, which will take the original string and return the position and what is the ASCII code of each hidden character in the string:

Example Usage - Parsing a String

Example Usage - Analyzing Records in a Table

Returning data without hidden characters

With the fncMostra_Character_Ocultos function, I was able to identify which characters of my string that are making my equality select not return the data. I'll test if that's right:

If you want to return information without the “invisible” characters even easier, you can use the fncRemove_Character_Ocultos function:

Example of using the fncRemove_Character_Hidden function:

If you want to remove accents and special characters, in addition to hidden characters, I recommend reading the article. How to remove accent and special characters from a string in SQL Server.

Important Note: Like every UDF function, these functions can be slow when used in large volumes of data. If you need to use them frequently in large volumes of data, I suggest implementing functions using SQLCLR, as they often deliver much higher performance than UDF T-SQL functions, as I explained in the article. SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function.

Well guys, I hope you enjoyed this tip that I am sharing with you and that these functions can be useful in your daily life.
A big hug and see you in the next article.