Click on the banner to learn about and purchase my database training on Azure

SQL Server - How to identify and replace “strange things” (hidden characters) in strings and tables

Views: 20.429 views
Reading Time: 5 minutes

Speak guys!
In this article I would like to share a situation that usually occurs from time to time in my day-to-day consultancy, serving several different clients and environments, and which happened again on Friday, which is when there are “strange things” in strings and tables (an allusion to the series “Stranger Things”), which is the occurrence of non-visible characters.

After helping a consultant to identify and resolve this, I thought of creating a practical solution to resolve this more quickly the next time and also help other people who may have this problem in their daily lives, which is usually the case especially in environments who have just undergone migration or work with importing data from various systems, APIs or files.

Simulating the problem

To demonstrate to you this problem happening, I will create a very simple table, and fill this table with random values ​​and a fixed line that I will insert. For this, I will use a function to generate random values, which is fncRand, especially useful for use in functions:

With this function, I will create the table and insert some random values:

Example of the generated table:

Now that we created the table, I will demonstrate what the customer was trying to do, failed, and triggered us to help understand what was going on:

Ue .. The equality query returned nothing, only using the LIKE operator. I copied and pasted the string returned by the LIKE query and it still didn't work. What is going on?

Identifying the hidden character problem

Well, they probably have hidden characters in the middle of the string. A pretty quick way to figure this out is to count the number of characters in the string and analyze how many characters we can see:

According to the print above, I can see 8 characters, but the string has 10, according to the LEN and DATALENGTH functions, which probably indicates that we have “invisible” characters in our column or string, which can be the control characters of the ASCII table (remembering that the table varies according to language and collation):

To learn more about the DATALENGTH function and its difference to the LEN function, read the article. SQL Server - How to identify occurrences of a specific character in a string or table.

To help with this identification, I created the function below that will help us identify the lines that have these control characters:

To learn more about PATINDEX used in conjunction with regular expressions (RegExp), I suggest reading the article. SQL Server - How to use regular expressions (RegExp) in your database.

Its use is very simple, and returns the records that have “not visible” characters:

Identifying which characters are hidden

We now need to identify what these hidden characters are to evaluate if we will try to replace them or not. To make this task easier, I created the function fncMostra_Character_Hidden, which will take the original string and return the position and what is the ASCII code of each hidden character in the string:

Usage example - Analyzing a string

Usage example - Analyzing records from a table

Returning data without hidden characters

With the fncMostra_Character_Ocultos function, I was able to identify which characters of my string that are making my equality select not return the data. I'll test if that's right:

If you want to return the information without the “invisible” characters in an even easier way, you can use the function fncRemove_Caracteres_Ocultos:

Example of using the fncRemove_Character_Hidden function:

If you want to remove accents and special characters, in addition to hidden characters, I recommend reading the article. How to remove accent and special characters from a string in SQL Server.

Important Note: Like every UDF function, these functions can be slow when used in large volumes of data. If you need to use them frequently in large volumes of data, I suggest implementing functions using SQLCLR, as they often deliver much higher performance than UDF T-SQL functions, as I explained in the article. SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function.

Well guys, I hope you enjoyed this tip that I am sharing with you and that these functions can be useful in your daily life.
A big hug and see you in the next article.