Click on the banner to learn about and purchase my database training on Azure

SQL Server - How to identify occurrences of a specific character in a string or table

Views: 12.631 views
Reading Time: 6 minutes

Hey guys!
All right with you ?

In this quick post I will demonstrate to you how to identify the occurrences of a specific character in a string or table, that is, count how many times the character “X” appears in each row of a table.

For the examples below, I will use 2 SP's from the CLR that I demonstrated how to create (in addition to other alternatives, for those who do not want to use CLR) in the articles below:

Today I had a problem importing a CSV file of more than 40 thousand lines and the routine, which has been running daily and without problems, returned the error message below.

Msg 6522, Level 16, State 1, Procedure stpImporta_CSV, Line 0 [Batch Start Line 0] The .NET Framework error occurred during execution of user-defined routine or aggregate “stpImporta_CSV”:
System.ApplicationException: Error: The input array is larger than the number of columns in this table.

ou

Msg 6522, Level 16, State 1, Procedure stpImporta_CSV, Line 0 [Batch Start Line 0] The .NET Framework error occurred during execution of user-defined routine or aggregate “stpImporta_CSV”: System.ApplicationException: Error : Input array is longer than the number of columns in this table.

From the description of the error, it became clear that there was a problem with my CSV (probably a pipe, which is my separator character in the file, in the middle of the strings). I thought of some solutions that I could create to identify the wrong record, such as using cursor, while loop and until came a VERY SIMPLE, fast and extremely effective solution: Our good old SELECT.

Example of imported CSV - 2 pipes separating Name, Age and E-mail

And now, with the query below, we can easily find out which records have different pipes than the rest of the rows.

After that, you just change the file, correct the lines and import again (and that's what I did in my case where only 1 line had problem).

The white space problem

In conversation with Ariel fernandez, he reminded me that when using the LEN () function, SQL Server applies an RTRIM () to the string implicitly, that is, if there are blanks on the right, these spaces will be cut in the calculation. For most cases, this will have no impact, but if the separator character we are looking for is exactly the space "", this will be a problem:

Note that the above string has 5 blanks at the beginning and end of the string, plus 2 in the middle of the string that would be the separators. When applying the LEN () function, the trailing blank 5 characters are removed and the end result will be 7 instead of 12.

To solve this problem, we can use the DATALENGTH function, which returns the number of bytes in a string (LEN returns the number of characters). With this, our query works correctly with the above example:

The DATALENGTH Function and UNICODE Strings

This solution seems to have solved our problem, but DATALENGTH has a peculiarity when using Unicode data (NCHAR, NVARCHAR, etc.), because these data types write the information in double-byte format, that is, 2 bytes are required for each. string character. As a result, the result of the DATALENGTH function ends up doubling for these data types, as shown below:

Using the SQL_VARIANT_PROPERTY Function to Identify the Variable Type

A more definitive solution to this would be to identify the input variable type (or column type) and if unicode type, divide the datalength result by 2. In order to identify the data type of our variable, let's use the SQL_VARIANT_PROPERTY () function:

Now using this function for our needs, we can use it to identify the type of the variable and perform the correct calculation.

Query evaluating a variable string:

Query using a string in a table:

Identifying the record that has more spaces than the other lines:

SQL_VARIANT_PROPERTY function and MAX-sized strings

Even with the above solution, we still have a possible problem. If the string or column is of type VARCHAR (MAX) or NVARCHAR (MAX), the SQL_VARIANT_PROPERTY () function has errors when used. In this case, the identification of the need to divide the result by 2 or not, will have to be done by you manually.

Example:

Result:

Msg 206, Level 16, State 2, Line 3
Operand type clash: nvarchar (max) is incompatible with sql_variant
Msg 206, Level 16, State 2, Line 7
Clash operand type: varchar (max) is incompatible with sql_variant

I hope you enjoyed this very simple and quick post and that this idea may be useful to you someday. If you would like to know more about the differences of the LEN () and DATALENGTH () functions, take a look in this post here. It is in English, but is very explanatory and complete.

Hugs!

How to identify occurrences of a specific character in a string or table count how many character characters string row

How to identify occurrences of a specific character in a string or table count how many character characters string row