Hey guys!
All right with you ?
In this quick post I will demonstrate to you how to identify the occurrences of a specific character in a string or table, that is, count how many times the character “X” appears in each row of a table.
For the examples below, I will use 2 SP's from the CLR that I demonstrated how to create (in addition to other alternatives, for those who do not want to use CLR) in the articles below:
- SQL Server - How to export and import tabular data files (Ex: CSV) using CLR (C #)
- Importing CSV Files into the SQL Server Database
- SQL Server - How to import text files into database (OLE Automation, CLR, BCP, BULK INSERT, OPENROWSET)
Today I had a problem importing a CSV file of more than 40 thousand lines and the routine, which has been running daily and without problems, returned the error message below.
1 2 3 4 5 6 7 8 9 |
IF (OBJECT_ID('tempdb..##Saida') IS NOT NULL) DROP TABLE ##Saida EXEC CLR.dbo.stpImporta_CSV @Ds_Caminho_Arquivo = N'C:\Users\difil\Desktop\Teste.csv', -- nvarchar(max) @Ds_Separador = N'|', -- nvarchar(max) @Fl_Primeira_Linha_Cabecalho = 0, -- bit @Nr_Linha_Inicio = 0, -- int @Nr_Linhas_Retirar_Final = 0, -- int @Ds_Tabela_Destino = N'##Saida', -- nvarchar(max) @Ds_Codificacao = N'utf-8' -- nvarchar(max) |
Msg 6522, Level 16, State 1, Procedure stpImporta_CSV, Line 0 [Batch Start Line 0] The .NET Framework error occurred during execution of user-defined routine or aggregate “stpImporta_CSV”:
System.ApplicationException: Error: The input array is larger than the number of columns in this table.
ou
Msg 6522, Level 16, State 1, Procedure stpImporta_CSV, Line 0 [Batch Start Line 0] The .NET Framework error occurred during execution of user-defined routine or aggregate “stpImporta_CSV”: System.ApplicationException: Error : Input array is longer than the number of columns in this table.
From the description of the error, it became clear that there was a problem with my CSV (probably a pipe, which is my separator character in the file, in the middle of the strings). I thought of some solutions that I could create to identify the wrong record, such as using cursor, while loop and until came a VERY SIMPLE, fast and extremely effective solution: Our good old SELECT.
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE #dirceuresende ( Ds_Texto VARCHAR(MAX) ) INSERT INTO #dirceuresende EXEC CLR.dbo.stpImporta_Txt @caminho = N'C:\Users\difil\Desktop\Teste.csv' -- nvarchar(max) SELECT * FROM #dirceuresende |
Example of imported CSV - 2 pipes separating Name, Age and E-mail
And now, with the query below, we can easily find out which records have different pipes than the rest of the rows.
1 2 3 |
SELECT *, LEN(Ds_Texto) - LEN(REPLACE(Ds_Texto, '|', '')) AS Qt_Pipes FROM #dirceuresende WHERE LEN(Ds_Texto) - LEN(REPLACE(Ds_Texto, '|', '')) != 2 |
After that, you just change the file, correct the lines and import again (and that's what I did in my case where only 1 line had problem).
The white space problem
In conversation with Ariel fernandez, he reminded me that when using the LEN () function, SQL Server applies an RTRIM () to the string implicitly, that is, if there are blanks on the right, these spaces will be cut in the calculation. For most cases, this will have no impact, but if the separator character we are looking for is exactly the space "", this will be a problem:
1 2 3 |
-- 5 espaços em branco no inicio e no final da string DECLARE @String VARCHAR(100) = N' Dirceu 29 email ' SELECT (LEN(@String) - LEN(REPLACE(@String, ' ', ''))) |
Note that the above string has 5 blanks at the beginning and end of the string, plus 2 in the middle of the string that would be the separators. When applying the LEN () function, the trailing blank 5 characters are removed and the end result will be 7 instead of 12.
To solve this problem, we can use the DATALENGTH function, which returns the number of bytes in a string (LEN returns the number of characters). With this, our query works correctly with the above example:
1 2 3 |
-- 5 espaços em branco no inicio e no final da string DECLARE @String VARCHAR(100) = N' Dirceu 29 email ' SELECT (DATALENGTH(@String) - DATALENGTH(REPLACE(@String, ' ', ''))) |
The DATALENGTH Function and UNICODE Strings
This solution seems to have solved our problem, but DATALENGTH has a peculiarity when using Unicode data (NCHAR, NVARCHAR, etc.), because these data types write the information in double-byte format, that is, 2 bytes are required for each. string character. As a result, the result of the DATALENGTH function ends up doubling for these data types, as shown below:
Using the SQL_VARIANT_PROPERTY Function to Identify the Variable Type
A more definitive solution to this would be to identify the input variable type (or column type) and if unicode type, divide the datalength result by 2. In order to identify the data type of our variable, let's use the SQL_VARIANT_PROPERTY () function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
DECLARE @String NVARCHAR(100) = N' Dirceu 29 email ' SELECT SQL_VARIANT_PROPERTY(@String, 'BaseType') AS [Base Type], SQL_VARIANT_PROPERTY(@String, 'Precision') AS [Precision], SQL_VARIANT_PROPERTY(@String, 'Scale') AS Scale, SQL_VARIANT_PROPERTY(@String, 'Collation') AS Collation, SQL_VARIANT_PROPERTY(@String, 'MaxLength') AS [MaxLength], SQL_VARIANT_PROPERTY(@String, 'TotalBytes') AS TotalBytes DECLARE @String2 VARCHAR(100) = ' Dirceu 29 email ' SELECT SQL_VARIANT_PROPERTY(@String2, 'BaseType') AS [Base Type], SQL_VARIANT_PROPERTY(@String2, 'Precision') AS [Precision], SQL_VARIANT_PROPERTY(@String2, 'Scale') AS Scale, SQL_VARIANT_PROPERTY(@String2, 'Collation') AS Collation, SQL_VARIANT_PROPERTY(@String2, 'MaxLength') AS [MaxLength], SQL_VARIANT_PROPERTY(@String2, 'TotalBytes') AS TotalBytes |
Now using this function for our needs, we can use it to identify the type of the variable and perform the correct calculation.
Query evaluating a variable string:
1 2 3 4 5 6 7 |
-- 5 espaços em branco no inicio e no final da string UNICODE DECLARE @String NVARCHAR(100) = N' Dirceu 29 email ' SELECT (DATALENGTH(@String) - DATALENGTH(REPLACE(@String, ' ', ''))) / (CASE WHEN SQL_VARIANT_PROPERTY(@String, 'BaseType') = 'nvarchar' THEN 2 ELSE 1 END) -- 5 espaços em branco no inicio e no final da string DECLARE @String2 VARCHAR(100) = ' Dirceu 29 email ' SELECT (DATALENGTH(@String2) - DATALENGTH(REPLACE(@String2, ' ', ''))) / (CASE WHEN SQL_VARIANT_PROPERTY(@String2, 'BaseType') = 'nvarchar' THEN 2 ELSE 1 END) |
Query using a string in a table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
IF (OBJECT_ID('tempdb..#dirceuresende') IS NOT NULL) DROP TABLE #dirceuresende CREATE TABLE #dirceuresende ( Ds_Texto VARCHAR(4000) ) INSERT INTO #dirceuresende VALUES(' Dirceu 29 email '), (' Teste 30 email2 '), (' Te ste 30 email2 ') SELECT *, (DATALENGTH(Ds_Texto) - DATALENGTH(REPLACE(Ds_Texto, ' ', ''))) / (CASE WHEN SQL_VARIANT_PROPERTY(Ds_Texto, 'BaseType') = 'nvarchar' THEN 2 ELSE 1 END) AS Qt_Pipes FROM #dirceuresende WHERE (DATALENGTH(Ds_Texto) - DATALENGTH(REPLACE(Ds_Texto, ' ', ''))) / (CASE WHEN SQL_VARIANT_PROPERTY(Ds_Texto, 'BaseType') = 'nvarchar' THEN 2 ELSE 1 END) != 13 |
Identifying the record that has more spaces than the other lines:
SQL_VARIANT_PROPERTY function and MAX-sized strings
Even with the above solution, we still have a possible problem. If the string or column is of type VARCHAR (MAX) or NVARCHAR (MAX), the SQL_VARIANT_PROPERTY () function has errors when used. In this case, the identification of the need to divide the result by 2 or not, will have to be done by you manually.
Example:
1 2 3 4 5 6 7 |
-- 5 espaços em branco no inicio e no final da string UNICODE DECLARE @String NVARCHAR(MAX) = N' Dirceu 29 email ' SELECT (DATALENGTH(@String) - DATALENGTH(REPLACE(@String, ' ', ''))) / (CASE WHEN SQL_VARIANT_PROPERTY(@String, 'BaseType') = 'nvarchar' THEN 2 ELSE 1 END) -- 5 espaços em branco no inicio e no final da string DECLARE @String2 VARCHAR(MAX) = ' Dirceu 29 email ' SELECT (DATALENGTH(@String2) - DATALENGTH(REPLACE(@String2, ' ', ''))) / (CASE WHEN SQL_VARIANT_PROPERTY(@String2, 'BaseType') = 'nvarchar' THEN 2 ELSE 1 END) |
Result:
Msg 206, Level 16, State 2, Line 3
Operand type clash: nvarchar (max) is incompatible with sql_variant
Msg 206, Level 16, State 2, Line 7
Clash operand type: varchar (max) is incompatible with sql_variant
I hope you enjoyed this very simple and quick post and that this idea may be useful to you someday. If you would like to know more about the differences of the LEN () and DATALENGTH () functions, take a look in this post here. It is in English, but is very explanatory and complete.
Hugs!
How to identify occurrences of a specific character in a string or table count how many character characters string row
How to identify occurrences of a specific character in a string or table count how many character characters string row