Hello people,
Good afternoon!
In this post I will show you how to quickly and easily solve a problem that although it is simple and the message is very clear, I have seen many analysts not knowing how to solve.
Cannot resolve the collation conflict between “SQL_Latin1_General_CP1_CI_AI” and “SQL_Latin1_General_CP1_CS_AS” in the equal to operation.
What is a COLLATION?What is a COLLATION?
Collation is nothing more than the form of character encoding that a database uses to interpret them.
A Collation is a grouping of these characters in a certain order (each Collation has a different order), where the “A” is a different character than the “a”, if the collation is case-sensitive (case sensitive) and the “A” is different from “á”, if the collation is Accent-Sensitive (accent differentiation).
COLLATION has three levels of hierarchy:
- Server
- Data base
- Column
If the database is created without specifying which collation to use, it will be created with the server's collation (operating system language). When a table is created without specifying the collation of text columns (VARCHAR, NVARCHAR, CHAR, etc.), the database collate will be used as the table collation.
In SQL Server, the name of the Collation follows the following naming pattern:
SQL_CollationDesignator_CaseSensitivity_AccentSensitivity_KanatypeSensitive_WidthSensitivity
Collation Example:
SQL_Latin1_General_CP1_CS_AS
Where:
- CollationDesignator: Specifies the basic grouping rules used by Windows grouping, where sorting rules are based on the alphabet or language.
- CaseSensitivity: CI specifies case-insensitive, CS specifies case-insensitive.
- AccentSensitivity: AI specifies that it does not differentiate accents, AS specifies that it differentiates accent.
- KanatypeSensitive: Omitted specifies not distinguishing kana characters, KS specifies distinguishing kana characters.
- WidthSensitivity: Omitted specifies that it does not distinguish width, WS specifies that it distinguishes width.
If a column is using a COLLATION case sensitive (CS), a query like SELECT * FROM Table WHERE Column LIKE '% Oracle%' will return the “Oracle” record, but it will not return the “oracle” record.
The same thing happens with a column using a COLLATION accent sensitive (AS). A query like SELECT * FROM Table WHERE Column LIKE '% JOÃO%' will return the record "João", but will not return the record "Joao".
For a complete list of collations by region and language, visit this link, remembering that the most used in Portuguese (Brazil) is SQL_Latin1_General_CP1_CI_AI (or SQL_Latin1_General_CP1_CS_AS).
References:
https://docs.microsoft.com/pt-br/sql/t-sql/statements/collations
https://docs.microsoft.com/pt-br/sql/relational-databases/collations/collation-and-unicode-support
Simulating the error
An easy way to simulate this error is by creating a table where two or more columns have different collations and trying to make a WHERE comparison between these columns, concatenating or creating two tables with different collation columns and trying to join between the 2 columns. .
Remember that this error only happens when comparing or manipulating two TEXT columns (VARCHAR, NVARCHAR, CHAR, etc.) of different collations.
1 2 3 4 5 6 7 8 9 10 11 12 |
IF (OBJECT_ID('tempdb..#Teste1') IS NOT NULL) DROP TABLE #Teste1 CREATE TABLE #Teste1 ( Id INT IDENTITY(1,1), Nome1 VARCHAR(100), Nome2 VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CS_AS ) INSERT INTO #Teste1 SELECT 'SQL Server', 'Oracle' SELECT A.Nome2 + ' é melhor que ' + A.Nome1 FROM #Teste1 A |
Performing the query without changing structure
A simple, quick and convenient way to get around this problem provisionally or when you don't have access to change the table / database structure is to use the COLLATE operator in select itself, which will convert all column data to a given collation and then you can compare and work with the data normally.
1 2 |
SELECT A.Nome1 COLLATE SQL_Latin1_General_CP1_CS_AS + ' é melhor que ' + A.Nome2 FROM #Teste1 A |
You can still use the DATABASE_DEFAULT collation on the two columns involved to ensure compatibility between them by converting the two columns to the same bank collation:
1 2 3 |
SELECT * FROM #Teste1 A LEFT JOIN #Teste1 B ON A.Nome1 COLLATE DATABASE_DEFAULT = B.Nome2 COLLATE DATABASE_DEFAULT |
This is a temporary and contour solution that should not be applied to large volumes of data since performance is not optimal as the entire column needs to be read and converted before working with the data.
How to identify database COLLATION
The first step in identifying a collation conflict between databases problem is to first identify the collation of the databases involved to try to understand if the problem is in this post. To identify the collation of a database, we can use the sp_helpdb command:
We can also use the sys.databases catalog view:
1 2 3 |
SELECT name, collation_name, compatibility_level FROM sys.databases WHERE name IN ('master', 'msdb', 'model', 'tempdb') |
And also use the DATABASEPROPERTYEX function:
1 |
SELECT DATABASEPROPERTYEX('master', 'Collation') |
How to identify the COLLATION of a column
After analyzing the collation of the databases involved and finding that they are already using the same encoding, let's now look at the columns involved.
There are several ways to perform this check, such as sys.columns:
1 2 3 4 |
SELECT A.name, A.column_id, B.name, A.max_length, A.[precision], A.scale, A.collation_name FROM msdb.sys.columns A JOIN msdb.sys.types B ON A.user_type_id = B.user_type_id WHERE [object_id] = OBJECT_ID('msdb.dbo.sysjobs') |
We can also get this information by analyzing the views of the INFORMATION_SCHEMA database:
1 2 3 |
SELECT COLUMN_NAME, ORDINAL_POSITION, DATA_TYPE, NUMERIC_PRECISION, NUMERIC_SCALE, COLLATION_NAME FROM msdb.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'sysjobs' |
How to change column COLLATION
A definitive solution to solving the reported collation problem is to change the column that has the character encoding different from the rest to standardize the table so that all columns use the same encoding.
To do this, simply use the command below:
1 2 |
ALTER TABLE #Teste1 ALTER COLUMN Nome2 VARCHAR(100) COLLATE DATABASE_DEFAULT |
With this command, we are changing the Name2 column of our example to the same database collation. If you need to, you can change the collation to one of your choice, ignoring the default database collation:
1 2 |
ALTER TABLE #Teste1 ALTER COLUMN Nome2 VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CI_AI |
How to change the COLLATION of Database
Another definitive solution to this collation conflict problem is to change the default collation of the database.
This is especially useful when columns have no collation definition (using the database default) and you are joining and / or string manipulating between different database columns with different collations and are experiencing this problem.
Example command to change database collation:
1 |
ALTER DATABASE MeuBanco COLLATE SQL_Latin1_General_CP1_CS_AS |
And that's it folks!
I hope you enjoyed the post!
Any questions, just leave your comment
Hug and see you next time!