Hello everybody!
All quiet?
In this article, I would like to share with you a problem I had recently, which several columns from various tables in a given database used a different collation from the DB standard, causing joins and WHERE conditions between VARCHAR / CHAR / columns. NVARCHAR with different collations, the bank returns us the following error message:
What is a COLLATION?
Collation is nothing more than the form of character encoding that a database uses to interpret them.
A Collation is a grouping of these characters in a certain order (each Collation has a different order), where the “A” is a different character than the “a”, if the collation is case-sensitive (case sensitive) and the “A” is different from “á”, if the collation is Accent-Sensitive (accent differentiation).
COLLATION has three levels of hierarchy:
- Server
- Data base
- Column
If the database is created without specifying which collation to use, it will be created with the server's collation (operating system language). When a table is created without specifying the collation of text columns (VARCHAR, NVARCHAR, CHAR, etc.), the database collate will be used as the table collation.
In SQL Server, the name of the Collation follows the following naming pattern:
SQL_CollationDesignator_CaseSensitivity_AccentSensitivity_KanatypeSensitive_WidthSensitivity
Collation Example:
SQL_Latin1_General_CP1_CS_AS
Where:
- CollationDesignator: Specifies the basic grouping rules used by Windows grouping, where sorting rules are based on the alphabet or language.
- CaseSensitivity: CI specifies case-insensitive, CS specifies case-insensitive.
- AccentSensitivity: AI specifies that it does not differentiate accents, AS specifies that it differentiates accent.
- KanatypeSensitive: Omitted specifies not distinguishing kana characters, KS specifies distinguishing kana characters.
- WidthSensitivity: Omitted specifies that it does not distinguish width, WS specifies that it distinguishes width.
If a column is using a COLLATION case sensitive (CS), a query like SELECT * FROM Table WHERE Column LIKE '% Oracle%' will return the “Oracle” record, but it will not return the “oracle” record.
The same thing happens with a column using a COLLATION accent sensitive (AS). A query like SELECT * FROM Table WHERE Column LIKE '% JOÃO%' will return the record "João", but will not return the record "Joao".
For a complete list of collations by region and language, visit this link, remembering that the most used in Portuguese (Brazil) is SQL_Latin1_General_CP1_CI_AI (or SQL_Latin1_General_CP1_CS_AS).
To learn more about what Collation is, take a look at the article. SQL Server - Cannot resolve the collation conflict between… in the same as operation..
How to identify the default collation?
To identify the default collation of a database, simply use the query below:
1 2 3 4 5 6 | USE [dirceuresende] GO SELECT collation_name FROM sys.databases WHERE database_id = DB_ID() |
To identify your instance's default collation, just use the command below:
1 | SELECT SERVERPROPERTY('Collation') |
Test base generation
To standardize the examples and help with the practical understanding of this article, I will use a simple script to generate sample data (tests):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ------------------------------------------------------------------------ -- Geração dos dados para teste ------------------------------------------------------------------------ CREATE TABLE dbo.Cliente ( CPF VARCHAR(11) COLLATE Latin1_General_CS_AS NOT NULL PRIMARY KEY CLUSTERED, Nome VARCHAR(50) ) INSERT INTO dbo.Cliente ( CPF, Nome ) VALUES ( '12345678909', 'Joãozinho' ) CREATE NONCLUSTERED INDEX SK01 ON dbo.Cliente(CPF) INCLUDE(Nome) CREATE TABLE dbo.Pedido ( Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, Dt_Pedido DATETIME DEFAULT GETDATE(), Vl_Pedido MONEY, CPF_Cliente VARCHAR(11) COLLATE Latin1_General_CS_AS CONSTRAINT [FK_Cliente] FOREIGN KEY(CPF_Cliente) REFERENCES dbo.Cliente(CPF) NOT NULL ) INSERT INTO dbo.Pedido ( Vl_Pedido, CPF_Cliente ) VALUES ( 9.99, -- Vl_Pedido - money '12345678909' -- Id_Cliente - int ) CREATE NONCLUSTERED INDEX SK02 ON dbo.Pedido(Id, Dt_Pedido, CPF_Cliente) |
How to standardize the collation of all columns?
If you want to standardize the collation of all columns in a database so that they use a specific collation or database standard, I developed a script that will identify all non-standard columns and execute an ALTER TABLE command. to apply the change.
Since there are dependencies (constraints, indexes, and foreign keys), the script will drop the identified dependent objects, apply the changes, and recreate those objects. Because of this, I strongly recommend that you test well before running this script in production. Preferably make a structure backup first.
Script to change the collation of all columns (and dependencies) that have a different collation than database
View source
Script to change the collation of all columns (and dependencies) to the desired collation:
View source
If the script identifies many columns to change and PRINT output is getting cut off because of the size of the generated strings, you can use the fncSplitTexto function, which I made available in the post. How to wrap a string in a substring table using a delimiter in SQL Server and replace the end of the script with this snippet here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ------------------------------------------------------------------------- -- Executa os scripts ------------------------------------------------------------------------- PRINT '---------- Dropando as contraints FK' IF (NULLIF(LTRIM(RTRIM(@CmdDropFK)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdDropFK, ';') IF (@Fl_Debug = 0) EXEC(@CmdDropFK) PRINT '---------- Desativando as Check Contraints' IF (NULLIF(LTRIM(RTRIM(@CmdDisableCK)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdDisableCK, ';') IF (@Fl_Debug = 0) EXEC(@CmdDisableCK) PRINT '---------- Dropando os índices' IF (NULLIF(LTRIM(RTRIM(@CmdDropIndex)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdDropIndex, ';') IF (@Fl_Debug = 0) EXEC(@CmdDropIndex) PRINT '---------- Dropando os índices em constraints' IF (NULLIF(LTRIM(RTRIM(@CmdDropIndexConstraint)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdDropIndexConstraint, ';') IF (@Fl_Debug = 0) EXEC(@CmdDropIndexConstraint) PRINT '---------- Alterando a tabela' IF (NULLIF(LTRIM(RTRIM(@CmdAlterTable)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdAlterTable, ';') IF (@Fl_Debug = 0) EXEC(@CmdAlterTable) PRINT '---------- Reativando as constraints' IF (NULLIF(LTRIM(RTRIM(@CmdEnableCK)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdEnableCK, ';') IF (@Fl_Debug = 0) EXEC(@CmdEnableCK) PRINT '---------- Recriando os índices' IF (NULLIF(LTRIM(RTRIM(@CmdCreateIndex)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdCreateIndex, ';') IF (@Fl_Debug = 0) EXEC(@CmdCreateIndex) PRINT '---------- Recriando os índices em constraints' IF (NULLIF(LTRIM(RTRIM(@CmdCreateIndexConstraint)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdCreateIndexConstraint, ';') IF (@Fl_Debug = 0) EXEC(@CmdCreateIndexConstraint) PRINT '---------- Recriando as FKs' IF (NULLIF(LTRIM(RTRIM(@CmdCreateFK)), '') IS NOT NULL) SELECT * FROM dirceuresende.dbo.fncSplitTexto(@CmdCreateFK, ';') IF (@Fl_Debug = 0) EXEC(@CmdCreateFK) |