Click on the banner to learn about and purchase my database training on Azure

SQL Server - How to standardize the Collation of all columns in the database

Views: 15.970 views
Reading Time: 15 minutes

Hello everybody!
All quiet?

In this article, I would like to share with you a problem I had recently, which several columns from various tables in a given database used a different collation from the DB standard, causing joins and WHERE conditions between VARCHAR / CHAR / columns. NVARCHAR with different collations, the bank returns us the following error message:

Cannot resolve the collation conflict between “SQL_Latin1_General_CP1_CI_AI” and “SQL_Latin1_General_CP1_CI_AS” in the equal to operation.

What is a COLLATION?

Collation is nothing more than the form of character encoding that a database uses to interpret them.

A Collation is a grouping of these characters in a certain order (each Collation has a different order), where the “A” is a different character than the “a”, if the collation is case-sensitive (case sensitive) and the “A” is different from “á”, if the collation is Accent-Sensitive (accent differentiation).

COLLATION has three levels of hierarchy:
- Server
- Data base
- Column

If the database is created without specifying which collation to use, it will be created with the server's collation (operating system language). When a table is created without specifying the collation of text columns (VARCHAR, NVARCHAR, CHAR, etc.), the database collate will be used as the table collation.

In SQL Server, the name of the Collation follows the following naming pattern:
SQL_CollationDesignator_CaseSensitivity_AccentSensitivity_KanatypeSensitive_WidthSensitivity

Collation Example:
SQL_Latin1_General_CP1_CS_AS

Where:

  • CollationDesignator: Specifies the basic grouping rules used by Windows grouping, where sorting rules are based on the alphabet or language.
  • CaseSensitivity: CI specifies case-insensitive, CS specifies case-insensitive.
  • AccentSensitivity: AI specifies that it does not differentiate accents, AS specifies that it differentiates accent.
  • KanatypeSensitive: Omitted specifies not distinguishing kana characters, KS specifies distinguishing kana characters.
  • WidthSensitivity: Omitted specifies that it does not distinguish width, WS specifies that it distinguishes width.

If a column is using a COLLATION case sensitive (CS), a query like SELECT * FROM Table WHERE Column LIKE '% Oracle%' will return the “Oracle” record, but it will not return the “oracle” record.

The same thing happens with a column using a COLLATION accent sensitive (AS). A query like SELECT * FROM Table WHERE Column LIKE '% JOÃO%' will return the record "João", but will not return the record "Joao".

For a complete list of collations by region and language, visit this link, remembering that the most used in Portuguese (Brazil) is SQL_Latin1_General_CP1_CI_AI (or SQL_Latin1_General_CP1_CS_AS).

To learn more about what Collation is, take a look at the article. SQL Server - Cannot resolve the collation conflict between… in the same as operation..

How to identify the default collation?

To identify the default collation of a database, simply use the query below:

Result:

To identify your instance's default collation, just use the command below:

Result:

Test base generation

To standardize the examples and help with the practical understanding of this article, I will use a simple script to generate sample data (tests):

How to standardize the collation of all columns?

If you want to standardize the collation of all columns in a database so that they use a specific collation or database standard, I developed a script that will identify all non-standard columns and execute an ALTER TABLE command. to apply the change.

Since there are dependencies (constraints, indexes, and foreign keys), the script will drop the identified dependent objects, apply the changes, and recreate those objects. Because of this, I strongly recommend that you test well before running this script in production. Preferably make a structure backup first.

Script to change the collation of all columns (and dependencies) that have a different collation than database
View source

Script to change the collation of all columns (and dependencies) to the desired collation:
View source

Result:

If the script identifies many columns to change and PRINT output is getting cut off because of the size of the generated strings, you can use the fncSplitTexto function, which I made available in the post. How to wrap a string in a substring table using a delimiter in SQL Server and replace the end of the script with this snippet here:

Result:

I hope you enjoyed this post and see you next time!
PS: Remember to back up and test ENOUGH if you plan to apply this script in Production.

If you have questions or problems, leave it here in the comments.