SQL Server - The Danger of Using JOIN Between Columns of Different Data Types

Views: 420
Reading Time: 3 minutes

Hello everybody!
Good night!

Today I came across a critical performance issue in a production environment, where a particular query (which can be executed multiple times per second) was experiencing an already known and old slow-running problem (taking between 21 and 30 seconds per run), but today was the day to settle this once and for all.

For many DBA's (including me) and database modeling enthusiasts, such problems should never occur, but unfortunately in practice this is not the case. Analyzing the query, which by the way was very large, with several CASES, LEFT JOIN'S, OR's, use of function, etc. I could see that there were many cases of implicit conversion and when analyzing the actual execution plan, it was very clear the reason for the slowness. :

SQL Server - JOIN Predicate Columns Different DataTypes 2

Analyzing the readings on disk, we can see that the readings in a specific table are very high, especially considering that the query uses in the WHERE clause an equality term in the main table of the query, informing a code that is primary key and is in the clustered index, that is, it should be an extremely fast and optimized query processing few records.

SQL Server - JOIN Predicate Columns Different DataTypes 3

Fix that there is a warning in select. And by hovering over it, we can see this information:
SQL Server - Differential Datatypes Performance Problems 2

Well, we have the implicit conversion problem going on. This occurs when two columns of different data types are compared and then the bank needs to convert this manually during the query.

Looking at the query JOINS, I could see that the columns involved in the comparison were of different types. This means that all records involved in JOIN must be converted to the same type, and then check whether they should be restricted by clauses in JOIN or not. If the volume of records is too high, this can greatly increase processing time and disk reads (as shown in the example).

A simple ALTER TABLE command on the column that was JOINING the table that had too many readings to match the data types solved the problem:

SQL Server - JOIN Predicate Columns Different DataTypes 1

SQL Server - JOIN Predicate Columns Different DataTypes 4

Another solution would be to create an intermediate table containing a part of the query, and converting the column already to the correct type. With this intermediate table would be JOIN with the rest of the query, and now, with the same data type, there was no longer the implicit conversion problem and the query would be running more optimally.

That's it folks!
Regards and see you next post!