SQL Server - Performance Tuning Tips: Implicit Conversion? NEVER!

Views: 2.738
This post is the 4 part of 8 in the series. Performance tuning
Reading Time: 11 minutes

Hey guys!
All in peace, right ?!

In this article I would like to comment on a query performance issue that we encountered a lot here in our day to day Fabrício Lima - Data Solutions, one of the best and most recognized Performance Tuning companies in Brazil. We are talking about something that is often terribly simple to solve and inexplicably and extremely common, implicit conversion.

For the examples in this article, I will use the following structure from the Orders table:

What is implicit conversion?

Implicit conversion occurs when SQL Server needs to convert the value of one or more columns or variables to another data type for comparison, concatenation, or other operation with other columns or variables because SQL Server cannot compare a column of type varchar with another of type int, for example, if it did not convert one of the columns so that both have the same data type.

When we talk about conversion, we have 2 types:

  • Explicit: Occurs when the query developer himself converts data using functions such as CAST, CONVERT, etc.

  • Implied: Occurs when SQL Server is forced to convert the data type between columns or variables internally because they have been declared with different types.

Since this conversion is applied to all the records of the columns involved, this operation can be very costly and greatly hamper query execution, since even if there is an index for these columns, the query optimizer will eventually use the Scan operator at the same time. instead of Seek, not using this index to the best of its ability. As a result, the runtime and the number of logical reads can be greatly increased.

What is the impact of conversion on my query?

As I commented above, when it is necessary to compare data with different types, either the query developer must do the explicit conversion using CAST / CONVERT, or SQL Server will have to do the implicit conversion internally to equalize the data types.

But will this really make any significant difference in my query? Let's analyze ..

1 Test - Using Same Data Type Between Column and Variable

In this first test, we will use the correct way to write the queries. The column data type (varchar) is the same as the literal variable (varchar) and no data conversion occurs. As a result, the query will be executed in 0ms, with only 6 logical reads on disk.

2 Test - Implicit Conversion

In the second test, I will use a simple query, without data type conversions. Because the Ds_Request column is of type varchar and the literal value 19870 is of integer type, SQL Server will have to do the (implicit) conversion of this data.

3 Test - Converting Data Types (Explicit Conversion)

In this last test, I'll use a CAST function to apply explicit casting and evaluate how the query behaved relative to the other two queries.

Implicit conversion only occurs between string and numbers?

Not really. I will demonstrate some examples of conversion between strings and strings, but can occur between strings and dates, uniqueidentifier and strings, etc.

VARCHAR and NVARCHAR

A very common mistake we see in everyday life is the implicit conversion between varchar and nvarchar values ​​/ columns. This happens a lot in applications that use ORM's, such as Entity Framework.

Many people find that there is no need for conversion between varchar and nvarchar, but let's look at the example below that it does:

Another example, using a variable of type NVARCHAR (10):

Implicit conversion does not occur between numbers with different types?

More or less. Implicit conversion may occur on the Compute Scalar operator, but does not prevent the use of the Seek operator. Because the numbers are all in the same family, SQL Server can natively compare numbers with different data types, such as INT vs BIGINT, or BIGINT vs SMALLINT, for example.

Following a tip from the big Jose says, I became aware that when using some expression with numbers of different data types, we don't see a warning in the execution plan demonstrating implicit conversion, nor is the Scan operator used instead of Seek, but implicit conversion occurs yes, in the Compute Scalar operator, as I will demonstrate below:

Before analyzing implicit conversion, let's create a new index to avoid this Key Lookup operator, in a technique known as “Covering Index”:

To learn more about the Covering Index and understand why I created the index this way, read the article Understanding Index Functioning in SQL Server

Looking back at the execution plan, we see that the Key Lookup operator has exited the plan, the new index is being used with the Seek operator, and has no implicit conversion warning. Where is she ?

By viewing the properties of the Compute Scalar operator, we can identify that SQL Server has implicitly converted to multiply the Value (numeric) column by the Quantity (int) column:

No artigo Implicit Conversions that cause Index Scans, I Jonathan Kehayias, he ran a series of tests across various data types, and the result of this study is the table below, which shows which intersections between data types cause the Scan event rather than the Seek event to be compared:

Na official SQL Server documentation, we can find the table below, which illustrates which intersections between data types that generate implicit conversion, which need to use explicit conversion, and which conversions are not possible:

Does implicit conversion occur in JOIN too?

In any operation or expression comparison with different data types, implicit conversion can occur (according to the rules seen above), either in SELECT, WHERE, JOIN, CROSS APPLY, etc., as I will demonstrate below.

I created a table named Orders2, with the same structure and data as the Orders table. After that, I performed an ALTER TABLE operation to modify the data type of the column Ds_Order for NVARCHAR (10) and with this we have the following example:

Note that there was an implicit conversion between the columns Ds_Order, since in the Orders table it is of type VARCHAR and in the Orders2 table it is of type NVARCHAR. Because of this, instead of using the Index Seek operation, Index Scan was used to read data from the Orders table.

If we look at the readings of the 2 tables, we see a stark difference because of the implicit conversion:

In this case, we can see a SERIOUS data modeling error, which allowed two tables that have relationships with each other to use different data types. To solve this performance problem in this scenario, and to get the most out of this query, let's change the column type Ds_Order in the Orders table to nvarchar (10), the same type as the Orders2 table, since the filter used in WHERE is from NVARCHAR type:

But then we encounter this error while trying to change the data type of the column:

Msg 5074, Level 16, State 1, Line 8
The index 'SK03_Request' is dependent on column 'Ds_Request'.
Msg 4922, Level 16, State 9, Line 8
ALTER TABLE ALTER COLUMN Ds_Request failed because one or more objects access this column.

NOTICE: Because this column is indexed, we cannot change the type. This will require us to erase the index, erase possible foreign key keys, make the column change, and then recreate the index and / or foreign key. That is, it is not that simple to solve this kind of problem, especially when it comes to the production environment, where changing the type of a column or creating an index can generate multiple locks.

Not to mention that there may be other relationships between this table and others using this column, which work well today, and may start to have an implicit conversion problem when changing the data type. Like everything else in performance, making this kind of correction requires validation, analysis and a lot of testing!

If you need a script to identify and recreate Foreign Keys that reference a table, read my post. How to identify, delete, and recreate Foreign Keys (FK) from a table in SQL Server.

In this case above, let's assume this column to be of type VARCHAR was a modeling error and let's fix the problem. To do this, I will use the T-SQL commands below to delete the index, type change and create the index again:

After matching the data type between the two columns, let's repeat the previous query and analyze the results:

Now the execution plan has been excellent. We eliminated implicit conversion and are using Seek operators in the indexes. Let's see how the runtime and logical readings went:

0ms runtime and only 3 logical reads. Great!! Optimized query.

But what about cases where we can't perform the ALTER TABLE command due to other relationships that already exist? What alternatives do we have for this?
A: There are several solutions, but one that I really like is the use of calculated and indexed columns, which have low impact to the application (although it may have a YES impact, especially on INSERT operations) and are often very effective and practical as Whenever the original column is changed, the calculated column is automatically updated as well (as do indices that reference the calculated column, if any). But remember: TEST BEFORE IMPLEMENTING!

I will demonstrate how you can implement this solution in the example above:

When you create this calculated column, it will not take up anything in your bank because it is calculated in real time. Only the created index that will take up space and it will bring the performance gain to this solution:

And if we analyze our execution plan using the new calculated column, we see that it is without implicit conversion and doing Seek operation, just as I demonstrated what it would be like if the columns were of the same type:

How does SQL Server choose which type to convert?

This is an excellent question. How can we consult from the page Data Type Precedencefrom the Microsoft documentation, when an operator combines two expressions of different data types, data type precedence rules specify that the data type with the lowest precedence is converted to the data type with the highest precedence. If the conversion is not a supported implicit conversion, an error is returned.

SQL Server uses the following precedence order for data types:

  1. custom user data types (highest level)
  2. sql_variant
  3. xml
  4. datetimeoffset
  5. datetime2
  6. datetime
  7. smalldatetime
  8. data
  9. time
  10. float
  11. real
  12. decimal
  13. money
  14. smallmoney
  15. bigint
  16. int
  17. smallint
  18. tinyint
  19. bit
  20. ntext
  21. text
  22. image
  23. timestamp
  24. uniqueidentifier
  25. nvarchar (including nvarchar (max))
  26. nchar
  27. varchar (including varchar (max))
  28. tank
  29. varbinary (including varbinary (max))
  30. binary (lowest level)

That is why in the example above, when comparing a varchar column with an nvarchar expression, the column was converted to nvarchar instead of the inverse (which even made more sense to convert a fixed value than an entire column).

How to identify implicit conversions in your environment

As I said earlier, implicit conversion operations are very common in SQL Server environments, so I will share two ways to identify the occurrence of these events in your environment.

1 Method - Plan Cache DMV's

Using the script below, you can identify which CPU-intensive queries are implicitly converted through SQL Server DMV's. You do not need to enable any options or create any objects as these queries are native and automatically collected by default.

Result:

2 Method - Extended Events (XE)

Using the script below, you can capture implicit conversion events generated through Extended Events events.

The advantage of this solution over the plancache query is that the data is stored permanently, as the plan cache is “truncated” whenever the SQL Server service is restarted, not all queries are stored there and when, The storage is temporary. Also, if you start the service using the -x parameter, multiple DMV's, such as dm_exec_query_stats, are not populated.

The disadvantage is that you need to create objects in the database (Job, XE, table), generating a much larger job to get this information, which will only be collected from the creation of these controls. Events that have occurred in the past will not be identified.

XE Script:

And after you create this XE, you can use the script below to collect the data and write it to a history table.

And now just create 1 job to collect this data periodically. To access the collected data, simply refer to the newly created table to analyze the implicit conversion occurrences in your environment:

Other articles on Implicit Conversion

Want other views and examples on this topic? Check out some articles by other authors that I separated for you:

Well guys, I hope you enjoyed this post, really understood the dangers of explicit and implicit conversion and don't let that happen any more in your queries. The DBA thanks you.
Big hug and until next time.