- SQL Server - Introduction to Performance Tuning Study
- Understanding Index Functioning in SQL Server
- SQL Server - How to Identify a Slow or “Heavy” Query in Your Database
- SQL Server - Performance Tuning Tips: Implicit Conversion? NEVER!
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- SQL Server - Performance Tuning Tips: What is the Difference Between Seek Predicate and Predicate?
- SQL Server - Using Calculated Columns (or Computed Columns) for Performance Tuning
- SQL Server - How to identify and collect time consuming query information using Extended Events (XE)
- SQL Server - How to identify all missing indexes (Missing indexes) of a database
- SQL Server and Azure SQL Database: How to Identify Key Lookup occurrences through the plancache
Speak guys!
All in peace, right ?!
In this article I would like to comment on a query performance issue that we encountered a lot here in our day to day Fabrício Lima - Data Solutions, one of the best and most recognized Performance Tuning companies in Brazil. We are talking about something that is often terribly simple to solve and inexplicably and extremely common, implicit conversion.
For the examples in this article, I will use the following structure from the Orders table:
1 2 3 4 5 6 7 8 9 10 11 12 | CREATE TABLE dbo.Pedidos ( Id_Pedido INT IDENTITY(1,1), Dt_Pedido DATETIME, [Status] INT, Quantidade INT, Ds_Pedido VARCHAR(10), Valor NUMERIC(18, 2) ) CREATE CLUSTERED INDEX SK01_Pedidos ON dbo.Pedidos(Id_Pedido) CREATE NONCLUSTERED INDEX SK02_Pedidos ON dbo.Pedidos (Ds_Pedido) GO |
What is implicit conversion?
Implicit conversion occurs when SQL Server needs to convert the value of one or more columns or variables to another data type for comparison, concatenation, or other operation with other columns or variables because SQL Server cannot compare a column of type varchar with another of type int, for example, if it did not convert one of the columns so that both have the same data type.
When we talk about conversion, we have 2 types:
- Explicit: Occurs when the query developer himself converts data using functions such as CAST, CONVERT, etc.
- Implied: Occurs when SQL Server is forced to convert the data type between columns or variables internally because they have been declared with different types.
Since this conversion is applied to all the records of the columns involved, this operation can be very costly and greatly hamper query execution, since even if there is an index for these columns, the query optimizer will eventually use the Scan operator at the same time. instead of Seek, not using this index to the best of its ability. As a result, the runtime and the number of logical reads can be greatly increased.
What is the impact of conversion on my query?
As I commented above, when it is necessary to compare data with different types, either the query developer must do the explicit conversion using CAST / CONVERT, or SQL Server will have to do the implicit conversion internally to equalize the data types.
But will this really make any significant difference in my query? Let's analyze ..
Test 1 - Using the same type of data between the column and the variable
In this first test, we will use the correct way to write the queries. The column data type (varchar) is the same as the literal variable (varchar) and no data conversion occurs. As a result, the query will be executed in 0ms, with only 6 logical reads on disk.
Test 2 - Implicit conversion
In the second test, I will use a simple query, without data type conversions. Because the Ds_Request column is of type varchar and the literal value 19870 is of integer type, SQL Server will have to do the (implicit) conversion of this data.
Test 3 - Converting data types (Explicit conversion)
In this last test, I'll use a CAST function to apply explicit casting and evaluate how the query behaved relative to the other two queries.
Implicit conversion only occurs between string and numbers?
Not really. I will demonstrate some examples of conversion between strings and strings, but can occur between strings and dates, uniqueidentifier and strings, etc.
VARCHAR and NVARCHAR
A very common mistake that we see on a daily basis is the occurrence of implicit conversion between values / columns varchar and nvarchar. This occurs a lot in applications that use ORM's, such as the Entity Framework.
Many people find that there is no need for conversion between varchar and nvarchar, but let's look at the example below that it does:
Another example, using a variable of type NVARCHAR (10):
Implicit conversion does not occur between numbers with different types?
More or less. Implicit conversion may occur on the Compute Scalar operator, but does not prevent the use of the Seek operator. Because the numbers are all in the same family, SQL Server can natively compare numbers with different data types, such as INT vs BIGINT, or BIGINT vs SMALLINT, for example.
Following a tip from the big Jose says, I became aware that when using some expression with numbers of different data types, we don't see a warning in the execution plan demonstrating implicit conversion, nor is the Scan operator used instead of Seek, but implicit conversion occurs yes, in the Compute Scalar operator, as I will demonstrate below:
Before analyzing the implicit conversion, let's create a new index to avoid this Key Lookup operator, in a technique known as “Covering the index” (Covering index):
1 | CREATE NONCLUSTERED INDEX SK05_Pedidos ON dbo.Pedidos (Ds_Pedido) INCLUDE(Quantidade, Valor) |
Looking back at the execution plan, we see that the Key Lookup operator has exited the plan, the new index is being used with the Seek operator, and has no implicit conversion warning. Where is she ?
By viewing the properties of the Compute Scalar operator, we can identify that SQL Server has implicitly converted to multiply the Value (numeric) column by the Quantity (int) column:
No artigo Implicit Conversions that cause Index Scans. Jonathan Kehayias, he ran a series of tests across various data types, and the result of this study is the table below, which shows which intersections between data types cause the Scan event rather than the Seek event to be compared:
Na official SQL Server documentation, we can find the table below, which illustrates which intersections between data types that generate implicit conversion, which need to use explicit conversion, and which conversions are not possible:
Does implicit conversion occur in JOIN too?
In any operation or expression comparison with different data types, implicit conversion can occur (according to the rules seen above), either in SELECT, WHERE, JOIN, CROSS APPLY, etc., as I will demonstrate below.
I created a table named Orders2, with the same structure and data as the Orders table. After that, I performed an ALTER TABLE operation to modify the data type of the column Ds_Order for NVARCHAR (10) and with this we have the following example:
Note that there was an implicit conversion between the columns Ds_Order, since in the Orders table it is of type VARCHAR and in the Orders2 table it is of type NVARCHAR. Because of this, instead of using the Index Seek operation, Index Scan was used to read data from the Orders table.
If we look at the readings of the 2 tables, we see a stark difference because of the implicit conversion:
In this case, we can see a SERIOUS data modeling error, which allowed two tables that have relationships with each other to use different data types. To solve this performance problem in this scenario, and to get the most out of this query, let's change the column type Ds_Order in the Orders table to nvarchar (10), the same type as the Orders2 table, since the filter used in WHERE is from NVARCHAR type:
1 | ALTER TABLE Pedidos ALTER COLUMN Ds_Pedido nvarchar(10) |
But then we encounter this error while trying to change the data type of the column:
Msg 5074, Level 16, State 1, Line 8
The index 'SK03_Pedidos' is dependent on column 'Ds_Pedido'.
Msg 4922, Level 16, State 9, Line 8
ALTER TABLE ALTER COLUMN Ds_Request failed because one or more objects access this column.
Not to mention that there may be other relationships between this table and others using this column, which work well today, and may start to have an implicit conversion problem when changing the data type. Like everything else in performance, making this kind of correction requires validation, analysis and a lot of testing!
If you need a script to identify and recreate Foreign Keys that reference a table, read my post. How to identify, delete, and recreate Foreign Keys (FK) from a table in SQL Server.
In this case above, let's assume this column to be of type VARCHAR was a modeling error and let's fix the problem. To do this, I will use the T-SQL commands below to delete the index, type change and create the index again:
1 2 3 4 5 6 7 8 | DROP INDEX SK03_Pedidos ON dbo.Pedidos GO ALTER TABLE Pedidos ALTER COLUMN Ds_Pedido nvarchar(10) GO CREATE NONCLUSTERED INDEX [SK03_Pedidos] ON [dbo].[Pedidos] ([Ds_Pedido]) GO |
After matching the data type between the two columns, let's repeat the previous query and analyze the results:
Now the execution plan has been excellent. We eliminated implicit conversion and are using Seek operators in the indexes. Let's see how the runtime and logical readings went:
0ms runtime and only 3 logical reads. Great!! Optimized query.
But what about cases where we can't perform the ALTER TABLE command due to other relationships that already exist? What alternatives do we have for this?
A: There are several solutions, but one that I really like is the use of calculated and indexed columns, which have low impact to the application (although it may have a YES impact, especially on INSERT operations) and are often very effective and practical as Whenever the original column is changed, the calculated column is automatically updated as well (as do indices that reference the calculated column, if any). But remember: TEST BEFORE IMPLEMENTING!
I will demonstrate how you can implement this solution in the example above:
1 2 3 4 5 | ALTER TABLE dbo.Pedidos ADD Ds_Pedido_NVARCHAR AS (CONVERT(NVARCHAR(10), Ds_Pedido)) GO CREATE NONCLUSTERED INDEX SK04_Pedidos ON dbo.Pedidos(Ds_Pedido_NVARCHAR) GO |
When you create this calculated column, it will not take up anything in your bank because it is calculated in real time. Only the created index that will take up space and it will bring the performance gain to this solution:
And if we analyze our execution plan using the new calculated column, we see that it is without implicit conversion and doing Seek operation, just as I demonstrated what it would be like if the columns were of the same type:
How does SQL Server choose which type to convert?
This is an excellent question. How can we consult from the page Data Type Precedencefrom the Microsoft documentation, when an operator combines two expressions of different data types, data type precedence rules specify that the data type with the lowest precedence is converted to the data type with the highest precedence. If the conversion is not a supported implicit conversion, an error is returned.
SQL Server uses the following precedence order for data types:
- custom user data types (highest level)
- sql_variant
- xml
- datetimeoffset
- datetime2
- datetime
- smalldatetime
- data
- team
- float
- real
- decimal
- money
- smallmoney
- bigint
- int
- smallint
- tinyint
- bit
- ntext
- text
- image
- timestamp
- uniqueidentifier
- nvarchar (including nvarchar (max))
- nchar
- varchar (including varchar (max))
- tank
- varbinary (including varbinary (max))
- binary (lowest level)
That is why in the example above, when comparing a varchar column with an nvarchar expression, the column was converted to nvarchar instead of the inverse (which even made more sense to convert a fixed value than an entire column).
How to identify implicit conversions in your environment
As I said earlier, implicit conversion operations are very common in SQL Server environments, so I will share two ways to identify the occurrence of these events in your environment.
Method 1 - Plan cache DMV's
Using the script below, you can identify the queries that consumed the most CPU and have implicit conversion through SQL Server's DMV's. There is no need to activate any options or create any objects, as these queries are native and automatically collected by default.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | SELECT TOP ( 100 ) DB_NAME(B.[dbid]) AS [Database], B.[text] AS [Consulta], A.total_worker_time AS [Total Worker Time], A.total_worker_time / A.execution_count AS [Avg Worker Time], A.max_worker_time AS [Max Worker Time], A.total_elapsed_time / A.execution_count AS [Avg Elapsed Time], A.max_elapsed_time AS [Max Elapsed Time], A.total_logical_reads / A.execution_count AS [Avg Logical Reads], A.max_logical_reads AS [Max Logical Reads], A.execution_count AS [Execution Count], A.creation_time AS [Creation Time], C.query_plan AS [Query Plan] FROM sys.dm_exec_query_stats AS A WITH ( NOLOCK ) CROSS APPLY sys.dm_exec_sql_text(A.plan_handle) AS B CROSS APPLY sys.dm_exec_query_plan(A.plan_handle) AS C WHERE CAST(C.query_plan AS NVARCHAR(MAX)) LIKE ( '%CONVERT_IMPLICIT%' ) AND B.[dbid] = DB_ID() AND B.[text] NOT LIKE '%sys.dm_exec_sql_text%' -- Não pegar a própria consulta ORDER BY A.total_worker_time DESC |
Method 2 - Extended Events (XE)
Using the script below, you can capture implicit conversion events generated through Extended Events events.
The advantage of this solution over the query to the plankache is that the data is permanently stored, since the plan cache is "truncated" whenever the SQL Server service is restarted, not all queries are stored there and when they are, storage is temporary. In addition, if you start the service using the -x parameter, several DMV's, such as dm_exec_query_stats, are not populated.
The disadvantage is that you need to create objects in the database (Job, XE, table), generating a much larger job to get this information, which will only be collected from the creation of these controls. Events that have occurred in the past will not be identified.
XE Script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | IF (EXISTS(SELECT NULL FROM sys.dm_xe_sessions WHERE [name] = 'Conversão Implícita')) DROP EVENT SESSION [Conversão Implícita] ON SERVER GO CREATE EVENT SESSION [Conversão Implícita] ON SERVER ADD EVENT sqlserver.plan_affecting_convert ( ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_name, sqlserver.username, sqlserver.sql_text ) WHERE ( [convert_issue] = 2 -- 1 = Cardinality Estimate / 2 = Seek Plan ) ) ADD TARGET package0.event_file ( SET filename = N'C:\Traces\Conversão Implícita', max_file_size = ( 50 ), max_rollover_files = ( 16 ) ) GO ALTER EVENT SESSION [Conversão implícita] ON SERVER STATE = START GO |
And after you create this XE, you can use the script below to collect the data and write it to a history table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | IF (OBJECT_ID('dbo.Historico_Conversao_Implicita') IS NULL) BEGIN -- DROP TABLE dbo.Historico_Conversao_Implicita CREATE TABLE dbo.Historico_Conversao_Implicita ( Dt_Evento DATETIME, [database_name] VARCHAR(100), username VARCHAR(100), client_hostname VARCHAR(100), client_app_name VARCHAR(100), [convert_issue] VARCHAR(50), [expression] VARCHAR(MAX), sql_text XML ) CREATE CLUSTERED INDEX SK01_Historico_Erros ON dbo.Historico_Conversao_Implicita(Dt_Evento) END DECLARE @TimeZone INT = DATEDIFF(HOUR, GETUTCDATE(), GETDATE()) DECLARE @Dt_Ultimo_Evento DATETIME = ISNULL((SELECT MAX(Dt_Evento) FROM dbo.Historico_Conversao_Implicita WITH(NOLOCK)), '1990-01-01') IF (OBJECT_ID('tempdb..#Eventos') IS NOT NULL) DROP TABLE #Eventos ;WITH CTE AS ( SELECT CONVERT(XML, event_data) AS event_data FROM sys.fn_xe_file_target_read_file(N'C:\Traces\Conversão Implícita*.xel', NULL, NULL, NULL) ) SELECT DATEADD(HOUR, @TimeZone, CTE.event_data.value('(//event/@timestamp)[1]', 'datetime')) AS Dt_Evento, CTE.event_data INTO #Eventos FROM CTE WHERE DATEADD(HOUR, @TimeZone, CTE.event_data.value('(//event/@timestamp)[1]', 'datetime')) > @Dt_Ultimo_Evento SET QUOTED_IDENTIFIER ON INSERT INTO dbo.Historico_Conversao_Implicita SELECT A.Dt_Evento, A.[database_name], A.username, A.client_hostname, A.client_app_name, A.convert_issue, A.expression, TRY_CAST(A.sql_text AS XML) AS sql_text FROM ( SELECT DISTINCT A.Dt_Evento, xed.event_data.value('(action[@name="database_name"]/value)[1]', 'varchar(100)') AS [database_name], xed.event_data.value('(action[@name="username"]/value)[1]', 'varchar(100)') AS [username], xed.event_data.value('(action[@name="client_hostname"]/value)[1]', 'varchar(100)') AS [client_hostname], xed.event_data.value('(action[@name="client_app_name"]/value)[1]', 'varchar(100)') AS [client_app_name], xed.event_data.value('(data[@name="convert_issue"]/text)[1]', 'varchar(100)') AS [convert_issue], xed.event_data.value('(data[@name="expression"]/value)[1]', 'varchar(max)') AS [expression], xed.event_data.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text] FROM #Eventos A CROSS APPLY A.event_data.nodes('//event') AS xed (event_data) ) A |
And now just create 1 job to collect this data periodically. To access the collected data, simply refer to the newly created table to analyze the implicit conversion occurrences in your environment:
Other articles on Implicit Conversion
Want other views and examples on this topic? Check out some articles by other authors that I separated for you:
- https://portosql.wordpress.com/2018/10/25/os-perigos-da-conversao-implicita-1/
- https://portosql.wordpress.com/2018/11/28/os-perigos-da-conversao-implicita-2/
- https://sqlkiwi.blogspot.com/2011/07/join-performance-implicit-conversions-and-residuals.html
- https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/
- https://hackernoon.com/are-implicit-conversions-killing-your-sql-query-performance-70961e547f11
- https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine?view=sql-server-2017
Well guys, I hope you enjoyed this post, really understood the dangers of explicit and implicit conversion and don't let that happen any more in your queries. The DBA thanks you.
Big hug and until next time.
Hehehe The happiness of those who found 3440 database records, is that who developed this application is or not kedak, great article, had already identified this a few days ago but did some evidence tests and really matches the article, was looking for something more detailed to point out to the project director.
Thank you Bruno.
I hope the article helped to show these details to the project director… Rs
Hi Dirceu. How do you see this message with the amount of logical reading?
Speak Paul, beauty?
To be able to view this I / O information, simply enter the SET STATISTICS IO ON command in your session.
To also view the times of each operation, use SET STATISTICS TIME ON in your session.
If you have any questions, do not hesitate to ask.
Very good! Congratulations!
Glad you enjoyed it, Carlos 🙂
top! very well explained and exemplified!
Thanks for the feedback, Leonardo 🙂