- SQL Server - Introduction to Performance Tuning Study
- Understanding Index Functioning in SQL Server
- SQL Server - How to Identify a Slow or “Heavy” Query in Your Database
- SQL Server - Performance Tuning Tips: Implicit Conversion? NEVER!
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- SQL Server - Performance Tuning Tips: What is the Difference Between Seek Predicate and Predicate?
- SQL Server - Using Calculated Columns (or Computed Columns) for Performance Tuning
- SQL Server - How to identify and collect time consuming query information using Extended Events (XE)
- SQL Server - How to identify all missing indexes (Missing indexes) of a database
- SQL Server and Azure SQL Database: How to Identify Key Lookup occurrences through the plancache
Hey guys!!!
In this article, I would like to share with you something I see a lot in everyday life when I am performing Tuning consulting, which are time consuming, I / O and CPU consuming queries that use WHERE or JOIN functions in tables with many records and how we can use a very simple calculated (or computed) column indexing technique to solve this problem.
As I comment in the article Understanding Index Functioning in SQL Server, when using functions in WHERE or JOINS clauses, we are hurting the concept of Consultation SARGability, that is, we are making this query no longer use Seek operations on the indexes, since SQL Server needs to read the entire table, apply the desired function later, compare the values, and return the results.
What I want in this article is to show you this scenario going on, how to identify it and some possible solutions to improve query performance. So, come on!
Creating the demo base for this article
To create this example table similar to mine (the data is random, right .. rs), to be able to follow the article and simulate these scenarios, you can use the script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
IF (OBJECT_ID('_Clientes') IS NOT NULL) DROP TABLE _Clientes CREATE TABLE _Clientes ( Id_Cliente INT IDENTITY(1,1), Dados_Serializados VARCHAR(MAX) ) INSERT INTO _Clientes ( Dados_Serializados ) SELECT CONVERT(VARCHAR(19), DATEADD(SECOND, (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 199999999, '2015-01-01'), 121) + '|' + CONVERT(VARCHAR(20), CONVERT(INT, (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 9)) + '|' + CONVERT(VARCHAR(20), CONVERT(INT, (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 10)) + '|' + CONVERT(VARCHAR(20), CONVERT(INT, 0.459485495 * (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0)) * 1999) GO 10000 INSERT INTO _Clientes ( Dados_Serializados ) SELECT Dados_Serializados FROM _Clientes GO 9 CREATE CLUSTERED INDEX SK01_Pedidos ON _Clientes(Id_Cliente) CREATE NONCLUSTERED INDEX SK02_Pedidos ON _Clientes(Dados_Serializados) GO |
Demonstration using native function
To demonstrate how a query can be slow simply by using function in WHERE or JOIN, I will use the query below initially:
1 2 3 |
SELECT * FROM _Clientes WHERE Dados_Serializados = '2016-11-22 04:49:06|2|0|0' |
If we look at the execution plan of this query, we see that it is using the Index Seek operator, doing only 6 reads and 512 records. Looking at the CPU and IO information, we can conclude that CPU (compile) 3ms and runtime 3ms are quite acceptable:
Now let's use a function in this same query:
1 2 3 |
SELECT * FROM _Clientes WHERE SUBSTRING(Dados_Serializados, 1, 10) = '2016-11-22' |
That is, the result was terrible. Index Scan, high cpu time, high runtime, too many logical reads. All because of the function used, which stopped using the Index Seek operator and started to do Index Scan.
To solve this, it is very simple, especially because this function the way it is assembled (just like a LEFT), is helping us, because in these cases, we can replace the function with LIKE 'text%' quietly, because SQL Server will use the Seek operation on the index:
1 2 3 |
SELECT * FROM _Clientes WHERE Dados_Serializados LIKE '2016-11-22%' |
We can notice that when using the LIKE 'texto%', the index was used with the Seek operation, making our query to be performed again.
LIKE and SARGability
IMPORTANT: Unlike LIKE 'text%', if you add the '%' symbol before the text, to filter everything that contains or ends with a certain expression, the index will not be used with the Seek operator, but Scan.
To understand the reason for this, make an analogy with a dictionary index: To find all the words in the dictionary that start with 'test' is very simple, just go to the letter T, then the letter 'e', then the letter ' s' and so on until you find the words you want. When the next word in the list is greater than 'test', we can end the search.
In order to identify all the words in the dictionary that contain the word 'test' or end with 'test', we will have to look at all the words in the dictionary to be able to identify them.
Did it make it easier to understand how indexes work? If you still have questions, read my article Understanding Index Functioning in SQL Server.
But what if it was the RIGHT function, for example? Will our query not use the Seek operation on the same index?
As we saw above, the query was pretty bad, with a high number of logical reads, runtime and CPU. To solve this problem, let's use the calculated column feature and indexing this calculated column:
1 2 3 4 5 6 7 8 9 10 11 12 |
-- Cria a nova coluna calculada ALTER TABLE _Clientes ADD Right_5 AS (RIGHT(Dados_Serializados, 5)) GO -- Cria um índice para a nova coluna criada CREATE NONCLUSTERED INDEX SK03_Clientes ON dbo._Clientes(Right_5) GO -- Executa a consulta nova SELECT Right_5 FROM _Clientes WHERE Right_5 = '1|4|0' |
Wow! The consultation was much faster now! This is because in creating the index, he already calculated this data for the entire column and left it sorted. This makes queries much faster than having to calculate this in real time and then comparing the values.
Index Creation Considerations and System Function Determinism
1 Note: Be aware that index creation will consume disk space and include a column in a table, even if calculated, should be tested first to ensure that it will not generate any errors during an insert operation that is not specifying the fields , for example.
2 Note: A very important point to note is that creating persistent calculated columns on disk and indexing calculated columns is only possible when using deterministic function.
All functions that exist in SQL Server are deterministic or non-deterministic. The determinism of a function is defined by the data returned by the function. The following describes the determinism of a function:
- A function is considered deterministic always returns the same result set when it is called with the same input value set.
- A function is considered non deterministic if it does not return the same result set when it is called with the same input value set.
This may sound a bit complicated, but it really isn't. Take, for example, the DATEDIFF and GETDATE functions. DATEDIFF is deterministic because it will always return the same data every time it is executed with the same input parameters. GETDATE is not deterministic because it will never return the same date every time it runs.
Demo using User Defined Function (UDF)
If using native function in WHERE / JOIN already worsens the performance of our queries, using custom user function the scenario is even worse. For this post, I will use the function fncSplit (with schema binding):
1 2 3 |
SELECT Dados_Serializados FROM _Clientes WHERE dbo.fncSplit(Dados_Serializados, '|', 3) = '1' |
As you can see, this simple query on a table of 10.000 records took about 35 seconds to execute, consuming almost CPU 15s. About 240 thousand logical readings were taken, and 610 thousand lines were processed to return the 1.040 lines of the final result. Summary: It's too bad!
To try to improve the performance of this query, let's use the same solution as the previous example, creating a calculated column and indexing this column:
1 2 3 4 5 6 7 8 9 10 11 12 |
-- Cria a nova coluna calculada ALTER TABLE _Clientes ADD Coluna_Teste AS (dbo.fncSplit(Dados_Serializados, '|', 3)) GO -- Cria um índice para a nova coluna criada CREATE NONCLUSTERED INDEX SK04_Clientes ON dbo._Clientes(Coluna_Teste) INCLUDE(Dados_Serializados) GO -- Executa a consulta nova SELECT Dados_Serializados FROM _Clientes WHERE Coluna_Teste = '1' |
Before reviewing the execution of this test, I need to make a warning about disk persisted calculated column creation using User Defined Function (UDF) and index creation on these calculated columns:
User Defined Function Determinism (UDF)
Important: When you create a user-defined function (UDF), SQL Server records determinism. The determinism of a user-defined function is determined by how you create the function. A user-defined function is considered deterministic if all of the following criteria are met:
- The function is schema-bound for all database objects it references.
- Any function called by the user-defined function is deterministic. This includes all user and system defined functions.
- The function does not reference any database objects that are outside its scope. This means that the function cannot reference external tables, variables, or cursors.
When you create a role, SQL Server applies all of these criteria to the role to determine its determinism. If a function fails any of these checks, the function is marked as non-deterministic. Sometimes these checks can produce functions marked as non-deterministic, even when you expect them to be marked as deterministic.
In the case of this example, if I do not include the WITH SCHEMABINDING parameter in the fncSplit declaration, we will come across the following error message:
Column 'Test_Column' in table 'dbo._Clientes' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
Wow !! From 32 seconds our query dropped to 8ms !! The CPU amount dropped from 14.974 to 8 and the logical reads amount dropped from 240.061 to 9! This tuning there was really very effective. I bet if you do something similar on a client, you'll get nice compliments 🙂
Before finalizing this article, I would like to leave you one last message:
As I always say: When applying performance tuning techniques, ALWAYS TEST!
Well guys, I hope you enjoyed this article.
A big hug and even more!