- SQL Server - Introduction to Performance Tuning Study
- Understanding Index Functioning in SQL Server
- SQL Server - How to Identify a Slow or “Heavy” Query in Your Database
- SQL Server - Performance Tuning Tips: Implicit Conversion? NEVER!
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- SQL Server - Performance Tuning Tips: What is the Difference Between Seek Predicate and Predicate?
- SQL Server - Using Calculated Columns (or Computed Columns) for Performance Tuning
- SQL Server - How to identify and collect time consuming query information using Extended Events (XE)
- SQL Server - How to identify all missing indexes (Missing indexes) of a database
- SQL Server and Azure SQL Database: How to Identify Key Lookup occurrences through the plancache
Hey guys!
One more Performance Tuning tip for you, where I will explain and comment on the difference between Seek Predicate and Predicate, where they may look the same, but make a big difference in the performance of your queries.
I believe this article should answer a very common question of who is starting in the area now: Does the order of the columns in index creation make a difference? After reading the whole article, comment here below what is your opinion .. rs
Creating the demo base for this article
To create this example table similar to mine (the data is random, right .. rs), to be able to follow the article and simulate these scenarios, you can use the script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
IF (OBJECT_ID('dbo.Vendas') IS NOT NULL) DROP TABLE dbo.Vendas CREATE TABLE dbo.Vendas ( Id_Pedido INT IDENTITY(1,1), Dt_Pedido DATETIME, [Status] INT, Quantidade INT, Valor NUMERIC(18, 2) ) CREATE CLUSTERED INDEX SK01_Pedidos ON dbo.Vendas(Id_Pedido) CREATE NONCLUSTERED INDEX SK02_Pedidos ON dbo.Vendas ([Status], Dt_Pedido) INCLUDE(Quantidade, Valor) GO INSERT INTO dbo.Vendas ( Dt_Pedido, [Status], Quantidade, Valor ) SELECT DATEADD(SECOND, (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 199999999, '2015-01-01'), (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 9, (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 10, 0.459485495 * (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 1999 GO 10000 INSERT INTO dbo.Vendas ( Dt_Pedido, [Status], Quantidade, Valor ) SELECT Dt_Pedido, [Status], Quantidade, Valor FROM dbo.Vendas GO 9 |
What is the difference between Seek Predicate and Predicate?
As I explained in the article Understanding Index Functioning in SQL Server, SQL has 2 basic ways of reading data:
- Seek: Direct read in pointers of data records that meet the search criteria. Since the data is sorted (if there is no fragmentation), it is simple to use fast search algorithms such as QuickSort to read the desired data with few reads on the disk.
- Scan: Scan that can be done using Range Scan (if index but fragmented), reading records that are in certain range (s) or Full Scan if the fragmentation level of the records is too high or not indexes, where SQL Server will have to read the entire table until it finds all records that meet the search criteria
Well, having understood the very simple and brief explanation of these 2 operations, it became quite clear that the Seek operation on Rowstore indexes almost always (not always) outperforms Scan. So let's talk now about Seek Predicate and Predicate, which only occur when the table has indexes that match a particular query.
Seek Predicate It is the first filter that is applied to data when SQL Server executes a query. Because of this, it is ideal that indexes are created to prioritize Seek Predicate on the most selective columns possible (as few records as possible for each column value), so that the first level of filtering returns as few rows as possible. The operation of Predicate occurs after the Seek Predicate. After the first filter performed on the data, SQL Server will apply the remaining query filters to the subset returned by Seek Predicate, that is, the larger the subset in 2 Step 1, the more work for the query optimizer, since in Predicate they can have filters that are heavy and not very selective.
Ah, but how do I force various conditions on the Seek Predicate? It doesn't ... lol .. There is a very limited number of operations that can be done together within the Seek Predicate, such as, for example, a range operation (between or> value and <value) and another of equality (=) can be used together in Seek Predicate, but two equal operations, whether range or equality, do not.
Analyzing Column Selectivity by Histogram
I will demonstrate in the example below, with two range operations in the same query, how to identify how selective the columns of a given index are:
1 2 3 4 5 |
SELECT * FROM dbo.Vendas WHERE Dt_Pedido >= '2019-02-06' AND Dt_Pedido < '2019-02-09' AND [Status] < 5 |
Looking at the execution plan superficially, we identify nothing different from an optimized query. Seek operation, no Keylookup. All right.
But what if we look deeper? Well, it's not that good. To return 2.560 lines, I had to read 2.857.984 lines, more than 1000x.
Another way to see how the query is running in the database is to use the command SET STATISTICS PROFILE ON:
Returning the following analysis in the StmtText field:
1 2 3 4 5 6 7 8 9 10 |
SELECT [Quantidade]*[Valor] FROM [dbo].[Vendas] WHERE [Dt_Pedido]>=@1 AND [Dt_Pedido]<@2 AND [Status]<@3 |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(numeric(10,0),[dirceuresende].[dbo].[Vendas].[Quantidade],0)*[dirceuresende].[dbo].[Vendas].[Valor])) |--Parallelism(Gather Streams) |--Index Seek(OBJECT:([dirceuresende].[dbo].[Vendas].[SK02_Pedidos]), SEEK:([dirceuresende].[dbo].[Vendas].[Status] < CONVERT_IMPLICIT(int,[@3],0)), WHERE:([dirceuresende].[dbo].[Vendas].[Dt_Pedido]>=CONVERT_IMPLICIT(datetime,[@1],0) AND [dirceuresende].[dbo].[Vendas].[Dt_Pedido]<CONVERT_IMPLICIT(datetime,[@2],0)) ORDERED FORWARD) OBJECT:([dirceuresende].[dbo].[Vendas].[SK02_Pedidos]), SEEK:([dirceuresende].[dbo].[Vendas].[Status] < CONVERT_IMPLICIT(int,[@3],0)), WHERE:([dirceuresende].[dbo].[Vendas].[Dt_Pedido]>=CONVERT_IMPLICIT(datetime,[@1],0) AND [dirceuresende].[dbo].[Vendas]. [dirceuresende].[dbo].[Vendas].[Quantidade], [dirceuresende].[dbo].[Vendas].[Valor] |
Where SEEK is Seek Predicate and WHERE is Predicate.
Look at the execution plan again and see the conditions of Seek Predicate (Status filter) and Predicate (OrderDt filter). Is the Status column more selective than the OrderDt column, even more so in the query above? Let's find out by creating statistics for the Status column and analyze the histogram:
1 2 3 4 5 |
CREATE STATISTICS Vendas_Status ON dbo.Vendas(Status) WITH FULLSCAN GO DBCC SHOW_STATISTICS('Vendas', Vendas_Status) GO |
That is, there are only 9 distinct values for the Status column, with an average distribution between 550 to 600 thousand records for each status, as shown in Histogram. Let's now look at the histogram of the OrderDt column to see if it is more selective than the Status column:
1 2 3 4 5 |
CREATE STATISTICS Vendas_DtPedido ON dbo.Vendas(Dt_Pedido) WITH FULLSCAN GO DBCC SHOW_STATISTICS('Vendas', Vendas_DtPedido) GO |
Looking at the histogram of the Dt_Request column, we can see that the density is much higher, with about 29.999 distinct values and an average of 20 to 50 thousand records for box histogram range and an estimate of 512 records per distinct value, which shows that It is a much more selective column than the Status column.
Seek Predicate vs Predicate
Now that I have explained the basic workings of Seek Predicate and Predicate and how to identify column selectivity, I will demonstrate some examples of how we can try to identify and even control Seek Predicate and Predicate operations.
Recalling the indexes of our table:
CREATE CLUSTERED INDEX SK01_Request ON dbo.Sales (Order_id)
CREATE NONCLUSTERED INDEX SK02_Request ON dbo.Sales ([Status], Order_td) INCLUDE (Quantity, Value)
Example 1 - Range and Range
In this first example, I will use a query with 2 range clauses (<and / or>) and we will try to control Seek Predicate and Predicate for these queries:
1 2 3 4 5 |
SELECT Quatidade * Valor FROM dbo.Vendas WHERE Dt_Pedido >= '2019-02-06' AND Dt_Pedido < '2019-02-09' AND [Status] < 5 |
Result of the execution plan analysis:
As I explained in the previous topic, the selectivity of the OrderDt column is much higher than the Status column, and this justifies the huge amount of rows that have been read (2.869.003) to return only 2.577 records per query. Although the read operation is a Seek, it can still be improved by rebuilding an index using a more selective approach.
To do this, let's drop the SK02_Purts index and invert the columns of this index to make the Seek Predicate operation be done in the Dt_Purchase column instead of the Status column.
1 2 3 4 5 |
DROP INDEX SK02_Pedidos ON dbo.Vendas GO CREATE NONCLUSTERED INDEX SK02_Pedidos ON dbo.Vendas (Dt_Pedido, [Status]) INCLUDE(Quantidade, Valor) GO |
And now let's execute the query again and analyze the execution plan:
Wow! What difference! Note that in addition to our query using CPU 0ms (formerly 313ms) and running on just 2ms (were 71ms), the logical reads dropped from 20.900 to 29 and also the number of rows read dropped from 2.869.003 to 5.657. All this with just a change in the order of indexes, which changed the columns that are part of Seek Predicate and Predicate.
Example 2 - Range and Equality
What if instead of status <5 it was status = 5, for example? How would the execution plan look?
New Query:
1 2 3 4 5 |
SELECT Quantidade * Valor FROM dbo.Vendas WHERE Dt_Pedido > '2019-02-06' AND Dt_Pedido < '2019-02-09' AND [Status] = 5 |
Analysis of the execution plan with the new query:
That is, the plan was very similar to the previous plan, even changing the status filter. But now comes the question: Can you further improve this query?
Well, yes .. Let's get the index back to the previous condition .. kkkkkkkk
1 2 3 4 5 |
DROP INDEX SK02_Pedidos ON dbo.Vendas GO CREATE NONCLUSTERED INDEX SK02_Pedidos ON dbo.Vendas ([Status], Dt_Pedido) INCLUDE(Quantidade, Valor) GO |
New plan generated by execution:
And with our old index, our query just got better and more selective! As I always talk about Performance, everything has to be tested and evaluated. In this case, one of the 2 queries will be hampered by the index change, or you can have the 2 indexes created (consuming twice as much disk space) and force the best. index for each situation.
It is very important to note that the creation of indexes must be very well thought out, because you can not be creating index for any query of the bank. Indexes take up space and slow down more complex write operations for SQL Server, so they should be created when needed.
In addition, Performance work requires business focus: Higher levels of the business don't care about logical reads or runtime, but how it adds value to the day to day business. It's no use spending your time improving an SP that runs 1x a day, at dawn, and started running on 1s and before it was 1h, because it's not improving the company's operation at all. Always look for the best opportunities for the company, leaving aside only the technical vision. When you have a normal time and environment, then you try to improve on these heavier routines.
That's it folks!
I hope you enjoyed this post and even more!
References:
Dirceu, creating an artificial search predicate is very useful when we have a WHERE clause filled with non sargable predicates. This way you can turn sequential scan into seek. In item 4 of the article “Building Efficient T-SQL Code: Sargability” there is an example of using this approach to speed up the execution of a query.
Ok thanks for sharing Jose