SQL Server - Performance Tuning Tips: What is the Difference Between Seek Predicate and Predicate?

Views: 1.172
This post is the 6 part of 8 in the series. Performance tuning
Reading Time: 7 minutes

Hey guys!
One more Performance Tuning tip for you, where I will explain and comment on the difference between Seek Predicate and Predicate, where they may look the same, but make a big difference in the performance of your queries.

I believe this article should answer a very common question of who is starting in the area now: Does the order of the columns in index creation make a difference? After reading the whole article, comment here below what is your opinion .. rs

Creating the demo base for this article

To create this example table similar to mine (the data is random, right .. rs), to be able to follow the article and simulate these scenarios, you can use the script below:

What is the difference between Seek Predicate and Predicate?

As I explained in the article Understanding Index Functioning in SQL Server, SQL has 2 basic ways of reading data:

  • Seek: Direct read in pointers of data records that meet the search criteria. Since the data is sorted (if there is no fragmentation), it is simple to use fast search algorithms such as QuickSort to read the desired data with few reads on the disk.
  • Scan: Scan that can be done using Range Scan (if index but fragmented), reading records that are in certain range (s) or Full Scan if the fragmentation level of the records is too high or not indexes, where SQL Server will have to read the entire table until it finds all records that meet the search criteria

Well, having understood the very simple and brief explanation of these 2 operations, it became quite clear that the Seek operation on Rowstore indexes almost always (not always) outperforms Scan. So let's talk now about Seek Predicate and Predicate, which only occur when the table has indexes that match a particular query.

Seek Predicate It is the first filter that is applied to data when SQL Server executes a query. Because of this, it is ideal that indexes are created to prioritize Seek Predicate on the most selective columns possible (as few records as possible for each column value), so that the first level of filtering returns as few rows as possible. The operation of Predicate occurs after the Seek Predicate. After the first filter performed on the data, SQL Server will apply the remaining query filters to the subset returned by Seek Predicate, that is, the larger the subset in 2 Step 1, the more work for the query optimizer, since in Predicate they can have filters that are heavy and not very selective.

Ah, but how do I enforce various conditions in Seek Predicate? There is a very limited number of operations that can be done together within Seek Predicate, such as a range (between or> value and <value) and equality (=) can be used together in Seek Predicate, but two equal operations, whether range or equality, cannot.

Analyzing Column Selectivity by Histogram

I will demonstrate in the example below, with two range operations in the same query, how to identify how selective the columns of a given index are:

Looking at the execution plan superficially, we identify nothing different from an optimized query. Seek operation, no Keylookup. All right.

But what if we look deeper? Well, it's not that good. To return 2.560 lines, I had to read 2.857.984 lines, more than 1000x.

Another way to see how the query is running in the database is to use the command SET STATISTICS PROFILE ON:

Returning the following analysis in the StmtText field:

Where SEEK is Seek Predicate and WHERE is Predicate.

Look at the execution plan again and see the conditions of Seek Predicate (Status filter) and Predicate (OrderDt filter). Is the Status column more selective than the OrderDt column, even more so in the query above? Let's find out by creating statistics for the Status column and analyze the histogram:

Result:

That is, there are only 9 distinct values ​​for the Status column, with an average distribution between 550 to 600 thousand records for each status, as shown in Histogram. Let's now look at the histogram of the OrderDt column to see if it is more selective than the Status column:

Result:

Looking at the histogram of the Dt_Request column, we can see that the density is much higher, with about 29.999 distinct values ​​and an average of 20 to 50 thousand records for box histogram range and an estimate of 512 records per distinct value, which shows that It is a much more selective column than the Status column.

Observation: Be careful when creating statistics on very large tables, especially with the FULLSCAN clause. If you are unsure of using this command to parse in the histogram, you can simply use a query like this SELECT Dt_Order, COUNT (*) FROM Sales GROUP BY Dt_Order to get a good idea of ​​column selectivity.

Seek Predicate vs Predicate

Now that I have explained the basic workings of Seek Predicate and Predicate and how to identify column selectivity, I will demonstrate some examples of how we can try to identify and even control Seek Predicate and Predicate operations.

Recalling the indexes of our table:
CREATE CLUSTERED INDEX SK01_Request ON dbo.Sales (Order_id)
CREATE NONCLUSTERED INDEX SK02_Request ON dbo.Sales ([Status], Order_td) INCLUDE (Quantity, Value)

1 Example - Range and Range

In this first example, I will use a query with 2 range clauses (<and / or>) and try to control Seek Predicate and Predicate for these queries:

Result of the execution plan analysis:

As I explained in the previous topic, the selectivity of the OrderDt column is much higher than the Status column, and this justifies the huge amount of rows that have been read (2.869.003) to return only 2.577 records per query. Although the read operation is a Seek, it can still be improved by rebuilding an index using a more selective approach.

To do this, let's drop the SK02_Purts index and invert the columns of this index to make the Seek Predicate operation be done in the Dt_Purchase column instead of the Status column.

And now let's execute the query again and analyze the execution plan:

Wow! What difference! Note that in addition to our query using CPU 0ms (formerly 313ms) and running on just 2ms (were 71ms), the logical reads dropped from 20.900 to 29 and also the number of rows read dropped from 2.869.003 to 5.657. All this with just a change in the order of indexes, which changed the columns that are part of Seek Predicate and Predicate.

Example 2 - Range and Equality

What if instead of status <5 was status = 5, for example? What would the execution plan look like?

New Query:

Analysis of the execution plan with the new query:

That is, the plan was very similar to the previous plan, even changing the status filter. But now comes the question: Can you further improve this query?
Well, yes .. Let's get the index back to the previous condition .. kkkkkkkk

New plan generated by execution:

And with our old index, our query just got better and more selective! As I always talk about Performance, everything has to be tested and evaluated. In this case, one of the 2 queries will be hampered by the index change, or you can have the 2 indexes created (consuming twice as much disk space) and force the best. index for each situation.

It is very important to note that the creation of indexes must be very well thought out, because you can not be creating index for any query of the bank. Indexes take up space and slow down more complex write operations for SQL Server, so they should be created when needed.

In addition, Performance work requires business focus: Higher levels of the business don't care about logical reads or runtime, but how it adds value to the day to day business. It's no use spending your time improving an SP that runs 1x a day, at dawn, and started running on 1s and before it was 1h, because it's not improving the company's operation at all. Always look for the best opportunities for the company, leaving aside only the technical vision. When you have a normal time and environment, then you try to improve on these heavier routines.

That's it folks!
I hope you enjoyed this post and even more!

References: