Click on the banner to learn about and purchase my database training on Azure

SQL Server - Using Calculated Columns (or Computed Columns) for Performance Tuning

Views: 8.388 views
This post is the 7 part of 10 in the series. Performance tuning
Reading Time: 7 minutes

Hey guys!!!

In this article, I would like to share with you something I see a lot in everyday life when I am performing Tuning consulting, which are time consuming, I / O and CPU consuming queries that use WHERE or JOIN functions in tables with many records and how we can use a very simple calculated (or computed) column indexing technique to solve this problem.

As I comment in the article Understanding Index Functioning in SQL Server, when using functions in WHERE or JOINS clauses, we are hurting the concept of Consultation SARGability, that is, we are making this query no longer use Seek operations on the indexes, since SQL Server needs to read the entire table, apply the desired function later, compare the values, and return the results.

What I want in this article is to show you this scenario going on, how to identify it and some possible solutions to improve query performance. So, come on!

Creating the demo base for this article

To create this example table similar to mine (the data is random, right .. rs), to be able to follow the article and simulate these scenarios, you can use the script below:

Demonstration using native function

To demonstrate how a query can be slow simply by using function in WHERE or JOIN, I will use the query below initially:

If we look at the execution plan of this query, we see that it is using the Index Seek operator, doing only 6 reads and 512 records. Looking at the CPU and IO information, we can conclude that CPU (compile) 3ms and runtime 3ms are quite acceptable:

Now let's use a function in this same query:

Execution Analysis:

That is, the result was terrible. Index Scan, high cpu time, high runtime, too many logical reads. All because of the function used, which stopped using the Index Seek operator and started to do Index Scan.

To solve this, it is very simple, especially because this function the way it is assembled (just like a LEFT), is helping us, because in these cases, we can replace the function with LIKE 'text%' quietly, because SQL Server will use the Seek operation on the index:

Execution Analysis:

We can notice that when using the LIKE 'texto%', the index was used with the Seek operation, making our query to be performed again.

LIKE and SARGability

IMPORTANT: Unlike LIKE 'text%', if you add the '%' symbol before the text, to filter everything that contains or ends with a certain expression, the index will not be used with the Seek operator, but Scan.

To understand the reason for this, make an analogy with a dictionary index: To find all the words in the dictionary that start with 'test' is very simple, just go to the letter T, then the letter 'e', ​​then the letter ' s' and so on until you find the words you want. When the next word in the list is greater than 'test', we can end the search.

In order to identify all the words in the dictionary that contain the word 'test' or end with 'test', we will have to look at all the words in the dictionary to be able to identify them.

Did it make it easier to understand how indexes work? If you still have questions, read my article Understanding Index Functioning in SQL Server.

But what if it was the RIGHT function, for example? Will our query not use the Seek operation on the same index?

As we saw above, the query was pretty bad, with a high number of logical reads, runtime and CPU. To solve this problem, let's use the calculated column feature and indexing this calculated column:

Execution Analysis:

Wow! The consultation was much faster now! This is because in creating the index, he already calculated this data for the entire column and left it sorted. This makes queries much faster than having to calculate this in real time and then comparing the values.

Index Creation Considerations and System Function Determinism

1 Note: Be aware that index creation will consume disk space and include a column in a table, even if calculated, should be tested first to ensure that it will not generate any errors during an insert operation that is not specifying the fields , for example.

2 Note: A very important point to note is that creating persistent calculated columns on disk and indexing calculated columns is only possible when using deterministic function.

All functions that exist in SQL Server are deterministic or non-deterministic. The determinism of a function is defined by the data returned by the function. The following describes the determinism of a function:

  • A function is considered deterministic always returns the same result set when it is called with the same input value set.
  • A function is considered non deterministic if it does not return the same result set when it is called with the same input value set.

This may sound a bit complicated, but it really isn't. Take, for example, the DATEDIFF and GETDATE functions. DATEDIFF is deterministic because it will always return the same data every time it is executed with the same input parameters. GETDATE is not deterministic because it will never return the same date every time it runs.

Demo using User Defined Function (UDF)

If using native function in WHERE / JOIN already worsens the performance of our queries, using custom user function the scenario is even worse. For this post, I will use the function fncSplit (with schema binding):

Execution Analysis:

As you can see, this simple query on a table of 10.000 records took about 35 seconds to execute, consuming almost CPU 15s. About 240 thousand logical readings were taken, and 610 thousand lines were processed to return the 1.040 lines of the final result. Summary: It's too bad!

To try to improve the performance of this query, let's use the same solution as the previous example, creating a calculated column and indexing this column:

Before reviewing the execution of this test, I need to make a warning about disk persisted calculated column creation using User Defined Function (UDF) and index creation on these calculated columns:

User Defined Function Determinism (UDF)

Important: When you create a user-defined function (UDF), SQL Server records determinism. The determinism of a user-defined function is determined by how you create the function. A user-defined function is considered deterministic if all of the following criteria are met:

  • The function is schema-bound for all database objects it references.
  • Any function called by the user-defined function is deterministic. This includes all user and system defined functions.
  • The function does not reference any database objects that are outside its scope. This means that the function cannot reference external tables, variables, or cursors.

When you create a role, SQL Server applies all of these criteria to the role to determine its determinism. If a function fails any of these checks, the function is marked as non-deterministic. Sometimes these checks can produce functions marked as non-deterministic, even when you expect them to be marked as deterministic.

In the case of this example, if I do not include the WITH SCHEMABINDING parameter in the fncSplit declaration, we will come across the following error message:

Msg 2729, Level 16, State 1, Line 47
Column 'Test_Column' in table 'dbo._Clientes' cannot be used in an index or statistics or as a partition key because it is non-deterministic.

Execution Analysis:

Wow !! From 32 seconds our query dropped to 8ms !! The CPU amount dropped from 14.974 to 8 and the logical reads amount dropped from 240.061 to 9! This tuning there was really very effective. I bet if you do something similar on a client, you'll get nice compliments 🙂

Before finalizing this article, I would like to leave you one last message:

This calculated column creation and column indexing technique is effective for both system function and user-defined function (UDF) use, whether they are in your query's JOIN or WHERE. If they are in your SELECT, it is not always worthwhile to use this technique unless they are really impacting performance a lot (VERY slow functions or returned dataset is too large).

As I always say: When applying performance tuning techniques, ALWAYS TEST!

Well guys, I hope you enjoyed this article.
A big hug and even more!