- SQL Server - Introduction to Performance Tuning Study
- Understanding Index Functioning in SQL Server
- SQL Server - How to Identify a Slow or “Heavy” Query in Your Database
- SQL Server - Performance Tuning Tips: Implicit Conversion? NEVER!
- SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function
- SQL Server - Performance Tuning Tips: What is the Difference Between Seek Predicate and Predicate?
- SQL Server - Using Calculated Columns (or Computed Columns) for Performance Tuning
- SQL Server - How to identify and collect time consuming query information using Extended Events (XE)
- SQL Server - How to identify all missing indexes (Missing indexes) of a database
- SQL Server and Azure SQL Database: How to Identify Key Lookup occurrences through the plancache
Hello people,
Good Morning!
In this post I will comment on the structure of indexes, which help a lot to optimize queries, reducing IO and CPU and returning information faster. However, be very careful when creating indexes, as they take up a lot of disk space and if not well modeled, may not be as effective.
Introduction
An index is a disk structure associated with a table or view that speeds row retrieval. An index contains keys created from one or more columns and those keys are stored in a structure (tree B) that enables SQL Server to find the row or rows associated with key values quickly and effectively.
With the creation of the index, the database will create an ordered tree structure for easy searching, where the first level is the root, the intermediate levels contain the index trees and the last level contains the data and a doubly linked list. linking the data pages, containing a previous page pointer and next page, as shown below:
Using the index will not always perform well because choosing the wrong index can cause poor performance. Therefore, the task of the query optimizer is to select an index or combination of indexes only when it yields performance improvement and to avoid indexed recovery when it impairs performance.
The data page contains an indexed header, with a double-linked list containing a pointer to the previous and next page, a block with the records (data) and in the footer there are array slots containing the memory addresses of the pages with the data.
Recommendations on Creating an Index
The following tasks make up the recommended strategy for index creation:
- Understand the characteristics of commonly used queries. For example, knowing that a frequently used query joins two or more tables will help you determine the best index type to use.
- Understand the characteristics of the columns used in queries. For example, an index is ideal for columns that have an integer data type as well as unique or non-null columns. For columns that have well-defined subsets of data, you can use a filtered index on SQL Server 2008 and later versions.
- Determine which index options could increase performance when creating or maintaining the index. For example, creating a clustered index on an existing large table would benefit from the ONLINE index option. The ONLINE option allows concurrent activity on underlying data to continue while the index is being created or rebuilt.
- Determine the best storage location for the index. A nonclustered index can be stored in the same filegroup as the underlying table or in a different filegroup. Index store location can improve query performance by increasing disk I / O performance. For example, storing a nonclustered index on a filegroup that is on a different disk than the table filegroup can improve performance because multiple disks can be read at the same time.
- Create nonclustered indexes on columns often used in predicates and JOINS conditions in queries. However, avoid adding unnecessary columns. Appending many index columns will increase disk space and index maintenance performance.
- Covering indexes can improve query performance because all data needed to satisfy query requirements exists within the index itself. That is, only index pages, not table or clustered index data pages, are required to retrieve the requested data, thus reducing overall disk I / O operations. For example, a query for columns a and b in a table that has a composite index created on columns a, b, and 1 can retrieve the specified data from the index only.
- Write queries that insert or modify as many queues as possible in a single statement, rather than using multiple queries to update those same queues. By using only one statement, you can exploit optimal index maintenance.
- Evaluate the type of query and how columns are used in the query. For example, a column used in an exact match query would be a good candidate for a clustered or nonclustered index.
- Keep the index key length short for clustered indexes. In addition, clustered indexes benefit from being created in unique or non-null columns.
- Examine the uniqueness of the column. A unique index instead of a non-unique index in the same column combination provides additional information for the query optimizer, which makes the index more useful.
- Examine the data distribution in the column. Often, a long query is caused by indexing a column with few unique values, or by performing a join on that column. For example, a physical phone book sorted alphabetically by last name will not be quick to locate a person if everyone in the city has Smith or Jones names.
- Consider the column order if the index contains multiple columns. The column that is used in the WHERE clause in a query criterion equal to (=), greater than (>), less than (>), or BETWEEN, or participating in a join, must be placed first. Additional columns should be ordered based on their level of distinction, ie from most distinct to least distinct.
For example, if the index is set to LastName, FirstName the index is useful when the query criteria is WHERE LastName = 'Smith' or WHERE LastName = Smith AND FirstName LIKE 'J%'. However, the query optimizer would not use the index for a query that had searched only on FirstName (WHERE FirstName = 'Jane'). - Specify the fill factor of the index. When an index is created or rebuilt, the fill factor value determines the percentage of space on each sheet-level page to fill with data, reserving the remainder on each page as free space for future growth. For example, specifying A fill factor value of 80 means that 20 percent of each leaf level page will be empty, providing room for index expansion as data is added to the underlying table. A correctly chosen fill factor value can reduce potential page divisions by providing enough room for index expansion as data is added to the underlying table. To learn more about Fill Factor, visit this link.
- Consider indexing the computed columns.
Clustered or NonClustered Index
Clustered Index
The CLUSTERED index is mounted on the table itself, creating the ORDERED tree structure for easy searching. For this reason, only 1 index of this type can be created per table and you cannot use INCLUDE columns in this index type. Each cluster index sheet has all the registry information.
Recommendations for the column that will compose the clustered index:
- Numeric fields (smallint, tinyint, int, bigint)
- Growing Data
- Unique Values
- Non-updated values
- Data that is often used in searches, joins, etc.
- Usually will be created on Primary Key
Classic example: Dictionary (You find the word and along with it you have the definition).
Nonclustered Index
The NONCLUSTERED index is a separate ORDERED structure, which contains only the indexed column (and INCLUDE columns, if any) and a table can have N indexes of this type. If it is necessary to consult some information that is not in the NONCLUSTERED index, the information is located using the clustered index of the table (Key Lookup). If the table does not have a CLUSTERED index, the IAM (Index Allocation Map) will be used to locate the information via RID (RowID - in the RID Lookup operation).
Classic example: Index of a book (You find in the index the page where the chapter is and then go to the page to see the information).
Composite index or with included columns
A very common question in index creation, these 2 forms of index creation work quite differently:
- Composite Index: It is an index that is formed by more than one column. In this case, the index tree structure will have the column information that is part of the index at all levels and searches using that column will be filtered faster.
- Index with column included: It is an index formed by one or more columns and that includes other columns. In this case, the index tree structure will be mounted only with the columns that are part of the index, and only at the last level of the tree (leaf) will the included column information be available (taking up less disk space than the composite index) . This type of index is recommended to avoid Key Lookup operators, including non-index columns that are always fetched, by using this index as well as the clustered index to return other information.
When to use an index with included columns
When there are columns used in SELECT for data display only but not used for filters, it is interesting to add these columns to INCLUDE, as the information will be stored only at the index sheet level (last index), for display only (eg cause this type of index is usually smaller than the composite index).
If these columns are not present in the index, you must either use another index (always in the clustered index or ROWID if the table is not clustered) or have to scan the entire table to find this information. See more details on this link.
Index Creation Example with Included Column
1 |
CREATE NONCLUSTERED INDEX SK01 ON dbo.Cliente(CPF) INCLUDE(RG, Nome) |
When to use composite index
When a recurring query uses more than one column in the WHERE clause, composite index (with more than one column) may be used. In this situation, the query plan uses the column set to filter, so that the most restrictive indices (equality indices) must come first in the definition, before the least restrictive indices (inequalities). The other columns that make up the index are copied to all leaves of the index tree (consuming more space).
Composite Index Creation Example
1 |
CREATE NONCLUSTERED INDEX SK01 ON dbo.Cliente(CPF, RG) |
What is and how to avoid Key Lookup and RID Lookup
When a query is performed on a table, SQL Server Query Optimizer will determine the best method of accessing data according to the statistics collected and choose the one with the lowest cost.
Because the clustered index is the table itself, generating a large amount of data, the lowest cost nonclustered index is often used for the query. This can be a problem because often the query is selecting columns where not all are indexed, so that a non-clustered index is used to search the indexed information (Index Seek NonClustered) and the clustered index is also used to return the remaining information, where the nonclustered index has a pointer to the exact position of the information in the clustered index (or the ROWID if the table has no clustered index).
This operation is called a Key lookup, for tables with clustered index or RID Lookup (RID = Row ID) for tables that do not have clustered index (called HEAP tables) and by generating 2 read operations for a single query, should be avoided whenever possible.
To avoid KeyLookup just use the Covering Index technique, which consists of adding to the NonClustered Index (INCLUDE) the main columns that are used in the table queries. This makes the query optimizer able to get all the information by reading only the chosen index, without having to read the clustered index as well.
However, close attention should be paid to index modeling. It is not recommended to add all table columns to the nonclustered index as it will become so large that it will no longer be effective and the query optimizer may even decide not to use it and prefer the Index Scan operator, which does so. sequential reading of the entire index, impairing query performance.
To avoid RID Lookup, simply create the clustered index on the table and pay attention to any Key Lookup events that may arise.
Unique Indexes (Exclusive)
A unique index ensures that the index key contains no duplicate values, so each row in the table is unique in some way.
Unique multicolumn indexes ensure that each combination of values in the index key is unique. For example, if a unique index is created on a combination of LastName, FirstName, and MiddleName columns, two rows in the table cannot have the same combination of values as those columns.
You cannot create a unique index on a single column if it has NULL on more than one row. Similarly, you cannot create a unique index on multiple columns if the column combination has NULL on more than one row, as this is treated as duplicate values for indexing purposes.
Internally, when you create a PRIMARY KEY constraint, a unique clustered index is automatically created on the column. The big difference between PRIMARY KEY and single index is that a table can have only 1 PRIMARY KEY, but several unique indexes.
Example of single index creation:
1 |
CREATE UNIQUE NONCLUSTERED INDEX SK01 ON dbo.Clientes(CPF) |
When you try to insert a CPF into the existing customer table, an error will appear on the SQL Server screen and execution will be aborted:
Using the IGNORE_DUP_KEY = ON parameter, you can allow the bank to just ignore the duplicate record and display only a warning on the screen:
1 2 |
CREATE UNIQUE NONCLUSTERED INDEX SK01 ON dbo.Clientes(CPF) WITH(IGNORE_DUP_KEY = ON) |
Rebuild / Reorganize Indexes as Needed
The SQL Server Database Engine automatically maintains indexes whenever input, update, or delete operations are performed on underlying data. Over time, these modifications can cause index information to be scattered throughout the database (fragmented). Fragmentation occurs when indexes have pages in which the logical order based on the key value does not match the physical order of the data file. Heavily fragmented indexes can degrade query performance and cause slow application response.
Identifying Index Fragmentation
The first step in choosing which fragmentation method to use is to analyze the index to determine the degree of fragmentation. Using the sys.dm_db_index_physical_stats system function, you can detect fragmentation at a specific index, all indexes in an indexed table or view, all indexes in a database, or all indexes in all databases. . For partitioned indexes, sys.dm_db_index_physical_stats also provides partition fragmentation information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT OBJECT_NAME(B.object_id) AS TableName, B.name AS IndexName, A.index_type_desc AS IndexType, A.avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') A INNER JOIN sys.indexes B WITH(NOLOCK) ON B.object_id = A.object_id AND B.index_id = A.index_id WHERE A.avg_fragmentation_in_percent > 30 AND OBJECT_NAME(B.object_id) NOT LIKE '[_]%' AND A.index_type_desc != 'HEAP' ORDER BY A.avg_fragmentation_in_percent DESC |
Once you have identified the level of index fragmentation, you can choose which method to defragment it:
- REORGANIZE: Used when the level of fragmentation is between 5% and 30%. This method does not cause the index to become unavailable, as the index is not deleted, only reorganized.
- REBUILD: Used when the level of fragmentation is greater than 30%. This method by default causes the index to become unavailable because it erases and recreates the index again. In order not to generate unavailability, you can use the ONLINE parameter when executing REBUILD.
Defragmenting the index
Defragmenting a fragmented index (REORGANIZE)
1 2 |
ALTER INDEX SK01 ON dbo.Logins REORGANIZE |
Defragmenting All Table Indexes (REORGANIZE)
1 2 |
ALTER INDEX ALL ON dbo.Logins REORGANIZE |
Defragmenting All Table Indexes (REBUILD)
1 2 |
ALTER INDEX ALL ON dbo.Logins REBUILD |
Defragmenting all indexes in the table (REBUILD - COMPLETE)
1 2 |
ALTER INDEX ALL ON dbo.Logins REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON, STATISTICS_NORECOMPUTE = ON, ONLINE = ON) |
To learn more about Rebuild and Reorganize, visit this link.
DMV's and index catalog views
Here is the list of DMV's and index catalog views that can be used to obtain usage information and index statistics:
DMV's
- sys.dm_db_column_store_row_group_physical_stats
- sys.dm_db_missing_index_groups
- sys.dm_db_index_operational_stats
- sys.dm_db_index_physical_stats
- sys.dm_db_index_usage_stats
- sys.dm_db_missing_index_columns
- sys.dm_db_missing_index_details
- sys.dm_db_missing_index_group_stats
Catalog Views
- sys.index_columns
- sys.indexes
- sys.sysindexes
- sys.sysindexkeys
- sys.xml_indexes
Check Index Usage
With the query below, you can identify the use of indexes in the database by displaying readings with Index Seek (index well used), Index Scan (possible problem in index modeling), Lookups and Updates (number of times the index has been updated with new registrations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SELECT ObjectName = OBJECT_SCHEMA_NAME(idx.object_id) + '.' + OBJECT_NAME(idx.object_id), IndexName = idx.name, IndexType = CASE WHEN is_unique = 1 THEN 'UNIQUE ' ELSE '' END + idx.type_desc, User_Seeks = us.user_seeks, User_Scans = us.user_scans, User_Lookups = us.user_lookups, User_Updates = us.user_updates FROM sys.indexes idx LEFT JOIN sys.dm_db_index_usage_stats us ON idx.object_id = us.object_id AND idx.index_id = us.index_id AND us.database_id = DB_ID() WHERE OBJECT_SCHEMA_NAME(idx.object_id) != 'sys' ORDER BY us.user_seeks + us.user_scans + us.user_lookups DESC |
Helping to identify the best clustered index candidate
With the query below, you can let SQL help us define the best candidate index to be clustered in the table. Analyzing the dm_db_index_usage_stats DMV, the query identifies the non-clustered index that has more Seek readings than clustered (in the query, I set a percentage of 150%) and the number of Seeks is smaller than clustered index Lookups.
It is not recommended to use only the result of this query for this definition. It should serve to indicate a possible improvement in index swapping and the DBA should do the detailed analysis to confirm this indication.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
SELECT TableName = OBJECT_NAME(idx.object_id), NonUsefulClusteredIndex = idx.name, ShouldBeClustered = nc.nonclusteredname, Clustered_User_Seeks = c.user_seeks, NonClustered_User_Seeks = nc.user_seeks, Clustered_User_Lookups = c.user_lookups, DatabaseName = DB_NAME(c.database_id) FROM sys.indexes idx LEFT JOIN sys.dm_db_index_usage_stats c ON idx.object_id = c.object_id AND idx.index_id = c.index_id JOIN ( SELECT idx.object_id, nonclusteredname = idx.name, ius.user_seeks FROM sys.indexes idx JOIN sys.dm_db_index_usage_stats ius ON idx.object_id = ius.object_id AND idx.index_id = ius.index_id WHERE idx.type_desc = 'nonclustered' AND ius.user_seeks = ( SELECT MAX(user_seeks) FROM sys.dm_db_index_usage_stats WHERE object_id = ius.object_id AND type_desc = 'nonclustered' ) GROUP BY idx.object_id, idx.name, ius.user_seeks ) nc ON nc.object_id = idx.object_id WHERE idx.type_desc IN ( 'clustered', 'heap' ) AND nc.user_seeks > ( c.user_seeks * 1.50 ) -- 150% AND nc.user_seeks >= ( c.user_lookups * 0.75 ) -- 75% ORDER BY nc.user_seeks DESC |
Identifying Missing Indexes
One of the daily tasks of a DBA is to identify missing indexes in the database, which may suggest a performance gain for queries that are frequently executed. With the query below, we can make this task a little easier, since by consulting the missing index DMV's, we can quickly identify this data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT mid.statement, migs.avg_total_user_cost * ( migs.avg_user_impact / 100.0 ) * ( migs.user_seeks + migs.user_scans ) AS improvement_measure, OBJECT_NAME(mid.object_id), 'CREATE INDEX [missing_index_' + CONVERT (VARCHAR, mig.index_group_handle) + '_' + CONVERT (VARCHAR, mid.index_handle) + '_' + LEFT(PARSENAME(mid.statement, 1), 32) + ']' + ' ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns, '') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL(mid.inequality_columns, '') + ')' + ISNULL(' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, migs.*, mid.database_id, mid.[object_id] FROM sys.dm_db_missing_index_groups mig INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE migs.avg_total_user_cost * ( migs.avg_user_impact / 100.0 ) * ( migs.user_seeks + migs.user_scans ) > 10 ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * ( migs.user_seeks + migs.user_scans ) DESC |
Analyzing Index Histogram
A very important point to consider after the index is created is to analyze its histogram. Using this feature, you can identify how granular the table data is and how selective our index is, so the more selective the better it will be used.
An example of this is creating an index on the Gender Column in a customer table and distributing the data to 50% for each of the values. In this situation, the index is not being very selective and the bank will have to do many readings to return the information. To look at the number of records for each value, look at the EQ_ROWS column.
To view the index histogram, you can use the DBCC procedure:
1 2 |
-- DBCC SHOW_STATISTICS(Nome_da_Tabela, Nome_do_Indice) DBCC SHOW_STATISTICS(Logins, SK01) |
In the example below, we can see a very selective index case, which has a density of 0,4% and has several distinct information. When this index is used, it will return a very small data volume.
In the example below, we can identify a non-selective index, with a density of 50% (only 2 distinct values), with one of these values ('DBA') representing 98% (232 records out of a total of 236) of the data collected. That is, if this index is used and the query is performed looking for the word 'DBA', the index would need to return 98% of the index data and the SQL optimizer would prefer to perform an Index Scan operation instead of the Index Seek.
That's it folks!
To the next!
Congratulations on the post. Very well explained.
Only spectacular explanation of the DR…
Think of a blog that is helping me get promoted kkkk
Thank you, very helpful.
Thanks for the explanation and knowledge sharing.
Great content, very didactic. Congratulations!
Congratulations,
It was the most complete and didactic content I found on the net
Dirceu, good morning.
Congratulations on the post, very didactic.