SQL Server - How to identify and compress tables and indexes without data compression

Hey guys!
In this post I will share with you a very practical script to identify tables and indexes without data compression and apply page-type data compression to the returned tables.

Using the query that I will share in this article, we will identify the tables and indexes in RowStore mode that are not using data compression

It is worth remembering that until version 2016, this data compression feature was exclusive to the Enterprise version (in addition to Trial and Developer), so if your version is lower than 2016 and in the Standard edition, you will not be able to use the data compression that I describe in this article.

Why use data compression

Data compression offers several benefits. It saves disk space and can help improve performance for certain workloads.

The benefits of data compression come at the cost of increased CPU usage to compress and decompress data, so it is important to understand the characteristics of the workload on a table before deciding on a compression strategy.

Data compression provides flexibility in terms of compression levels (row or page) and the objects you can compress (table, index, partition). This allows you to tune compression based on data and workload characteristics.

Another important advantage of data compression is that it works transparently to applications, and works well with other SQL Server features such as TDE and backup compression.

Compressed pages are kept as compressed on disk and remain compressed when read into memory. Data is decompressed (not the entire page, but just the data values of interest) when it meets one of the following conditions:

It is read to filter, sort, join, as part of a query response.
It is updated by an application.

There is no uncompressed in-memory copy of the compressed page. Data decompression consumes CPU.

However, because compressed data uses fewer pages of data, it also saves:

Physical I/O: Because physical I/O is expensive from a workload perspective, reduced physical I/O generally results in greater savings than the additional CPU cost of compressing and decompressing the data. Note that physical I/O is saved because less data is read from or written to disk and because more data can remain cached in buffer pool memory.
Logical I/O (if the data is in memory): Because logical I/O consumes CPU, reduced logical I/O can sometimes offset the CPU cost of compressing and decompressing the data.

To better understand what data compression is, I suggest you read the article Compressing all tables in a database in SQL Server.

What is Row Compression Level

This compression feature takes into account the type of variable data structures that define a column. Row Compression Level is a compression level that does not use any compression algorithm.

The main objective of Row Compression Level is to reduce the storage of fixed type data, that is, when you are enabling Row compression, you are only changing the physical storage format of data that is associated with a data type.

Row Level Compression extends the vardecimal storage format (integer, decimal, float, datetime, money, etc.) by storing data of all fixed-length types in a variable-length storage format. This type of compression will remove any extra bytes in the fixed data type.

NULL and 0 values, in all data types, are optimized to not take up any bytes.

For example, we have a CHAR(100) column, which is using the Row Compression Level. This column will only use the amount of storage defined by the data. Like this? Let's store the phrase “SQL Server 2008” in the column. This sentence contains only 15 characters and only these 15 characters are stored, as opposed to the 100 that were defined by the column, so you have an 85% saving in storage space.

What is Page Compression Level

In previous versions of SQL Server, each value was stored on the page, regardless of whether the same value had already appeared in the same column for some other rows on a page. In SQL Server 2008, the redundant or duplicate value will be stored only once within the page and will be referenced in all other occurrences, thus we have the Page Compression Level.

Basically, the Page Compression Level is a superset of ROW compression, and takes into account redundant data in one or more rows on a given page. It also uses prefix and dictionary compression.

The Page Compression Level method is smarter because it allows common data to be shared between rows on a given page.
This type of compression uses the following techniques:

ROW COMPRESSION: Already seen above.
PREFIX COMPRESSION: For each column on a page duplicate prefixes are identified. These prefixes are stored in the Information Compression (CI) headers, which reside after the page header. A reference number is assigned to these prefixes and this reference number is used wherever these prefixes are being used, decreasing the amount of bytes used.
DICTIONARY COMPRESSION: Searches for duplicate values outside the page and stores them in the CI. The main difference between Prefix Compression and Dictionary Compression is that Prefix is restricted to just one column whereas Dictionary is applicable for the entire page.
After Prefix Compression has completed, Dictionary Compression is applied and searches for repeated values anywhere on the page and stores them in the CI area. Unlike Prefix Compression, Dictionary Compression is not restricted to a column and can replace repeated values that occur anywhere on a page.

Script to identify tables and indexes without data compression

To identify tables and indexes without data compression and apply data compression, you can use the script below:

SELECT DISTINCT 
    C.[name] AS [Schema],
    A.[name] AS Tabela,
    NULL AS Indice,
    'ALTER TABLE [' + C.[name] + '].[' + A.[name] + '] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)' AS Comando
FROM 
    sys.tables                   A
    INNER JOIN sys.partitions    B   ON A.[object_id] = B.[object_id]
    INNER JOIN sys.schemas       C   ON A.[schema_id] = C.[schema_id]
WHERE 
    B.data_compression_desc = 'NONE'
    AND B.index_id = 0 -- HEAP
    AND A.[type] = 'U'
    
UNION
 
SELECT DISTINCT 
    C.[name] AS [Schema],
    B.[name] AS Tabela,
    A.[name] AS Indice,
    'ALTER INDEX [' + A.[name] + '] ON [' + C.[name] + '].[' + B.[name] + '] REBUILD PARTITION = ALL WITH ( STATISTICS_NORECOMPUTE = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF, DATA_COMPRESSION = PAGE)'
FROM 
    sys.indexes                  A
    INNER JOIN sys.tables        B   ON A.[object_id] = B.[object_id]
    INNER JOIN sys.schemas       C   ON B.[schema_id] = C.[schema_id]
    INNER JOIN sys.partitions    D   ON A.[object_id] = D.[object_id] AND A.index_id = D.index_id
WHERE
    D.data_compression_desc =  'NONE'
    AND D.index_id <> 0
    AND A.[type] IN (1, 2) -- CLUSTERED e NONCLUSTERED (Rowstore)
    AND B.[type] = 'U'
ORDER BY
    Tabela,
    Indice

Result:

Page or Row data compression?

Microsoft documentation (link here) gives us some tips to try to identify whether the index should be created as page compression or row compression.

Run the script below:

SELECT
    [o].[name] AS [Table_Name],
    [x].[name] AS [Index_Name],
    [i].[partition_number] AS [Partition],
    [i].[index_id] AS [Index_ID],
    [x].[type_desc] AS [Index_Type],
    [i].[leaf_update_count] * 100.0 / ([i].[range_scan_count] + [i].[leaf_insert_count] + [i].[leaf_delete_count] + [i].[leaf_update_count] + [i].[leaf_page_merge_count] + [i].[singleton_lookup_count]) AS [Percent_Update],
    [i].[range_scan_count] * 100.0 / ([i].range_scan_count + [i].leaf_insert_count + [i].leaf_delete_count + [i].leaf_update_count + [i].leaf_page_merge_count + [i].singleton_lookup_count) AS [Percent_Scan]
FROM
    [sys].dm_db_index_operational_stats(DB_ID(), NULL, NULL, NULL) AS [i]
    JOIN [sys].[objects] AS [o] ON [o].[object_id] = [i].[object_id]
    JOIN [sys].[indexes] AS [x] ON [x].[object_id] = [i].[object_id] AND [x].[index_id] = [i].[index_id]
WHERE
    ([i].[range_scan_count] + [i].[leaf_insert_count] + [i].[leaf_delete_count] + [i].[leaf_update_count] + [i].[leaf_page_merge_count] + [i].[singleton_lookup_count]) <> 0
    AND OBJECTPROPERTY([i].[object_id], 'IsUserTable') = 1
ORDER BY
    [Percent_Update] ASC;

The smaller the value of the “Percent_Update” column or the larger the value of the “Percent_Scan” column, the better candidate this index is for using page compression.

Some information to help you decide:

Page compression looks for repeating patterns, as we saw in how it works. So if your data doesn't have repeating patterns, you won't get much extra benefit from page compression. You'll see extra CPU utilization, but you probably won't see much of a performance gain to make it worth it.
The CPU overhead of row compression is generally minimal (usually less than or equal to 10%). If row compression results in space savings and the system can accommodate a 10% increase in CPU usage, all data must be row compressed.
On average, row compression takes 1.5 times the CPU time used to rebuild an index, while page compression takes 2 to 5 times the CPU time used to rebuild an index.
Examples of tables that are good candidates for page compression are log or audit tables, which are written once and rarely read.
Use page compression when the goal is to reduce disk space, as page compression will generally compress more than line compression
To obtain the maximum compression available, use COLUMNSTORE_ARCHIVE compression, but it does not have good read performance.

That's it, folks!
I hope you liked this tip and see you next time!