In this post I will talk a little about the SQL Server data compression method and how we can use this feature to compress all objects in a database. Data compression can be configured for the following database objects:
- For a table that is stored as a heap.
- For a table that is stored as a clustered index.
- For a nonclustered index.
- For an indexed view.
- For partitioned tables and indexes, the compression option can be set for each partition, and multiple object partitions can contain different compression settings.
Remember that compression is not available for system objects and the compression setting is not automatically applied to nonclustered indexes. Therefore, each nonclustered index must be set individually and manually.
ROW COMPRESSION LEVEL
This compression feature takes into account the type of variable data structures that define a column. Row Compression Level is a compression level that uses no compression algorithm.
Row Compression Level's main goal is to reduce fixed-type data storage, that is, when you are allowing Row Level Compression you are just changing the physical storage format of the data that is associated with a data type.
Row Level Compression extends the vardecimal storage format by storing data of all types of fixed length into a variable length storage format. This type of compression will remove any extra bytes in the fixed data type. There are absolutely no changes required in the app.
For example, we have a CHAR (100) column that is using Row Compression Level will only use the amount of storage defined by the data. As well? Let's store the phrase "SQL Server 2008" in the column, the sentence contains only 15 characters and only these 15 characters are stored unlike the 100 that were defined by the column, so you have 85% saving on storage space.
PAGE COMPRESSION LEVEL
In previous versions of SQL Server each value was stored on the page, regardless of whether the same value had already appeared in the same column for some other rows within a page. In SQL Server 2008, the redundant or duplicate value will be stored only once within the page and will be referenced in all other instances, so we have the Page Compression Level.
Basically, Page Compression Level is a superset of ROW compression and takes into account redundant data on one or more lines on a given page. It also uses prefix and dictionary compression.
The Page Compression Level method is more vital because it allows common data to be shared between the lines of a given page.
This type of compression uses the following techniques:
- ROW COMPRESSION: Already seen above.
- PREFIX COMPRESSION: For each column on a page duplicate prefixes are identified. These prefixes are stored in the Information Compression (CI) headers, which reside after the page header. A reference number is assigned to these prefixes and this reference number is used wherever these prefixes are being used, reducing the number of bytes used.
- DICTIONARY COMPRESSION: Searches for duplicate values outside the page and stores them in CI. The main difference between Prefix Compression and Dictionary Compression is that Prefix is restricted to only one column while Dictionary is applicable to the full page.
After Prefix Compression is completed, Dictionary Compression is applied and searches for repeated values anywhere on the page and stores them in the CI area. Unlike Prefix Compression, Dictionary Compression is not restricted to one column and can override repeated values, which occur anywhere on a page.
Estimating Space Gain
In SQL Server 2008 there are two ways to estimate the space savings for storing tables and indexes. The first method is to use a system SP called sp_estimate_data_compression_savings and the second method is to use the Data Compression Wizard.
First let's use the sp_estimate_data_compression_savings where:
1th parameter is the name of the schema;
2th parameter is the name of the object;
3th parameter is the index ID;
4th parameter is Partition ID
5th parameter is the compression type;
Note the size_with_current_compression_setting (KB) and size_with_requested_compression_setting (KB) columns, these columns show the current value and the value after compression. This way we can know how much space we will gain by applying compression.
Now let's use the wizard.
Right click on the table choose Storege -> Manage Compression
The compression wizard screen will open.
In the next watch screen you can test the type of compression and before applying you can calculate the space to see which one got the highest data compression rate.
Following with the compression wizard, I chose the PAGE option, which usually gets the best result, and then asks what to do: Just generate the script, run the compression, or schedule the change to enable compression.
Finally, a message of data compression success will be displayed on the screen.
You can check all your objects and what levels of compression they are by right-clicking on the object, on the Storage tab, the Compression option tells you the type of compression or using the command:
1 2 3 4 5 6 7 8 9 10 | SELECT A.[partition_id], A.[object_id], object_name(A.[object_id]) AS [object_name], data_compression_desc FROM sys.partitions A join sys.objects B on A.[object_id] = B.[object_id] WHERE B.is_ms_shipped = 0 |
Applying Compression to an Entire Database
In this post we already saw how beneficial data compression is, especially for storage. Well, let's put this into practice. I created a procedure that allows you to compress all objects of a database, easily and quickly. The procedure code is available here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | CREATE PROCEDURE [dbo].[stpCompacta_Database] ( @Ds_Database SYSNAME, @Fl_Rodar_Shrink BIT = 1, @Fl_Parar_Se_Falhar BIT = 1, @Fl_Exibe_Comparacao_Tamanho BIT = 1, @Fl_Metodo_Compressao_Page BIT = 1 ) AS BEGIN SET NOCOUNT ON DECLARE @Ds_Query VARCHAR(MAX), @Ds_Comando_Compactacao VARCHAR(MAX), @Ds_Metodo_Compressao VARCHAR(20) = (CASE WHEN @Fl_Metodo_Compressao_Page = 1 THEN 'PAGE' ELSE 'ROW' END), @Nr_Metodo_Compressao VARCHAR(20) = (CASE WHEN @Fl_Metodo_Compressao_Page = 1 THEN 2 ELSE 1 END) IF (OBJECT_ID('tempdb..#Comandos_Compactacao') IS NOT NULL) DROP TABLE #Comandos_Compactacao CREATE TABLE #Comandos_Compactacao ( Id BIGINT IDENTITY(1, 1), Tabela SYSNAME, Indice SYSNAME NULL, Comando VARCHAR(MAX) ) IF (@Fl_Exibe_Comparacao_Tamanho = 1) BEGIN SET @Ds_Query = ' SELECT (SUM(a.total_pages) / 128) AS Vl_Tamanho_Tabelas_Antes_Compactacao FROM [' + @Ds_Database + '].sys.tables t WITH(NOLOCK) INNER JOIN [' + @Ds_Database + '].sys.indexes i WITH(NOLOCK) ON t.OBJECT_ID = i.object_id INNER JOIN [' + @Ds_Database + '].sys.partitions p WITH(NOLOCK) ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id INNER JOIN [' + @Ds_Database + '].sys.allocation_units a WITH(NOLOCK) ON p.partition_id = a.container_id WHERE i.OBJECT_ID > 255 ' EXEC(@Ds_Query) END SET @Ds_Query = 'INSERT INTO #Comandos_Compactacao( Tabela, Indice, Comando ) SELECT DISTINCT A.name AS Tabela, NULL AS Indice, ''ALTER TABLE ['' + ''' + @Ds_Database + ''' + ''].['' + C.name + ''].['' + A.name + ''] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ' + @Ds_Metodo_Compressao + ')'' AS Comando FROM [' + @Ds_Database + '].sys.tables A INNER JOIN [' + @Ds_Database + '].sys.partitions B ON A.object_id = B.object_id INNER JOIN [' + @Ds_Database + '].sys.schemas C ON A.schema_id = C.schema_id WHERE B.data_compression <> ' + @Nr_Metodo_Compressao + ' -- NONE AND B.index_id = 0 AND A.type = ''U'' UNION SELECT DISTINCT B.name AS Tabela, A.name AS Indice, ''ALTER INDEX ['' + A.name + ''] ON ['' + ''' + @Ds_Database + ''' + ''].['' + C.name + ''].['' + B.name + ''] REBUILD PARTITION = ALL WITH ( STATISTICS_NORECOMPUTE = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF, DATA_COMPRESSION = ' + @Ds_Metodo_Compressao + ')'' FROM [' + @Ds_Database + '].sys.indexes A INNER JOIN [' + @Ds_Database + '].sys.tables B ON A.object_id = B.object_id INNER JOIN [' + @Ds_Database + '].sys.schemas C ON B.schema_id = C.schema_id INNER JOIN [' + @Ds_Database + '].sys.partitions D ON A.object_id = D.object_id AND A.index_id = D.index_id WHERE D.data_compression <> ' + @Nr_Metodo_Compressao + ' -- NONE AND D.index_id <> 0 AND B.type = ''U'' ORDER BY Tabela, Indice ' EXEC(@Ds_Query) DECLARE @Qt_Comandos INT = (SELECT COUNT(*) FROM #Comandos_Compactacao), @Contador INT = 1, @Ds_Mensagem VARCHAR(MAX), @Nr_Codigo_Erro INT = (CASE WHEN @Fl_Parar_Se_Falhar = 1 THEN 16 ELSE 10 END) WHILE(@Contador <= @Qt_Comandos) BEGIN SELECT @Ds_Comando_Compactacao = Comando FROM #Comandos_Compactacao WHERE Id = @Contador BEGIN TRY SET @Ds_Mensagem = 'Executando comando "' + @Ds_Comando_Compactacao + '"... Aguarde...' RAISERROR(@Ds_Mensagem, 10, 1) WITH NOWAIT EXEC(@Ds_Comando_Compactacao) END TRY BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_SEVERITY() AS ErrorSeverity, ERROR_STATE() AS ErrorState, ERROR_PROCEDURE() AS ErrorProcedure, ERROR_LINE() AS ErrorLine, ERROR_MESSAGE() AS ErrorMessage; SET @Ds_Mensagem = 'Falha ao executar o comando "' + @Ds_Comando_Compactacao + '"' RAISERROR(@Ds_Mensagem, @Nr_Codigo_Erro, 1) WITH NOWAIT RETURN END CATCH SET @Contador = @Contador + 1 END IF (@Fl_Exibe_Comparacao_Tamanho = 1) BEGIN SET @Ds_Query = ' SELECT (SUM(a.total_pages) / 128) AS Vl_Tamanho_Tabelas_Depois_Compactacao FROM [' + @Ds_Database + '].sys.tables t WITH(NOLOCK) INNER JOIN [' + @Ds_Database + '].sys.indexes i WITH(NOLOCK) ON t.OBJECT_ID = i.object_id INNER JOIN [' + @Ds_Database + '].sys.partitions p WITH(NOLOCK) ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id INNER JOIN [' + @Ds_Database + '].sys.allocation_units a WITH(NOLOCK) ON p.partition_id = a.container_id WHERE i.OBJECT_ID > 255 ' EXEC(@Ds_Query) END IF (@Fl_Rodar_Shrink = 1) BEGIN SET @Ds_Query = ' USE ' + @Ds_Database + ' DBCC SHRINKFILE (' + @Ds_Database + ', 1) WITH NO_INFOMSGS ' EXEC(@Ds_Query) END IF (@Qt_Comandos > 0) PRINT 'Database "' + @Ds_Database + '" compactado com sucesso!' ELSE PRINT 'Nenhum objeto para compactar no database "' + @Ds_Database + '"' END |
Example of use:
1 2 3 4 5 6 | EXEC dbo.stpCompacta_Database @Ds_Database = 'Testes', -- sysname @Fl_Rodar_Shrink = 1, -- bit @Fl_Parar_Se_Falhar = 0, -- bit @Fl_Exibe_Comparacao_Tamanho = 1, -- bit @Fl_Metodo_Compressao_Page = 1 -- bit |
Note: I do not recommend using the @Fl_Rodar_Shrink option in production environments.
hi Dirceu, in the Page compression of the table do the indices suffer from compression too?
Does Page compression also compress the indexes in the table?
Congratulations! I performed the migration from SQL server 2014 to 2016. In it I applied the procedure completely. Bank of ERP Protheus with 400 GB. After 16 hours the result was a 96 GB base. I performed rebuild, reorder and statistics. As it was a holiday yesterday, I tested the base approval. I couldn't resist putting it into production! So far so good!
Excellent post.
Great post! Congratulations!
In the script, you could add a case by evaluating the percentage of scans x updates,
so if more compact scan in page, if more compact update in row?
Dirceu, thanks for sharing.
It really will help me a lot !!!
I have a question about backing up this compressed base. For example, if my base before compression had 500GB and after compression it was for 200GB, will the backup be based on 200GB or will it disregard compression and use 500GB?
Hi Rodrigo, excellent doubt. Compression is always about data, although if you already use compressed backup, the data is probably already being compressed.
You may want to change the command to:
DBCC SHRINKFILE (1, 1) WITH NO_INFOMSGS
This command is not given the name of the bank but file_name or file_id and as Dirceu left it, it only works if file_name equals database name!
It's true, André. Corrected the post.
Thanks!
Dirceu, thanks for the post, great didactic!
Could you ask me a simple question?
If a column has the length varchar (10) where all or part of its characters must be filled with blanks until it reaches its maximum size.
Example:
Column-> Name Varchar (10)
Data present in the column -> 'Kin' = (17 blank fields + 3 characters)
Would compression affect this in any way or is it transparent?
I ask this because the final application uses these spaces to compose its import layout,
it will necessarily search and fetch the 10 characters, even if part of them is blank.
Speak Kin!
Excellent doubt huh .. Worth a post just to explain it 🙂
Hi Dirceu,
Congratulations, I did a test here on a test basis the compression reached 120% until I was scared with the result.
I would just like to ask a question I used the following parameter @Fl_Rodar_Shrink = 1, - bit and my base, which was 74GB, was on SQL Server with 56GB available to clean, but the function alone did not free up space.
I had to go to Task / Shrink / Files and only after that it released.
The question is with the parameter set should not have made space clearance automatically?
Congratulations again for the post.
Paulo, the shrink that actually frees up disk space, but it has a negative effect on performance (fragmenting indexes and data). Therefore it is recommended to always rebuild all indexes after shrink (be careful not to slow down production during rebuild).
Compression frees up space inside the sql server data file, meaning you can store more data with the space already allocated, but to shrink disk space with shrink only