Hello people,
Good Morning!
In this post I will talk about something that is not new to SQL Server, is present since SQL Server 2008, but I do not see many people using in their queries, which is the grouping (summarization) using ROLLUP, CUBE and GROUPING SETS .
This type of feature is especially useful for generating totals and subtotals without having to create multiple subquerys as it allows you to accomplish this task in a single command, as I will demonstrate below.
To create an AutoSum feature in your dataset, as Excel implements, see more in post SQL Server - How to create an AutoSum (same as Excel) using Window functions
Test Base
For the examples in this post, I will provide the base below so we can create a testing and demonstration environment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | IF (OBJECT_ID('tempdb..#Produtos') IS NOT NULL) DROP TABLE #Produtos CREATE TABLE #Produtos ( Codigo INT IDENTITY(1, 1) NOT NULL PRIMARY KEY, Ds_Produto VARCHAR(50) NOT NULL, Ds_Categoria VARCHAR(50) NOT NULL, Preco NUMERIC(18, 2) NOT NULL ) IF (OBJECT_ID('tempdb..#Vendas') IS NOT NULL) DROP TABLE #Vendas CREATE TABLE #Vendas ( Codigo INT IDENTITY(1, 1) NOT NULL PRIMARY KEY, Dt_Venda DATETIME NOT NULL, Cd_Produto INT NOT NULL ) INSERT INTO #Produtos ( Ds_Produto, Ds_Categoria, Preco ) VALUES ( 'Processador i7', 'Informática', 1500.00 ), ( 'Processador i5', 'Informática', 1000.00 ), ( 'Processador i3', 'Informática', 500.00 ), ( 'Placa de Vídeo Nvidia', 'Informática', 2000.00 ), ( 'Placa de Vídeo Radeon', 'Informática', 1500.00 ), ( 'Celular Apple', 'Celulares', 10000.00 ), ( 'Celular Samsung', 'Celulares', 2500.00 ), ( 'Celular Sony', 'Celulares', 4200.00 ), ( 'Celular LG', 'Celulares', 1000.00 ), ( 'Cama', 'Utilidades do Lar', 2000.00 ), ( 'Toalha', 'Utilidades do Lar', 40.00 ), ( 'Lençol', 'Utilidades do Lar', 60.00 ), ( 'Cadeira', 'Utilidades do Lar', 200.00 ), ( 'Mesa', 'Utilidades do Lar', 1000.00 ), ( 'Talheres', 'Utilidades do Lar', 50.00 ) DECLARE @Contador INT = 1, @Total INT = 100 WHILE(@Contador <= @Total) BEGIN INSERT INTO #Vendas ( Cd_Produto, Dt_Venda ) SELECT (SELECT TOP 1 Codigo FROM #Produtos ORDER BY NEWID()) AS Cd_Produto, DATEADD(DAY, (CAST(RAND() * 364 AS INT)), '2017-01-01') AS Dt_Venda SET @Contador += 1 END |
In a simple data grouping, just bringing the quantities filtering by category and product, we can return this view using the query below:
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT B.Ds_Categoria, B.Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY B.Ds_Categoria, B.Ds_Produto ORDER BY 1, 2 |
Summarizing the values
But what if you need to group the data by creating a totalizer for each category and at the end and a general totalizer? How would you do it?
Using UNION ALL
A traditional way of meeting this need is to use subquery with UNION ALL, so that you return 3 resultsets containing the analytical data, the totalizers by category, and then the overall sum. The larger the number of filters, the more complex your subquery will be, as you will need more queries at each level.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | SELECT * FROM ( SELECT B.Ds_Categoria, B.Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY B.Ds_Categoria, B.Ds_Produto UNION ALL SELECT B.Ds_Categoria, 'Subtotal' AS Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY B.Ds_Categoria UNION ALL SELECT 'Total' AS Ds_Categoria, 'Total' AS Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo ) A ORDER BY (CASE WHEN A.Ds_Categoria = 'Total' THEN 1 ELSE 0 END), A.Ds_Categoria, (CASE WHEN A.Ds_Produto = 'Subtotal' THEN 1 ELSE 0 END), A.Ds_Produto |
Using GROUP BY ROLLUP
A very simple and practical way to solve this problem is by using the ROLLUP () function in GROUP BY, which already creates groupings and summaries according to the grouped columns in the function.
Using this function, you will see that it creates the totalizers just below each grouping and the general totalizer on the last line of the resultset.
Example 1:
1 2 3 4 5 6 7 8 9 10 | SELECT B.Ds_Categoria, B.Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY ROLLUP(B.Ds_Categoria, B.Ds_Produto) |
As you may have noticed, rows containing summaries have the value NULL in the grouped columns. In this example, the subtotal by category has the Ds_Product column with a value of NULL, and in the grand total, both columns have a value of NULL. For our query to produce the same result as another query with UNION ALL, we just need to treat these NULL values, as shown below:
1 2 3 4 5 6 7 8 9 10 | SELECT ISNULL(B.Ds_Categoria, 'Total') AS Ds_Categoria, ISNULL(B.Ds_Produto, 'Subtotal') AS Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY ROLLUP(B.Ds_Categoria, B.Ds_Produto) |
Example 2:
1 2 3 4 5 6 7 8 9 10 | SELECT ISNULL(CONVERT(VARCHAR(10), MONTH(A.Dt_Venda)), 'Total') AS Mes_Venda, ISNULL(B.Ds_Categoria, 'Subtotal') AS Ds_Categoria, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY ROLLUP(MONTH(A.Dt_Venda), B.Ds_Categoria) |
Using GROUP BY CUBE
Like the ROLLUP () function, the CUBE () function allows you to create grouped totalizers in a very practical and easy way and its syntax is the same as that of the ROLLUP function.
Using this function, you will see that its difference from ROLLUP is that the CUBE function creates the totalizers for each grouping type.
Example:
1 2 3 4 5 6 7 8 9 10 | SELECT ISNULL(CONVERT(VARCHAR(10), MONTH(A.Dt_Venda)), 'Total') AS Mes_Venda, ISNULL(B.Ds_Categoria, 'Subtotal') AS Ds_Categoria, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY CUBE(MONTH(A.Dt_Venda), B.Ds_Categoria) |
Note that by using the CUBE () function, I can get the general totalizer, the Mes_Sale column totalizer, and the Category column totalizer. That is, in the CUBE function, it generates all possibilities of combining between the columns used in the function.
Using GROUP BY GROUPING SETS
Using the GROUPING SETS function in GROUP BY enables us to generate totalizers of our data using the columns inserted in the function, so that it generates different totalizers in a single query. Unlike the ROLLUP and CUBE functions, the GROUPING SETS function does not return a grand total.
Example 1:
1 2 3 4 5 6 7 8 9 10 | SELECT ISNULL(B.Ds_Produto, 'Total') AS Ds_Produto, ISNULL(B.Ds_Categoria, 'Subtotal') AS Ds_Categoria, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY GROUPING SETS(B.Ds_Categoria, B.Ds_Produto) |
In this example, the GROUPING SETS function returns us the totalizer by product and then the totalizer by category, where you would need to execute 2 queries to show this same view.
Example 2:
1 2 3 4 5 6 7 8 9 10 11 | SELECT MONTH(A.Dt_Venda) AS Mes_Venda, B.Ds_Categoria, B.Ds_Produto, COUNT(*) AS Qt_Vendas, SUM(B.Preco) AS Vl_Total FROM #Vendas A JOIN #Produtos B ON A.Cd_Produto = B.Codigo GROUP BY GROUPING SETS(MONTH(A.Dt_Venda), B.Ds_Categoria, B.Ds_Produto) |
In this example, the GROUPING SETS function returned the sum of the values grouped by the 3 columns I used in the function: Sum by Month_Sale, Sum by Product, and Sum by Category. To reproduce this same result, I would need to create a query with 3 queries. If they were 10 columns, the query would need 10 different queries to bring about this result, which we achieved with a single query using the GROUPING SETS function.
Note: All 3 functions presented in this post accept N columns as parameters, not being limited to only 2 columns.
Performance and Performance
Now that we've seen how 3 functions work and how practical they are, saving many lines of code, let's look at whether they are performative or not.
I will try to use the same query for all cases, although in some cases the output (output) is different because it really has different goals, but only to demonstrate the cost and execution plan of each one.
UNION ALL (Example Query):
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(19 row (s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 2, logical reads 204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sales'. Scan count 3, logical reads 64, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row (s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 79 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
GROUP BY ROLLUP (Query from 1 Example)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(19 row (s) affected)
Table '#Sales'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row (s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 57 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
GROUP BY CUBE (Query from 1 Example)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
(49 row (s) affected)
Table 'Worktable'. Scan count 2, logical reads 71, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 0, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sales'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row (s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 210 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
GROUP BY GROUPING SETS (example query 1)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 5 ms.
(18 row (s) affected)
Table 'Worktable'. Scan count 2, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sales'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row (s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 58 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Running 4 Queries Together:
SQL Sentry Plan Explorer Execution Plan
SQL Server Management Studio Execution Plan
That is, with these tests and using this mass of data, we can say that queries performed with these functions are more performative than using subquery with multiple queries to return the grouped data. This is not to say that this statement will always be true, it will depend greatly on your environment and your mass of data.
I hope you enjoyed this post.
A hug!
Excellent explanation. With the examples, the images and the step by step, it makes understanding a lot easier! Thank you teacher.
Great content.
Very good the post. I didn't know the command. I will use it.
Great post Dirceu! I've always used UNION ALL to do these summaries because I don't know about these commands. I will put it into practice from now on.
Hug.
Luiz Vitor
Thanks for the return, Luiz Vitor. I've also used a lot Union all to do this .. Lol