SQL Server - Grouping Data Using ROLLUP, CUBE, and GROUPING SETS

Views: 6.233
Reading Time: 9 minutes

Hello people,
Good Morning!

In this post I will talk about something that is not new to SQL Server, is present since SQL Server 2008, but I do not see many people using in their queries, which is the grouping (summarization) using ROLLUP, CUBE and GROUPING SETS .

This type of feature is especially useful for generating totals and subtotals without having to create multiple subquerys as it allows you to accomplish this task in a single command, as I will demonstrate below.

To create an AutoSum feature in your dataset, as Excel implements, see more in post SQL Server - How to create an AutoSum (same as Excel) using Window functions

Test Base

For the examples in this post, I will provide the base below so we can create a testing and demonstration environment.

In a simple data grouping, just bringing the quantities filtering by category and product, we can return this view using the query below:

Summarizing the values

But what if you need to group the data by creating a totalizer for each category and at the end and a general totalizer? How would you do it?

Using UNION ALL

A traditional way of meeting this need is to use subquery with UNION ALL, so that you return 3 resultsets containing the analytical data, the totalizers by category, and then the overall sum. The larger the number of filters, the more complex your subquery will be, as you will need more queries at each level.

Using GROUP BY ROLLUP

A very simple and practical way to solve this problem is by using the ROLLUP () function in GROUP BY, which already creates groupings and summaries according to the grouped columns in the function.

Using this function, you will see that it creates the totalizers just below each grouping and the general totalizer on the last line of the resultset.

Example 1:

As you may have noticed, rows containing summaries have the value NULL in the grouped columns. In this example, the subtotal by category has the Ds_Product column with a value of NULL, and in the grand total, both columns have a value of NULL. For our query to produce the same result as another query with UNION ALL, we just need to treat these NULL values, as shown below:

Example 2:

Using GROUP BY CUBE

Like the ROLLUP () function, the CUBE () function allows you to create grouped totalizers in a very practical and easy way and its syntax is the same as that of the ROLLUP function.

Using this function, you will see that its difference from ROLLUP is that the CUBE function creates the totalizers for each grouping type.

Example:

Note that by using the CUBE () function, I can get the general totalizer, the Mes_Sale column totalizer, and the Category column totalizer. That is, in the CUBE function, it generates all possibilities of combining between the columns used in the function.

Using GROUP BY GROUPING SETS

Using the GROUPING SETS function in GROUP BY enables us to generate totalizers of our data using the columns inserted in the function, so that it generates different totalizers in a single query. Unlike the ROLLUP and CUBE functions, the GROUPING SETS function does not return a grand total.

Example 1:

In this example, the GROUPING SETS function returns us the totalizer by product and then the totalizer by category, where you would need to execute 2 queries to show this same view.

Example 2:

In this example, the GROUPING SETS function returned the sum of the values ​​grouped by the 3 columns I used in the function: Sum by Month_Sale, Sum by Product, and Sum by Category. To reproduce this same result, I would need to create a query with 3 queries. If they were 10 columns, the query would need 10 different queries to bring about this result, which we achieved with a single query using the GROUPING SETS function.

Note: All 3 functions presented in this post accept N columns as parameters, not being limited to only 2 columns.

Performance and Performance

Now that we've seen how 3 functions work and how practical they are, saving many lines of code, let's look at whether they are performative or not.

I will try to use the same query for all cases, although in some cases the output (output) is different because it really has different goals, but only to demonstrate the cost and execution plan of each one.

UNION ALL (Example Query):
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(19 row (s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 2, logical reads 204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sales'. Scan count 3, logical reads 64, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row (s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 79 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

GROUP BY ROLLUP (Query from 1 Example)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

(19 row (s) affected)
Table '#Sales'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row (s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 57 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

GROUP BY CUBE (Query from 1 Example)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.

(49 row (s) affected)
Table 'Worktable'. Scan count 2, logical reads 71, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 0, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sales'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row (s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 210 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

GROUP BY GROUPING SETS (example query 1)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 5 ms.

(18 row (s) affected)
Table 'Worktable'. Scan count 2, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sales'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Produtos'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row (s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 58 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Running 4 Queries Together:
SQL Sentry Plan Explorer Execution Plan

SQL Server Management Studio Execution Plan

That is, with these tests and using this mass of data, we can say that queries performed with these functions are more performative than using subquery with multiple queries to return the grouped data. This is not to say that this statement will always be true, it will depend greatly on your environment and your mass of data.

I hope you enjoyed this post.
A hug!