How to identify rows added by GROUP BY ROLLUP?

Using rollup in SQL queries allows you to include extra rows that represent the subtotals.
Example:
select category, sum(sales) as `Annual Sales` from order group by category
+------------------+--------------+
| category | Annual Sales |
+------------------+--------------+
| Appliances | 4600.0 |
| Garden & Outdoor | 12000.0 |
| Home & Kitchen | 15000.0 |
+------------------+--------------+
Adding rollup to the query:
select category, sum(sales) as `Annual Sales` from order group by rollup(category)
+------------------+--------------+
| category | Annual Sales |
+------------------+--------------+
| Appliances | 4600.0 |
| Garden & Outdoor | 12000.0 |
| Home & Kitchen | 15000.0 |
| NULL | 31600.0 |
+------------------+--------------+
The NULL value in the category column specifies the grand total row.
It’s easy in that case to identify the extra row corresponding to the grand total, but what if the order table contains rows with no category (null value):
select category, sum(sales) as `Annual Sales` from order group by category
+------------------+--------------+
| category | Annual Sales |
+------------------+--------------+
| Appliances | 4600.0 |
| Garden & Outdoor | 12000.0 |
| Home & Kitchen | 15000.0 |
| NULL | 4600.0 |
+------------------+--------------+
Adding rollup to the query:
select category, sum(sales) as `Annual Sales` from order group by rollup(category)
+------------------+--------------+
| category | Annual Sales |
+------------------+--------------+
| Appliances | 4600.0 |
| Garden & Outdoor | 12000.0 |
| Home & Kitchen | 15000.0 |
| NULL | 4600.0 |
| NULL | 36200.0 |
+------------------+--------------+
We end up with two rows containing NULL values. In that case, we can’t rely on the value of the cell anymore…
Your database supports GROUPING function
GROUPING is used to distinguish the null values that are returned by ROLLUP, CUBE or GROUPING SETS from standard null values. It returns 1 for aggregates computed by ROLLUP, 0 otherwise.
select category, sum(sales) as `Annual Sales`, grouping(category) as grouping from order group by rollup(category)
+------------------+--------------+--------------+
| category | Annual Sales | grouping |
+------------------+--------------+--------------+
| Appliances | 4600.0 | 0 |
| Garden & Outdoor | 12000.0 | 0 |
| Home & Kitchen | 15000.0 | 0 |
| NULL | 4600.0 | 0 |
| NULL | 36200.0 | 1 |
+------------------+--------------+--------------+
Your database DOES NOT supports GROUPING function
Databases such as BigQuery does not support the GROUPING function (https://issuetracker.google.com/issues/205238172).
The trick here is to change the arguments of the select and group by rollup to replace “real” null values by something else. To do that, we can use COALESCE function. By doing so, null values that do not correspond to rollup aggregates will be replaced by the second argument.
select COALESCE(category, "__null__"), sum(sales) as `Annual Sales` from order group by rollup(COALESCE(category, "__null__"))
+------------------+--------------+
| category | Annual Sales |
+------------------+--------------+
| Appliances | 4600.0 |
| Garden & Outdoor | 12000.0 |
| Home & Kitchen | 15000.0 |
| __null__ | 4600.0 |
| NULL | 36200.0 |
+------------------+--------------+
The NULL value in the category column specifies the grand total row. Notice the type of the second argument of the COALESCE function should be compatible with the type of the first argument i.e if the type of the column is a BIGINT, the second should also be a BIGINT… So you might need to update the above statement if your case is different.
Replace NULL values in rollup with Total
To get a friendlier table result, we can replace the NULL values corresponding to total(s) with ‘Total’.
Without grouping function by using one more time the COALESCE function.
select COALESCE(COALESCE(category, "__null__"), 'Total') as category, sum(sales) as `Annual Sales` from order group by rollup(COALESCE(category, "__null__"))
With grouping function, the solution is cleaner. We can use a CASE WHEN statement
select case when grouping(category) = 1 then 'Total' else category end as category, sum(sales) as `Annual Sales` from order group by rollup(category)
+------------------+--------------+
| category | Annual Sales |
+------------------+--------------+
| Appliances | 4600.0 |
| Garden & Outdoor | 12000.0 |
| Home & Kitchen | 15000.0 |
| NULL | 4600.0 |
| Total | 31600.0 |
+------------------+--------------+
The SQL statement is quite different depending on the underlying database being targeted and can even be more complex when adding multiple columns in the SELECT statement to get multiple totals and subtotals. If we introduce subcategories in our data, we can compute Annual Sales by category and subcategory to get more details:
+------------------+------------------+--------------+
| category | subcategory | Annual Sales |
+------------------+------------------+--------------+
| Grand Total | Grand Total | 36200.0 |
| Appliances | Total | 4600.0 |
| Appliances | Large Appliances | 3000.0 |
| Appliances | Small Appliances | 1600.0 |
| Garden & Outdoor | Total | 12000.0 |
| Garden & Outdoor | Garden | 7000.0 |
| Garden & Outdoor | Outdoor | 5000.0 |
| Home & Kitchen | Total | 15000.0 |
| Home & Kitchen | Home | 5000.0 |
| Home & Kitchen | Kitchen | 10000.0 |
| null | Total | 4600.0 |
| null | null | 4600.0 |
+------------------+------------------+--------------+
I have to confess I didn’t write the corresponding SQL query. The result above has been obtained with SquashQL, the open-source SQL query engine specialized in what-if analysis and multi-dimensional queries, using Apache Spark as database. The query executed by SquashQL is written like this:
const query = from("order")
.select(["category", "subcategory"], [], [sum("Annual Sales", "sales")])
.rollup(["category", "subcategory"])
.build()
How it works?
The query described above is sent to SquashQL. SquashQL generates the right SQL query depending on the underlying database. If BigQuery is used for instance, it can’t use the grouping function and therefore uses the workaround mentioned earlier. The SQL is executed by the database and the result is sent back to SquashQL. Once received, SquashQL identifies the rows corresponding to Totals/Sub-Totals and replaces the corresponding cell values with “Grand Total” and “Total” before returning the result to the client.
Note that the same query is compatible with BigQuery, Snowflake or ClickHouse as well.
Do you want to try it? Check out https://github.com/squashql/squashql and follow the tutorial.
Claps, shares and stars ⭐ are very much appreciated!