Just SQL PartVIII
January 19, 2006
Oracle Grouping with the ROLLUP Operation
Grouping and summing at multiple levels often takes an application to perform. Take a look at the ROLLUP operation within Oracle and perform these operations in a single SQL statement.
Grouping and in particular gathering aggregates across groups often brings confusion to many practitioners. This does not need to be the case if approached from a systematic fashion. This article will approach gathering aggregates from a simple GROUP BY operation and then extend into Oracle's higher level grouping operations. In particular, the ROLLUP operation, which allows us to group and aggregate at different levels in a collection of similar rows. By similar I mean that two or more rows must have a common column or set of columns so that we can group them together for performing some form of aggregate function. In this article, I will use the SUM function for aggregating.
Once again, in this article, we revisit the DOG_ORIGIN table. This time however, I have added a new column entitled 'POPULATION' that holds a fictitious value for the number of a particular breed in their country of origin. These are just made up numbers so please do not hold me to them. To introduce this table again, Table 1. has the values as I have entered them.
Early practitioners of SQL often are only introduced to answering questions like 'How many dogs are in America?' or 'How many dogs are there in the world?'. These two answers can quickly be answered by the following SQL. These two SQL statements simply investigate each row in the DOG_ORIGIN table and add together the population. In the case where country = 'USA' only the dogs in the USA are added together. For the second SQL statement, every row is added together, to give us the grand total of all dogs in the world.
SQL > SELECT SUM(population) FROM dog_origin WHERE country = 'USA'; SUM(POPULATION) --------------- 16000 SQL > SELECT SUM(population) FROM dog_origin; SUM(POPULATION) --------------- 64000
Over time we soon learn that in order to gather aggregate values for each of the groupings we are interested in, in this case the country column, we can use the GROUP BY operation. This allows us to process and aggregate all grouping of a column without having to issue a separate SQL statement for each grouping we are interested in. Most of us have already used this in the past but it is introduced here because the ROLLUP operation is an extension of the GROUP BY statement. In TABLE 2., the simple question and answer of 'How many dogs are there within each country?' has been given. This can quickly be accomplished by using the simple GROUP BY operation on the country column and then adding the SUM function around the population column. This SQL statement groups all the rows within each country and 'rolls' that value up into a single result row that has the accumulated population for that country.
Simple GROUP BY operation SQL> select country,sum(population) from dog_origin group by country; COUNTRY SUM(POPULATION) --------------- --------------- Australia 13000 <- group by country Germany 29000 <- group by country Switzerland 6000 <- group by country USA 16000 <- group by country
Since the ROLLUP operation is just an extension of the GROUP BY operation we should be able to compose a statement that uses the ROLLUP operation and get an answer that resembles our previous simple GROUP BY operation. To understand how we might be able to do this we must first look at the ROLLUP option and its' syntax.
The ROLLUP operation works on a set of columns you want to group. Just like the GROUP BY operation but aggregates a summary row for each group of columns supplied in its clause. From the most detailed to a grand total and has the following basic syntax format of:
GROUP BY ROLLUP ([columns of interest separated by commas])
The best way to explain how this will work is to supply a simple example. If you had 4 columns of interest in you ROLLUP operation say (c1, c2, c3, c4) Oracle will actually produce 5 different groupings (one more than the number of columns you provide). You can almost think of this as Oracle performing 5 different GROUP BY operations on your behalf.
Those grouping would be:
(c1, c2, c3, c4) (c1, c2, c3) (c1, c2) (c1) (t1) where t1 is a GROUP BY across all rows
So if we translate the previous simple GROUP BY operation into a ROLLUP operation the statement becomes one like that in TABLE 3. As you can see we have basically the same result set but the totals were generated by the ROLLUP operation instead of the GROUP BY. This is noted by the 'rollup by country' notation I have added to the result set. The ROLLUP operation also provides a grand total line, which gives us the total dog population across all countries. If we relate this to the column example above we have told Oracle to perform a GROUP BY on (c1)country and (t1)grand total.
SQL> select country,sum(population) from dog_origin group by rollup (country); COUNTRY SUM(POPULATION) ----------- --------------- Australia 13000 <- rollup by country Germany 29000 <- rollup by country Switzerland 6000 <- rollup by country USA 16000 <- rollup by country 64000 <- grand total given
To go to the next level and show the real power of the ROLLUP operation we need to introduce at least one more column in the grouping. To do this we will add in the BREED_SIZE. The question now becomes 'What is the population of dogs given the different breed sizes?'. This is now a very simple SQL statement where all we need to do is add BREED_SIZE into the ROLLUP operation. Table 4. shows the SQL statement and the result set. Oracle now performs GROUP BY operations on (c1,c2)country & breed size, (c1)country, and (t1)grand total. I have added a rollup comment on the end of each row for clarification. So now, we can see the population by dog size in each country as well as the total number of dogs in each country and in the world.
SQL> select country,breed_size,sum(population) from dog_origin group by rollup (country,breed_size); COUNTRY BREED_SIZE SUM(POPULATION) --------------- ---------- --------------- Australia Medium 6000 <- rollup by country & breed_size Australia Small 7000 <- rollup by country & breed_size Australia 13000 <- rollup by country Germany Big 29000 <- rollup by country & breed_size Germany 29000 <- rollup by country Switzerland Big 4000 <- rollup by country & breed_size Switzerland Medium 2000 <- rollup by country & breed_size Switzerland 6000 <- rollup by country USA Big 8000 <- rollup by country & breed_size USA Medium 8000 <- rollup by country & breed_size USA 16000 <- rollup by country 64000 <- grand total line
Another hybrid of the GROUP BY..ROLLUP operation is the ability to do partial rollups. The GROUP BY...ROLLUP clause actually has the following syntax. You can move columns between the GROUP BY list and the ROLLUP list to produce different, or partial, aggregated values.
GROUP BY (column[s]) ROLLUP (column[s])
So if we extend our previous column example where we used (c1,c2,c3,c4) we actually have at our disposal the following options when developing our SQL statement. Notice that as we move columns from the ROLLUP list, in the end, all we have left is a simple GROUP BY operation.
Those grouping would be:
group by rollup (c1,c2,c3,c4) - full rollup option group by c1,rollup (c2,c3,c4) - removes grand total line (t1) group by c1,c2,rollup (c3,c4) - removes (t1) and group by (c1) group by c1,c2,c3,rollup (c4) - removes (t1) and group by (c1) & (c1,c2) group by (c1,c2,c3,c4) - removes (t1) and group by (c1),(c1,c2),&(c1,c2,c3)
For instance if we wanted to remove the grand total line from our dog origin example and only wanted to show detail for the dog population by country and breed size we would have the following SQL in Table 5.
SQL > select country,breed_size,sum(population) from dog_origin group by country, rollup (breed_size); COUNTRY BREED_SIZE SUM(POPULATION) --------------- ---------- --------------- Australia Medium 6000 Australia Small 7000 Australia 13000 Germany Big 29000 Germany 29000 Switzerland Big 4000 Switzerland Medium 2000 Switzerland 6000 USA Big 8000 USA Medium 8000 USA 16000
Moreover, if we wanted to get rid of the total line for the dog population in each of the countries we would have the SQL in Table 6. Here we now just have a simple GROUP BY operation and have in essence wrapped around to the beginning of this article.
SQL > select country,breed_size,sum(population) from dog_origin group by country,breed_size; COUNTRY BREED_SIZE SUM(POPULATION) --------------- ---------- --------------- Australia Medium 6000 Australia Small 7000 Germany Big 29000 Switzerland Big 4000 Switzerland Medium 2000 USA Big 8000 USA Medium 8000
The ROLLUP extension to the GROUP by operation is very powerful in producing various levels of aggregated detail. We can move columns between the GROUP BY list and the ROLLUP list to fine-tune those aggregated rows we wish to see.