SQL for MySQL Developers: A Comprehensive Tutorial and Reference

This excerpt from “SQL for MySQL Developers: A Comprehensive Tutorial and Reference” discusses adding aggregation functions to a select block with the use of a GROUP By clause.


SQL for MySQL Developers: A Comprehensive Tutorial and Reference
By Rick F. van der Lans
Published by Addison Wesley Professional
ISBN-10: 0-13-149735-9
Published: Apr 20, 2007
Dimensions 7×9-1/4
Pages: 1032
Buy this book


Chapter 10: SELECT Statement: The GROUP BY Clause


10.1 Introduction


The GROUP BY clause groups rows on the basis of similarities among them. For example, we could group all the rows in the PLAYERS table on the basis of the place of residence. The result would be one group of players per town. From there, we could query the number of players in each group. The final result answers the question, how many players live in each town? Other examples are: How many matches has each team played, and how much has each player incurred in penalties? In short, the GROUP BY clause is frequently used to formulate questions based on the word per.


By adding aggregation functions, such as COUNT and SUM, to a select block with the use of a GROUP BY clause, data can be aggregated. These functions owe their name to this. Aggregation means that we ask for summations, averages, frequencies, and subtotals instead of individual values.


Definition

<group by clause> ::=
GROUP BY <group by specification list> [ WITH ROLLUP ]
<group by specification list> ::=
<group by specification> [ , <group by specification> ]…
<group by specification> ::=
<group by expression> [ <sort direction> ]
<group by expression> ::= <scalar expression>
<sort direction> ::= ASC | DESC

10.2 Grouping on One Column


The simplest form of the GROUP BY clause is the one in which only one column is grouped. Previous chapters gave several examples of statements with such a GROUP BY clause. For the sake of clarity, we show several other examples in this section.


Example 10.1: Get all the different town names from the PLAYERS table.

SELECT   TOWN
FROM PLAYERS
GROUP BY TOWN

The intermediate result from the GROUP BY clause could look similar to this:

TOWN       PLAYERNO                    NAME
——— ————————– ———————-
Stratford {6, 83, 2, 7, 57, 39, 100} {Parmenter, Hope, …}
Midhurst {28} {Collins}
Inglewood {44, 8} {Baker, Newcastle}
Plymouth {112} {Bailey}
Douglas {95} {Miller}
Eltham {27, 104} {Collins, Moorman}

Explanation: All rows with the same TOWN form one group. Each row in the intermediate result has one value in the TOWN column, whereas all other columns can contain multiple values. To indicate that these columns are special, the values are placed between brackets. We show those columns in this way for illustrative purposes only; MySQL probably would solve this internally in a different way. Furthermore, these two columns cannot be presented like this because a column that is not grouped is completely omitted from the end result. We return to this topic later in the chapter.


The end result of the statement is:

TOWN
———
Stratford
Midhurst
Inglewood
Plymouth
Douglas
Eltham

A frequently used term in this particular context is grouping. The GROUP BY clause in the previous statement has one grouping, which consists of only one column—the TOWN column. In this chapter, we sometimes represent this as follows: The result is grouped by [TOWN]. Later in this chapter, we give examples of groupings with multiple columns and GROUP BY clauses consisting of multiple groupings.


The earlier question could be solved more easily by leaving out the GROUP BY clause and adding DISTINCT to the SELECT clause instead (work this out by yourself). Using the GROUP BY clause becomes interesting when we extend the SELECT clause with aggregation functions.


Example 10.2: For each town, find the number of players.

SELECT  TOWN, COUNT(*)
FROM PLAYERS
GROUP BY TOWN

The result is:

TOWN    COUNT(*)
——— ——–
Stratford 7
Midhurst 1
Inglewood 2
Plymouth 1
Douglas 1
Eltham 2

Explanation: In this statement, the result is grouped by [TOWN]. The COUNT(*) function is now executed against each grouped row (for each town) instead of against all rows.


In this result, the data is clearly aggregated. The individual data of players cannot be displayed anymore, and the data is aggregated by TOWN. The aggregation level of this result is TOWN.


Example 10.3: For each team, get the team number, the number of matches that has been played for that team, and the total number of sets won.

SELECT   TEAMNO, COUNT(*), SUM(WON)
FROM MATCHES
GROUP BY TEAMNO

The result is:

TEAMNO  COUNT(*)  SUM(WON)
—— ——– ——–
1 8 15
2 5 9

Explanation: This statement contains one grouping, consisting of the TEAMNO column.


Example 10.4: For each team that is captained by a player resident in Eltham, get the team number and number of matches that has been played for that team.

SELECT   TEAMNO, COUNT(*)
FROM MATCHES
WHERE TEAMNO IN
(SELECT TEAMNO
FROM TEAMS INNER JOIN PLAYERS
ON TEAMS.PLAYERNO = PLAYERS.PLAYERNO
WHERE TOWN = ‘Eltham’)
GROUP BY TEAMNO

The result is:

TEAMNO  COUNT(*)
—— ——–
2 5

The column on which the result has been grouped might also appear in the SELECT clause as a parameter within an aggregation function. This does not happen often, but it is allowed.


Example 10.5: Get each different penalty amount, followed by the number of times that the amount occurs in the PENALTIES table, and also show the result of that amount multiplied by the number.

SELECT   AMOUNT, COUNT(*), SUM(AMOUNT)
FROM PENALTIES
GROUP BY AMOUNT

The PENALTIES table is grouped on the AMOUNT column first. The intermediate result could be presented as follows:

PAYMENTNO  PLAYERNO  PAYMENT_DATE              AMOUNT
——— ——– ———————— ——
{5, 6} {44, 8} {1980-12-08, 1980-12-08} 25.00
{7} {44} {1982-12-30} 30.00
{4} {104} {1984-12-08} 50.00
{2, 8} {44, 27} {1981-05-05, 1984-11-12} 75.00
{1, 3} {6, 27} {1980-12-08, 1983-09-10} 100.00

Again, the values of the columns that are not grouped are placed between brackets, and the AMOUNT column shows only one value. However, that is not entirely correct. Behind the scenes, MySQL also creates a group for this column. So the intermediate result should, in fact, be presented as follows:

PAYMENTNO  PLAYERNO  PAYMENT_DATE              AMOUNT
——— ——– ———————— —————-
{5, 6} {44, 8} {1980-12-08, 1980-12-08} {25.00, 25.00}
{7} {44} {1982-12-30} {30.00}
{4} {104} {1984-12-08} {50.00}
{2, 8} {44, 27} {1981-05-05, 1984-11-12} {75.00, 75.00}
{1, 3} {6, 27} {1980-12-08, 1983-09-10} {100.00, 100.00}

The values in the AMOUNT column are also represented as a group now. Of course, only equal values appear in each group. And because it is a group, aggregation functions can be used.


The result is:

AMOUNT  COUNT(*)  SUM(AMOUNT)
—— ——– ———–
25.00 2 50.00
30.00 1 30.00
50.00 1 50.00
75.00 2 150.00
100.00 2 200.00

However, this book does not present the values of the grouped columns between brackets.


Exercise 10.1: Show the different years in which players joined the club; use the PLAYERS table.


Exercise 10.2: For each year, show the number of players who joined the club.


Exercise 10.3: For each player who has incurred at least one penalty, give the player number, average penalty amount, and number of penalties.


Exercise 10.4: For each team that has played in the first division, give the team number, number of matches, and total number of sets won.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles