A cross tabulation query, or cross tab for short, is used to display
the calculation of a sum, average, count, or other type of total on two or
more fields. The results are then presented in a table format in which one field
is displayed down the left side of a matrix (row headings) and the other across
the top (column headings).
Funnily enough, most relational databases don’t provide an easy way to
create crosstabs, the exception being good ole Microsoft Access. It features
its own SQL language constructs for creating crosstabs such as TRANSFORM
and PIVOT. Today, I’m going to show you one of several ways to create
crosstabs using regular SQL. Although I will be working in MySQL, the
techniques discussed here could be applied to just about any database.
A Basic Crosstab
One of my many responsibilities of my job is to construct canned
reports for our clients. For each report, there is a stored procedure that
fetches the required data from the database. Every so often I get a request
for a report that necessitates generating a crosstab query. I recognize the
need for a crosstab when I hear requirements such as “I want to display this
field by that field”. Most often, it’s a grouping of a field by dates,
although other combinations come up from time to time. Here is what one of
those crosstab reports might look like:
Cases by Date and |
|||||
Month |
01 |
02 |
03 |
04 |
05 |
April |
13 |
33 |
76 |
2 |
47 |
May |
17 |
55 |
209 |
1 |
143 |
June |
8 |
63 |
221 |
1 |
127 |
July |
13 |
104 |
240 |
6 |
123 |
August |
18 |
121 |
274 |
9 |
111 |
September |
25 |
160 |
239 |
2 |
88 |
October |
9 |
88 |
295 |
2 |
127 |
November |
2 |
86 |
292 |
2 |
120 |
December |
1 |
128 |
232 |
6 |
155 |
The above crosstab displays the count of cases for each Region
by Month. Note that only months that contain records are displayed.
It is also common to display totals for both columns and rows. The
following report sample shows Case totals for each Region broken
down by Travel Mode. The latter is further broken down by the Nationality
of passengers. Sub and grand totals are tabulated for each field:
Crosstab Query Solutions
Now it’s time to delve into the primary reason that you’re reading this
article: how to write SQL that will display data in tabular format. There are
several ways to get the job done; it’s really a matter of deciding what it is
exactly that you need.
Solution #1: Using the GROUP By Clause
The GROUP BY clause groups a selected set of records into a set of
summary rows by the values of one or more columns or expressions. One row is
returned for each group, so aggregate functions in the SELECT list will provide
information about each group rather than individual rows:
SELECT MONTHNAME(CREATION_DATE) AS Month,
COUNT(*) AS CASES
FROM TA_CASES
WHERE MONTH(CREATION_DATE) = 4 AND Year(CREATION_DATE) = 1998
GROUP BY MONTH(CREATION_DATE);
If you leave out the GROUP BY, the above query would throw an error on
execution such as the following:
“You tried to execute a query that does not include the specified expression as part of an aggregate function or grouping.”
That happens because you normally can’t mix aggregate functions with
scalar field values, that is unless you specify the scalar field in the GROUP
BY clause.
Here are the results for the Month of April, as defined by the query above:
Cases for April |
|
Month |
CASES |
April |
171 |
That gives us the basic information that we want for one month. The
next steps are to break down the cases by region and include the other months.
Let’s deal with the regions first.
We can get the counts for each region by adding the REGION_CODE
to the field list:
SELECT MONTHNAME(CREATION_DATE) AS Month,
COUNT(*) AS CASES,
REGION_CODE
FROM TA_CASES
WHERE MONTH(CREATION_DATE) = 4 AND Year(CREATION_DATE) = 1998
GROUP BY MONTH(CREATION_DATE),
REGION_CODE;
Don’t forget to add it to the GROUP BY clause as well, or you’ll get an
error similar to the following:
You tried to execute a query that does not include the specified expression ‘REGION_CODE’ as part of an aggregate function.
The extra field will now cause the counts to be applied to each REGION_CODE:
Cases for April by |
||
Month |
CASES |
REGION_CODE |
April |
13 |
01 |
April |
33 |
02 |
April |
76 |
03 |
April |
2 |
04 |
April |
47 |
05 |
Although our query now captures the counts for each region, they are
displayed in separate rows, rather than columns. That’s because GROUP BY
applies to row data. To show the counts in one row requires adding each REGION_CODE
to the field list. Therefore, where we previously had one Count(*) in
the field list, we will now require one for each of our five regions. The
trick to making this work, is to filter the counts by the same criteria as in
the main WHERE clause:
SELECT MONTHNAME(CREATION_DATE) AS ‘Month’,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE MONTH(CREATION_DATE)=4 AND YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);
Now our query displays all of the counts in one row. We can even
include a TOTAL column by keeping the original Count(*) in the
field list:
Cases for April by |
||||||
Month |
REGION 1 |
REGION 2 |
REGION 3 |
REGION 4 |
REGION 5 |
TOTAL |
April |
13 |
33 |
76 |
2 |
47 |
171 |
To show all months, we must remove the MONTH(CREATION_DATE) = 4
criteria from the main WHERE clause and replace the MONTH(CREATION_DATE) = 4
with the more generic MONTH(CREATION_DATE) = MONTH(CA.CREATION_DATE), where
CA is an alias for the TA_CASES table of the outer query:
SELECT MONTHNAME(CREATION_DATE) AS ‘Month’,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);
Here is the resulting output from the above query, which shows counts
for every month that contains data:
Cases by Region Code |
||||||
Month |
REGION 1 |
REGION 2 |
REGION 3 |
REGION 4 |
REGION 5 |
TOTAL |
April |
13 |
33 |
76 |
2 |
47 |
171 |
May |
17 |
55 |
209 |
1 |
143 |
425 |
June |
8 |
63 |
221 |
1 |
127 |
420 |
July |
13 |
104 |
240 |
6 |
123 |
486 |
August |
18 |
121 |
274 |
9 |
111 |
533 |
September |
25 |
160 |
239 |
2 |
88 |
514 |
October |
9 |
88 |
295 |
2 |
127 |
521 |
November |
2 |
86 |
292 |
2 |
120 |
502 |
December |
1 |
128 |
232 |
6 |
155 |
522 |
Our final task today will be to add a row that will show the totals for
each Region.
This is not an especially easy thing to accomplish because the GROUP BY
clause is summing the counts by month. You can’t include an aggregate function
to the GROUP BY so we have to come up with another way to add another row. One
way to add rows to a query recordset is to use the UNION statement. It appends
the second query results to the first; the only catch is that they must both possess
the same number of columns. Our second query will be almost identical to the
first, except that we will remove the GROUP BY, so that the counts are now
applied to the entire year:
SELECT MONTHNAME(CREATION_DATE) AS ‘Month’,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);
UNION
SELECT ‘TOTAL’ AS Month,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998;
Here is the finished crosstab query, including the Region totals:
Cases by Region Code |
||||||
Month |
REGION 1 |
REGION 2 |
REGION 3 |
REGION 4 |
REGION 5 |
TOTAL |
April |
13 |
33 |
76 |
2 |
47 |
171 |
August |
18 |
121 |
274 |
9 |
111 |
533 |
December |
1 |
128 |
232 |
6 |
155 |
522 |
July |
13 |
104 |
240 |
6 |
123 |
486 |
June |
8 |
63 |
221 |
1 |
127 |
420 |
May |
17 |
55 |
209 |
1 |
143 |
425 |
November |
2 |
86 |
292 |
2 |
120 |
502 |
October |
9 |
88 |
295 |
2 |
127 |
521 |
September |
25 |
160 |
239 |
2 |
88 |
514 |
TOTAL |
106 |
838 |
2078 |
31 |
1041 |
4094 |
The technique that we covered today is ideal for those times that you
know how many columns you’ll be needing. To accommodate an unknown number of columns,
you’ll need something a bit more flexible, such as a stored procedure. We’ll
be looking at that solution next time.