Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted Dec 29, 2009

All About the Crosstab Query

By Rob Gravelle

A cross tabulation query, or cross tab for short, is used to display the calculation of a sum, average, count, or other type of total on two or more fields. The results are then presented in a table format in which one field is displayed down the left side of a matrix (row headings) and the other across the top (column headings).

Funnily enough, most relational databases don’t provide an easy way to create crosstabs, the exception being good ole Microsoft Access. It features its own SQL language constructs for creating crosstabs such as TRANSFORM and PIVOT. Today, I’m going to show you one of several ways to create crosstabs using regular SQL. Although I will be working in MySQL, the techniques discussed here could be applied to just about any database.

A Basic Crosstab

One of my many responsibilities of my job is to construct canned reports for our clients. For each report, there is a stored procedure that fetches the required data from the database. Every so often I get a request for a report that necessitates generating a crosstab query. I recognize the need for a crosstab when I hear requirements such as “I want to display this field by that field”. Most often, it’s a grouping of a field by dates, although other combinations come up from time to time. Here is what one of those crosstab reports might look like:

Cases by Date and Region Code

Month

01

02

03

04

05

April

13

33

76

2

47

May

17

55

209

1

143

June

8

63

221

1

127

July

13

104

240

6

123

August

18

121

274

9

111

September

25

160

239

2

88

October

9

88

295

2

127

November

2

86

292

2

120

December

1

128

232

6

155

The above crosstab displays the count of cases for each Region by Month. Note that only months that contain records are displayed.

It is also common to display totals for both columns and rows. The following report sample shows Case totals for each Region broken down by Travel Mode. The latter is further broken down by the Nationality of passengers. Sub and grand totals are tabulated for each field:

Totals for each Region broken down by travel mode

Crosstab Query Solutions

Now it’s time to delve into the primary reason that you’re reading this article: how to write SQL that will display data in tabular format. There are several ways to get the job done; it’s really a matter of deciding what it is exactly that you need.

Solution #1: Using the GROUP By Clause

The GROUP BY clause groups a selected set of records into a set of summary rows by the values of one or more columns or expressions. One row is returned for each group, so aggregate functions in the SELECT list will provide information about each group rather than individual rows:

SELECT   MONTHNAME(CREATION_DATE) AS Month, 
         COUNT(*) AS CASES
FROM     TA_CASES
WHERE    MONTH(CREATION_DATE) = 4 AND Year(CREATION_DATE) = 1998
GROUP BY MONTH(CREATION_DATE);

If you leave out the GROUP BY, the above query would throw an error on execution such as the following:

“You tried to execute a query that does not include the specified
expression as part of an aggregate function or grouping.”

That happens because you normally can’t mix aggregate functions with scalar field values, that is unless you specify the scalar field in the GROUP BY clause.

Here are the results for the Month of April, as defined by the query above:

Cases for April

Month

CASES

April

171

That gives us the basic information that we want for one month. The next steps are to break down the cases by region and include the other months. Let’s deal with the regions first.

We can get the counts for each region by adding the REGION_CODE to the field list:

SELECT   MONTHNAME(CREATION_DATE) AS Month, 
COUNT(*) AS CASES,
         REGION_CODE
FROM     TA_CASES
WHERE    MONTH(CREATION_DATE) = 4 AND Year(CREATION_DATE) = 1998
GROUP BY MONTH(CREATION_DATE),
         REGION_CODE;

Don’t forget to add it to the GROUP BY clause as well, or you’ll get an error similar to the following:

You tried to execute a
query that does not include the specified expression ‘REGION_CODE’ as part of
an aggregate function.

The extra field will now cause the counts to be applied to each REGION_CODE:

Cases for April by Region Code

Month

CASES

REGION_CODE

April

13

01

April

33

02

April

76

03

April

2

04

April

47

05

Although our query now captures the counts for each region, they are displayed in separate rows, rather than columns. That’s because GROUP BY applies to row data. To show the counts in one row requires adding each REGION_CODE to the field list. Therefore, where we previously had one Count(*) in the field list, we will now require one for each of our five regions. The trick to making this work, is to filter the counts by the same criteria as in the main WHERE clause:

SELECT MONTHNAME(CREATION_DATE) AS 'Month', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '01' 
        AND    MONTH(CREATION_DATE)=4 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 1',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '02' 
        AND    MONTH(CREATION_DATE)=4 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 2', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '03' 
        AND    MONTH(CREATION_DATE)=4 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 3',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '04' 
        AND    MONTH(CREATION_DATE)=4 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 4',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '05' 
        AND    MONTH(CREATION_DATE)=4 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 5',
        Count(*) AS TOTAL
FROM TA_CASES CA
WHERE MONTH(CREATION_DATE)=4 AND YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);

Now our query displays all of the counts in one row. We can even include a TOTAL column by keeping the original Count(*) in the field list:

Cases for April by Region Code

Month

REGION 1

REGION 2

REGION 3

REGION 4

REGION 5

TOTAL

April

13

33

76

2

47

171

To show all months, we must remove the MONTH(CREATION_DATE) = 4 criteria from the main WHERE clause and replace the MONTH(CREATION_DATE) = 4 with the more generic MONTH(CREATION_DATE) = MONTH(CA.CREATION_DATE), where CA is an alias for the TA_CASES table of the outer query:

SELECT MONTHNAME(CREATION_DATE) AS 'Month', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '01' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE) 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 1',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '02' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)  
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 2', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '03' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)  
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 3',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '04' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 4',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '05' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)  
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 5',
        Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);

Here is the resulting output from the above query, which shows counts for every month that contains data:

Cases by Region Code and Month

Month

REGION 1

REGION 2

REGION 3

REGION 4

REGION 5

TOTAL

April

13

33

76

2

47

171

May

17

55

209

1

143

425

June

8

63

221

1

127

420

July

13

104

240

6

123

486

August

18

121

274

9

111

533

September

25

160

239

2

88

514

October

9

88

295

2

127

521

November

2

86

292

2

120

502

December

1

128

232

6

155

522

Our final task today will be to add a row that will show the totals for each Region.

This is not an especially easy thing to accomplish because the GROUP BY clause is summing the counts by month. You can’t include an aggregate function to the GROUP BY so we have to come up with another way to add another row. One way to add rows to a query recordset is to use the UNION statement. It appends the second query results to the first; the only catch is that they must both possess the same number of columns. Our second query will be almost identical to the first, except that we will remove the GROUP BY, so that the counts are now applied to the entire year:

SELECT MONTHNAME(CREATION_DATE) AS 'Month', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '01' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE) 
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 1',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '02' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)  
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 2', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '03' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)  
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 3',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '04' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 4',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '05' 
        AND    MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)  
        AND    YEAR(CREATION_DATE)=1998) AS 'REGION 5',
        Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);
UNION
SELECT 'TOTAL' AS Month, 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '01' 
        AND    YEAR(CREATION_DATE)=1998 AS 'REGION 1',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '02' 
        AND    YEAR(CREATION_DATE)=1998 AS 'REGION 2', 
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '03' 
        AND    YEAR(CREATION_DATE)=1998 AS 'REGION 3',
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '04' 
        AND    YEAR(CREATION_DATE)=1998 AS 'REGION 4',  
       (SELECT count(*) 
        FROM   TA_CASES 
        WHERE  REGION_CODE = '05' 
        AND    YEAR(CREATION_DATE)=1998 AS 'REGION 5',
       Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998; 

Here is the finished crosstab query, including the Region totals:

Cases by Region Code and Month with Totals

Month

REGION 1

REGION 2

REGION 3

REGION 4

REGION 5

TOTAL

April

13

33

76

2

47

171

August

18

121

274

9

111

533

December

1

128

232

6

155

522

July

13

104

240

6

123

486

June

8

63

221

1

127

420

May

17

55

209

1

143

425

November

2

86

292

2

120

502

October

9

88

295

2

127

521

September

25

160

239

2

88

514

TOTAL

106

838

2078

31

1041

4094

The technique that we covered today is ideal for those times that you know how many columns you’ll be needing. To accommodate an unknown number of columns, you’ll need something a bit more flexible, such as a stored procedure. We’ll be looking at that solution next time.

» See All Articles by Columnist Rob Gravelle



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM