Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Tips
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted February 9, 2017

WEBINAR: On-demand webcast

How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >

Pivot Aggregate Counts on Months in MySQL

By Rob Gravelle

A cross tabulation query, or cross tab/crosstab for short, is used to display the result of an aggregation on two or more fields.  The trick to this type of query is the transformation of rows of data to columns. That presents the data in a table format in which one field is displayed down the left side of a matrix (row headings) and the other across the top (column headings).

One common cross tab query is one that counts or sums some data for a time period, such as a year, quarter, or month.  Aggregating on periods of less than a year presents an extra challenge because we have to be careful to count each period as a distinct entity, e.g., January of 2006’s data must not include that of any other year.  In today’s article, we’ll create a crosstab query for the Sakila sample database that counts the number of movie rentals for each customer aggregated by month.

The Source Database

As usual, I’ll be relying on the ever-dependable Sakila sample database.  A former member of the MySQL AB documentation team named Mike Hillyer created the Sakila database specifically for the purpose of providing a standard schema for use in books, tutorials, and articles just like the one you’re reading.

The database contains a number of related tables themed around the film industry and covers everything from actors and film studios to video rental stores.  The full schema structure can be viewed on the MySQL Dev site.

Please refer to my Generating Reports on MySQL Data tutorial for instructions on downloading and installing the Sakila database.

Starting at the End

In order to better understand what it is we’re trying to achieve, let’s take a look at the final results. These will display the customer names, along with their yearly and monthly video rental totals.  The last column on the right will contain totals for each year, while the last row will provide grand totals for each month. Finally, the cell in the bottom-right will display the total number of movies rented.

The months are ideal candidates for the column data because they are known in advance.  There will always be twelve of them and in exactly the same order.  Compare those to the customers, which can change between query executions.  Whenever you have one known quantity and one unknown, choosing the rows and columns is a no-brainer – the known quantity must take the columns.

Here are the first several rows of the final result set:

Customer                     | Year| Jan| Feb| Mar| Apr| May| Jun| Jul| Aug| Sep| Oct| Nov| Dec| TOTAL
 ---------------------------------------------------------------------------------------------------
 ABNEY, RAFAEL               | 2005|   0|   0|   0|   0|   2|   4|   6|   8|   0|   0|   0|   0| 20
 ABNEY, RAFAEL               | 2006|   0|   1|   0|   0|   0|   0|   0|   0|   0|   0|   0|   0| 1    
 Total for ABNEY, RAFAEL     |     |   0|   1|   0|   0|   2|   4|   6|   8|   0|   0|   0|   0| 21
 ADAM, NATHANIEL             | 2005|   0|   0|   0|   0|   3|   2|  11|  12|   0|   0|   0|   0| 28
 Total for ADAM, NATHANIEL   |     |   0|   0|   0|   0|   3|   2|  11|  12|   0|   0|   0|   0| 28
 ADAMS, KATHLEEN             | 2005|   0|   0|   0|   0|   2|   1|  13|  11|   0|   0|   0|   0| 27
 Total for ADAMS, KATHLEEN   |     |   0|   0|   0|   0|   2|   1|  13|  11|   0|   0|   0|   0| 27
 ALEXANDER, DIANA            | 2005|   0|   0|   0|   0|   0|   7|  12|   8|   0|   0|   0|   0| 27
 Total for ALEXANDER, DIANA  |     |   0|   0|   0|   0|   0|   7|  12|   8|   0|   0|   0|   0| 27
 ALLARD, GORDON              | 2005|   0|   0|   0|   0|   1|   3|  17|  11|   0|   0|   0|   0| 32
 ...
 ---------------------------------------------------------------------------------------------------
 GRAND TOTAL:                |     |   0| 182|   0|   0|1156|2311|6709|5686|   0|   0|   0|   0| 16044

Selecting the Data

Before you try to pivot the data to produce the above result, you should first take a step back and write a standard SELECT statement that fetches all of the pertinent data.  The relevant columns are:

  1. The customer name
  2. The rental date year
  3. The rental date month
  4. A count of movies rented (any non-null field will work)

We’ll need to join the customer table to the rental one in order to access the customers’ names.

As mentioned earlier, we need to include the year in the GROUP BY clause if we don’t want our counts to include every month of that name, i.e. all Januarys. 

Here is the query:

SELECT CONCAT(CU.last_name, ', ', CU.first_name) AS 'Customer',
        YEAR(rental_date)                         AS 'Year',
        monthname(R.rental_date)                  AS 'Month',
        COUNT(rental_date)                        AS 'TOTAL'
 FROM rental R 
       LEFT JOIN sakila.customer CU ON R.customer_id = CU.customer_id
 GROUP BY customer, year, month(R.rental_date)
 WITH ROLLUP;

That produces the following result set:

Customer        | Year | Month    | TOTAL
 -----------------------------------------
 ABNEY, RAFAEL   | 2005 | May      | 2
 ABNEY, RAFAEL   | 2005 | June     | 4
 ABNEY, RAFAEL   | 2005 | July       6
 ABNEY, RAFAEL   | 2005 | August   | 8
 ABNEY, RAFAEL   | 2005 | August   | 20
 ABNEY, RAFAEL   | 2006 | February | 1
 ABNEY, RAFAEL   | 2006 | February | 1
 ABNEY, RAFAEL   |      | February | 21
 ADAM, NATHANIEL | 2005 | May      | 3
 ADAM, NATHANIEL | 2005 | June     | 2
 ADAM, NATHANIEL | 2005 | July     | 11
 ADAM, NATHANIEL | 2005 | August   | 12
 ADAM, NATHANIEL | 2005 | August   | 28
 ADAM, NATHANIEL |      | August   | 28
 ADAMS, KATHLEEN | 2005 | May      | 2
 ADAMS, KATHLEEN | 2005 | June     | 1
 ADAMS, KATHLEEN | 2005 | July     | 13
 ADAMS, KATHLEEN | 2005 | August   | 11
 ADAMS, KATHLEEN | 2005 | August   | 27
 ADAMS, KATHLEEN |      | August   | 27
 ...

It satisfies our requirements very well, in that each customer’s movie rentals are tabulated by year and months in which he/she rented something.  The “with rollup” clause produces an extra row at the end of each year and customer.   In this format, these rows are a little difficult to discern because of duplicate values.  In addition to the grouped column value, there are also duplicated months.  This happens because MySQL permits a column that does not appear in the GROUP BY list to be named in the select list. The server is free to choose any value from this non-aggregated column in summary rows, and this includes the extra rows added by WITH ROLLUP.  In this case, the month name is a non-aggregated column, even though the month index is.  This was done so as to sort the months alphabetically.  Substituting the month() function for monthname() in the select list would replace the duplicated month values with nulls.

Just for the sake of thoroughness, I performed the above query without the GROUP BY to manually tabulate the totals.  We can see that they match: 

customer       | title                 | Year | month | rental_date
 ---------------------------------------------------------------------------
 ABNEY, RAFAEL  | SAGEBRUSH CLUELESS    | 2005 | 5     | 2005-05-26 01:34:28
 ABNE, RAFAEL   | POCUS PULP            | 2005 | 5     | 2005-05-28 19:14:09
 ABNE, RAFAEL   | NIGHTMARE CHILL       | 2005 | 6     | 2005-06-16 20:17:20
 ABNE, RAFAEL   | LEGALLY SECRETAR      | 2005 | 6     | 2005-06-17 03:36:02
 ABNE, RAFAEL   | TRADING PINOCCHIO     | 2005 | 6     | 2005-06-19 18:04:18
 ABNE, RAFAEL   | CONEHEADS SMOOCH      | 2005 | 6     | 2005-06-20 19:41:28
 ABNE, RAFAEL   | DOUBLE WRATH          | 2005 | 7     | 2005-07-07 00:26:43
 ABNE, RAFAEL   | CONQUERER NUTS        | 2005 | 7     | 2005-07-08 02:22:45
 ABNE, RAFAEL   | GOODFELLAS SALUTE     | 2005 | 7     | 2005-07-11 00:16:35
 ABNE, RAFAEL   | WANDA CHAMBER         | 2005 | 7     | 2005-07-11 17:23:33
 ABNE, RAFAEL   | MADNESS ATTACKS       | 2005 | 7     | 2005-07-12 00:27:08
 ABNE, RAFAEL   | SHOCK CABIN           | 2005 | 7     | 2005-07-28 07:15:32
 ABNE, RAFAEL   | JUGGLER HARDL         | 2005 | 8     | 2005-08-01 01:10:33
 ABNE, RAFAEL   | KARATE MOON           | 2005 | 8     | 2005-08-02 01:19:33
 ABNE, RAFAEL   | HOBBIT ALIEN          | 2005 | 8     | 2005-08-02 10:08:40
 ABNE, RAFAEL   | STRICTLY SCARFACE     | 2005 | 8     | 2005-08-17 15:40:47
 ABNE, RAFAEL   | CHOCOLAT HARR         | 2005 | 8     | 2005-08-20 06:22:08
 ABNE, RAFAEL   | BLACKOUT PRIVATE      | 2005 | 8     | 2005-08-21 10:27:21
 ABNE, RAFAEL   | CLASH FREDD           | 2005 | 8     | 2005-08-22 00:48:23
 ABNE, RAFAEL   | FREDDY STORM          | 2005 | 8     | 2005-08-23 00:30:26
 ABNE, RAFAEL   | CONVERSATION DOWNHILL | 2006 | 2     | 2006-02-14 15:16:03
 ---------------------------------------------------------------------------
 TOTAL MOVIES:    21                                
 ---------------------------------------------------------------------------
 ADA, NATHANIEL | KISS GLOR             | 2005 | 5     | 2005-05-25 22:02:30
 ADA, NATHANIEL | GATHERING CALENDAR    | 2005 | 5     | 2005-05-27 21:17:08
 ADA, NATHANIEL | NOON PAPI             | 2005 | 5     | 2005-05-30 00:27:57
 ADA, NATHANIEL | SHEPHERD MIDSUMMER    | 2005 | 6     | 2005-06-19 14:51:55
 ADA, NATHANIEL | GUYS FALCON           | 2005 | 6     | 2005-06-20 05:17:22
 ADA, NATHANIEL | SOMETHING DUCK        | 2005 | 7     | 2005-07-06 10:47:35
 ADA, NATHANIEL | HANGING DEEP          | 2005 | 7     | 2005-07-06 10:49:30
 ADA, NATHANIEL | CHASING FIGHT         | 2005 | 7     | 2005-07-07 18:04:16
 ADA, NATHANIEL | ENDING CROWDS         | 2005 | 7     | 2005-07-08 14:36:51
 ADA, NATHANIEL | NEMO CAMPUS           | 2005 | 7     | 2005-07-09 08:35:05
 ADA, NATHANIEL | TIGHTS DAWN           | 2005 | 7     | 2005-07-27 14:25:17
 ADA, NATHANIEL | DIVORCE SHINING       | 2005 | 7     | 2005-07-27 22:38:05
 ADA, NATHANIEL | JASON TRAP            | 2005 | 7     | 2005-07-28 07:58:27
 ADA, NATHANIEL | SLEUTH ORIENT         | 2005 | 7     | 2005-07-28 10:23:48
 ADA, NATHANIEL | POSEIDON FOREVER      | 2005 | 7     | 2005-07-28 12:51:22
 ADA, NATHANIEL | ROCKY WAR             | 2005 | 7     | 2005-07-29 06:02:11
 ADA, NATHANIEL | TRAMP OTHERS          | 2005 | 8     | 2005-08-01 07:11:27
 ADA, NATHANIEL | AMADEUS HOL           | 2005 | 8     | 2005-08-01 11:25:28
 ADA, NATHANIEL | MENAGERIE RUSHMORE    | 2005 | 8     | 2005-08-17 01:31:04
 ADA, NATHANIEL | DANCING FEVER         | 2005 | 8     | 2005-08-18 23:26:40
 ADA, NATHANIEL | WARDROBE PHANTO       | 2005 | 8     | 2005-08-19 13:36:28
 ADA, NATHANIEL | LUST LOCK             | 2005 | 8     | 2005-08-19 15:05:34
 ADA, NATHANIEL | ANALYZE HOOSIERS      | 2005 | 8     | 2005-08-19 21:46:10
 ADA, NATHANIEL | WAR NOTTING           | 2005 | 8     | 2005-08-20 14:53:43
 ADA, NATHANIEL | BOOGIE AMELIE         | 2005 | 8     | 2005-08-22 08:52:38
 ADA, NATHANIEL | FREDDY STOR           | 2005 | 8     | 2005-08-22 15:05:37
 ADA, NATHANIEL | STRANGERS GRAFFITI    | 2005 | 8     | 2005-08-23 11:17:26
 ADA, NATHANIEL | ORIENT CLOSER         | 2005 | 8     | 2005-08-23 21:44:27
 ---------------------------------------------------------------------------
 TOTAL MOVIES:    28                                
 ---------------------------------------------------------------------------

Pivoting on Months

Now we’re going to go through the process of translating the year and month rows into columns.

The GROUP BY clause will still handle the customers and years, but now we have to count each month separately.  To do that, we can test for the month index and only count that row if it’s that column’s month:

COUNT(CASE WHEN MONTH(rental_date) = 1 THEN rental_date END) AS 'Jan'

Here is the full updated query:

SELECT 
   CONCAT(CU.last_name, ', ', CU.first_name)                     AS 'Customer',  
   YEAR(rental_date)                                             AS 'Year',
   COUNT(CASE WHEN MONTH(rental_date) = 1  THEN rental_date END) AS 'Jan',
   COUNT(CASE WHEN MONTH(rental_date) = 2  THEN rental_date END) AS 'Feb',
   COUNT(CASE WHEN MONTH(rental_date) = 3  THEN rental_date END) AS 'Mar',
   COUNT(CASE WHEN MONTH(rental_date) = 4  THEN rental_date END) AS 'Apr',
   COUNT(CASE WHEN MONTH(rental_date) = 5  THEN rental_date END) AS 'May',
   COUNT(CASE WHEN MONTH(rental_date) = 6  THEN rental_date END) AS 'Jun',
   COUNT(CASE WHEN MONTH(rental_date) = 7  THEN rental_date END) AS 'Jul',
   COUNT(CASE WHEN MONTH(rental_date) = 8  THEN rental_date END) AS 'Aug',
   COUNT(CASE WHEN MONTH(rental_date) = 9  THEN rental_date END) AS 'Sep',
   COUNT(CASE WHEN MONTH(rental_date) = 10 THEN rental_date END) AS 'Oct',
   COUNT(CASE WHEN MONTH(rental_date) = 11 THEN rental_date END) AS 'Nov',
   COUNT(CASE WHEN MONTH(rental_date) = 12 THEN rental_date END) AS 'D',
   COUNT(rental_date)                                            AS 'TOTAL'
 FROM 
   rental R 
       LEFT JOIN sakila.customer CU ON R.customer_id = CU.customer_id
 GROUP BY Customer, Year
 WITH ROLLUP;

Creating Customized Aggregate Labels

You may have noticed that the final result includes custom labels in the aggregate rows, such as “Total for ABNEY, RAFAEL”.  These were produced by placing the above statement within a subquery and selecting from it.  A Null value signifies an aggregate row and is populated with a “Total” label.

SELECT CASE WHEN Customer IS NULL AND Year IS NULL THEN 'GRAND TOTAL:' 
             WHEN Year IS NULL THEN CONCAT('Total for ', Customer)
             ELSE Customer 
        END                                                         AS 'Customer',
        CASE WHEN Year IS NULL THEN ''
             ELSE Year
        END                                                         AS 'Year',
        Jan,  
        -- month fields...
        D                                                           AS 'Dec',
        TOTAL
 FROM  
  ( SELECT 
      CONCAT(CU.last_name, ', ', CU.first_name)                     AS 'Customer',  
      YEAR(rental_date)                                             AS 'Year',
      -- month fields...
      COUNT(CASE WHEN MONTH(rental_date) = 12 THEN rental_date END) AS 'D',
      COUNT(rental_date)                                            AS 'TOTAL'
      FROM 
   rental R 
        LEFT JOIN sakila.customer CU ON R.customer_id = CU.customer_id
 GROUP BY customer, year
 WITH ROLLUP ) as fields;
 
 

Conclusion

Being fixed units, time intervals are the ideal column data in crosstab queries. The only time that you may want to consider assigning them to rows is if the related field is also fixed.  For example, if you were generating a crosstab of sales for each office, the latter could occupy the columns, since you would know exactly which offices’ sales are being measured.

See all articles by Rob Gravelle



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM