MySql View Technique for Grouping Crosstab Column Data
March 12, 2010
Learn a technique to fetch column data for MySQL crosstabs by emulating views. By replacing tables with a subquery subset, the resulting view will contain a limited dataset that has been filtered for that column field.
In the database realm, a view is a virtual or logical table, which contains the results of a SELECT query. Views are generally used to group related columns from multiple tables together or to hide certain columns. One of the best features of views is that they behave essentially the same way as a table in that you can retrieve and update data on them in the same way as with a table. We can mimic some of this behavior to suit our purposes by replacing full table aliases with a subquery subset. Hence, a subquerys resulting view will contain a limited dataset that has been filtered on the select criteria of a crosstab column field.
Table Self Aliasing Explained
Sometimes you need to join a table to itself in order to compare records in a table to other records in that same table. For example, the following query groups animals by species and by male and female pairs:
SELECT a1.name, a1.sex, a2.name, a2.sex, a1.species FROM domestic_animal AS a1 INNER JOIN domestic_animal AS a2 ON a1.species = a2.species AND a1.sex = 'f' AND a2.sex = 'm'; --PRODUCES--: +--------+------+--------+------+---------+ | name | sex | name | sex | species | +--------+------+--------+------+---------+ | Bertha | f | Bucko | m | cow | | Buffy | f | Killer | m | dog | | Babe | f | Bowser | m | pig | +--------+------+--------+------+---------+
Column Fields with Complex Criteria
As we saw in the MySQL CASE Statements in Multi-Aggregated Crosstab Columns article, the useage of CASE statements to select column data can lead to a lot of repetition in the field list, depending on the complexity of the criteria. For instance, take the following code, which only represents one field:
COUNT(CASE WHEN REGION_CODE = '01' AND CITIZENSHIP_CODE = '205' AND (DOB_DT BETWEEN '1980-01-01 00:00:00' AND '1990-12-31 23:59:59' OR CASE_TYPE_CD = 'M-10-A') THEN FEE_NUMBER END) AS Cit_1,
Rather than repeat the same criteria for every citizenship column, we can perform a separate subquery in the FROM clause to obtain all of the filtered data. The various views can be linked together using JOINs, just like regular tables. Here again is the query that retrieves the counts of Indian citizenships and case totals by region code, only this time, the code that fetches records containing Indian citizenships has been moved to the FROM clause:
SELECT CASE WHEN Year IS NULL THEN 'GRAND TOTAL' ELSE Year END AS 'Year', CASE WHEN Month_Num IS NULL THEN CASE WHEN Year IS NULL THEN '' ELSE CONCAT(Year, ' TOTAL') END ELSE Month END AS 'Month', Cit_1 AS 'Indian Cit for Region 1', REGION_1 AS 'REGION 1', Cit_2 AS 'Indian Cit for Region 2', REGION_2 AS 'REGION 2', Cit_3 AS 'Indian Cit for Region 3', REGION_3 AS 'REGION 3', Cit_4 AS 'Indian Cit for Region 4', REGION_4 AS 'REGION 4', Cit_5 AS 'Indian Cit for Region 5', REGION_5 AS 'REGION 5', Cit_TOTAL AS 'Indian Cit Total', REGION_TOTAL AS 'REGION TOTAL' FROM (SELECT YEAR(ca.CREATION_DATE) AS Year, MONTH(ca.CREATION_DATE) AS Month_Num, MONTHNAME(ca.CREATION_DATE) AS Month, COUNT(CASE WHEN cit.REGION_CODE ='01' THEN cit.FEE_NUMBER END) AS Cit_1, COUNT(CASE WHEN ca.REGION_CODE ='01' THEN ca.FEE_NUMBER END) AS REGION_1, COUNT(CASE WHEN cit.REGION_CODE ='02' THEN cit.FEE_NUMBER END) AS Cit_2, COUNT(CASE WHEN ca.REGION_CODE ='02' THEN ca.FEE_NUMBER END) AS REGION_2, COUNT(CASE WHEN cit.REGION_CODE ='03' THEN cit.FEE_NUMBER END) AS Cit_3, COUNT(CASE WHEN ca.REGION_CODE ='03' THEN ca.FEE_NUMBER END) AS REGION_3, COUNT(CASE WHEN cit.REGION_CODE ='04' THEN cit.FEE_NUMBER END) AS Cit_4, COUNT(CASE WHEN ca.REGION_CODE ='04' THEN ca.FEE_NUMBER END) AS REGION_4, COUNT(CASE WHEN cit.REGION_CODE ='05' THEN cit.FEE_NUMBER END) AS Cit_5, COUNT(CASE WHEN ca.REGION_CODE ='05' THEN ca.FEE_NUMBER END) AS REGION_5, COUNT(cit.fee_number) AS Cit_TOTAL, COUNT(ca.fee_number) AS REGION_TOTAL FROM TA_CASES as ca LEFT JOIN (SELECT * FROM TA_CASES WHERE CITIZENSHIP_CODE ='205') as cit ON (ca.fee_number = cit.fee_number) WHERE ca.CREATION_DATE IS NOT NULL GROUP BY Year, Month_Num WITH ROLLUP) as temp;
The Left Join
A LEFT JOIN is used so that all of the records in the left (primary) table are returned in a MySQL result, whether or not there is a matching record in the other (dependent) table that it is being joined to. The following diagram shows two datasets, A and B, where A represents the left table, and B is on the right of the Join:
Here is an example that shows the LEFT Join in action. The following two tables contain music artists and styles:
Using a Left Join, we can select all of the styles regardless of whether or not there are any artists in that particular style:
SELECT artists.name AS artist, styles.description AS style FROM styles LEFT JOIN artists ON styles.id = artists.style ORDER BY styles.description;
The resulting recordset displays NULL for style records that have no associated artist:
A LEFT JOIN can also be used to count all of the records in the TA_CASES table, while still filtering the citizenship columns. (As in the last article, the results were limited to 1998 and the REGION 4 and 5 data were omitted in order to fit the table on the page):
The next stop on our look at crosstab queries is the use of a stored proc to generate crosstabs where the exact number of columns is not known beforehand.