Reporting what's not there
October 16, 2009
I tend to answer a lot of questions at the UtterAccess Discussion Forums (http://www.utteraccess.com) A recent question there got me thinking about the fact that it's easy to write queries that will show you data in the database that matches a criteria. However, if you want details of when there is no data in the database that matches the criteria, it becomes more difficult.
The specific question posed was something like:
Now, the problem with SQL Aggregate Function (such as Count, Avg, Min, Max or Sum) is that they only return values based on the number of rows selected that meet the given criteria. If no rows in the table are returned, then there's no value to be calculated. And while the DLookup Domain Aggregate function will return Null in that case (so that you can use the Nz function to return a value of zero), because nothing is returned by the query, there's nothing for you to grab to try and coerce into a value.
Before I go any further, I want to discuss a pet peeve of mine with Totals queries generated in Access. They're usually not generated correctly!
SELECT fieldlist FROM table WHERE selectcriteria GROUP BY groupfieldlist [ HAVING groupcriteria]
Both WHERE and HAVING affect the details of the records displayed, but there is a fundamental difference between them. WHERE specifies which records are acted upon by the SELECT statement, whereas HAVING specifies which records are displayed. What does this mean when doing Totals queries (i.e. queries that use Aggregate functions)?
In the query posted to UtterAccess, the intent is to limit the results to only those rows that meet the three given criteria. Since we don't want to see any records which don't meet those criteria, they really should be in a WHERE clause. While the difference may be subtle, when you're dealing with large tables, if you can reduce the number of rows that need to be aggregated, you can speed the query up significantly. The HAVING clause should be limited to those cases where the display criteria depends on the results of grouping the records. In other words, whatever is specified in a HAVING clause should include an aggregate function: HAVING Sum(UnitsInStock) > 100, HAVING Count(*) <= 1 and so on.
As well, note that there's no reason to include fields in the GROUP BY clause unless those fields are included in the SELECT field list. In fact, having fields that don't appear in the list can actually cause your query to return unexpected results: additional rows will be returned to account for changes in the fields not displayed.
Finally, using Count(*) is considerably faster than Count([FieldName]). There is, of course, a difference between the two: the Count function doesn't count records that have a Null value in the field when you use a field name, whereas when you use the asterisk wildcard character, the Count function calculates the total number of records, including those that contain Null fields. Since in this particular case a desired value is being specified in the WHERE clause, the SELECT statement will never return a Null value for CABLE_POINT_TYPE (and, as I've already mentioned, the query isn't going to return a row with Null values if no match is found). Besides, we want a zero value, and if Count ignores Null values, that wouldn't give us what we wanted anyhow!
All that being said, let me now rewrite the original query more correctly, as shown in Listing 1.
SELECT COUNT(*) AS [No of 12f Loop Joint] FROM CABLE_POINT WHERE PROJECT_ID=Forms!frmReqForm!txtProject_ID AND CABLE_POINT_TYPE="Break Out Splice Point" AND CAPACITY="12"Listing 1: SQL for a more efficient version of the original query
Now, I believe that the reason why the generated SQL is badly formed is due to how people create such queries. The usual approach is to select the table, drag the fields into the grid, change the query into a Totals query and put criteria under the appropriate fields. The problem is, when you convert a SELECT query to a Totals query, Access assumes a value of GROUP BY in the Total row under each field. Many people don't realize that you can change that GROUP BY to WHERE to get the improvement I discuss above. Figure 1 shows how to do this.
For the particular problem as posed, it's relatively simple to cause a row to be generated whether or not there's data. Since we're dealing with totals, and adding zero to any number results in the same number, what happens if we force a row with a zero value into the mix, and then sum the resultant recordset?
We can create a UNION query (qryS2_UnionQuery, shown in Listing 2) that adds a row having a value of 0:
SELECT COUNT(*) AS CountRows FROM CABLE_POINT WHERE PROJECT_ID=Forms!frmGetTotals!cboProject AND CABLE_POINT_TYPE="Break Out Splice Point" AND CAPACITY="12" UNION SELECT DISTINCT 0 FROM CABLE_POINTListing 2: SQL for qryS2_UnionQuery, a query that will ensure that the recordset returned will always have a least one row.
Figure 2 illustrates what this query will return if there is data that matches on the three criteria, while Figure 3 shows what will be returned when there is no matching data.
Instead of the original query, we now can use:
SELECT Sum(CountRows) AS NumberOfJoints FROM qryS2_UnionQueryListing 3: SQL for qryS2_SumBasedOnUnion, a query sums together the rows returned by qryS2_UnionQuery to return a single row.
Note that it's actually possible to do this as a single query:
SELECT Sum(SubSelect.CountRows) AS NumberOfJoints FROM ( SELECT Count(*) AS CountRows FROM Cable_Point WHERE (Project_ID=Forms![frmGetTotals]![cboProject]) AND (Cable_Point.Cable_Point_Type=Forms![frmGetTotals]![cboType]) AND (Cable_Point.Capacity=Forms![frmGetTotals]![cboCapacity]) UNION SELECT DISTINCT 0 FROM Cable_Point) AS SubSelectListing 4: SQL for qryS2_SingleQuery, a query that combines qryS2_UnionQuery and qryS2_SumBasedOnUnion into a single query.
Incidentally, while what I've written above is syntactically correct, Access may change the bracketing on that to the following:
SELECT Sum(SubSelect.CountRows) AS NumberOfJoints FROM [SELECT Count(*) AS CountRows FROM Cable_Point WHERE (Project_ID=Forms![frmGetTotals]![cboProject]) AND (Cable_Point.Cable_Point_Type=Forms![frmGetTotals]![cboType]) AND (Cable_Point.Capacity=Forms![frmGetTotals]![cboCapacity]) UNION SELECT DISTINCT 0 FROM Cable_Point]. AS SubSelect;Listing 5: SQL for qryS2_SingleQuery, as Access will may render it.
Using Crosstab Queries
Crosstab queries calculate a value (sum, average, count, or other type of total) for data that is grouped by two types of information one down the left side of the datasheet and another across the top. While a crosstab query will generate Null values if there is no value for a particular intersection of the left-side field and the top field, like regular Select queries, it will not return Null values if there are no values for the particular left side field for all top fields or vice versa. To illustrate what I mean, the sample database that accompanies this article has a simplistic example where a particular company has three different products (with the imaginative names Product A, Product B and Product C) that it sells to a total of five different customers. Figure 4 shows the Entity-Relationship Diagram for the tables required to represent the orders placed, while Figure 5 shows the limited data entered for this fictitious company as generated by the SQL presented in Listing 6.
SELECT Customers.CustomerId, OrderDetails.OrderId, OrderDetails.ProductId, Customers.CustomerNm, Products.ProductDs, OrderDetails.QuantityAm, Orders.OrderDt, Format([OrderDt],"mmm, yyyy") AS OrderMonth Format([OrderDt],"yyyy-mm") AS SortMonth FROM Products INNER JOIN ((Customers INNER JOIN Orders ON Customers.CustomerId = Orders.CustomerId) INNER JOIN OrderDetails ON Orders.OrderId = OrderDetails.OrderId) ON Products.ProductId = OrderDetails.ProductIdListing 6: SQL for qryS3_OrderDetails, a query that returns a denormalized view of all of the order details.
To summarize the order details, we can create a Crosstab query based on qryS3_OrderDetails as shown in Listing 7, producing the recordset shown in Figure 6.
TRANSFORM Sum(QuantityAm) AS SumOfQuantityAm SELECT OrderMonth FROM qryS3_OrderDetails GROUP BY OrderMonth, SortMonth ORDER BY SortMonth PIVOT ProductDsListing 7: SQL for qryS3_OrderDetails_Crosstab, a query that summarizes the data returned by qryS3_OrderDetails
Looking at Figure 6, you might notice that there are no orders listed for Product B, nor are there any orders listed for the month of March (nor any month after April...). Usually our users are going to want to see zeroes for all products not ordered in a given month.
We can create a query that returns one row for each month of the current year using SQL such as is shown in Listing 8, resulting in a recordset such as is shown in Figure 7.
SELECT DateSerial(Year(Date()),[MonthNumber],1) AS MonthStart, DateSerial(Year(Date()),[MonthNumber]+1,0) AS MonthEnd, Year(Date()) AS YearNumber, Months.MonthNumber, Format(DateSerial(Year(Date()),[MonthNumber],1),"yyyy-mm") AS SortMonth FROM Months ORDER BY Months.MonthNumber;Listing 8: SQL for qryS3_MonthsThisYear, a query that returns one row for each month in the current year.
(The Month table was described in last month's article Uses for Cartesian Products in MS Access)
If we join qryS3_MonthsThisYear to the Products table, as shown in Listing 9, we can return one row for each product/month combination. (Look closely: this is another Cartesian product, as discussed last month!) Just to summarize that data, I created a crosstab query on qryS3_MonthsThisYear, shown in Figure 8.
SELECT qryS3_MonthsThisYear.MonthStart, qryS3_MonthsThisYear.MonthEnd, Products.ProductDs, 0 AS QuantityAm, qryS3_MonthsThisYear.SortMonth FROM Products, qryS3_MonthsThisYear ORDER BY qryS3_MonthsThisYear.SortMonth, Products.ProductDs;Listing 9: SQL for qryS3_AllProductMonthCombinations, a query that returns one row for each combination of months in the current year and products sold.
Now that we have a query that does have one row for all possible outcomes, it's fairly straightforward to join that table to our Order Details using a Left Join, as shown in Figure 9, which produces the SQL statement shown in Listing 10.
SELECT Format(A.MonthStart,"mmm, yyyy") AS OrderMonth, A.ProductDs, CLng(Nz(B.[QuantityAm],0)) AS QuantityAm, A.SortMonth FROM qryS3_AllProductMonthCombinations AS A LEFT JOIN qryS3_OrderDetails AS B ON (A.SortMonth = B.SortMonth) AND (A.ProductDs = B.ProductDs)Listing 10: SQL for qryS3_OrderDetails_AllProductMonthCombinations, which will have at least one row for each combination of Month and Product.
Now, creating a crosstab based on qryS3_OrderDetails_AllProductMonthCombinations, such as shown in Listing 11, will result in having zeroes for any product not ordered in a particular month, as shown in Figure 10.
TRANSFORM Sum(QuantityAm) AS SumOfQuantityAm SELECT OrderMonth FROM qryS3_OrderDetails_AllProductMonthCombinations GROUP BY OrderMonth, SortMonth ORDER BY SortMonth PIVOT ProductDsListing 11: SQL for qryS3_OrderDetails_AllProductMonthCombinations_Crosstab, which will include zeroes for all combinations not ordered.
In the preceding discussion, I've attempted to show two different scenarios where it's necessary to create data in order to be able to report zero values in queries. Now, I'll acknowledge that the queries involved may be slower than "straight" queries against the data that does exist, but customers often have very specific ideas about what they want to see. Note that since I used Union queries and Cartesian products in order to produce the data, and Union queries and Cartesian products both result in recordsets that are read-only, these techniques are really only appropriate for Reports .