Reporting what’s not there

I tend to answer a lot of questions at the UtterAccess
Discussion Forums (http://www.utteraccess.com) A recent
question there got me thinking about the fact that it’s easy to write queries
that will show you data in the database that matches a criteria. However, if
you want details of when there is no data in the database that matches the
criteria, it becomes more difficult.

The specific
question
posed was something like:

How would i get this query to display a value of zero if
there are no records for CABLE_POINT.CABLE_POINT_TYPE ="Break Out Splice
Point" And CABLE_POINT.CAPACITY="12"

SELECT Count(CABLE_POINT.CABLE_POINT_TYPE) AS
[No of 12f Loop Joint]

FROM CABLE_POINT

GROUP BY CABLE_POINT.PROJECT_ID,
CABLE_POINT.CABLE_POINT_TYPE, CABLE_POINT.CAPACITY

HAVING
(((CABLE_POINT.PROJECT_ID)=Forms!frmReqForm!txtProject_ID) And
((CABLE_POINT.CABLE_POINT_TYPE)="Break Out Splice Point") And
((CABLE_POINT.CAPACITY)="12"));

The query runs from a form where I have a combo box to
select different projects according to their ID. Sometime a project might have
a record of CABLE_POINT.CABLE_POINT_TYPE ="Break Out Splice Point"
And CABLE_POINT.CAPACITY="12" present. Sometimes it won’t. Therefore
if there is no record present I want to the query to display "0"

Now, the problem with SQL Aggregate Function (such as Count, Avg, Min, Max or Sum) is that they only return values based on the number of rows selected that meet the given criteria. If no rows in the
table are returned, then there’s no value to be calculated. And while the DLookup
Domain Aggregate function will return Null in that case (so that you can use
the Nz function to return a value of zero), because nothing is returned by the
query, there’s nothing for you to grab to try and coerce into a value.

Query Basics

Before I go any further, I want to discuss a pet peeve of
mine with Totals queries generated in Access. They’re usually not generated
correctly!


SELECT fieldlist
FROM table
WHERE selectcriteria
GROUP BY groupfieldlist
[ HAVING groupcriteria]

Both WHERE and HAVING affect the details of the records
displayed, but there is a fundamental difference between them. WHERE specifies
which records are acted upon by the SELECT statement, whereas HAVING specifies
which records are displayed. What does this mean when doing Totals queries
(i.e. queries that use Aggregate functions)?

In the query posted to UtterAccess, the intent is to limit
the results to only those rows that meet the three given criteria. Since we
don’t want to see any records which don’t meet those criteria, they really
should be in a WHERE clause. While the difference may be subtle, when you’re dealing
with large tables, if you can reduce the number of rows that need to be
aggregated, you can speed the query up significantly. The HAVING clause should
be limited to those cases where the display criteria depends on the results of
grouping the records. In other words, whatever is specified in a HAVING clause
should include an aggregate function: HAVING Sum(UnitsInStock) > 100, HAVING
Count(*) <= 1 and so on.

As well, note that there’s no reason to include fields in the
GROUP BY clause unless those fields are included in the SELECT field list. In fact, having fields that don’t appear in the list can actually cause your query to return unexpected results: additional rows will be returned to account for changes in the fields not displayed.

Finally, using Count(*) is considerably faster than Count([FieldName]).
There is, of course, a difference between the two: the Count function doesn’t
count records that have a Null value in the field when you use a field name,
whereas when you use the asterisk wildcard character, the Count function calculates
the total number of records, including those that contain Null fields. Since in
this particular case a desired value is being specified in the WHERE clause, the
SELECT statement will never return a Null value for CABLE_POINT_TYPE (and, as I’ve
already mentioned, the query isn’t going to return a row with Null values if no
match is found). Besides, we want a zero value, and if Count ignores Null
values, that wouldn’t give us what we wanted anyhow!

All that being said, let me now rewrite the original query
more correctly, as shown in Listing 1.


SELECT COUNT(*) AS [No of 12f Loop Joint]
FROM CABLE_POINT
WHERE PROJECT_ID=Forms!frmReqForm!txtProject_ID
AND CABLE_POINT_TYPE=”Break Out Splice Point”
AND CAPACITY=”12″


Listing 1: SQL for a more efficient version of the original query

Now, I believe that the reason why the generated SQL is badly
formed is due to how people create such queries. The usual approach is to
select the table, drag the fields into the grid, change the query into a Totals
query and put criteria under the appropriate fields. The problem is, when you
convert a SELECT query to a Totals query, Access assumes a value of GROUP BY in
the Total row under each field. Many people don’t realize that you can change
that GROUP BY to WHERE to get the improvement I discuss above. Figure 1 shows
how to do this.

An illustration of how to generate a better Totals query in Access

Figure 1: An illustration of how to generate a better
Totals query in Access.

Forcing data

For the particular problem as posed, it’s relatively simple
to cause a row to be generated whether or not there’s data. Since we’re dealing
with totals, and adding zero to any number results in the same number, what
happens if we force a row with a zero value into the mix, and then sum the
resultant recordset?

We can create a UNION query (qryS2_UnionQuery, shown in
Listing 2) that adds a row having a value of 0:


SELECT COUNT(*) AS CountRows
FROM CABLE_POINT
WHERE PROJECT_ID=Forms!frmGetTotals!cboProject
AND CABLE_POINT_TYPE=”Break Out Splice Point”
AND CAPACITY=”12″
UNION
SELECT DISTINCT 0
FROM CABLE_POINT

Listing 2: SQL for qryS2_UnionQuery, a query that will ensure that the recordset returned will always have a least one row.

Figure 2 illustrates what this query will return if there is
data that matches on the three criteria, while Figure 3 shows what will be
returned when there is no matching data.

Resultant recordset when matching data exists in the table. Note that two rows are returned

Figure 2: Resultant recordset when matching data exists in the table. Note that two rows are returned.

Resultant recordset when matching data does not exist in the table. Note that only one row is returned.

Figure 3: Resultant recordset when matching data does not exist in the table. Note that only one row is returned.

Instead of the original query, we now can use:


SELECT Sum(CountRows) AS NumberOfJoints
FROM qryS2_UnionQuery

Listing 3: SQL for qryS2_SumBasedOnUnion, a query sums together the rows returned by qryS2_UnionQuery to return a single row.

Note that it’s actually possible to do this as a single
query:


SELECT Sum(SubSelect.CountRows) AS NumberOfJoints
FROM
(
SELECT Count(*) AS CountRows
FROM Cable_Point
WHERE (Project_ID=Forms![frmGetTotals]![cboProject])
AND (Cable_Point.Cable_Point_Type=Forms![frmGetTotals]![cboType])
AND (Cable_Point.Capacity=Forms![frmGetTotals]![cboCapacity])
UNION
SELECT DISTINCT 0
FROM Cable_Point) AS SubSelect

Listing 4: SQL for qryS2_SingleQuery, a query that combines qryS2_UnionQuery and qryS2_SumBasedOnUnion into a single query.

Incidentally, while what I’ve written above is syntactically
correct, Access may change the bracketing on that to the following:


SELECT Sum(SubSelect.CountRows) AS NumberOfJoints
FROM [SELECT Count(*) AS CountRows
FROM Cable_Point
WHERE (Project_ID=Forms![frmGetTotals]![cboProject])
AND (Cable_Point.Cable_Point_Type=Forms![frmGetTotals]![cboType])
AND (Cable_Point.Capacity=Forms![frmGetTotals]![cboCapacity])
UNION
SELECT DISTINCT 0
FROM Cable_Point]. AS SubSelect;

Listing 5: SQL for qryS2_SingleQuery, as Access will may render it.

Using Crosstab Queries

Crosstab queries calculate a value (sum, average, count, or
other type of total) for data that is grouped by two types of information — one
down the left side of the datasheet and another across the top. While a
crosstab query will generate Null values if there is no value for a particular
intersection of the left-side field and the top field, like regular Select
queries, it will not return Null values if there are no values for the
particular left side field for all top fields or vice versa. To illustrate what
I mean, the sample database that accompanies this
article has a simplistic example where a particular company has three different
products (with the imaginative names Product A, Product B and Product C) that it
sells to a total of five different customers. Figure 4 shows the
Entity-Relationship Diagram for the tables required to represent the orders
placed, while Figure 5 shows the limited data entered for this fictitious
company as generated by the SQL presented in Listing 6.

Entity-Relationship Diagram for the Order Taking system

Figure 4: Entity-Relationship Diagram for the Order Taking system.


SELECT Customers.CustomerId,
OrderDetails.OrderId,
OrderDetails.ProductId,
Customers.CustomerNm,
Products.ProductDs,
OrderDetails.QuantityAm,
Orders.OrderDt,
Format([OrderDt],”mmm, yyyy”) AS OrderMonth
Format([OrderDt],”yyyy-mm”) AS SortMonth
FROM Products
INNER JOIN ((Customers
INNER JOIN Orders
ON Customers.CustomerId = Orders.CustomerId)
INNER JOIN OrderDetails
ON Orders.OrderId = OrderDetails.OrderId)
ON Products.ProductId = OrderDetails.ProductId

Listing 6: SQL for qryS3_OrderDetails, a query that returns a denormalized view of all of the order details.

Resultant recordset from qryS3_OrderDetails, showing sample data

Figure 5: Resultant recordset from qryS3_OrderDetails, showing sample data.

To summarize the order details, we can create a Crosstab
query based on qryS3_OrderDetails as shown in Listing 7, producing the recordset
shown in Figure 6.


TRANSFORM Sum(QuantityAm) AS SumOfQuantityAm
SELECT OrderMonth
FROM qryS3_OrderDetails
GROUP BY OrderMonth, SortMonth
ORDER BY SortMonth
PIVOT ProductDs

Listing 7: SQL for qryS3_OrderDetails_Crosstab, a query that summarizes the data returned by qryS3_OrderDetails

Resultant recordset from qryS3_OrderDetails_Crosstab, showing summarized data

Figure 6: Resultant recordset from qryS3_OrderDetails_Crosstab, showing summarized data.

Looking at Figure 6, you might notice that there are no
orders listed for Product B, nor are there any orders listed for the month of
March (nor any month after April…). Usually our users are going to want to
see zeroes for all products not ordered in a given month.

We can create a query that returns one row for each month of
the current year using SQL such as is shown in Listing 8, resulting in a recordset
such as is shown in Figure 7.


SELECT DateSerial(Year(Date()),[MonthNumber],1) AS MonthStart,
DateSerial(Year(Date()),[MonthNumber]+1,0) AS MonthEnd,
Year(Date()) AS YearNumber,
Months.MonthNumber,
Format(DateSerial(Year(Date()),[MonthNumber],1),”yyyy-mm”) AS SortMonth
FROM Months
ORDER BY Months.MonthNumber;

Listing 8: SQL for qryS3_MonthsThisYear, a query that returns one row for each month in the current year.

(The Month table was described in last month’s article Uses
for Cartesian Products in MS Access
)

Resultant recordset from qryS3_MonthsThisYear.

Figure 7: Resultant recordset from qryS3_MonthsThisYear.

If we join qryS3_MonthsThisYear to the Products table, as shown in Listing 9, we can return one row for each product/month combination. (Look closely: this is another Cartesian product, as discussed last month!) Just to summarize that data, I created a crosstab query on qryS3_MonthsThisYear, shown in Figure 8.


SELECT qryS3_MonthsThisYear.MonthStart,
qryS3_MonthsThisYear.MonthEnd,
Products.ProductDs,
0 AS QuantityAm,
qryS3_MonthsThisYear.SortMonth
FROM Products, qryS3_MonthsThisYear
ORDER BY qryS3_MonthsThisYear.SortMonth, Products.ProductDs;

Listing 9: SQL for qryS3_AllProductMonthCombinations, a
query that returns one row for each combination of months in the current year
and products sold.

Resultant recordset from qryS3_AllProductMonthCombinations

Figure 8: Resultant recordset generated by a crosstab query based on query qryS3_AllProductMonthCombinations.

Now that we have a query that does have one row for all
possible outcomes, it’s fairly straightforward to join that table to our Order
Details using a Left Join, as shown in Figure 9, which produces the SQL
statement shown in Listing 10.

SQL for qryS3_AllProductMonthCombinations, a query that returns one row for each combination of months in the current year and products sold

Figure 9: Creating query qryS3_OrderDetails_AllProductMonthCombinations
using a Left Join.


SELECT Format(A.MonthStart,”mmm, yyyy”) AS OrderMonth,
A.ProductDs,
CLng(Nz(B.[QuantityAm],0)) AS QuantityAm,
A.SortMonth
FROM qryS3_AllProductMonthCombinations AS A
LEFT JOIN qryS3_OrderDetails AS B
ON (A.SortMonth = B.SortMonth)
AND (A.ProductDs = B.ProductDs)

Listing 10: SQL for qryS3_OrderDetails_AllProductMonthCombinations,
which will have at least one row for each combination of Month and Product.

Now, creating a crosstab based on
qryS3_OrderDetails_AllProductMonthCombinations, such as shown in Listing 11,
will result in having zeroes for any product not ordered in a particular month,
as shown in Figure 10.


TRANSFORM Sum(QuantityAm) AS SumOfQuantityAm
SELECT OrderMonth
FROM qryS3_OrderDetails_AllProductMonthCombinations
GROUP BY OrderMonth, SortMonth
ORDER BY SortMonth
PIVOT ProductDs

Listing 11: SQL for qryS3_OrderDetails_AllProductMonthCombinations_Crosstab,
which will include zeroes for all combinations not ordered.

 Resultant recordset from qryS3_OrderDetails_AllProductMonthCombinations_Crosstab

Figure 10: Resultant recordset from qryS3_OrderDetails_AllProductMonthCombinations_Crosstab

Conclusion

In the preceding discussion, I’ve attempted to show two
different scenarios where it’s necessary to create data in order to be able to
report zero values in queries. Now, I’ll acknowledge that the queries involved
may be slower than "straight" queries against the data that does
exist, but customers often have very specific ideas about what they want to
see. Note that since I used Union queries and Cartesian products in order to
produce the data, and Union queries and Cartesian products both result in recordsets
that are read-only, these techniques are really only appropriate for Reports .

»


See All Articles by Columnist
Doug Steele

Doug Steele
Doug Steele
Doug Steele has worked with databases, both mainframe and PC, for many years. Microsoft has recognized him as an Access MVP for his contributions to the Microsoft-sponsored newsgroups.

Latest Articles