Working With Columns That Contain Null Values

When a column in a row contains a NULL what does this mean?
So what is a NULL? Is a NULL value a zero, space, or something else? From a
SQL Server perspective a NULL is not a value, it only means that a value was
not provided when the row was created. Special considerations need to be taken
into account when dealing with nulls in SQL Server. This article will discuss
some programming issues to be aware of when working with nulls, and some
different techniques that can be used when dealing with nulls.

Comparing Nulls

When comparing columns that have a NULL value you need to be
aware how SQL Server will process your comparison. Suppose you have the
following CLIENT table and you want to write a query to return all the rows
where the LNAME is NULL:


ID FNAME LNAME
———– ———- ———-
1 Jane Doe
1 Kathy Jones
3 Mary Poppins
4 Doris NULL
5 Melinda NULL

An entry level T-SQL programmer will sometimes assume that a
NULL is just a value and code their query to look like this:


SELECT id, fname, lname from CLIENT
where lname = NULL

This command does not get a syntax error, but it also may not
return any rows, if your database or connection ANSI_NULL setting is set ON.
An ANSI_NULL setting of ON is normally the default settings for databases and
connections. When ANSI_NULL setting is set to ON, the ANSI SQL-92 standard
states that any equal (=) or non equal (<>) statement must equate to
FALSE when compared against a NULL. For this reason the WHERE clause in the
above statement will always be false when the ANSI_NULL setting of a database
or connection is ON. To find all the CLIENTS who have a NULL LNAME you need to
use the IS NULL expression, like so when the ANSI_NULL setting is ON:


SELECT id, fname, lname FROM CLIENT
where lname IS NULL

If you really want to use non-standard ANSI coding then you
can get my first SELECT statement to return rows where the LNAME is NULL by
changing the ANSI_NULL setting to OFF. Doing this will cause both the above
SELECT statements to return the same results. Here is an example of how to
change the connection settings for ANSI_NULL by issuing a SET command:


SET ANSI_NULLS OFF
SELECT id, fname, lname FROM CLIENT
WHERE lname = NULL

Setting Columns to NULL

When you are inserting a row into a table there are two
different ways to set a column to NULL. The first method is to specify a value
of NULL on the INSERT statement like so:

INSERT INTO CLIENT (ID,FNAME, LNAME) values(6,'Heather',NULL)

The second way is to specify the columns to be populated on
the INSERT statement and exclude those columns where you want the column values
to be set to NULL. Here is an example for this method:

INSERT INTO CLIENT (ID,FNAME) values(4,'Doris')

If a column is already set
to a value, and you want to change it to NULL then you would need to
specifically set it to NULL on an UPDATE statement, like so:


UPDATE CLIENT
SET LNAME = NULL
WHERE LNAME = ‘Poppins’

Nulls in Aggregate function

When working with columns that have NULL values you need to
be aware that NULL values may affect the results when these NULL value columns
are used in aggregate functions. Let me show you what I am talking about. Say
you have a table that looks like the following:


color number
—– ———–
Red 7
White 6
Blue 1
Red 7
White NULL
Blue 1

Here I have a single row that has a number with a NULL value.
So let me show you how this NULL value affects the results of the aggregate value
for the color “White.” Suppose I want to calculate the average value for the
number column for each color. I will calculate the average two different ways
to demonstrate how NULL values can affect the aggregate calculations. The first
method will calculate the average using the SUM and COUNT functions, and the
second method will calculate the average using the AVG function. The following
query uses both of these methods to calculate the average for the number column
for each color in the colors table:


SELECT color, sum(number)/count(*) as CAVG, AVG(number) as AVG
from colors group by color

When I run this query I get
the following results:


color CAVG AVG
—– ———– ———–
Blue 1 1
Red 7 7
White 3 6

Warning: Null value is eliminated by an aggregate or other SET operation.

Note that the color “White” has two different values for the average,
plus there is a warning message printed. The value for CAVG, the column I
calculated using the SUM and COUNT functions, is different then the average that
was calculated using the AVG function.

Why do these two different methods calculate different
values? The reason for the different results is that most aggregate functions
ignore columns that contain a NULL value, where as the COUNT function does not
ignore columns that have a NULL value. Because the AVG function ignores
columns with a NULL value, the row where the color was “White” and the number
value was NULL was not considered when calculating the average for the color
“White”. However, since the COUNT function does not ignore columns that
contain a NULL value, the row that contained a NULL value was considered when
the SUM and COUNT functions where used to calculate the average.

You can make the COUNT function behave like the AVG
function, and make it ignore columns that contain a NULL value. This is done
by specifying the actual column name within the COUNT function call. By
changing my original query to look like the query below, the AVG function and
the SUM / COUNT functions calculation produce the same average value:


SELECT color, sum(number)/count(number) as CAVG, AVG(number) as AVG
from colors group by color

Should you want the AVG function to include columns that
contain a NULL value, then you can use the ISNULL function within the AVG
function call to accomplish this. Here is a query that uses the ISNULL
function to make the AVG function calculate the same average as my original SUM
/ COUNT calculation:


SELECT color, sum(number)/count(*) as CAVG, AVG(ISNULL(number,0)) as AVG
from colors group by color

As you can see from these examples aggregating columns that
contain a NULL value might give you different results depending on how you
calculate the aggregate. You need to understand why the different results
occur in order to make sure you are actually getting the calculated results
that you desire.

Conclusion

Understanding how to work with and how columns that contain
a NULL value affect your queries is an important concept. I hope that you now
have a better understanding of how to compare columns that contain a NULL value,
plus make sure you get the correct results when using aggregate functions across
columns that contain a NULL value.

»


See All Articles by Columnist
Gregory A. Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles