Working With Columns That Contain Null Values
August 31, 2004
When a column in a row contains a NULL what does this mean? So what is a NULL? Is a NULL value a zero, space, or something else? From a SQL Server perspective a NULL is not a value, it only means that a value was not provided when the row was created. Special considerations need to be taken into account when dealing with nulls in SQL Server. This article will discuss some programming issues to be aware of when working with nulls, and some different techniques that can be used when dealing with nulls.
When comparing columns that have a NULL value you need to be aware how SQL Server will process your comparison. Suppose you have the following CLIENT table and you want to write a query to return all the rows where the LNAME is NULL:
ID FNAME LNAME ----------- ---------- ---------- 1 Jane Doe 1 Kathy Jones 3 Mary Poppins 4 Doris NULL 5 Melinda NULL
An entry level T-SQL programmer will sometimes assume that a NULL is just a value and code their query to look like this:
SELECT id, fname, lname from CLIENT where lname = NULL
This command does not get a syntax error, but it also may not return any rows, if your database or connection ANSI_NULL setting is set ON. An ANSI_NULL setting of ON is normally the default settings for databases and connections. When ANSI_NULL setting is set to ON, the ANSI SQL-92 standard states that any equal (=) or non equal (<>) statement must equate to FALSE when compared against a NULL. For this reason the WHERE clause in the above statement will always be false when the ANSI_NULL setting of a database or connection is ON. To find all the CLIENTS who have a NULL LNAME you need to use the IS NULL expression, like so when the ANSI_NULL setting is ON:
SELECT id, fname, lname FROM CLIENT where lname IS NULL
If you really want to use non-standard ANSI coding then you can get my first SELECT statement to return rows where the LNAME is NULL by changing the ANSI_NULL setting to OFF. Doing this will cause both the above SELECT statements to return the same results. Here is an example of how to change the connection settings for ANSI_NULL by issuing a SET command:
SET ANSI_NULLS OFF SELECT id, fname, lname FROM CLIENT WHERE lname = NULL
Setting Columns to NULL
When you are inserting a row into a table there are two different ways to set a column to NULL. The first method is to specify a value of NULL on the INSERT statement like so:
INSERT INTO CLIENT (ID,FNAME, LNAME) values(6,'Heather',NULL)
The second way is to specify the columns to be populated on the INSERT statement and exclude those columns where you want the column values to be set to NULL. Here is an example for this method:
INSERT INTO CLIENT (ID,FNAME) values(4,'Doris')
If a column is already set to a value, and you want to change it to NULL then you would need to specifically set it to NULL on an UPDATE statement, like so:
UPDATE CLIENT SET LNAME = NULL WHERE LNAME = 'Poppins'
Nulls in Aggregate function
When working with columns that have NULL values you need to be aware that NULL values may affect the results when these NULL value columns are used in aggregate functions. Let me show you what I am talking about. Say you have a table that looks like the following:
color number ----- ----------- Red 7 White 6 Blue 1 Red 7 White NULL Blue 1
Here I have a single row that has a number with a NULL value. So let me show you how this NULL value affects the results of the aggregate value for the color "White." Suppose I want to calculate the average value for the number column for each color. I will calculate the average two different ways to demonstrate how NULL values can affect the aggregate calculations. The first method will calculate the average using the SUM and COUNT functions, and the second method will calculate the average using the AVG function. The following query uses both of these methods to calculate the average for the number column for each color in the colors table:
SELECT color, sum(number)/count(*) as CAVG, AVG(number) as AVG from colors group by color
When I run this query I get the following results:
color CAVG AVG ----- ----------- ----------- Blue 1 1 Red 7 7 White 3 6 Warning: Null value is eliminated by an aggregate or other SET operation.
Note that the color "White" has two different values for the average, plus there is a warning message printed. The value for CAVG, the column I calculated using the SUM and COUNT functions, is different then the average that was calculated using the AVG function.
Why do these two different methods calculate different values? The reason for the different results is that most aggregate functions ignore columns that contain a NULL value, where as the COUNT function does not ignore columns that have a NULL value. Because the AVG function ignores columns with a NULL value, the row where the color was "White" and the number value was NULL was not considered when calculating the average for the color "White". However, since the COUNT function does not ignore columns that contain a NULL value, the row that contained a NULL value was considered when the SUM and COUNT functions where used to calculate the average.
You can make the COUNT function behave like the AVG function, and make it ignore columns that contain a NULL value. This is done by specifying the actual column name within the COUNT function call. By changing my original query to look like the query below, the AVG function and the SUM / COUNT functions calculation produce the same average value:
SELECT color, sum(number)/count(number) as CAVG, AVG(number) as AVG from colors group by color
Should you want the AVG function to include columns that contain a NULL value, then you can use the ISNULL function within the AVG function call to accomplish this. Here is a query that uses the ISNULL function to make the AVG function calculate the same average as my original SUM / COUNT calculation:
SELECT color, sum(number)/count(*) as CAVG, AVG(ISNULL(number,0)) as AVG from colors group by color
As you can see from these examples aggregating columns that contain a NULL value might give you different results depending on how you calculate the aggregate. You need to understand why the different results occur in order to make sure you are actually getting the calculated results that you desire.
Understanding how to work with and how columns that contain a NULL value affect your queries is an important concept. I hope that you now have a better understanding of how to compare columns that contain a NULL value, plus make sure you get the correct results when using aggregate functions across columns that contain a NULL value.