What is a Database NULL Value?

Null and unknown values

What is a Null Value?

In databases a common issue is what value or placeholder do you use to represent a missing values.   In SQL, this is solved with null.  It is used to signify missing or unknown values.  The keyword NULL is used to indicate these values.  NULL really isn’t a specific value as much as it is an indicator.  Don’t think of NULL as similar to zero or blank, it isn’t the same.  Zero (0) and blanks “ “, are values.

In most of our beginning lessons we’ve assumed all tables contained data; however, SQL treats missing values differently.  It is important to understand how missing values are used and their effect on queries and calculations.

Database Tables and Values

In many situations every row and column will contain data, but there cases where it makes sense for some columns to not contain a value.

Customer 

CustomerID Name City State Age Sex
2 Bob Peterson New York NY 55 M
3 Sue Linkle Omaha NE F
4 Chris Green MI
5 Lori Otterman Oslo
6 Acme Plumbing Austin TX

 

Consider the above Customer table.  Several columns contain missing or unknown values.  Reasons why this can happen include a value is:

  • missing – Perhaps a customer, such as Sue, doesn’t divulge her age to your customer service representative.
  • unknown – An employee’s termination date is usually some event in the unforeseen future.
  • doesn’t apply – If the customer is a business, then Sex doesn’t apply.

You could argue that for text values you could use blanks, such as one space ‘ ‘, or even an empty value, which is two single quotes ‘’ to represent a missing value.  Yet this strategy doesn’t work well for numbers or dates.  If the customer’s age is unknown, what numeric value would you use?  Would it make sense to use zero or a negative number?

I think doing that causes more confusion and would make it very easy to skew results.  For instance, if you were going to calculate the average age of your female customers, the zeros, the ones you were using to represent missing values, would cause the actual average age to be lower than you would expect.

SQL reserves the NULL keyword to denote an unknown or missing value.  In SQL NULL’s are treated differently than values.  Special consideration is required when comparing null value or using them in expressions.

NULL Value in Comparisons:

When it isn’t possible to specially code your data using “N/A” you can use the special keyword NULL to denote a missing value.  NULL is tricky.  NULL isn’t a value in the normal sense.  For instance no two NULL are equal to each other.  Surprisingly NULL = NULL is FALSE!

SQL covers this dilemma though.  You can use the IS NULL and IS NOT NULL comparisons to test for NULL values.

For example, the following query from the AdventureWorks2012 database

SELECT ProductID,
       Name,
       Color
  FROM Production.Product
 WHERE Color IS NULL

will return all products whose color is missing; whereas

SELECT ProductID,
       Name,
       Color
  FROM Production.Product
 WHERE Color IS NOT NULL

will return all products having a color value.

NULLS in Expressions

As you can expect nulls have an adverse effect in expressions.  Since NULL denotes an undefined value, its participation in most expressions renders the expression unknown, or NULL, as well.

Check out the following arithmetic expression.  The first line evaluates to a number the others to NULL:

  • 20 + (5 * 4) = 20 + 20 = 40
  • NULL + (5 * 4) = NULL + 20 = NULL
  • 20 + (NULL * 4) = 20 + NULL = NULL

NULL has the same effect on text. Expressions:

  • ‘Happy’ + ‘ ‘ + ‘Holidays’ = ‘Happy Holidays’
  • NULL + ‘ ‘ + ‘Holidays’ = NULL

Try running the following query in the AdventureWorks2012 database:

SELECT ProductID,
       Color + ' ' +Name
  FROM Production.Product

Did you notice you either get value like “Black Chainring” or NULL?  You may expect to see just the product‘s name if the color is missing, but since the color is NULL, the expression is also NULL.

Without help, it would be tough to create expressions, as the NULL’s would “ruin” most results!

In this case you can use the COALESCE function to assist.  This function will return the first non-NULL value from a list of parameters.  For example

COALESCE(NULL, 'A', 'B')

Returns ‘A’, since it is the first non-NULL value found.

We can use COALESCE to replace a NULL with another value.  In this way we can then continue to build an expression that will return a result free of NULL value.

Our improved SQL statement is:

SELECT ProductID,
       COALESCE(Color + ' ','') + Name
FROM   Production.Product

How does the work?  I’ve highlighted some important bits in the statement:

  • If color is the value “Black,” then the COALESCE function will return “Black “ with the training space.  The product name is then appended to form the full value.
  •  If the color is NULL, then COALESCE seeks out the first non-null value.  This happens to be the empty string ”.  Using this trick, it makes it really easy to return either the combination of two columns, or just the second column.

COALESCE Function

 

NULL Values in Where Clauses

Where clauses are used to limit the rows returned from a query.  Generally only rows meeting the where clause are returned.  Rows whose where clause evaluates to FALSE are removed from the result.

In similar fashion if the where clause evaluates to NULL, the row is eliminated.

SELECT ProductID,
       Name,
       Color
FROM   Production.Product
WHERE  LEN(COLOR) < 100

Will return all rows where a color is specified.  LEN is a function that returns the number of characters in a value; LEN(‘Black’) returns 5.  Since this is less than 100 it would be included in the result.

In cases where COLOR is NULL, then LEN(COLOR) returns NULL.  Since NULL < 100 is always false, the row is removed from the result.

 

NULL in Boolean Expressions (Intermediate)

Since NULL represents a missing or unknown value, its effect on Boolean expressions is curious.  It status as an unknown value puts a spin on the outcome.

I think the results for AND are what you would expect, but the results you would expect for OR are not.  In many cases when a value is unknown, the uncertainty of the value translates through the expression.

Boolean AND Operator

The AND operator returns a TRUE only if all conditions are also TRUE.  The following truth table shows all combinations of values the condition (A AND B).  Keep in mind we use NULL to denote unknown values.

Condition A Condition B Result
TRUE FALSE FALSE
TRUE TRUE TRUE
TRUE NULL NULL
FALSE FALSE FALSE
FALSE TRUE FALSE
FALSE NULL NULL
NULL FALSE NULL
NULL TRUE NULL
NULL NULL NULL

The one of the inputs being unknown, there is no possibility the expression can be TRUE, as we can’t be certain both inputs are TRUE.  Conversely, we don’t know whether the unknown value is FALSE.  This leaves to concluding the result, in general is unknown.

Boolean OR Operator

The OR operator returns a TRUE when one or more conditions are also TRUE.  Here is the Truth table for the OR operator.

Condition A Condition B Result
TRUE FALSE TRUE
TRUE TRUE TRUE
TRUE NULL TRUE
FALSE FALSE FALSE
FALSE TRUE TRUE
FALSE NULL NULL
NULL FALSE NULL
NULL TRUE TRUE
NULL NULL NULL

You’ll see that in every case one of the conditions is true, so is the end result.  This is even the case when the other input is NULL.  All that matters is one input is TRUE, the other input, albeit unknown, is irrelevant.

If you have other questions you want answered, then post a comment or tweet me.  I’m here to help you. What other topics would you like to know more about?

Click Here to Leave a Comment Below 0 comments