6

Introduction to Outer Joins

Outer Joins

Outer joins are used to match rows from two tables.  Even if there is no match rows are included.  Rows from one of the tables are always included, for the other, when there are no matches, NULL values are included.

The series starts with the article Introduction to Database Joins.  All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database.  You can get started using these free tools using my Guide Getting Started Using SQL Server. In this article we are going to cover outer joins.

Types of Outer Joins

There are three types of outer joins:

  • Left Outer Join – All rows from the left table are included, unmatched rows from the right are replaced with NULL values.
  • Right Outer Join – All rows from the right table are included, unmatched rows from the left are replaced with NULL values.
  • Full Outer Join – All rows from both tables are included, NULL values fill unmatched rows.

Let’s dig a deeper and explore the left outer join.

Left Outer Join

Check out the following data model.  This is taken from the AdventureWorks2012 database.  In this model, there is 1 Person to 0 or 1 Employees.

Datamodel Used for Outer Join

To construct a list of all Person LastNames, yet also show JobTitle if the Person is an Employee, we need a way of joining the two tables and include Person rows in the result, even if they don’t match Employee.

This type of join is called a left outer join, as all the rows for the table from the left side of the JOIN keyword are included regardless of the match.  The basic syntax for a left outer join is:

SELECT columnlist
FROM   table
LEFT OUTER JOIN othertable ON join condition

The SQL for the join in the diagram above is:

SELECT person.Person.BusinessEntityID,
       Person.Person.LastName,
       HumanResources.Employee.NationalIDNumber,
       HumanResources.Employee.JobTitle
FROM   person.Person
LEFT OUTER JOIN
      HumanResources.Employee
      ON person.BusinessEntityID = Employee.BusinessEntityID

Here are the first results from the query

Left Outer Join Results

Notice how there are NULL values listed in the second row for NationalIDNumber and JobTitle.  This is because there are no employees matching BusinessEntityID 293.

Right Outer Join

Let’s take another look at the diagram, but this time we are doing a right outer join.  As you may have guessed, there isn’t too much difference in the SQL statement between a left outer join and a right outer join.  The basic syntax for a right outer join is:

SELECT columnlist
FROM   table
RIGHT OUTER JOIN othertable ON join condition

Below is our sample query written as a right outer join:

SELECT person.Person.BusinessEntityID,
       Person.Person.LastName,
       HumanResources.Employee.NationalIDNumber,
       HumanResources.Employee.JobTitle
FROM   person.Person
RIGHT OUTER JOIN
      HumanResources.Employee
      ON person.BusinessEntityID = Employee.BusinessEntityID

The key difference is that now we are going to return all records from the Employee table, which is the table to the right of the join keyword.  If a matching Employee record isn’t found, then NULL will be returned for BusinessEntityID and LastName.

Here are the results from the query.

Right Outer Join REsults

I scrolled through all the results and was surprised to not see any null values.

Do you know why?

The answer lies in the data model.  There is a 0..1 to 1 relationship between Employee and Person.  This means that for every Employee there is one Person.  Given this, for the right join there won’t exist any non-matching rows.  With this type of relationship you could have also used an inner join.

Left versus Right Outer Joins

There is no difference in functionality between a left outer join and a right outer join.

The statement

SELECT person.Person.BusinessEntityID,
       HumanResources.Employee.NationalIDNumber
FROM   person.Person
LEFT OUTER JOIN
       HumanResources.Employee
      ON person.BusinessEntityID = Employee.BusinessEntityID

Returns the same result as

SELECT person.Person.BusinessEntityID,
       HumanResources.Employee.NationalIDNumber
FROM   HumanResources.Employee
RIGHT OUTER JOIN
      person.Person
      ON person.BusinessEntityID = Employee.BusinessEntityID

Of course this wouldn’t be the case if I had only changed the join from LEFT to RIGHT and not switched the table names.

I typically use left outer joins more than I do right outer joins.  I think this is because when I draw relationships I do so left to right.  Also, I traverse tables in my head from left to right.

This the fits in well with SQL as the “left” table is in the FROM statement.

I’m curious to know what you use.  I’m really curious to know whether a right joins seem more intuitive to you if you’re native speaker of Arabic or some other “right to left” language.

Full Outer Join

A full outer join is the combination of results from a left and right outer join.  The results returned from this type of join include all rows from both tables.  Where matches occur, values are related.  Where matched from either table don’t, then NULL are returned instead.

The basic syntax for a full outer join is:

SELECT columnlist
FROM   table
FULL OUTER JOIN othertable ON join condition

Let’s take a look at a different portion of the AdventureWork2012 database.  This time, we are going to focus on the relationships between SalesOrderHeader and CurrencyRate tables.

The model is shown below:

Full Outer Join Model

Suppose we want to know all the currencies we can place orders in and which orders were placed in those currencies?

SELECT sales.SalesOrderHeader.AccountNumber,
       sales.SalesOrderHeader.OrderDate,
       sales.CurrencyRate.ToCurrencyCode,
       sales.CurrencyRate.AverageRate
FROM   sales.SalesOrderHeader
FULL OUTER JOIN
       sales.CurrencyRate
       ON sales.CurrencyRate.CurrencyRateID = 
          sales.SalesOrderHeader.CurrencyRateID

Here is a portion of the results showing where some sales have match to a currency and some that haven’t.  The reason there are sales that don’t match is that these are sales in USD.

Full Outer Join Results 1

Further down in the results you see currencies with no matching sales.  This reflects the fact that no sales were made in those currencies.

Full Outerjoins 2

Note:  I was surprised to see USD listed, see row 42463, since I would think a majority of the sales would be in this currency.  My thought is that rather than reverence the currency rate for these transaction, the SalesOrderHeader vale for CurrencyRateID was set to null for all USD transactions.  I think this is inconsistent, and isn’t the way I would do it, but it isn’t my database…

Advanced Example

So far we’ve looked at the three types of outer joins but haven’t explored some more advanced concepts such as joining multiple table and using more than one condition in our join clauses.

We covered these concepts when we explored inner joins, so what I’ll be showing you, shouldn’t be too new, but I think it still makes sense to review, since in some cases mixing full joins with inner joins may produce unexpected or unintended results.

Let’s turn our focus to the production schema and explore products and categories.  Let’s produce a list of all product categories and the product models contained within.

Product has a one to many relationship with ProductModel and ProductSubcategory.  Since it lies between these two tables, there is an implicit many to many relationship between ProductModel and ProductSubcategory.   Because of this, it is a good candidate for outer joins as there is may be product models with no assigned products and ProductSubcategory entries with no product.

Product Category Datamodel

To overcome this situation we will do an outer join to both the ProductModel and ProductCategory table.

Here is the SQL

SELECT   PC.Name AS Category,
         PSC.Name AS Subcategory,
         PM.Name AS Model,
         P.Name AS Product
FROM     Production.Product AS P
FULL OUTER JOIN
         Production.ProductModel AS PM
         ON PM.ProductModelID = P.ProductModelID
FULL OUTER JOIN
         Production.ProductSubcategory AS PSC
         ON PSC.ProductSubcategoryID = P.ProductSubcategoryID
INNER JOIN
         Production.ProductCategory AS PC
         ON PC.ProductCategoryID = PSC.ProductCategoryID
ORDER BY PC.Name, PSC.Name

There are several items to note:

  • I used table aliases to make the SQL more readable.
  • There is more than one full outer join clause.
  • The ProductCategory table is also part of an outer join

Originally when I wrote the SQL for this query I had an inner join between ProductSubcategory and ProductCategory, but I wasn’t seeing NULL values for unmatched records I would expect.

Once I changed the join to a full outer join I saw the results I expected.  The reason this occurs is subtle.

After checking the data I confirmed that all categories are assigned subcategories.  Given this you would think an inner join would work; however, consider that as the entire statement is executed and rows are returned, the ProductSubcategoryID value is NULL whenever a product fails to match a product subcategory.

Null values, by definition, aren’t equal to one another, so the inner join fails.  Given this, when these values are then matched to ProductCategory they aren’t included in the result unless the join to ProductCategory is an outer join.

In fact, the join doesn’t have to be a full outer join, a left join works just as well:

SELECT   PC.Name AS Category,
PSC.Name AS Subcategory,
PM.Name AS Model,
P.Name AS Product
FROM     Production.Product AS P
FULL OUTER JOIN
Production.ProductModel AS PM
ON PM.ProductModelID = P.ProductModelID
FULL OUTER JOIN
Production.ProductSubcategory AS PSC
ON PSC.ProductSubcategoryID = P.ProductSubcategoryID
LEFT OUTER JOIN
         Production.ProductCategory AS PC
         ON PC.ProductCategoryID = PSC.ProductCategoryID
ORDER BY PC.Name, PSC.Name

Uses for Outer Joins

Because outer joins not only the matching rows but also those that don’t, they are a really good way to find missing entries in tables.  This are great when you need to do diagnosis on your database to determine if there are data integrity issues.

For example suppose we were concerned that we may have some ProductSubcategory entries that don’t match to Categories.  We could test by running the following SQL

SELECT   PSC.Name AS Subcategory
FROM     Production.ProductCategory AS PSC
LEFT OUTER JOIN
         Production.ProductSubcategory AS PC
         ON PC.ProductCategoryID = PSC.ProductCategoryID
WHERE PSC.ProductCategoryID is NULL

The outer join returns the unmatched row values as NULL values.  The where clause filters on the non-null values, leaving only nonmatching Subcategory names for us to review.

Outer joins can also be used to ask questions such as:

“What sales persons have never made a sale?”

“What products aren’t assigned to a product model?”

“Which departments don’t have any assigned employees?”

“List all sales territories not assigned sales people.”

Click Here to Leave a Comment Below 6 comments