Introduction to Outer Joins
Outer joins are used to match rows from two tables. Even if there is no match rows are included. Rows from one of the tables are always included, for the other, when there are no matches, NULL values are included.
The series starts with the article Introduction to Database Joins. All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database. You can get started using these free tools using my Guide Getting Started Using SQL Server. In this article we are going to cover outer joins.
Types of Outer Joins
There are three types of outer joins:
- Left Outer Join – All rows from the left table are included, unmatched rows from the right are replaced with NULL values.
- Right Outer Join – All rows from the right table are included, unmatched rows from the left are replaced with NULL values.
- Full Outer Join – All rows from both tables are included, NULL values fill unmatched rows.
Let’s dig a deeper and explore the left outer join.
Left Outer Join
Check out the following data model. This is taken from the AdventureWorks2012 database. In this model, there is 1 Person to 0 or 1 Employees.
To construct a list of all Person LastNames, yet also show JobTitle if the Person is an Employee, we need a way of joining the two tables and include Person rows in the result, even if they don’t match Employee.
This type of join is called a left outer join, as all the rows for the table from the left side of the JOIN keyword are included regardless of the match. The basic syntax for a left outer join is:
SELECT columnlist FROM table LEFT OUTER JOIN othertable ON join condition
The SQL for the join in the diagram above is:
SELECT person.Person.BusinessEntityID, Person.Person.LastName, HumanResources.Employee.NationalIDNumber, HumanResources.Employee.JobTitle FROM person.Person LEFT OUTER JOIN HumanResources.Employee ON person.BusinessEntityID = Employee.BusinessEntityID
Here are the first results from the query
Notice how there are NULL values listed in the second row for NationalIDNumber and JobTitle. This is because there are no employees matching BusinessEntityID 293.
Right Outer Join
Let’s take another look at the diagram, but this time we are doing a right outer join. As you may have guessed, there isn’t too much difference in the SQL statement between a left outer join and a right outer join. The basic syntax for a right outer join is:
SELECT columnlist FROM table RIGHT OUTER JOIN othertable ON join condition
Below is our sample query written as a right outer join:
SELECT person.Person.BusinessEntityID, Person.Person.LastName, HumanResources.Employee.NationalIDNumber, HumanResources.Employee.JobTitle FROM person.Person RIGHT OUTER JOIN HumanResources.Employee ON person.BusinessEntityID = Employee.BusinessEntityID
The key difference is that now we are going to return all records from the Employee table, which is the table to the right of the join keyword. If a matching Employee record isn’t found, then NULL will be returned for BusinessEntityID and LastName.
Here are the results from the query.
I scrolled through all the results and was surprised to not see any null values.
Do you know why?
The answer lies in the data model. There is a 0..1 to 1 relationship between Employee and Person. This means that for every Employee there is one Person. Given this, for the right join there won’t exist any non-matching rows. With this type of relationship you could have also used an inner join.
Left versus Right Outer Joins
There is no difference in functionality between a left outer join and a right outer join.
SELECT person.Person.BusinessEntityID, HumanResources.Employee.NationalIDNumber FROM person.Person LEFT OUTER JOIN HumanResources.Employee ON person.BusinessEntityID = Employee.BusinessEntityID
Returns the same result as
SELECT person.Person.BusinessEntityID, HumanResources.Employee.NationalIDNumber FROM HumanResources.Employee RIGHT OUTER JOIN person.Person ON person.BusinessEntityID = Employee.BusinessEntityID
Of course this wouldn’t be the case if I had only changed the join from LEFT to RIGHT and not switched the table names.
I typically use left outer joins more than I do right outer joins. I think this is because when I draw relationships I do so left to right. Also, I traverse tables in my head from left to right.
This the fits in well with SQL as the “left” table is in the FROM statement.
I’m curious to know what you use. I’m really curious to know whether a right joins seem more intuitive to you if you’re native speaker of Arabic or some other “right to left” language.
Full Outer Join
A full outer join is the combination of results from a left and right outer join. The results returned from this type of join include all rows from both tables. Where matches occur, values are related. Where matched from either table don’t, then NULL are returned instead.
The basic syntax for a full outer join is:
SELECT columnlist FROM table FULL OUTER JOIN othertable ON join condition
Let’s take a look at a different portion of the AdventureWork2012 database. This time, we are going to focus on the relationships between SalesOrderHeader and CurrencyRate tables.
The model is shown below:
Suppose we want to know all the currencies we can place orders in and which orders were placed in those currencies?
SELECT sales.SalesOrderHeader.AccountNumber, sales.SalesOrderHeader.OrderDate, sales.CurrencyRate.ToCurrencyCode, sales.CurrencyRate.AverageRate FROM sales.SalesOrderHeader FULL OUTER JOIN sales.CurrencyRate ON sales.CurrencyRate.CurrencyRateID = sales.SalesOrderHeader.CurrencyRateID
Here is a portion of the results showing where some sales have match to a currency and some that haven’t. The reason there are sales that don’t match is that these are sales in USD.
Further down in the results you see currencies with no matching sales. This reflects the fact that no sales were made in those currencies.
Note: I was surprised to see USD listed, see row 42463, since I would think a majority of the sales would be in this currency. My thought is that rather than reverence the currency rate for these transaction, the SalesOrderHeader vale for CurrencyRateID was set to null for all USD transactions. I think this is inconsistent, and isn’t the way I would do it, but it isn’t my database…
So far we’ve looked at the three types of outer joins but haven’t explored some more advanced concepts such as joining multiple table and using more than one condition in our join clauses.
We covered these concepts when we explored inner joins, so what I’ll be showing you, shouldn’t be too new, but I think it still makes sense to review, since in some cases mixing full joins with inner joins may produce unexpected or unintended results.
Let’s turn our focus to the production schema and explore products and categories. Let’s produce a list of all product categories and the product models contained within.
Product has a one to many relationship with ProductModel and ProductSubcategory. Since it lies between these two tables, there is an implicit many to many relationship between ProductModel and ProductSubcategory. Because of this, it is a good candidate for outer joins as there is may be product models with no assigned products and ProductSubcategory entries with no product.
To overcome this situation we will do an outer join to both the ProductModel and ProductCategory table.
Here is the SQL
SELECT PC.Name AS Category, PSC.Name AS Subcategory, PM.Name AS Model, P.Name AS Product FROM Production.Product AS P FULL OUTER JOIN Production.ProductModel AS PM ON PM.ProductModelID = P.ProductModelID FULL OUTER JOIN Production.ProductSubcategory AS PSC ON PSC.ProductSubcategoryID = P.ProductSubcategoryID INNER JOIN Production.ProductCategory AS PC ON PC.ProductCategoryID = PSC.ProductCategoryID ORDER BY PC.Name, PSC.Name
There are several items to note:
- I used table aliases to make the SQL more readable.
- There is more than one full outer join clause.
- The ProductCategory table is also part of an outer join
Originally when I wrote the SQL for this query I had an inner join between ProductSubcategory and ProductCategory, but I wasn’t seeing NULL values for unmatched records I would expect.
Once I changed the join to a full outer join I saw the results I expected. The reason this occurs is subtle.
After checking the data I confirmed that all categories are assigned subcategories. Given this you would think an inner join would work; however, consider that as the entire statement is executed and rows are returned, the ProductSubcategoryID value is NULL whenever a product fails to match a product subcategory.
Null values, by definition, aren’t equal to one another, so the inner join fails. Given this, when these values are then matched to ProductCategory they aren’t included in the result unless the join to ProductCategory is an outer join.
In fact, the join doesn’t have to be a full outer join, a left join works just as well:
SELECT PC.Name AS Category, PSC.Name AS Subcategory, PM.Name AS Model, P.Name AS Product FROM Production.Product AS P FULL OUTER JOIN Production.ProductModel AS PM ON PM.ProductModelID = P.ProductModelID FULL OUTER JOIN Production.ProductSubcategory AS PSC ON PSC.ProductSubcategoryID = P.ProductSubcategoryID LEFT OUTER JOIN Production.ProductCategory AS PC ON PC.ProductCategoryID = PSC.ProductCategoryID ORDER BY PC.Name, PSC.Name
Uses for Outer Joins
Because outer joins not only the matching rows but also those that don’t, they are a really good way to find missing entries in tables. This are great when you need to do diagnosis on your database to determine if there are data integrity issues.
For example suppose we were concerned that we may have some ProductSubcategory entries that don’t match to Categories. We could test by running the following SQL
SELECT PSC.Name AS Subcategory FROM Production.ProductCategory AS PSC LEFT OUTER JOIN Production.ProductSubcategory AS PC ON PC.ProductCategoryID = PSC.ProductCategoryID WHERE PSC.ProductCategoryID is NULL
The outer join returns the unmatched row values as NULL values. The where clause filters on the non-null values, leaving only nonmatching Subcategory names for us to review.
Outer joins can also be used to ask questions such as:
“What sales persons have never made a sale?”
“What products aren’t assigned to a product model?”
“Which departments don’t have any assigned employees?”
“List all sales territories not assigned sales people.”