Inner Join Introduction
An inner join is used when you need to match rows from two tables. Rows that match remain in the result, those that don’t are rejected. The match condition is commonly called the join condition. When the match conditions involve equality, that is matching exactly the contents of one column to another, the join is called an equijoin. You’ll find that most of the joins you’ll use are equijoins. A common situation is where you need to join the primary key of one table to the foreign key of another. This is needed when you are denormalizing data.
Note: The series starts with the article Introduction to Database Joins. All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database. You can get started using these free tools using my Guide Getting Started Using SQL Server. In this article we are going to cover inner joins.
Basic Structure of INNER JOIN
Below is an example of simple select statement with an INNER JOIN clause.
SELECT columnlist FROM maintable INNER JOIN secondtable ON join condition
Consider if you want to create a directory of people and their phone numbers. To do so you need to combine rows from the Person and PersonPhone tables.
The SQL to do this is:
SELECT Person.FirstName, Person.LastName, PersonPhone.PhoneNumber FROM Person.Person INNER JOIN Person.PersonPhone ON Person.BusinessEntityID = PersonPhone.BusinessEntityID
This type of join is called an equi-join, since we are using an equality for the join condition. It pretty safe to say that the vast majority of the joins you encounter are equi-joins. There are special circumstances where is makes sense to use other comparisons such as not equals or greater than. Those type of joins, called non equi-joins, are explored later in this post. Once the data is joined it can also be filtered and sorted using familiar WHERE and ORDER BY clauses. The order of the clauses is what you might expect:
SELECT columnlist FROM maintable INNER JOIN secondtable ON join condition WHERE filter condition ORDER BY columnlist
The WHERE clause can refer to any of the fields from the joined tables. To be clear, it is best practice to prefix the columns with the table name. The same goes for sorting. Any field can be used; however, it is common to sort by one or more fields you are selecting. Let’s take our phone directory, but only list people, in order of last name, those whose last names start with C. With this requirement in mind, our SQL statement becomes:
SELECT Person.FirstName, Person.LastName, PersonPhone.PhoneNumber FROM Person.Person INNER JOIN Person.PersonPhone ON Person.BusinessEntityID = PersonPhone.BusinessEntityID WHERE Person.LastName LIKE 'C%' ORDER BY Person.LastName
Notice the join condition remains the same. The only changes are the addition of the WHERE and ORDER BY clauses. By now these should be familiar to you.
As you create more complicated SQL statements they can become difficult to read. Much is of this is due to the verbose (wordy) nature of the language. You can cut down on some of this by using another name for your tables in statements. This is done using table aliases. Think of aliases like nicknames. They are easily defined. Just place them after the first reference of the table name. Here is a simple example where we alias the Person table as P:
SELECT P.FirstName, P.LastName FROM Person.Person AS P
In this example we used the letter P. It could really be any number of characters, but I like to keep aliases simple. Where aliases really come in handy is with joins. Let’s take last join we did, but this time write it by aliasing Person as P and PersonPhone as PP. Here is the statement
SELECT P.FirstName, P.LastName, PP.PhoneNumber FROM Person.Person AS P INNER JOIN Person.PersonPhone AS PP ON P.BusinessEntityID = PP.BusinessEntityID WHERE P.LastName LIKE 'C%' ORDER BY P.LastName
Regarding naming, I suppose you could alias your tables, A,B,C, but it can become hard to remember the aliases. My system to keep the aliases short, yet easy to remember, is it use the first letter of the table. If the table has two words, like PersonPhone, I’ll typically use the first letter of each word.
Join with More than One Field
In some cases you may find that you need to join two or more fields together. This happens when a table’s primary key consists of two or more columns. This is easily done using an AND operator and an additional join condition.
Consider an example where we want to know all the days and times each instructor teaches a class. In our example the primary key for the Class table is two fields: ClassName and Instructor. To construct the schedule we need to join Class to Section by matching both ClassName and Instructor.
SELECT C.Instructor, C.ClassName, S.Day, S.Hour FROM Class AS C INNER JOIN Section AS S ON C.ClassName = S.ClassName AND C.Instructor = S.Instructor ORDER BY C.Instructor, S.Day, S.Hour
Note: This example won’t work in the AdventureWorks2012 database
When creating joins it is important to pay attention to your data. In our example, if we mistakenly joined Class with the Section table using only the ClassName column, then each instruction’s class listing would also include schedules from other instructions that teach the same class. A joins will match as many rows as it can between tables. As long as the join condition is satisfied, combinations of rows are included. Because of this, it is important to understand each table’s primary key definition and to know how tables’ are meant to relate to one another.
Joining Three or More Tables
In the AdventureWorks2012 database, you need to join to the PhoneNumberType table to know whether a listed phone number is a home, cell, or office number. In order to create a directory of names, phone number types, and numbers three tables must be joined together. This relationship is shown below:
Including another table in our query is as simple as adding another INNER JOIN clause to our statement.
SELECT P.FirstName, P.LastName, PP.PhoneNumber, PT.Name FROM Person.Person AS P INNER JOIN Person.PersonPhone AS PP ON P.BusinessEntityID = PP.BusinessEntityID INNER JOIN Person.PhoneNumberType AS PT ON PP.PhoneNumberTypeID = PT.PhoneNumberTypeID WHERE P.LastName LIKE 'C%' ORDER BY P.LastName
I don’t think there is any hard fast rule to how you format the join clause, I tend to separate them on their own lines and when specifying the columns, place the “join from” column first, as:
INNER JOIN ToTable On FromTable.PKID = ToTable.FKID
For all the examples in this article, I first wrote the statement and then used TidySQL to automatically format them.
A self-join is when you join a table to itself. When you use a self-join it is important to alias the table. Suppose we wanted to list all departments within a department group. In the AdventureWorks2012 database this is done using the following self-join:
SELECT D1.Name, D2.Name FROM HumanResources.Department AS D1 INNER JOIN HumanResources.Department AS D2 ON d1.GroupName = d2.GroupName
Since the Department table is being joined to itself, we need distinguish the two occurrences of Department. They are aliased as D1 and D2. Other than the tables being the same, you see there is nothing remarkable about this type of join.
A non equi-join is just a fancy way of saying your join condition doesn’t have an equals sign in it. You may be scratching your head wondering why you would want to use a non equi-join. I agree, to be honest, it can be hard to think up examples; however, they do exist, and you’ll appreciate knowing how to use them when you need to do so. One reason to use a non equi-join is when you need to verify that you have clean data. Consider the AdventureWorks2012 Products table. You may want to check that the product name is unique for each product listed. One way to do this is to self-join the product table on Name.
SELECT P1.ProductID, P1.Name FROM Production.Product AS P1 INNER JOIN Production.Product AS P2 ON P1.Name = P2.Name ORDER BY P1.ProductNumber
This query allows us to compare values for each row that matches the Product Name. If Name is unique, then the only matches you should have are when the ProductID’s are equal. So, if we are looking for duplicate Product Name, it follows that we want to find records that match having different ProductID’s. This is shown in the diagram below
The SQL to return the duplicate records is:
SELECT P1.ProductID, P1.Name, P1.ProductNumber FROM Production.Product AS P1 INNER JOIN Production.Product AS P2 ON P1.Name = P2.Name AND P1.ProductID <> P2.ProductID ORDER BY P1.ProductNumber
You may wonder what happens first. Does the computer first match the names and then, once that is complete looks for ProductID’s that don’t match, or does is somehow evaluate both conditions as it builds the result? The beauty of SQL is that, in theory, the way you write your code shouldn’t have an influence on how the database actually retrieves the data. The order of the joins or conditions shouldn’t significantly influence the time a query runs. This will all be explored when we talk about the query optimize and how to review query plans.