6

Inner Join Introduction

An inner join is used when you need to match rows from two tables.  Rows that match remain in the result, those that don’t are rejected. The match condition is commonly called the join condition.  When the match conditions involve equality, that is matching exactly the contents of one column to another, the join is called an equijoin.  You’ll find that most of the joins you’ll use are equijoins.  A common situation is where you need to join the primary key of one table to the foreign key of another.  This is needed when you are denormalizing data.

Note:  The series starts with the article Introduction to Database Joins.  All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database.  You can get started using these free tools using my Guide Getting Started Using SQL Server. In this article we are going to cover inner joins.

Basic Structure of INNER JOIN

Below is an example of simple select statement with an INNER JOIN clause.

SELECT columnlist
FROM maintable
INNER JOIN secondtable ON join condition

Consider if you want to create a directory of people and their phone numbers.  To do so you need to combine rows from the Person and PersonPhone tables.

Person-To-Person Data Model

The SQL to do this is:

SELECT  Person.FirstName,
        Person.LastName,
        PersonPhone.PhoneNumber
 FROM   Person.Person
        INNER JOIN
        Person.PersonPhone
        ON Person.BusinessEntityID = 
           PersonPhone.BusinessEntityID

This type of join is called an equi-join, since we are using an equality for the join condition.  It pretty safe to say that the vast majority of the joins you encounter are equi-joins.  There are special circumstances where is makes sense to use other comparisons such as not equals or greater than.  Those type of joins, called non equi-joins, are explored later in this post. Once the data is joined it can also be filtered and sorted using familiar WHERE and ORDER BY clauses.   The order of the clauses is what you might expect:

SELECT columnlist
FROM  maintable
INNER JOIN secondtable ON join condition
WHERE filter condition
ORDER BY columnlist

The WHERE clause can refer to any of the fields from the joined tables.  To be clear, it is best practice to prefix the columns with the table name.  The same goes for sorting.  Any field can be used; however, it is common to sort by one or more fields you are selecting. Let’s take our phone directory, but only list people, in order of last name, those whose last names start with C. With this requirement in mind, our SQL statement becomes:

SELECT    Person.FirstName,
          Person.LastName,
          PersonPhone.PhoneNumber
 FROM     Person.Person
          INNER JOIN
          Person.PersonPhone
          ON Person.BusinessEntityID = 
             PersonPhone.BusinessEntityID
 WHERE    Person.LastName LIKE 'C%'
 ORDER BY Person.LastName

Notice the join condition remains the same.  The only changes are the addition of the WHERE and ORDER BY clauses.  By now these should be familiar to you.

Table Aliases

As you create more complicated SQL statements they can become difficult to read.  Much is of this is due to the verbose (wordy) nature of the language. You can cut down on some of this by using another name for your tables in statements.  This is done using table aliases.  Think of aliases like nicknames.  They are easily defined.  Just place them after the first reference of the table name. Here is a simple example where we alias the Person table as P:

SELECT  P.FirstName,
        P.LastName
 FROM   Person.Person AS P

In this example we used the letter P.  It could really be any number of characters, but I like to keep aliases simple.  Where aliases really come in handy is with joins.  Let’s take last join we did, but this time write it by aliasing Person as P and PersonPhone as PP. Here is the statement

SELECT    P.FirstName,
          P.LastName,
          PP.PhoneNumber
 FROM     Person.Person AS P
          INNER JOIN
          Person.PersonPhone AS PP
          ON P.BusinessEntityID = PP.BusinessEntityID
 WHERE    P.LastName LIKE 'C%'
 ORDER BY P.LastName

Regarding naming, I suppose you could alias your tables, A,B,C, but it can become hard to remember the aliases.  My system to keep the aliases short, yet easy to remember, is it use the first letter of the table.  If the table has two words, like PersonPhone, I’ll typically use the first letter of each word.

Join with More than One Field

In some cases you may find that you need to join two or more fields together.  This happens when a table’s primary key consists of two or more columns.  This is easily done using an AND operator and an additional join condition.

Class-To-Section Data Model

Consider an example where we want to know all the days and times each instructor teaches a class.  In our example the primary key for the Class table is two fields:  ClassName and Instructor.  To construct the schedule we need to join Class to Section by matching both ClassName and Instructor.

SELECT    C.Instructor,
          C.ClassName,
          S.Day,
          S.Hour
 FROM     Class AS C
          INNER JOIN
          Section AS S
          ON C.ClassName = S.ClassName
             AND C.Instructor = S.Instructor
          ORDER BY C.Instructor, S.Day, S.Hour

Note:  This example won’t work in the AdventureWorks2012 database

When creating joins it is important to pay attention to your data.  In our example, if we mistakenly joined Class with the Section table using only the ClassName column, then each instruction’s class listing would also include schedules from other instructions that teach the same class. A joins will match as many rows as it can between tables.  As long as the join condition is satisfied, combinations of rows are included.  Because of this, it is important to understand each table’s primary key definition and to know how tables’ are meant to relate to one another.

Joining Three or More Tables

In the AdventureWorks2012 database, you need to join to the PhoneNumberType table to know whether a listed phone number is a home, cell, or office number.  In order to create a directory of names, phone number types, and numbers three tables must be joined together.  This relationship is shown below:

Person-PersonPhone-Type Data Model

Including another table in our query is as simple as adding another INNER JOIN clause to our statement.

SELECT   P.FirstName, P.LastName,
         PP.PhoneNumber, PT.Name
FROM     Person.Person AS P
         INNER JOIN
         Person.PersonPhone AS PP
         ON P.BusinessEntityID = PP.BusinessEntityID
         INNER JOIN
         Person.PhoneNumberType AS PT
         ON PP.PhoneNumberTypeID = PT.PhoneNumberTypeID
WHERE    P.LastName LIKE 'C%'
ORDER BY P.LastName

I don’t think there is any hard fast rule to how you format the join clause, I tend to separate them on their own lines and when specifying the columns, place the “join from” column first, as:

INNER JOIN
ToTable
On FromTable.PKID = ToTable.FKID

For all the examples in this article, I first wrote the statement and then used TidySQL to automatically format them.

Self-Join

A self-join is when you join a table to itself.  When you use a self-join it is important to alias the table.  Suppose we wanted to list all departments within a department group.  In the AdventureWorks2012 database this is done using the following self-join:

SELECT  D1.Name,
        D2.Name
 FROM   HumanResources.Department AS D1
        INNER JOIN
        HumanResources.Department AS D2
        ON d1.GroupName = d2.GroupName

Since the Department table is being joined to itself, we need distinguish the two occurrences of Department.  They are aliased as D1 and D2.  Other than the tables being the same, you see there is nothing remarkable about this type of join.

Non Equi-Joins

A non equi-join is just a fancy way of saying your join condition doesn’t have an equals sign in it. You may be scratching your head wondering why you would want to use a non equi-join.  I agree, to be honest, it can be hard to think up examples; however, they do exist, and you’ll appreciate knowing how to use them when you need to do so. One reason to use a non equi-join is when you need to verify that you have clean data.  Consider the AdventureWorks2012 Products table.  You may want to check that the product name is unique for each product listed.  One way to do this is to self-join the product table on Name.

SELECT    P1.ProductID,
          P1.Name
 FROM     Production.Product AS P1
          INNER JOIN
          Production.Product AS P2
          ON P1.Name = P2.Name
 ORDER BY P1.ProductNumber

This query allows us to compare values for each row that matches the Product Name.  If Name is unique, then the only matches you should have are when the ProductID’s are equal. So, if we are looking for duplicate Product Name, it follows that we want to find records that match having different ProductID’s.  This is shown in the diagram below Self-Join

The SQL to return the duplicate records is:

SELECT   P1.ProductID,
         P1.Name,
         P1.ProductNumber
FROM     Production.Product AS P1
         INNER JOIN
         Production.Product AS P2
         ON P1.Name = P2.Name
         AND P1.ProductID <> P2.ProductID
ORDER BY P1.ProductNumber

You may wonder what happens first.  Does the computer first match the names and then, once that is complete looks for ProductID’s that don’t match, or does is somehow evaluate both conditions as it builds the result? The beauty of SQL is that, in theory, the way you write your code shouldn’t have an influence on how the database actually retrieves the data.  The order of the joins or conditions shouldn’t significantly influence the time a query runs.  This will all be explored when we talk about the query optimize and how to review query plans.

Click Here to Leave a Comment Below 6 comments