What is the difference between a subquery and inner join?

In this puzzle, we’re going to learn how to rewrite a subquery using inner joins.  Knowing about a subquery versus inner join can help you with interview questions and performance issues.  Though subqueries have unique abilities, there are times when it is better to use other SQL constructs such as joins.

By reading this article you’ll learn about several types of subqueries, and how each can be switched to another form, such as a join.

Solving puzzles is a great way to learn SQL.  Nothing beats practicing what you’ve learned.  Once you have figured out the puzzle, post your answer in the comments so we all can learn from one another.  We also discuss puzzle and more in Essential SQL Learning Group on Facebook.  Be sure to find us there!

SQL Puzzle Question

A remedy for query confusion…

A coworker just learned about subqueries and wrote some SQL to retrieve employee names and birthdates from the AdventureWorks database.  The problem is, they want to change it and now it’s hard to read!

Can you help them simplify the following SQL?

SELECT   E.HireDate,
         (SELECT FirstName
          FROM   Person.Person P1
          WHERE  P1.BusinessEntityID = E.BusinessEntityID),
         (SELECT LastName
          FROM   Person.Person P2
          WHERE  P2.BusinessEntityID = E.BusinessEntityID),
         E.BirthDate
FROM     HumanResources.Employee E
WHERE    (SELECT PersonType
          FROM   Person.Person T
          WHERE  T.BusinessEntityID = E.BusinessEntityID) = 'EM'
ORDER BY HireDate,
         (SELECT FirstName
          FROM   Person.Person P1
          WHERE  P1.BusinessEntityID = E.BusinessEntityID)

What Statement would you write to make it easier to read, and perhaps run more efficiently?

Subquery versus Inner Join Answer

Before we begin let’s talk about the existing query… what is it?

You’ll see the query combines data from two different table.  It is doing so using subqueries in both the FROM and WHERE clauses.  These are highlighted below in Blue.

SELECT   E.HireDate,
         (SELECT FirstName
          FROM   Person.Person P1
          WHERE  P1.BusinessEntityID = E.BusinessEntityID),
         (SELECT LastName
          FROM   Person.Person P2
          WHERE  P2.BusinessEntityID = E.BusinessEntityID),
         E.BirthDate
FROM     HumanResources.Employee E
WHERE    (SELECT PersonType
          FROM   Person.Person T
          WHERE  T.BusinessEntityID = E.BusinessEntityID) = 'EM'
ORDER BY HireDate,
         (SELECT FirstName
          FROM   Person.Person P1
          WHERE  P1.BusinessEntityID = E.BusinessEntityID)

Also, you see that each subquery’s WHERE clause is restricting the rows returned to those equal to the Employee.BusinessEntityID.  This is what people call a correlated subquery.

I also want to point out that the queries in the FROM must return a single value (scalar).  If they don’t, then an error is thrown.

As you can imagine, this is dangerous, as it can be difficult to guarantee a query return at most one row.  I know were safe in this case I know this as the matching condition occurs between each table’s primary keys.

Subquery versus Inner Join – Converting the query

If I was writing this query I would use an INNER JOIN.  Here is the query I would write:

SELECT   E.HireDate,
         P.FirstName,
         P.LastName,
         E.BirthDate
FROM     HumanResources.Employee E
         INNER JOIN Person.Person P
         ON P.BusinessEntityID = E.BusinessEntityID
WHERE    P.PersonType  = 'EM'
ORDER BY E.HireDate,
         P.FirstName

Which is Easier to Read?

This shouldn’t much of a debate, the INNER JOIN is much shorter, and I think to the point.  The join clause speaks for itself.  You know it is relating two tables together; whereas, with the subquery, it is so apparent.

The INNER JOIN version is easier to maintain.

Also, with the subquery method, you’ll see much of the code repeated. This may not seem like a big deal, now, but if you ever have to change the query, it is, as now when you make a change you need to be sure to make the same change in several locations.

SubQuery or Inner Join?  Which is More Efficient?

Here is the query plan for the sub query version:

Subquery versus Inner Join Query Plan

I’ve highlighted the effect of the four subqueries.  For this statement, each query results in a nested loop. These aren’t good.

It means if you have ten rows in two tables each, then you need to on average, iterate through the second table 50 times (100/2) for each row in the first to find a match.  This means, instead of a seek taking two or three operations to find a match, the match could take upward of 100 * 50 = 500 seeks to find.

Nested loops are a fact of life, but less is better.

And here is the version for the INNER JOIN:

SET SHOWPLAN_ALL ON
SELECT   E.HireDate,
         P.FirstName,
         P.LastName,
         E.BirthDate
FROM     HumanResources.Employee E
         INNER JOIN Person.Person P
         ON P.BusinessEntityID = E.BusinessEntityID
WHERE    P.PersonType  = 'EM'
ORDER BY E.HireDate,
         P.FirstName

Here you see there is only one nested loop.  For sure that is better than four.

After observing both the SQL and query plans for each set of statements you can see that INNER JOIN is superior in several ways; however, check out that simplified plan!

The query’s true task is to combine columns from two tables; this is what INNER JOINS excel at.  Sure, there are time subqueries make sense, and can be used to do things you can’t with joins, but in this case, it doesn’t make sense to use one.