Do You Want to Learn SQL?

Take our free eight day course, where I'll teach you in simple to understand English all you need to get started learning SQL.

-
DAYS
-
HOURS
-
MINUTES
-
SECONDS

Free Beginners SQL Course!

SQL Subqueries: Exploring Dynamic and Data-Driven Queries

·

·

SQL subqueries make it possible for you to write queries that are more dynamic, and data driven. Think of them as query within another query.  

In this article I’ll introduce you to subqueries and some of their high-level concepts. I will show you how to write a SQL subquery as part of the SELECT statement. Subqueries return two types of results:  either a single value called a scalar value, or a table value, which is akin to a query result. 

As you work through the examples, keep in mind they’re based Microsoft SQL Server Management Studio and the AdventureWorks2012 database.  Get started using these free tools with my Guide Getting Started Using SQL Server.

Subqueries in SQL

Subqueries provide a powerful means to combine data from two tables into a single result.  You can also call these nested queries.  As the name implies, subqueries contain one or more queries, one inside the other.

Subqueries are very versatile and that can make them somewhat hard to understand.  For most cases use them anywhere you can use an expression or table specification.

For example, you can use subqueries in the SELECT, FROM, WHERE, or HAVING clauses.  A subquery may return either a single value or multiple rows.

A single value is also known as a scalar value.

Subquery Overview

Subqueries make it possible for you to write queries that are more dynamic and data driven.  For instance using a subquery you can return all products whose ListPrice is greater than the average ListPrice for all products.

You can do this by having it first calculate the average price and then use this to compare against each product’s price.

example of a subquery in a column list

First Walk Through

Let’s break down this query so you can see how it works.

Step 1:  First let’s run the subquery:

SELECT AVG(ListPrice)
FROM Production.Product

It returns 438.6662 as the average ListPrice

Step 2:  Find products greater than the average price by plugging in the average ListPrice value into our query’s comparison

SELECT ProductID,
       Name,
       ListPrice
FROM   production.Product
WHERE  ListPrice > 438.6662

Huge Benefit of SQL Subqueries

As you can see, by using the subquery we combined the two steps together.  The subquery eliminated the need for us to find the average ListPrice and then plug it into our query.

This is huge!  It means our query automatically adjusts itself to changing data and new averages.

Hopefully you’re seeing a glimpse into how subqueries can make your statements more flexible. In this case, by using a subquery we don’t need to know the value for the average list price.

We let the subquery do the work for us!  The subquery calcuates the average value on-the-fly; there is no need for us to “update” the average value within the query.

Second Walkthrough

Being able to dynamically create the criterion for a query is very handy. Here we use a subquery to list all customers whose territories have sales below $5,000,000.

Sub query example in WHERE clause

We build the list “live” with the IN operator, this avoid us from having to hard code the values.

It may help to see how to execute this query step by step:

Step 1:  Run the subquery to get the list of territories that had year to date sales less than 5,000,000:

SELECT TerritoryID
FROM   Sales.SalesTerritory
WHERE  SalesYTD < 5000000

This returns 2,3,5,7,8 as a list of values.

Step 2: Now that we have a list of values we can plug them into the IN operator:

SELECT DISTINCT CustomerID
FROM   Sales.SalesOrderHeader
WHERE  TerritoryID IN (2,3,5,7,8)

Again, we could have just broken this down into multiple steps, but the disadvantage in doing so is we would have to constantly update the list of territories.

Now that you’ve see how we use a sub query in the SQL statement let’s take a look at using them in the WHERE clause.

Subqueries in the SELECT Clause

Queries used within a column list must return a single value.  In this case, you can think of the subquery as a single value expression.  The result returned is no different than the expression “2 + 2.”  Of course, subqueries can return text as well, but you get the point!

The main SQL statement is called the outer query; it calls the subquery.  Enclose subqueries in parenthesis, its required, but also makes them easier to spot.

Be careful when using subqueries.  They can be fun to use, but as you add more to your query they can start to slow down your query.

Simple Subquery to Calculate Average

Let’s start out with a simple query to show SalesOrderDetail and compare that to the overall average SalesOrderDetail LineTotal.  The SELECT statement we’ll use is:

SELECT SalesOrderID,
LineTotal,
(SELECT AVG(LineTotal)
   FROM Sales.SalesOrderDetail) AS AverageLineTotal
FROM   Sales.SalesOrderDetail;

This query returns results as:

Sub Query Results

The average line total is calculated using the subquery shown in red.

SELECT AVG(LineTotal)
FROM   Sales.SalesOrderDetail

This result is then plugged back into the column list, and the query continues. 

Observations

There are several things I want to point out:

  1. SQL requires that subqueries are enclosed in parenthesis.
  2. Subqueries using in a SELECT can only return one value. This should make sense, simply selecting a column returns one value for a row, and we need to follow the same pattern.
  3. In general, the subquery is run only once for the entire query, and its result reused. This is because the query result does not vary for each row returned.
  4. It is important to use aliases for the column names to improve readability.

Simple Subquery in Expression

A subquery’s result can be used in other expressions.  Building on the previous example let’s use the subquery to determine how much our LineTotal varies from the average.

The variance is simply the LineTotal minus the Average Line total.  In the following subquery, I’ve colored it blue.  Here is the formula for the variance:

LineTotal - (SELECT AVG(LineTotal)                             
               FROM   Sales.SalesOrderDetail)

The SELECT statement enclosed in the parenthesis is the subquery.  Like the earlier example, this query will run once, return a numeric value, which is then subtracted from each LineTotal value.

Here is the query in final form:

SELECT SalesOrderID,
       LineTotal,
       (SELECT AVG(LineTotal)
          FROM   Sales.SalesOrderDetail) AS AverageLineTotal,
       LineTotal - (SELECT AVG(LineTotal)
                    FROM   Sales.SalesOrderDetail) AS Variance
FROM   Sales.SalesOrderDetail

Here is the result:

Suqquery Results in Expression

When working with subqueries in select statements I usually build and test the subquery SQL first. Do this to troubleshoot the pieces before you assemble your complex query! It is best to build them up little by little.  By building and testing the various pieces separately, it really helps with debugging.

Read More: Correlated Subqueries>>

Subqueries in the WHERE clause

Let’s see how to write a subquery in the where clause.  To do this we will use IN operator to test whether a value is part of the result the subquery returns. 

Here is the general format: 

SELECT column1, column2,.. 
FROM table1 
WHERE column2 IN (subquery1) 

An example makes this clearer.  Suppose we want to find all sales orders made by a salesperson having a bonus greater than 5000 dollars. 

To do this we’ll write two queries.  The outer to find the sales orders, and an inner (the SQL subquery) to return SalePersonID for $5000 and above bonus earners. 

I color coded the subquery so you can easily see it: 

SELECT SalesOrderID
               ,OrderDate
               ,AccountNumber
               ,CustomerID
               ,SalesPersonID
               ,TotalDue
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IN (SELECT BusinessEntityID
FROM Sales.SalesPerson
WHERE Bonus > 5000
)

When run as a subquery, it returns a list of ID’s.  A row is included in the result, if the outer query’s SalesPersonID is within this list.  Here is the example you can try:

Read More: Using Where EXISTS in SQL >>, SQL ANY and ALL Operators>>

Subqueries in the FROM Clause

Like all subqueries, enclose subqueries within the FROM clause with parenthesis. Unlike other subqueries though, you need to alias the derived table so that your SQL can reference its results.

In this example, we’re going to select territories and their average bonuses.

SELECT TerritoryID,
       AverageBonus
FROM   (SELECT   TerritoryID,
                 Avg(Bonus) AS AverageBonus
        FROM     Sales.SalesPerson
        GROUP BY TerritoryID) AS TerritorySummary
ORDER BY AverageBonus

The subquery alias name is TerritorySummary and is highlighted in red.

Joining to a Subquery

When this query runs, the subquery is first to run and the results created.  The results are then used in the FROM clause as if it were a table.  When used by themselves these types of queries aren’t very fascinating; however, when used in combination with other tables they are.

Let’s add a join to our example above.

SELECT SP.TerritoryID,
       SP.BusinessEntityID,
       SP.Bonus,
       TerritorySummary.AverageBonus
FROM   (SELECT   TerritoryID,
                 AVG(Bonus) AS AverageBonus
        FROM     Sales.SalesPerson
        GROUP BY TerritoryID) AS TerritorySummary
       INNER JOIN
       Sales.SalesPerson AS SP
       ON SP.TerritoryID = TerritorySummary.TerritoryID

Read More: What is the Difference Between a Join and Subquery?

Table Relationships with the Query

From a data modeling point of view this query looks like the following:

Derived Table in Join

There is a relationship between the TerritorySummary and the joined table SalesPerson.  Of course, TerritorySummary only exists as part of this SQL statement since it is derived.

By using derived tables we are able to summarize using one set of fields and report on another.  In this case, we’re summarizing by TerritoryID but report by each salesperson (BusinessEntityID).

You may think you could simply replicate this query using an INNER JOIN, but you can’t as the final GROUP BY for the query would have to include BusinessEntityID, which would then throw-off the Average calculation.

Read More: Derived Tables >>

Subqueries in the HAVING Clause

You can use sub queries in you SQL’s HAVING clause to filter out groups of records.  Just as the WHERE clause is used to filter rows of records, the HAVING clause is used to filter groups.  Because of this, it becomes very useful in filtering on aggregate values such as averages, summations, and count.

The power of using a subquery in the HAVING clause is now you don’t have to hard-code values within the comparisons. You can rely on the subquery’s results to do so for you.

For example, it is now possible to compare the average  of a group to the overall average.   We’ve always been able to use the average of the group in the HAVING clause, but had no way to compute the overall average.  Now, using subqueries, this is possible.

In this example we’re selecting employee job titles having remaining vacation hours greater than the overall average for all employees.

Here I’ve written the query without using a subquery

SELECT   JobTitle,
         AVG(VacationHours) AS AverageVacationHours
FROM     HumanResources.Employee
GROUP BY JobTitle
HAVING   AVG(VacationHours) > 50

The subquery replaces the text highlighted in red. Now here is the complete statement including the subquery:

SELECT   JobTitle,
         AVG(VacationHours) AS AverageVacationHours
FROM     HumanResources.Employee
GROUP BY JobTitle
HAVING   AVG(VacationHours) > (SELECT AVG(VacationHours)
                               FROM   HumanResources.Employee)

This query is executed as:

  1. Compute the remaining average vacation hours for all employees. (subquery)
  2. Group records by JobTitle and computer the average vacation hours.
  3. Only keep groups whose average vacation hours are greater than the overall average.

Important Points Revisited

  • A subquery is just a SELECT statement inside of another.
  • Enclose subqueries in parenthesis ().
  • Alias subqueries in your column list to make it easier to read and reference the results. 
  • A subquery either returns a single value or table.  Be aware of which type your query returns to ensure it works correctly in the situation. 
  • A subquery that returns more than one value is typically used where a list of values, such as those used in and IN operator.
  • Warning! Subqueries can be very inefficient.  If there are more direct means to achieve the same result, such as using an inner join, you’re better for it.
  • You can nest subqueries up to thirty two levels deep on SQL server. To be honest, I can’t imagine why!   I’ve see maybe four deep at most.  Realistically I think you only go one or two deep.

Don’t worry, if a lot of this seems confusing at the moment.  You still need to learn where the use subqueries within the SELECT statement. Once you see them in action, the above points become clearer.

Conclusion

As you can see subqueries aren’t that intimidating.  The key to understanding them is understanding their role in the SQL statement.  By now you know that the column list in a SELECT statement can only be one value.

It follows that a subquery used in a column list can only return one value.

In other cases, such as when being used with the IN operator, which operates on multiple values, it makes sense then that the subquery can return more than one row within the SQL.

Looking to learn more? Check out these related articles.

12 responses to “SQL Subqueries: Exploring Dynamic and Data-Driven Queries”
  1. Mithilesh Kumar Yadav

    Hi Kris,
    I fond of the way you explain the topics, become easy to visualize and understand whole concepts.
    Thankyou….

  2. jo

    hi ,
    I have a query to find the 5th highest salary without using top or limit.

    select salary from worker w1
    where 4= (select count(distinct(w2.salary)
    from worker w2
    where w2.salary>=w1.salary)

    I would like to know how this works.It would be helpful if anyone explains.
    Thanks
    Jo

  3. bagavathy

    Hi Kris, could you give me a tip to know whether i have to use subquery or normal query by reading the question…iam little confused at what point/situation we have to use subquery…eagerly waiting for your reply

  4. Wilma

    Hi Kris, I have been trying to brush up on my SQL skills. And this series on subqueries has been the most informative and useful to me. Thank you!

    1. My pleasure! I’m glad you found it useful.

  5. Joanna

    Hi Kris, I am hoping you will be able to assist. I came across your article in the hope it will resolve my query issue. I have query whereby I like the results formatted specifically. I had hoped an subquery would help.

    So currently I get, understandable since I’m using a joins to return the results

    Stock# StockTotQty StockQty Price StockValue
    Item1 20 10 20.00 400.00
    Item1 20 10 20.00 400.00

    However, what I am trying to achieve is an output formatted as follows:

    Stock# StockTotQty StockQty Price StockValue
    Item1 20 10 20.00 400.00
    Item1 10 20.00

    I thought I could run a subquery to return StockTotQty per itemcode as well as the StockValue but as I understand it a subquery within a Select statement can only return a single value and my StockTotQty is different values. Same goes for the StockValue.

    This is going beyond my current SQL skills. Any advice would be helpful. Thanks

    1. Kris Wenzel

      I’m not understanding the goal you’re trying to achieve. I think if I saw your two table structure and a more comprehensive example, I would “get it.” :)

  6. Abhilash Urs

    Informative step by step article about subquery

  7. Venkat

    Thank you for explaining about all sub-query placeholders in the query.

  8. Wing

    Wow, simple, effective and straight to the point! Great Job!!!

  9. Yahia

    Correction “By know you know that the column list in a SELECT statement can only be one value.”, it should be “By now” :)
    Very simple & helpful. Very appreciated ;)

    1. Hi,

      I’m glad you liked the article. I just fixed the typo. Thanks for pointing it out to me.

      SQLKris.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SQL Server