Learn SQL – Introduction to Subqueries
The purpose of this article is to introduce you to subqueries and some of their high-level concepts. There are a lot of details to cover in order to learn sub queries, but you’ll see we cover those in depth in later articles.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database. You can get started using these free tools using my Guide Getting Started Using SQL Server.
Introduction to Subqueries
Subqueries provide a powerful means to combine data from two tables in to a single result. Subqueries are sometimes called nested queries. As the name implies, subqueries contain one or more queries, one inside the other.
Subqueries are very versatile and that can make them somewhat hard to understand. For most cases a subquery can be used anywhere you can use an expression or table specification.
For example, you can use subqueries in the SELECT, FROM, WHERE, or HAVING clauses. Depending on its use, a subquery may return one or more rows, or in other cases, such as when used in a SELECT clause, be restricted to returning a single value.
In later articles we’ll get into the specifics on using subqueries throughout your SELECT statements such as the SELECT, WHERE, FROM, and HAVING clauses. They can be quite slippery and hard to wrap your brain around. Perhaps it would help to see a couple of examples.
Subqueries make it possible for you to write queries that are more dynamic and data driven. For instance using a subquery you can return all products whose ListPrice is greater than the average ListPrice for all products. You can do this by having a subquery first calculate the average price and then use this to compare against each product’s price.
Let’s break down this query so you can see how it works.
Step 1: First let’s run the subquery:
SELECT AVG(ListPrice) FROM Production.Product
It returns 438.6662 as the average ListPrice
Step 2: Find products greater than the average price by plugging in the average ListPrice value into our query’s comparison
SELECT ProductID, Name, ListPrice FROM production.Product WHERE ListPrice > 438.6662
As you can see, by using the subquery we combined the two steps together. The subquery eliminated the need for us to find the average ListPrice and then plug it into our query.
This is huge! It means our query automatically adjusts itself to changing data and new averages.
Hopefully you’re seeing a glimpse into how subqueries can make your statements more flexible. In this case, by using a subquery we don’t need to know the value for the average list price.
We let the subquery do the work for us! Thus, as more orders are received, we don’t need to worry about updating the average value within the query; it is calculated on-the-fly for us.
Being able to dynamically create the criterion for a query is very handy. Here we use a subquery to list all customers whose territories have sales below $5,000,000.
In this example the IN operator could have been hard coded to include territories with low sales, but by using the subquery we build the list “live.”
It may help to see how to execute this query step by step:
Step 1: Run the subquery to get the list of territories that had year to date sales less than 5,000,000:
SELECT TerritoryID FROM Sales.SalesTerritory WHERE SalesYTD < 5000000
This returns 2,3,5,7,8 as a list of values.
Step 2: Now that we have a list of values we can plug them into the IN operator:
SELECT DISTINCT CustomerID FROM Sales.SalesOrderHeader WHERE TerritoryID IN (2,3,5,7,8)
Again, we could have just broken this down into multiple steps, but the disadvantage in doing so is we would have to constantly update the list of territories.
Things to Keep in Mind
- A subquery is just a SELECT statement inside of another.
- Subqueries are always enclosed in parenthesis ().
- A subquery that returns a single value can be used anywhere you would use an expression, such as in a column list or filter expression.
- A subquery that returns more than one value is typically used where a list of values, such as those used in and IN operator.
- Warning! Subqueries can be very inefficient. If there are more direct means to achieve the same result, such as using an inner join, you’re better for it.
- You can nest subqueries up to thirty two levels deep on SQL server. To be honest, I can’t imagine why! I’ve see maybe four deep at most. Realistically I think you only go one or two deep.
Don’t worry, if a lot of this seems confusing at the moment. You still need to learn where subqueries are used within the SELECT statement. Once you see them in action, the above points become clearer.
Where Subqueries Are Used
In future articles we’ll explore the four ways you can use a subquery in a SELECT statement. They are:
- In a SELECT clause as a column expression
- In a WHERE clause as a filter criteria
- In the FROM clause as a table specification
- In a HAVING clause as a group selector (filter criteria)
As you can see subqueries aren’t that intimidating. The key to understanding them is understanding their role in the SQL statement. By now you know that the column list in a SELECT statement can only be one value.
It follows that a subquery used in a column list can only return one value.
In other cases, such as when being used with the IN operator, which operates on multiple values, it makes sense then that the subquery can return more than one row.
Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you.