The Common Table Expressions or CTE’s for short are used within SQL Server to simplify complex joins and subqueries, and to provide a means to query hierarchical data such as an organizational chart. In this article, we’ll introduce you to common table expressions, the two types of the SQL CTEs, and their uses. In addition, we’ll introduce CTE’s overall. Once you’re familiar, I highly encourage you to read these articles as well:
Introduction to Common Table Expressions
A CTE (Common Table Expression) is a temporary result set that you can reference within another SELECT, INSERT, UPDATE, or DELETE statement. They were introduced in SQL Server version 2005. They are SQL-compliant and part of the ANSI SQL 99 specification.
A SQL CTE always returns a result set. You can use them to simplify queries. For example, you could use one to eliminate a derived table from the main query body.
Note: All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database. Start learning SQL Server using my free guide Getting Started Using SQL Server.
What is a SQL CTE or Common Table Expression in SQL Server?
A SQL CTE (Common Table Expression) defines a temporary result set which you can then use in a SELECT statement. It becomes a convenient way to manage complicated queries.
You define Common Table Expressions using the WITH statement. You can define one or more common table expression in this fashion.
Here is a really simple example of one CTE:
WITH Employee_CTE (EmployeeNumber, Title) AS (SELECT NationalIDNumber, JobTitle FROM HumanResources.Employee) SELECT EmployeeNumber, Title FROM Employee_CTE
Let’s break this down a bit.
The blue portion is the CTE. Do you see the contained SQL is a fully formed query? It is the SQL CTE query definition:
SELECT NationalIDNumber, JobTitle FROM HumanResources.Employee
When you run it you see results like:
Notice that when we define the CTE we give the result a name as well its columns. In this way a CTE acts like a VIEW. The result and columns are named differently. This allows you to encapsulate complicated query logic with the common table expression.
Now going back to the SQL CTE, notice that the WITH statement. There you’ll see the name and columns are defined. These columns correspond to the columns returned from the inner query.
Finally notice that our final query references the SQL CTE and columns defined.
From our outer query’s perspective all it “sees” is this definition. It is only concerned with the name and columns, not the inner SQL.
As such, the results from the CTE are:
Notice the column names, they’re based on the those defined in the CTE.
I want to point out that you can define more than one SQL CTE within a WITH statement. This can help you simplify some very complicated queries which are ultimately joined together. Each complicated piece can include in their own CTE which is then referred to and joined outside the WITH clause.
Why Do you need SQL CTE’s?
There are several reasons why you may want to use a SQL CTE over other methods. Some of them include:
- Readability – CTE’s promote readability. Rather than lump all you query logic into one large query, create several CTE’s, which are the combined later in the statement. This lets you get the chunks of data you need and combine them in a final SELECT.
- Substitute for a View – You can substitute a SQL CTE for a view. This is handy if you don’t have permissions to create a view object or you don’t want to create one as it is only used in this one query.
- Recursion – Use CTE’s do create recursive queries, that is queries that can call themselves. This is handy when you need to work on hierarchical data such as organization charts.
- Limitations – Overcome SELECT statement limitations, such as referencing itself (recursion), or performing GROUP BY using non-deterministic functions.
- Ranking – Whenever you want to use ranking function such as ROW_NUMBER(), RANK(), NTILE() etc.
Common Table Expression Types
Common Table Expressions are categorized as: Recursive CTE’s and Non-Recursive CTE’s.
Recursive CTE’s are common table expressions that reference themselves. Recursion can be a pretty difficult topic to grasp, I really didn’t get it until I took a LISP class way back in 1986, but hopefully I can explain it to you.
We’ll go deep into recursive CTE’s in a separate post, but for now let me introduce to you recursion using this diagram:
Here you see picture of opposing mirrors. Due to the reflection, it becomes a picture in a picture.
Recursive queries are like that.
A recursive query repeatedly run on a subset of the data. A recursive query is basically a query that calls itself. At some point there is an end condition, so it doesn’t call itself indefinitely.
In a way when you look into the picture you can imagine each picture in a picture is the picture calling itself. However, unlike the “infinite reflection” in the mirrors, there comes a point where a recursive query encounters the end condition and stop calling itself.
At that point, the recursion starts to unwind, collect and calculate data as it reviews each successive result.
Non-Recursive CTE’s, as the name implies, are don’t use recursion. They don’t reference themselves. They are easier to understand so we’ll look at them first in detail in the next article within this series.
We’ll first take a look at non-recursive CTE’s or CTE’s for short. Let’s dig into them by learning about the WITH clause.
Define Common Table Expression Using the WITH Clause
You start defining the SQL CTE using the WITH clause.
CTEs are table expressions. The are a temporary result that is used in the scope of an SELECT, INSERT, UPDATE, DELETE, or APPLY statement.
Here is a general example:
WITH TableExpressionName (Column1, Column2, …, ColumnN) AS (Query Definition)
The CTE has two parts. The first part defines the name of the CTE and it the columns contained within it. This is the table expression’s definition. The second part is the query definition. This is the SELECT statement used to populate the expression with rows.
You can define more than one CTE in a statement. You see an example of this further along in this article.
When writing the query definition keep in mind that the following cannot be used within it:
- ORDER BY, unless you also use as TOP clause
- OPTION clause with query hints
- FOR BROWSE
For more information regarding the WITH keyword, refer to the MSDN documentation.
Defining and Using Multiple Common Table Expressions
Here is an example using of TWO CTE’s, it’s a simple example, but it shows how two CTE’s are defined, and then used in an INNER JOIN
WITH PersonCTE (BusinessEntityID, FirstName, LastName) AS (SELECT Person.BusinessEntityID, FirstName, LastName FROM Person.Person WHERE LastName LIKE 'C%'), PhoneCTE (BusinessEntityID, PhoneNumber) AS (SELECT BusinessEntityID, PhoneNumber FROM Person.PersonPhone) SELECT FirstName, LastName, PhoneNumber FROM PersonCTE INNER JOIN PhoneCTE ON PersonCTE.BusinessEntityID = PhoneCTE.BusinessEntityID;
The first common table expression is colored green, the second blue. As you can see from the SELECT statement the CTE’s are joined as if they were tables. Hopefully you can see that as your queries become more complicated, CTE’s can become a really useful way to separate operations; therefore, simplify your final query.
We already talked about some reasons to use SQL CTE’s, so let’s, see this with some examples.
Reasons to use CTEs by Example
We’ve already seen there are several good reasons to use SQL CTE’s. Let’s look at each of these, in turn, using examples. By the time we’re done, you’ll see how to use one or more Common Table Expressions in a statement to work with joins. Also, you’ll see how you can replace a correlated subquery with a two non recursive CTEs.
As queries get larger is can become really difficult to understand how they work. In my mind, readability doesn’t mean the query has less lines. Instead, it means that it is understandable to you and others. I think CTEs help improve readability several ways.
They help separate out query logic. If you are joining two complex queries, you can use non recursive CTEs to separate out the complexity of the queries from the actual join. This not only helps in debugging, as you can independently run the query definition to test it, but you can more easily identify the parts used to do the join.
Also, CTEs improve readability by eliminating repeating expressions. There are situations where you want to display an expression and then sort by it. In SQL server, the expression has to be repeated in both the SELECT clause and ORDER BY; however, if you use a CTE, this isn’t the case.
Readability such as Reference the resulting table multiple times in the same statement.
For the first couple of examples, we’re going to assume our original query is the following:
SELECT Substring(Person.LastName,1,1) + ' ' + Person.FirstName as SortValue, Employee.NationalIDNumber, Person.FirstName, Person.LastName, Employee.JobTitle FROM HumanResources.Employee INNER JOIN Person.Person ON HumanResources.Employee.BusinessEntityID = person.BusinessEntityID WHERE Person.LastName >= 'L'AND (Employee.JobTitle Like 'Marketing%' OR Employee.JobTitle Like 'Production%') ORDER BY Substring(Person.LastName,1,1) + ' ' + Person.FirstName
We’ll now take this statement show you how it is easier to read and maintain using CTEs. Of course our example isn’t that complex, so it is still pretty readable, but I think as we work through the solution, it will help you see how CTEs can really help you in real world situations where your queries can be thirty or more lines long!
Example: CTE joined to normal table
The first thing we can do is move the query used to retrieve person rows in to a CTE as so
WITH Employee_CTE (BusinessEntityID, NationalIDNumber, JobTitle) AS (SELECT BusinessEntityID, NationalIDNumber, JobTitle FROM HumanResources.Employee WHERE (Employee.JobTitle LIKE 'Marketing%' OR Employee.JobTitle LIKE 'Production%')) SELECT Substring(Person.LastName, 1, 1) + ' ' + Person.FirstName as SortValue, Employee_CTE.NationalIDNumber, Person.FirstName, Person.LastName, Employee_CTE.JobTitle FROM Employee_CTE INNER JOIN Person.Person ON Employee_CTE.BusinessEntityID = Person.BusinessEntityID WHERE Person.LastName >= 'L' ORDER BY Substring(Person.LastName, 1, 1) + ' ' + Person.FirstName;
Notice that the CTE is defined to return three columns: BusinessEntityID, NationalID, and JobTitle.
Also, you see we’ve pulled in the filtering criterial into the SQL CTEs query definition. The overall query is still a bit messy, but hopefully you’re starting to see how CTEs can help to separate various query operations to make queries easier to read.
The next thing we could do to simplify the query is to create another CTE to handle Person table rows.
Example: CTE joined to another CTE
In the example below are two CTEs. I’ve colored them blue and green respectively. The blue colored CTE is the same as the above example, the green one is newly added, and really helps to simplify the overall query.
WITH Employee_CTE (BusinessEntityID, NationalIDNumber, JobTitle) AS (SELECT BusinessEntityID, NationalIDNumber, JobTitle FROM HumanResources.Employee WHERE (Employee.JobTitle LIKE 'Marketing%' OR Employee.JobTitle LIKE 'Production%')), Person_CTE (BusinessEntityID, FirstName, LastName, SortValue) AS (SELECT BusinessEntityID, FirstName, LastName, Substring(Person.LastName, 1, 1) + ' ' + Person.FirstName FROM Person.Person WHERE Person.LastName >= 'L') SELECT Person_CTE.SortValue, Employee_CTE.NationalIDNumber, Person_CTE.FirstName, Person_CTE.LastName, Employee_CTE.JobTitle FROM Employee_CTE INNER JOIN Person_CTE ON Employee_CTE.BusinessEntityID = Person_CTE.BusinessEntityID ORDER BY Person_CTE.SortValue;
In our original example the SortValue expression, Substring(Person.LastName, 1, 1) + ‘ ‘ + Person.FirstName, was repeated in both the SELECT clause and ORDER BY statement. Now, by placing the SortValue expression within the SQL CTE, we only need to define it once!
Compare the original query to the final one:
I think the original query is harder to read and maintain for these reasons:
- The SortValue expression is listed twice.
- The filtering of both tables are in the same expression.
- Since queries for both the employee and person are in the same statement it is difficult to separate them for debugging purposes.
I like having the statement broke out into a SQL CTE because:
- If I need to verify what rows are being returned by a CTE, it’s as simple as running the query definition in a separate query window.
- Expression, such as that used for SortValue, aren’t repeated.
- The final Query Using CTEs is easier to read. You can focus on the join conditions, and not be distracted by filtering nor expressions.
Example: Join CTE to Self
As you have seen you can join a CTE to another table, or CTE. But did you know you can join a CTE to itself?
Note: a table is joined to itself is a self-join.
Consider the following CTE:
WITH Department_CTE (Name, GroupName) AS (SELECT Name, GroupName FROM HumanResources.Department) SELECT D1.Name, D2.Name FROM Department_CTE AS D1 INNER JOIN Department_CTE AS D2 ON d1.GroupName = d2.GroupName
Here you can see one CTE is defined as Department_CTE and that this is then used twice within the query.
The result of this query is to list combinations of department names within the same department group:
Substitute for a View
As your SQL becomes more complex you’ll find that using views are a great way to hide the inner workings of a query and allow you to just focus on the results. This is especially true when you’re working with data involving multiple joins.
With a view you can encapsulate all that query and join logic into the view. Queries using the view are simpler and easier to read.
But there are times when it may not make sense to create a view. For instance, if you don’t have permissions to create database objects, such as when using a third party database, or when the only time you’ll need to us the view is just once.
In these cases, you can use a common table expression.
When creating a CTE, keep in mind the SQL CTEs query definition is the same as the query used to create the view.
In the diagram below we show a query two ways. In blue you’ll see defined as view, and then used in a query. It is then shown in green as a CTE.
We then take the same query used to define the view and use that for the CTE query definition. Finally, you can see that the CTEs final query is exactly like that used to reference the view.
CTE Can Replace Other SQL Objects
If you find it hard to work with derived tables or correlated subqueries, you can “break” the query apart, using common table expression. This can make it easier for you to get started learning SQL. Once you get more comfortable, you can then tackle the other topics!
CTE versus Derived Table
Derived tables are table results defined in the FROM clause. Given that derived tables return a table expression, it should be no surprise that you can use CTEs in their place.
Consider the following query. The derived tables are color coded in red and green.
SELECT Quota.TerritoryID, Quota.TerritoryQuota, Sales.TerritorySales, Sales.TerritorySales - Quota.TerritoryQuota FROM (SELECT TerritoryID, SUM(SalesQuota) AS TerritoryQuota FROM Sales.SalesPerson GROUP BY TerritoryID) AS Quota INNER JOIN (SELECT SOH.TerritoryID, SUM(SOH.TotalDue) AS TerritorySales FROM Sales.SalesOrderHeader AS SOH GROUP BY SOH.TerritoryID) AS Sales ON Quota.TerritoryID = Sales.TerritoryID
We can make it easier to read the statement by pulling out the table expression into a CTE. Here is the same query using CTEs instead of derived tables:
WITH Quota (territoryid, quota) AS (SELECT territoryid, Sum(salesquota) AS TerritoryQuota FROM sales.salesperson GROUP BY territoryid), Sales (territoryid, sales) AS (SELECT SOH.territoryid, Sum(SOH.totaldue) AS TerritorySales FROM sales.salesorderheader AS SOH GROUP BY SOH.territoryid) SELECT Quota.territoryid, Quota.quota, Sales.sales, Sales.sales - Quota.quota FROM Quota INNER JOIN Sales ON Quota.territoryid = Sales.territoryid;
By using CTEs we move the code used to define the query results for Quota and Sales away from the portion used to combine the table together.
I think this makes it much easier to maintain. For instance, if you need to make changes, it’s easier to know where to make a change. Also, it makes it easier to be able to see what is actually being joined together. The queries for each derived table are cluttering that portion of the SELECT statement.
CTEs versus Subqueries
CTEs and subqueries are similar, but CTEs have capabilities not found with subqueries.
Subqueries and CTEs:
- Are table expressions created at run-time. They are temporary objects.
- Can be created and used within stored procedures, triggers, and views.
- Can be correlated or non-correlated. A CTE can reference a CTE previously defined in the same statement.
- Can be in in SELECT, FROM, WHERE, HAVING, IN, EXISTS clauses.
There are some differences between subqueries and CTEs, notably:
- A subquery is defined within an outer query. It is defined before calling it from within the query.
- A Common Table Expression can reference itself, a subquery cannot.
- They can reference other CTEs within the same WITH clause (Nest). A subquery cannot reference other subqueries.
- A CTE can be referenced multiple times from a calling query. A subquery cannot be referenced.
CTE versus Correlated Subquery
Correlated subqueries can also be replaced with non recursive CTEs. This shouldn’t be a new concept, as we’ve seen in the past it is easy to convert a correlated sub query into a join. Given this, and knowing that joins are easily moved into non recursive CTEs, you can see the possibilities.
Let’s that the following correlated subquery which displays the sales order information. The subquery is used to calculate the average line total for each sales order.
SELECT salesorderid, salesorderdetailid, linetotal, (SELECT Avg(linetotal) FROM sales.salesorderdetail WHERE salesorderid = SOD.salesorderid) AS AverageLineTotal FROM sales.salesorderdetail SOD
To replicate this query using a CTE we first need to create query definition to calculate the average line total for each Sales Order.
This common table expression is then joined to the sales order table to obtain our final result.
WITH linetotal_cte (salesorderid, averagelinetotal) AS (SELECT salesorderid, Avg(linetotal) FROM sales.salesorderdetail GROUP BY salesorderid) SELECT SOD.salesorderid, SOD.salesorderdetailid, SOD.linetotal, LT.averagelinetotal FROM sales.salesorderdetail SOD INNER JOIN linetotal_cte LT ON LT.salesorderid = SOD.salesorderid
Use CTE’s to Overcome SQL Limitations
Use common table expression to overcome SQL limitations such as “enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.” (TechNet)
Suppose you wanted to know how many departments have the same number of employees. What query could you use?
Finding the number of employees by department name is pretty easy. We can use the following query to do so:
FROM HumanResources.EmployeeDepartmentHistory AS H
WHERE D.DepartmentID = H.DepartmentID
AND H.EndDate IS NULL) AS NumberEmployees
FROM HumanResources.Department AS D;
The number of employees in each department is calculated using a scalar sub select. This is colored red in the above query for easy identification. Here is a sample result:
Now let’s take this once step further and count how many departments have the same number of employees. To do this we should group on NumberEmployees, but this statement is invalid:
SELECT (SELECT Count(1) FROM HumanResources.EmployeeDepartmentHistory AS H WHERE D.DepartmentID = H.DepartmentID AND H.EndDate IS NULL) AS NumberEmployees, Count(Name) SameCount FROM HumanResources.Department AS D GROUP BY NumberEmployees
Due to the grouping restriction. The subquer is a scarlar subselect and we’re trying to group by it. That is a SQL violation! As is
SELECT (SELECT Count(1) FROM HumanResources.EmployeeDepartmentHistory AS H WHERE D.DepartmentID = H.DepartmentID AND H.EndDate IS NULL) AS NumberEmployees, Count(Name) SameCount FROM HumanResources.Department AS D GROUP BY (SELECT Count(1) FROM HumanResources.EmployeeDepartmentHistory AS H WHERE D.DepartmentID = H.DepartmentID AND H.EndDate IS NULL)
To solve this problem, we can define and use a CTE:
WITH Department_CTE (GroupName, Name, NumberEmployees) AS (SELECT GroupName, Name, (SELECT Count(1) FROM HumanResources.EmployeeDepartmentHistory AS H WHERE D.DepartmentID = H.DepartmentID AND H.EndDate IS NULL) AS NumberEmployees FROM HumanResources.Department AS D) SELECT NumberEmployees, Count(Name) SameCount FROM Department_CTE GROUP BY NumberEmployees;
Now we query the CTE and group by the result of the scalar sub select (red text). The results returned are:
From this we can see there are 5 departments with 6 employees.
Use with SQL Ranking functions
You can use ranking functions such as RANK() and NTILE() in conjunction with windowing functions to return the top items within a group.
Suppose we want to return the top sales within each territory. To do this we can RANK sales within each territory as 1,2,…, and so on, and then select those with a rank of 1.
Here is the query you can use to generate sales ranks within territories:
SELECT RANK() OVER(PARTITION BY S.TerritoryID ORDER BY SOH.TOTALDue desc) ,S.TerritoryID ,SOH.SalesOrderNumber ,SOH.TotalDue FROM Sales.SalesPerson S INNER JOIN Sales.SalesOrderHeader SOH ON S.BusinessEntityID = SOH.SalesPersonID WHERE S.TerritoryID is not NULL
Given this, then our job is to pick only those rows whose RANK() is one. These are the top ranked sales within each territory. You may think you could repeat the ranking expression in the WHERE clause to filter, but this isn’t possible. Windows functions, such as partition are not allowed in the WHERE clause and if you try to “wrap” it into a subquery you’ll get an error since the subquery returns more than one row.
It seems we are stuck. But this is where non recursive CTEs come to our aid.
With a CTE we can define a query to return the sales data ranked by territory and then query that to only return those items ranked first in each territory:
WITH SalesRankCTE (SalesPersonID, Name, TerritoryID, SalesRanking, SalesOrderNumber, TotalDue) AS (SELECT SalesPersonID, P.FirstName + ' ' + P.LastName, S.TerritoryID, RANK() OVER (PARTITION BY S.TerritoryID ORDER BY SOH.TOTALDue DESC), SOH.SalesOrderNumber, SOH.TotalDue FROM Sales.SalesPerson AS S INNER JOIN Sales.SalesOrderHeader AS SOH ON S.BusinessEntityID = SOH.SalesPersonID INNER JOIN Person.Person AS P ON P.BusinessEntityID = S.BusinessEntityID WHERE S.TerritoryID IS NOT NULL) SELECT SalesPersonID, Name, TerritoryID, SalesRanking, SalesOrderNumber, TotalDue FROM SalesRankCTE WHERE SalesRanking = 1;
To help you see this, the original query used to generate rankings is highlighted in blue. The filter to only show the top ranked sales in each territory is in red.
Read More: SQL Window Functions [Visual Explanation] >>
A recursive CTE is a CTE that references itself. In doing so, the initial CTE is repeatedly executed, returning subsets of data, until the complete result is returned.
A recursive CTE is set with an WITH, but is set up differently inside. You’ll see that it can call itself!
This recursive CTE consists of three main parts:
- Invocation – This is the statement using the CTE.
- Anchor Member – This portion executes first and is only called once.
- Recursive Member – The portion of the query is repeatedly executed until no rows are returned. The results of each execution are unioned with the prior results.
- Termination Check – The termination check ensures the query stops.
The Anchor Member’s Importance
When this query is run the Anchor Member is run once and its derived rows combined, via the UNION ALL, with the Recursive Member.
This query is repeatedly run until no rows are returned. This is why the termination check is so important.
If we weren’t checking for N < 50, the query would always return a row, thus complete with an error. Here is an example of the error you can expect if the termination check is missing or faulty:
Msg 530, Level 16, State 1, Line 1 The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
Of course, there may be an instance where the maximum recursion should be greater than 100. In this case you can adjust it using the MAXRECURSION query hint. Here we’ve increased the recursion to 200.
AS (SELECT 1 AS n -- anchor member
SELECT n + 1 -- recursive member
WHERE n < 50 -- terminator
OPTION (MAXRECURSION 200);
Recursive CTE Guidelines
When writing a Recursive CTE keep in mind the following guidelines:
- A recursive CTE must contain at least an anchor member and recursive member. Multiple anchor and recursive members can be defined, but all the anchor members should be before the first recursive member.
- The number of columns and their corresponding data types should be the same between the anchor and recursive members. This makes sense, as this is a condition for UNION ALL.
- The FROM clause of the recursive member can only refer to the CTE expression once.
In addition, the following items are not allowed in the recursive member query definition:
- GROUP BY
- LEFT, RIGHT, or OUTER JOIN
- Scalar aggregation
- SELECT DISTINCT
Note: For a more complete list of guidelines and restrictions, please see the MSDN article WITH common_table_expression (Transact-SQL)
Let’s move on to a more comprehensive example.
Retrieve a Bill of Materials using a Recursive Common Table Expression
The Adventure Works company manufactures bicycles, and as you can imagine, they contain a lot of parts! To keep it all straight the production department has put together a bill of materials or BOM for short.
A BOM lists all the parts for each bike. Bikes are made up of many larger parts, called sub-assemblies, which in turn are made up of other sub-assemblies or components.
This creates a hierarchy of sorts with the product at the top. You can think of this as a parent-child relationship. The parent is the sub-assembly, and the parts making up the sub-assembly of the children.
For instance, for a Bike, the wheel would be a sub-assembly, which is composed of a rim and spokes.
Here are the table relationships for the AdventureWorks2012 BillOfMaterials:
As you can see all products (parts) are listed in the Product table. The BillOfMaterials table contains pairs of ProductID numbers:
- ProductAssemblyID – The sub-assembly containing the part (parent)
- ComponentID – Part in the subassembly (child)
These are foreign keys referring to Product.ProductID values.
Given this suppose we want a list of a product sub-assemblies and its constituent parts? How can we do this?
Top Level BOM Parts
Getting a list of the top-level parts is pretty straight forward. They would be products corresponding to BillOfMaterials entries whose ProductAssemblyID is NULL. Here is a query to get those products:
SELECT P.ProductID, P.Name, P.Color FROM Production.Product AS P INNER JOIN Production.BillOfMaterials AS BOM ON BOM.ComponentID = P.ProductID AND BOM.ProductAssemblyID IS NULL AND (BOM.EndDate IS NULL OR BOM.EndDate > GETDATE());
And the resulting products:
So how do we get the products and their sub-assemblies?
One idea is to write another query that gets all products whose ProductAssemblyID is one of the top-level products and use a UNION ALL to combine the results. We can easily do this with a subquery:
SELECT P.ProductID, P.Name, P.Color FROM Production.Product AS P INNER JOIN Production.BillOfMaterials AS BOM ON BOM.ComponentID = P.ProductID AND BOM.ProductAssemblyID IS NULL AND (BOM.EndDate IS NULL OR BOM.EndDate > GETDATE()) UNION ALL SELECT P.ProductID, P.Name, P.Color FROM Production.Product AS P INNER JOIN Production.BillOfMaterials AS BOM ON BOM.ComponentID = P.ProductID AND (BOM.EndDate IS NULL OR BOM.EndDate > GETDATE()) AND BOM.ProductAssemblyID IN (SELECT P.ProductID FROM Production.Product AS P INNER JOIN Production.BillOfMaterials AS BOM ON BOM.ComponentID = P.ProductID AND BOM.ProductAssemblyID IS NULL AND (BOM.EndDate IS NULL OR BOM.EndDate > GETDATE()) );
The blue portion is the query returning the top-level products. The green portion returns the sub-assemblies, and the portion in red is the subquery used to return the set of ProductID’s for the top-level products used to match ProductAssemblyID’s.
But there is an issue with the query. It only returns Products or sub-assemblies used in the top-level products. Sub-assemblies contained within these sub-assemblies aren’t considered.
Of course, we could continue to write queries to “dig” deeper into the hierarchy, but can you imagine? The queries would get really long, complicated, and always limited by the number of queries union together.
This is where recursive CTE’s shine.
By using recursion, we’re able to use the self-referring nature of the CTE to continue to dig into deeper levels of the BOM. Much like the counting example, we showed previously, with the following recursive CTE, the common table expression is repeatedly called until the termination condition is met. In this query’s case, that is once no further sub-assemblies or components are found.
Final Recursive CTE to Try
Here is the recursive CTE you can try:
WITH cte_BOM (ProductID, Name, Color, Quantity, ProductLevel, ProductAssemblyID, Sort) AS (SELECT P.ProductID, CAST (P.Name AS VARCHAR (100)), P.Color, CAST (1 AS DECIMAL (8, 2)), 1, NULL, CAST (P.Name AS VARCHAR (100)) FROM Production.Product AS P INNER JOIN Production.BillOfMaterials AS BOM ON BOM.ComponentID = P.ProductID AND BOM.ProductAssemblyID IS NULL AND (BOM.EndDate IS NULL OR BOM.EndDate > GETDATE()) UNION ALL SELECT P.ProductID, CAST (REPLICATE('|---', cte_BOM.ProductLevel) + P.Name AS VARCHAR (100)), P.Color, BOM.PerAssemblyQty, cte_BOM.ProductLevel + 1, cte_BOM.ProductID, CAST (cte_BOM.Sort + '\' + p.Name AS VARCHAR (100)) FROM cte_BOM INNER JOIN Production.BillOfMaterials AS BOM ON BOM.ProductAssemblyID = cte_BOM.ProductID INNER JOIN Production.Product AS P ON BOM.ComponentID = P.ProductID AND (BOM.EndDate IS NULL OR BOM.EndDate > GETDATE()) ) SELECT ProductID, Name, Color, Quantity, ProductLevel, ProductAssemblyID, Sort FROM cte_BOM ORDER BY Sort;
The anchor member, which is used to retrieve the top-level products is in blue. The recursive member` is colored green. The recursive CTE is invoked by the portion colored black.
Here is the Bill of Materials results:
Some things that important to note with this query are:
- You’ll notice several columns are explicitly cast as VARCHAR(100). This to ensure the data types are the same for corresponding columns within the anchor and recursive query definition.
- The level is incremented by one upon each recursive query definition call. The anchor query starts with 1, and incremented from there.
- The level is used to help indent sub-assemblies within products. We use the REPLICATE string function as a helper.
- A sort field is constructed from using product and sub-assembly names. If this wasn’t done, then there wouldn’t be a convenient way to order the results by sub-assembly.
Sort Key as Aid in Hierarchies
The sort field is a key concept. When a recursive CTE runs remember it first gathers all the top-level product IDs. Then it uses these to collect products used in those assemblies. This process repeats until no further product ID is found.
Because of this, rows are returned one level at a time. This is called a breadth-first search. For a BOM we typically want to drill into a sub-assembly and then its component parts or sub-assemblies. This is called a depth-first search.
In order to simulate this, we construct the sort field. The sort field is constructing the “path” of assemblies (think subfolders on your computer). When these paths are sorted, the items appear in depth-first-search order.
Hopefully you now have an appreciation of what CTE’s are and why we may want to use them. In the next two article we’ll go into much greater detail on CTE’s, and when to use them.
What to Read Next!