Do You Want to Learn SQL?

Take our free eight day course, where I'll teach you in simple to understand English all you need to get started learning SQL.

Common Table Expressions – The Ultimate Guide

·

·

The Common Table Expressions or CTE’s for short are used within SQL Server to simplify complex joins and subqueries, and to provide a means to query hierarchical data such as an organizational chart.  In this article, we’ll introduce you to common table expressions, the two types of the SQL CTEs, and their uses.  In addition, we’ll introduce CTE’s overall.  Once you’re familiar, I highly encourage you to read these articles as well:

Introduction to Common Table Expressions

A CTE (Common Table Expression) is a temporary result set that you can reference within another SELECT, INSERT, UPDATE, or DELETE statement.  They were introduced in SQL Server version 2005.  They are SQL-compliant and part of the ANSI SQL 99 specification.

A SQL CTE always returns a result set.  You can use them to simplify queries. For example, you could use one to eliminate a derived table from the main query body.

Note:  All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database.  Start learning SQL Server using my free guide Getting Started Using SQL Server.

What is a SQL CTE or Common Table Expression in SQL Server?

A SQL CTE (Common Table Expression) defines a temporary result set which you can then use in a SELECT statement.  It becomes a convenient way to manage complicated queries.

You define Common Table Expressions using the WITH statement.  You can define one or more common table expression in this fashion.

Here is a really simple example of one CTE:

WITH Employee_CTE (EmployeeNumber, Title)
AS
(SELECT NationalIDNumber,
        JobTitle
 FROM   HumanResources.Employee)
SELECT EmployeeNumber,
       Title
FROM   Employee_CTE

Let’s break this down a bit.

Common Table Expression - Parts
CTE Query Definition

The blue portion is the CTE.  Do you see the contained SQL is a fully formed query? It is the SQL CTE query definition:

SELECT NationalIDNumber,
       JobTitle
FROM   HumanResources.Employee

When you run it you see results like:

Common Table Expression - Definition Results
CTE Query Definition Results

Naming Columns

Notice that when we define the CTE we give the result a name as well its columns.  In this way a CTE acts like a VIEW.  The result and columns are named differently.  This allows you to encapsulate complicated query logic with the common table expression.

Now going back to the SQL CTE, notice that the WITH statement.  There you’ll see the name and columns are defined.  These columns correspond to the columns returned from the inner query.

SQL CTE (Common Table Expression) column mapping
CTE Query Definition Column Mappings

Finally notice that our final query references the SQL CTE and columns defined.

From our outer query’s perspective all it “sees” is this definition.  It is only concerned with the name and columns, not the inner SQL.

As such, the results from the CTE are:

Common Table Expressions - Result

Notice the column names, they’re based on the those defined in the CTE.

I want to point out that you can define more than one SQL CTE within a WITH statement.  This can help you simplify some very complicated queries which are ultimately joined together.  Each complicated piece can include in their own CTE which is then referred to and joined outside the WITH clause.

Why Do you need SQL CTE’s?

There are several reasons why you may want to use a SQL CTE over other methods.  Some of them include:

  • Readability – CTE’s promote readability. Rather than lump all you query logic into one large query, create several CTE’s, which are the combined later in the statement.  This lets you get the chunks of data you need and combine them in a final SELECT.
  • Substitute for a View – You can substitute a SQL CTE for a view. This is handy if you don’t have permissions to create a view object or you don’t want to create one as it is only used in this one query.
  • Recursion – Use CTE’s do create recursive queries, that is queries that can call themselves. This is handy when you need to work on hierarchical data such as organization charts.
  • Limitations – Overcome SELECT statement limitations, such as referencing itself (recursion), or performing GROUP BY using non-deterministic functions.
  • Ranking – Whenever you want to use ranking function such as ROW_NUMBER(), RANK(), NTILE() etc.

Common Table Expression Types

Common Table Expressions are categorized as:  Recursive CTE’s and Non-Recursive CTE’s.

Recursive CTE’s are common table expressions that reference themselves.  Recursion can be a pretty difficult topic to grasp, I really didn’t get it until I took a LISP class way back in 1986, but hopefully I can explain it to you.

We’ll go deep into recursive CTE’s in a separate post, but for now let me introduce to you recursion using this diagram:

Common Table Expression - Recursive CTE

Here you see picture of opposing mirrors.  Due to the reflection, it becomes a picture in a picture.

Recursive queries are like that.

What is a Recursive Query?

A recursive query repeatedly run on a subset of the data.  A recursive query is basically a query that calls itself.  At some point there is an end condition, so it doesn’t call itself indefinitely.

In a way when you look into the picture you can imagine each picture in a picture is the picture calling itself.  However, unlike the “infinite reflection” in the mirrors, there comes a point where a recursive query encounters the end condition and stop calling itself.

At that point, the recursion starts to unwind, collect and calculate data as it reviews each successive result.

Non-Recursive CTE’s, as the name implies, are don’t use recursion.  They don’t reference themselves.  They are easier to understand so we’ll look at them first in detail in the next article within this series.

We’ll first take a look at non-recursive CTE’s or CTE’s for short. Let’s dig into them by learning about the WITH clause.

Define Common Table Expression Using the WITH Clause

You start defining the SQL CTE using the WITH clause.

CTEs are table expressions. The are a temporary result that is used in the scope of an SELECT, INSERT, UPDATE, DELETE, or APPLY statement.

Here is a general example:

WITH TableExpressionName (Column1, Column2, …, ColumnN)
AS
(Query Definition)

The CTE has two parts.  The first part defines the name of the CTE and it the columns contained within it.  This is the table expression’s definition.  The second part is the query definition.  This is the SELECT statement used to populate the expression with rows.

You can define more than one CTE in a statement.  You see an example of this further along in this article.

When writing the query definition keep in mind that the following cannot be used within it:

  • ORDER BY, unless you also use as TOP clause
  • INTO
  • OPTION clause with query hints
  • FOR BROWSE

For more information regarding the WITH keyword, refer to the MSDN documentation.

Defining and Using Multiple Common Table Expressions

Here is an example using of TWO CTE’s, it’s a simple example, but it shows how two CTE’s are defined, and then used in an INNER JOIN

WITH   PersonCTE (BusinessEntityID, FirstName, LastName)
AS     (SELECT Person.BusinessEntityID,
               FirstName,
               LastName
        FROM   Person.Person
        WHERE  LastName LIKE 'C%'),
PhoneCTE (BusinessEntityID, PhoneNumber)
AS     (SELECT BusinessEntityID,
               PhoneNumber
        FROM   Person.PersonPhone)
SELECT FirstName,
       LastName,
       PhoneNumber
FROM   PersonCTE
INNER JOIN
PhoneCTE
ON PersonCTE.BusinessEntityID = PhoneCTE.BusinessEntityID;

The first common table expression is colored green, the second blue.  As you can see from the SELECT statement the CTE’s are joined as if they were tables. Hopefully you can see that as your queries become more complicated, CTE’s can become a really useful way to separate operations; therefore, simplify your final query.

We already talked about some reasons to use SQL CTE’s, so let’s, see this with some examples.

Reasons to use CTEs by Example

We’ve already seen there are several good reasons to use SQL CTE’s. Let’s look at each of these, in turn, using examples.  By the time we’re done, you’ll see how to use one or more Common Table Expressions in a statement to work with joins.  Also, you’ll see how you can replace a correlated subquery with a two non recursive CTEs.

Increase Readability

As queries get larger is can become really difficult to understand how they work.  In my mind, readability doesn’t mean the query has less lines.  Instead, it means that it is understandable to you and others.  I think CTEs help improve readability several ways.

They help separate out query logic.  If you are joining two complex queries, you can use non recursive CTEs to separate out the complexity of the queries from the actual join.  This not only helps in debugging, as you can independently run the query definition to test it, but you can more easily identify the parts used to do the join.

Also, CTEs improve readability by eliminating repeating expressions.  There are situations where you want to display an expression and then sort by it.  In SQL server, the expression has to be repeated in both the SELECT clause and ORDER BY; however, if you use a CTE, this isn’t the case.

Readability such as Reference the resulting table multiple times in the same statement.

For the first couple of examples, we’re going to assume our original query is the following:

SELECT   Substring(Person.LastName,1,1) + ' ' + Person.FirstName as SortValue,
         Employee.NationalIDNumber,
         Person.FirstName,
         Person.LastName,
         Employee.JobTitle
FROM     HumanResources.Employee
         INNER JOIN
         Person.Person
         ON HumanResources.Employee.BusinessEntityID = person.BusinessEntityID
WHERE    Person.LastName >= 'L'AND
         (Employee.JobTitle Like 'Marketing%' OR
                                 Employee.JobTitle Like 'Production%')
ORDER BY Substring(Person.LastName,1,1) + ' ' + Person.FirstName

We’ll now take this statement show you how it is easier to read and maintain using CTEs.  Of course our example isn’t that complex, so it is still pretty readable, but I think as we work through the solution, it will help you see how CTEs can really help you in real world situations where your queries can be thirty or more lines long!

Example: CTE joined to normal table

The first thing we can do is move the query used to retrieve person rows in to a CTE as so

WITH Employee_CTE (BusinessEntityID, NationalIDNumber, JobTitle)
AS       (SELECT BusinessEntityID,
                 NationalIDNumber,
                 JobTitle
          FROM   HumanResources.Employee
          WHERE  (Employee.JobTitle LIKE 'Marketing%'
                  OR Employee.JobTitle LIKE 'Production%'))
SELECT  Substring(Person.LastName, 1, 1) + ' ' + Person.FirstName as SortValue,  
         Employee_CTE.NationalIDNumber,
         Person.FirstName,
         Person.LastName,
         Employee_CTE.JobTitle
FROM     Employee_CTE
         INNER JOIN
         Person.Person
         ON Employee_CTE.BusinessEntityID = Person.BusinessEntityID
WHERE    Person.LastName >= 'L'
ORDER BY Substring(Person.LastName, 1, 1) + ' ' + Person.FirstName;

Notice that the CTE is defined to return three columns:  BusinessEntityID, NationalID, and JobTitle.

Also, you see we’ve pulled in the filtering criterial into the SQL CTEs query definition.  The overall query is still a bit messy, but hopefully you’re starting to see how CTEs can help to separate various query operations to make queries easier to read.

The next thing we could do to simplify the query is to create another CTE to handle Person table rows.

Example: CTE joined to another CTE

In the example below are two CTEs.  I’ve colored them blue and green respectively.  The blue colored CTE is the same as the above example, the green one is newly added, and really helps to simplify the overall query.

WITH     Employee_CTE (BusinessEntityID, NationalIDNumber, JobTitle)
AS       (SELECT BusinessEntityID,
                 NationalIDNumber,
                 JobTitle
          FROM   HumanResources.Employee
          WHERE  (Employee.JobTitle LIKE 'Marketing%'
                  OR Employee.JobTitle LIKE 'Production%')),
         Person_CTE (BusinessEntityID, FirstName, LastName, SortValue)
AS       (SELECT BusinessEntityID,
                 FirstName,
                 LastName,
                 Substring(Person.LastName, 1, 1) + ' ' + Person.FirstName
          FROM   Person.Person
          WHERE  Person.LastName >= 'L')
SELECT   Person_CTE.SortValue,
         Employee_CTE.NationalIDNumber,
         Person_CTE.FirstName,
         Person_CTE.LastName,
         Employee_CTE.JobTitle
FROM     Employee_CTE
         INNER JOIN
         Person_CTE
         ON Employee_CTE.BusinessEntityID = Person_CTE.BusinessEntityID
ORDER BY Person_CTE.SortValue;

In our original example the SortValue expression, Substring(Person.LastName, 1, 1) + ‘ ‘ + Person.FirstName, was repeated in both the SELECT clause and ORDER BY statement.  Now, by placing the SortValue expression within the SQL CTE, we only need to define it once!

Compare the original query to the final one:

You can join two common table expressions to themselves
Two CTE’s joined to one another.

I think the original query is harder to read and maintain for these reasons:

  1. The SortValue expression is listed twice.
  2. The filtering of both tables are in the same expression.
  3. Since queries for both the employee and person are in the same statement it is difficult to separate them for debugging purposes.

I like having the statement broke out into a SQL CTE because:

  1. If I need to verify what rows are being returned by a CTE, it’s as simple as running the query definition in a separate query window.
  2. Expression, such as that used for SortValue, aren’t repeated.
  3. The final Query Using CTEs is easier to read. You can focus on the join conditions, and not be distracted by filtering nor expressions.

Example: Join CTE to Self

As you have seen you can join a CTE to another table, or CTE.  But did you know you can join a CTE to itself?

Note: a table is joined to itself is a self-join.

Consider the following CTE:

WITH Department_CTE (Name, GroupName)
AS (SELECT Name,
           GroupName
      FROM HumanResources.Department)
SELECT D1.Name,
       D2.Name
FROM   Department_CTE AS D1
INNER JOIN
Department_CTE AS D2
ON d1.GroupName = d2.GroupName

Here you can see one CTE is defined as Department_CTE and that this is then used twice within the query.

The result of this query is to list combinations of department names within the same department group:

You can self join CTEs
CTE Self Join

Substitute for a View

As your SQL becomes more complex you’ll find that using views are a great way to hide the inner workings of a query and allow you to just focus on the results.  This is especially true when you’re working with data involving multiple joins.

With a view you can encapsulate all that query and join logic into the view.  Queries using the view are simpler and easier to read.

But there are times when it may not make sense to create a view.  For instance, if you don’t have permissions to create database objects, such as when using a third party database, or when the only time you’ll need to us the view is just once.

In these cases, you can use a common table expression.

When creating a CTE, keep in mind the SQL CTEs query definition is the same as the query used to create the view.

In the diagram below we show a query two ways.  In blue you’ll see defined as view, and then used in a query.  It is then shown in green as a CTE.

Substituting a View with a SQL CTE (Common Table Expression)
Substituting a View with a CTE

We then take the same query used to define the view and use that for the CTE query definition.  Finally, you can see that the CTEs final query is exactly like that used to reference the view.

CTE Can Replace Other SQL Objects

If you find it hard to work with derived tables or correlated subqueries, you can “break” the query apart, using common table expression. This can make it easier for you to get started learning SQL. Once you get more comfortable, you can then tackle the other topics!

CTE versus Derived Table

Derived tables are table results defined in the FROM clause.  Given that derived tables return a table expression, it should be no surprise that you can use CTEs in their place.

Consider the following query.  The derived tables are color coded in red and green.

SELECT Quota.TerritoryID,
       Quota.TerritoryQuota,
       Sales.TerritorySales,
       Sales.TerritorySales - Quota.TerritoryQuota
FROM   (SELECT   TerritoryID,
                 SUM(SalesQuota) AS TerritoryQuota
        FROM     Sales.SalesPerson
        GROUP BY TerritoryID) AS Quota
INNER JOIN
       (SELECT   SOH.TerritoryID,
                 SUM(SOH.TotalDue) AS TerritorySales
        FROM     Sales.SalesOrderHeader AS SOH
        GROUP BY SOH.TerritoryID) AS Sales
ON Quota.TerritoryID = Sales.TerritoryID

We can make it easier to read the statement by pulling out the table expression into a CTE.  Here is the same query using CTEs instead of derived tables:

WITH Quota (territoryid, quota)
 AS  (SELECT   territoryid,
               Sum(salesquota) AS TerritoryQuota
        FROM   sales.salesperson
      GROUP BY territoryid),
     Sales (territoryid, sales)
AS  (SELECT   SOH.territoryid,
              Sum(SOH.totaldue) AS TerritorySales
       FROM   sales.salesorderheader AS SOH
     GROUP BY SOH.territoryid)
SELECT Quota.territoryid,
       Quota.quota,
       Sales.sales,
       Sales.sales - Quota.quota
FROM   Quota
       INNER JOIN
       Sales
       ON Quota.territoryid = Sales.territoryid;

By using CTEs we move the code used to define the query results for Quota and Sales away from the portion used to combine the table together.

I think this makes it much easier to maintain.  For instance, if you need to make changes, it’s easier to know where to make a change.  Also, it makes it easier to be able to see what is actually being joined together.  The queries for each derived table are cluttering that portion of the SELECT statement.

CTEs versus Subqueries

CTEs and subqueries are similar, but CTEs have capabilities not found with subqueries.

Subqueries and CTEs:

  • Are table expressions created at run-time. They are temporary objects.
  • Can be created and used within stored procedures, triggers, and views.
  • Can be correlated or non-correlated. A CTE can reference a CTE previously defined in the same statement.
  • Can be in in SELECT, FROM, WHERE, HAVING, IN, EXISTS clauses.

There are some differences between subqueries and CTEs, notably:

  • A subquery is defined within an outer query. It is defined before calling it from within the query.
  • A Common Table Expression can reference itself, a subquery cannot.
  • They can reference other CTEs within the same WITH clause (Nest). A subquery cannot reference other subqueries.
  • A CTE can be referenced multiple times from a calling query. A subquery cannot be referenced.

CTE versus Correlated Subquery

Correlated subqueries can also be replaced with non recursive CTEs.  This shouldn’t be a new concept, as we’ve seen in the past it is easy to convert a correlated sub query into a join.  Given this, and knowing that joins are easily moved into non recursive CTEs, you can see the possibilities.

Let’s that the following correlated subquery which displays the sales order information.  The subquery is used to calculate the average line total for each sales order.

SELECT salesorderid,
       salesorderdetailid,
       linetotal,
       (SELECT Avg(linetotal)
        FROM   sales.salesorderdetail
        WHERE  salesorderid = SOD.salesorderid) AS AverageLineTotal
 FROM   sales.salesorderdetail SOD

To replicate this query using a CTE we first need to create query definition to calculate the average line total for each Sales Order.

This common table expression is then joined to the sales order table to obtain our final result.

WITH linetotal_cte (salesorderid, averagelinetotal)
AS   (SELECT    salesorderid,
                Avg(linetotal)
      FROM      sales.salesorderdetail
      GROUP  BY salesorderid)
 SELECT SOD.salesorderid,
        SOD.salesorderdetailid,
        SOD.linetotal,
        LT.averagelinetotal
 FROM   sales.salesorderdetail SOD
 INNER JOIN linetotal_cte LT
 ON LT.salesorderid = SOD.salesorderid

Use CTE’s to Overcome SQL Limitations

Use common table expression to overcome SQL limitations such as “enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.” (TechNet)

Suppose you wanted to know how many departments have the same number of employees.  What query could you use?

Finding the number of employees by department name is pretty easy.  We can use the following query to do so:

SELECT GroupName,
       Name,
       (SELECT Count(1)
        FROM   HumanResources.EmployeeDepartmentHistory AS H
        WHERE  D.DepartmentID = H.DepartmentID
               AND H.EndDate IS NULL) AS NumberEmployees
FROM    HumanResources.Department AS D;

The number of employees in each department is calculated using a scalar sub select.  This is colored red in the above query for easy identification.  Here is a sample result:

Results of scalar subselect. We'll group by these results using a CTE.
Scalar sub-select results.

Now let’s take this once step further and count how many departments have the same number of employees.  To do this we should group on NumberEmployees, but this statement is invalid:

SELECT (SELECT Count(1)
          FROM   HumanResources.EmployeeDepartmentHistory AS H
         WHERE  D.DepartmentID = H.DepartmentID
               AND H.EndDate IS NULL) AS NumberEmployees,
        Count(Name) SameCount
FROM   HumanResources.Department AS D
GROUP BY NumberEmployees

Due to the grouping restriction.  The subquer is a scarlar subselect and we’re trying to group by it.  That is a SQL violation!  As is

SELECT  (SELECT Count(1)
  FROM   HumanResources.EmployeeDepartmentHistory AS H
WHERE    D.DepartmentID = H.DepartmentID
        AND H.EndDate IS NULL) AS NumberEmployees,
        Count(Name) SameCount
FROM   HumanResources.Department AS D
GROUP BY  (SELECT Count(1)
           FROM   HumanResources.EmployeeDepartmentHistory AS H
           WHERE  D.DepartmentID = H.DepartmentID
                  AND H.EndDate IS NULL)

To solve this problem, we can define and use a CTE:

WITH     Department_CTE (GroupName, Name, NumberEmployees)
AS       (SELECT GroupName,
                 Name,
                (SELECT Count(1)
                 FROM   HumanResources.EmployeeDepartmentHistory AS H
                 WHERE  D.DepartmentID = H.DepartmentID
                        AND H.EndDate IS NULL) AS NumberEmployees
          FROM   HumanResources.Department AS D)
SELECT   NumberEmployees,
         Count(Name) SameCount
FROM     Department_CTE
GROUP BY NumberEmployees;

Now we query the CTE and group by the result of the scalar sub select (red text).  The results returned are:

You can use a CTE to overcome limitation. These results show how we can group on a scalar subselect.
Final Results of Grouping

From this we can see there are 5 departments with 6 employees.

Use with SQL Ranking functions

You can use ranking functions such as RANK() and NTILE() in conjunction with windowing functions to return the top items within a group.

Suppose we want to return the top sales within each territory.  To do this we can RANK sales within each territory as 1,2,…, and so on, and then select those with a rank of 1.

Here is the query you can use to generate sales ranks within territories:

SELECT RANK() OVER(PARTITION BY S.TerritoryID ORDER BY SOH.TOTALDue desc)
      ,S.TerritoryID
      ,SOH.SalesOrderNumber
      ,SOH.TotalDue
FROM  Sales.SalesPerson S
INNER JOIN Sales.SalesOrderHeader SOH
ON S.BusinessEntityID = SOH.SalesPersonID
WHERE S.TerritoryID is not NULL

Given this, then our job is to pick only those rows whose RANK() is one.  These are the top ranked sales within each territory.  You may think you could repeat the ranking expression in the WHERE clause to filter, but this isn’t possible.  Windows functions, such as partition are not allowed in the WHERE clause and if you try to “wrap” it into a subquery you’ll get an error since the subquery returns more than one row.

It seems we are stuck.  But this is where non recursive CTEs come to our aid.

With a CTE we can define a query to return the sales data ranked by territory and then query that to only return those items ranked first in each territory:

WITH   SalesRankCTE (SalesPersonID, Name, TerritoryID, SalesRanking, SalesOrderNumber, TotalDue)
AS     (SELECT SalesPersonID,
               P.FirstName + ' ' + P.LastName,
               S.TerritoryID,
               RANK() OVER (PARTITION BY S.TerritoryID ORDER BY SOH.TOTALDue DESC),
               SOH.SalesOrderNumber,
               SOH.TotalDue
        FROM   Sales.SalesPerson AS S
        INNER JOIN
        Sales.SalesOrderHeader AS SOH
        ON S.BusinessEntityID = SOH.SalesPersonID
        INNER JOIN
        Person.Person AS P
        ON P.BusinessEntityID = S.BusinessEntityID
        WHERE  S.TerritoryID IS NOT NULL)
SELECT SalesPersonID,
       Name,
       TerritoryID,
       SalesRanking,
       SalesOrderNumber,
       TotalDue
FROM   SalesRankCTE
WHERE  SalesRanking = 1;

To help you see this, the original query used to generate rankings is highlighted in blue.  The filter to only show the top ranked sales in each territory is in red.

Read More: SQL Window Functions [Visual Explanation] >>

Recursive CTEs

A recursive CTE is a CTE that references itself.  In doing so, the initial CTE is repeatedly executed, returning subsets of data, until the complete result is returned.

A recursive CTE is set with an WITH, but is set up differently inside. You’ll see that it can call itself!

Parts of A Recursive Common Table Expression

This recursive CTE consists of three main parts:

  1. Invocation – This is the statement using the CTE.
  2. Anchor Member – This portion executes first and is only called once.
  3. Recursive Member – The portion of the query is repeatedly executed until no rows are returned. The results of each execution are unioned with the prior results.
  4. Termination Check – The termination check ensures the query stops.

Read More: What is a Recursive CTE (Common Table Expression)? >>

The Anchor Member’s Importance

When this query is run the Anchor Member is run once and its derived rows combined, via the UNION ALL, with the Recursive Member.

This query is repeatedly run until no rows are returned.  This is why the termination check is so important.

If we weren’t checking for N < 50, the query would always return a row, thus complete with an error.  Here is an example of the error you can expect if the termination check is missing or faulty:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

Of course, there may be an instance where the maximum recursion should be greater than 100.  In this case you can adjust it using the MAXRECURSION query hint.  Here we’ve increased the recursion to 200.

WITH   cte
AS     (SELECT 1 AS n -- anchor member
        UNION ALL
        SELECT n + 1 -- recursive member
        FROM   cte
        WHERE  n < 50 -- terminator
       )
SELECT n
FROM   cte
OPTION (MAXRECURSION 200);

Recursive CTE Guidelines

When writing a Recursive CTE keep in mind the following guidelines:

  • A recursive CTE must contain at least an anchor member and recursive member. Multiple anchor and recursive members can be defined, but all the anchor members should be before the first recursive member.
  • The number of columns and their corresponding data types should be the same between the anchor and recursive members. This makes sense, as this is a condition for UNION ALL.
  • The FROM clause of the recursive member can only refer to the CTE expression once.

In addition, the following items are not allowed in the recursive member query definition:

  • GROUP BY
  • HAVING
  • LEFT, RIGHT, or OUTER JOIN
  • Scalar aggregation
  • SELECT DISTINCT
  • Subqueries
  • TOP

Note:  For a more complete list of guidelines and restrictions, please see the MSDN article WITH common_table_expression (Transact-SQL)

Let’s move on to a more comprehensive example.

Retrieve a Bill of Materials using a Recursive Common Table Expression

The Adventure Works company manufactures bicycles, and as you can imagine, they contain a lot of parts!  To keep it all straight the production department has put together a bill of materials or BOM for short.

A BOM lists all the parts for each bike.  Bikes are made up of many larger parts, called sub-assemblies, which in turn are made up of other sub-assemblies or components.

This creates a hierarchy of sorts with the product at the top.  You can think of this as a parent-child relationship.  The parent is the sub-assembly, and the parts making up the sub-assembly of the children.

For instance, for a Bike, the wheel would be a sub-assembly, which is composed of a rim and spokes.

Here are the table relationships for the AdventureWorks2012 BillOfMaterials:

Bill of Materials Database Relationships

As you can see all products (parts) are listed in the Product table.  The BillOfMaterials table contains pairs of ProductID numbers:

  • ProductAssemblyID – The sub-assembly containing the part (parent)
  • ComponentID – Part in the subassembly (child)

These are foreign keys referring to Product.ProductID values.

Given this suppose we want a list of a product sub-assemblies and its constituent parts?  How can we do this?

Top Level BOM Parts

Getting a list of the top-level parts is pretty straight forward.  They would be products corresponding to BillOfMaterials entries whose ProductAssemblyID is NULL.  Here is a query to get those products:

SELECT P.ProductID,
       P.Name,
       P.Color
FROM   Production.Product AS P
       INNER JOIN
       Production.BillOfMaterials AS BOM
       ON BOM.ComponentID = P.ProductID
       AND BOM.ProductAssemblyID IS NULL
       AND (BOM.EndDate IS NULL
            OR BOM.EndDate > GETDATE());

And the resulting products:

BOM Results - Top Level Products

So how do we get the products and their sub-assemblies?

One idea is to write another query that gets all products whose ProductAssemblyID is one of the top-level products and use a UNION ALL to combine the results.  We can easily do this with a subquery:

SELECT P.ProductID,
       P.Name,
       P.Color
FROM   Production.Product AS P
       INNER JOIN Production.BillOfMaterials AS BOM
       ON BOM.ComponentID = P.ProductID
       AND BOM.ProductAssemblyID IS NULL
       AND (BOM.EndDate IS NULL
            OR BOM.EndDate > GETDATE())
UNION ALL
SELECT P.ProductID,
       P.Name,
       P.Color
FROM   Production.Product AS P
       INNER JOIN Production.BillOfMaterials AS BOM
       ON BOM.ComponentID = P.ProductID
       AND (BOM.EndDate IS NULL
           OR BOM.EndDate > GETDATE())
       AND BOM.ProductAssemblyID IN
           (SELECT P.ProductID
            FROM   Production.Product AS P
                   INNER JOIN Production.BillOfMaterials AS BOM
                   ON BOM.ComponentID = P.ProductID
                   AND BOM.ProductAssemblyID IS NULL
                   AND (BOM.EndDate IS NULL
                        OR BOM.EndDate > GETDATE())
            );

The blue portion is the query returning the top-level products.  The green portion returns the sub-assemblies, and the portion in red is the subquery used to return the set of ProductID’s for the top-level products used to match ProductAssemblyID’s.

But there is an issue with the query.  It only returns Products or sub-assemblies used in the top-level products.  Sub-assemblies contained within these sub-assemblies aren’t considered.

Of course, we could continue to write queries to “dig” deeper into the hierarchy, but can you imagine?  The queries would get really long, complicated, and always limited by the number of queries union together.

This is where recursive CTE’s shine.

By using recursion, we’re able to use the self-referring nature of the CTE to continue to dig into deeper levels of the BOM.  Much like the counting example, we showed previously, with the following recursive CTE, the common table expression is repeatedly called until the termination condition is met.  In this query’s case, that is once no further sub-assemblies or components are found.

Final Recursive CTE to Try

Here is the recursive CTE you can try:

WITH cte_BOM (ProductID, Name, Color, Quantity, ProductLevel, ProductAssemblyID, Sort)
AS  (SELECT P.ProductID,
            CAST (P.Name AS VARCHAR (100)),
            P.Color,
            CAST (1 AS DECIMAL (8, 2)),
            1,
            NULL,
            CAST (P.Name AS VARCHAR (100))
     FROM   Production.Product AS P
            INNER JOIN
            Production.BillOfMaterials AS BOM
            ON BOM.ComponentID = P.ProductID
            AND BOM.ProductAssemblyID IS NULL
            AND (BOM.EndDate IS NULL
                OR BOM.EndDate > GETDATE())
     UNION ALL
     SELECT P.ProductID,
            CAST (REPLICATE('|---', cte_BOM.ProductLevel) + P.Name AS VARCHAR (100)),
            P.Color,
            BOM.PerAssemblyQty,
            cte_BOM.ProductLevel + 1,
            cte_BOM.ProductID,
            CAST (cte_BOM.Sort + '\' + p.Name AS VARCHAR (100))
     FROM   cte_BOM
            INNER JOIN Production.BillOfMaterials AS BOM
            ON BOM.ProductAssemblyID = cte_BOM.ProductID
            INNER JOIN Production.Product AS P
            ON BOM.ComponentID = P.ProductID
            AND (BOM.EndDate IS NULL
                OR BOM.EndDate > GETDATE())
    )
SELECT   ProductID,
         Name,
         Color,
         Quantity,
         ProductLevel,
         ProductAssemblyID,
         Sort
FROM     cte_BOM
ORDER BY Sort;

The anchor member, which is used to retrieve the top-level products is in blue.  The recursive member` is colored green.  The recursive CTE is invoked by the portion colored black.

Here is the Bill of Materials results:

BOM Final Results

Some things that important to note with this query are:

  1. You’ll notice several columns are explicitly cast as VARCHAR(100). This to ensure the data types are the same for corresponding columns within the anchor and recursive query definition.
  2. The level is incremented by one upon each recursive query definition call. The anchor query starts with 1, and incremented from there.
  3. The level is used to help indent sub-assemblies within products. We use the REPLICATE string function as a helper.
  4. A sort field is constructed from using product and sub-assembly names. If this wasn’t done, then there wouldn’t be a convenient way to order the results by sub-assembly.

Sort Key as Aid in Hierarchies

The sort field is a key concept.  When a recursive CTE runs remember it first gathers all the top-level product IDs.  Then it uses these to collect products used in those assemblies.  This process repeats until no further product ID is found.

Because of this, rows are returned one level at a time.  This is called a breadth-first search.  For a BOM we typically want to drill into a sub-assembly and then its component parts or sub-assemblies.  This is called a depth-first search.

In order to simulate this, we construct the sort field.  The sort field is constructing the “path” of assemblies (think subfolders on your computer).  When these paths are sorted, the items appear in depth-first-search order.

Conclusion

Hopefully you now have an appreciation of what CTE’s are and why we may want to use them.   In the next two article we’ll go into much greater detail on CTE’s, and when to use them.

What to Read Next!

7 responses to “Common Table Expressions – The Ultimate Guide”
  1. […] Jika Anda ingin mengetahui mengenai CTE lebih detail, silahkan klik disini   Common Table Expressions […]

  2. […] the results from the CTE (Common Table Expression), I’m able to compare the ROW_NUMBER and return a […]

  3. […] do we want to get this data?  We can use chained CTEs (Common Table Expression) so we can convert and then perform the calculations as […]

  4. […] you’re unfamiliar with CTEs I would encourage you to read Introduction to Common Table Expressions.  Once you’re familiar, then come back to this article and we’ll dig deeper into the reasons […]

  5. […] article will be discussing recursive Common Table Expressions (CTEs). Recursive CTEs are unique, such that they are allowed to reference their own. With this special […]

  6. […] To make it simpler to write and read, I used CTE’s (Common Table Expressions) to create this query.  If you are not familiar with CTE’s, think of them as views for now.  Note: you can read more about CTEs in my article Introduction to Common Table Expressions. […]

  7. […] Here is a link to another great article on Common Table Expressions: Common Table Expressions (Introduction to CTE’s) […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SQL Server