Introduction to Common Table Expressions


A Common Table Expression (CTEs) is an expression that returns a record set that can be referenced within the scope of a single INSERT, UPDATE, DELETE, INSERT or CREATE VIEW statement. Greg Larsen describes how to define a CTE and then provides some simple examples that demonstrate how to use a CTE.

With the introduction of SQL Server 2005, Microsoft introduced Common Table
Expression, which is commonly called CTE. A CTE is an expression that returns a
record set that can be referenced within the scope of a single INSERT, UPDATE,
DELETE, INSERT or CREATE VIEW statement. They are commonly used to simplify
complicated queries, to make them more readable, and maintainable. A CTE can be
thought of as a derived table, although a CTE can reference itself. In this
article, I will describe how to define a CTE and then provide some simple
examples that demonstrates how to use a CTE.

Defining a CTE

You define a CTE using the WITH clause. Here is the syntax for the WITH
clause:

WITH <expression_name> (Column1, Column2, …) AS (CTE Definition)

Where:

  • "<expression_name>" is the name
    of the CTE
  • "Column 1, Column2,…" are the
    names of the columns returned from the CTE (only required if column names
    are not provided, or are not unique)
  • "CTE Definition" is the set of
    statements that describes the definition of the CTE

Here is a simple CTE declaration followed by a SELECT statement that
references the CTE declaration. This CTE returns information from the
AdventureWorks database:

WITH OrderCountByYear(OrderYear, OrderCount)
AS (
SELECT DATEPART(Year,OrderDate), COUNT(*)
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(Year,OrderDate)
)
SELECT * FROM OrderCountByYear;

The CTE defined in this example calculates the number of
Sales.SalesOrderDetail records per year based on the OrderDate column. This is
a very trivial example. Note how the more complicated SELECT statement that
groups the SalesOrderDetail records by year is contained within the CTE. The
CTE is then referenced in the SELECT statement that following the CTE
definition. Notice how simple that actual SELECT statement is, when all it does
is reference the CTE.

CTEs can also be recursive. When you define a recursive CTE, they require
two different parts in the CTE Definition: An anchor part and a
recursive part. The anchor part defines the initial record set for the CTE, and
its result set feeds the recursive part. The anchor part of a recursive CTE
must be specified first in the CTE Definition. Note there can be multiple
anchor part SELECT statements. The second part or recursive part is joined to
the anchor part by a UNION ALL clause. If there are multiple anchor parts, the
last anchor statement needs to be joined to the recursive part with a UNION ALL
clause. The recursive part must return the same number of columns as the anchor
part, as well as it references the CTE expression name as part of its
statement. A recursive CTE continues until the recursive part returns an empty
set (more on this later).

Here is a simple recursive CTE example that uses the AdventureWorks
database:

WITH EmployeeHierarchy(EmployeeID, Title, OrgLevel)
AS (
SELECT EmployeeID, Title, 1 AS OrgLevel
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT E.EmployeeID, E.Title, H.OrgLevel + 1 AS OrgLevel
FROM HumanResources.Employee E
JOIN EmployeeHierarchy H
ON H.EmployeeId = E.ManagerID) 
SELECT *
FROM EmployeeHierarchy H
ORDER BY OrgLevel;

The above recursive CTE returns all the EmployeeID’s, their Title, and
OrgLevel. The returned results are ordered by OrgLevel, where the lower the
OrgLevel number the higher up on the management food chain for the given
employee. OrgLevel=1 would will be associated with the president of the
company.

Walking Through a Recursive CTE Example

To help you to better understand how a recursive CTE works, I will walk
through an example, one recursive call at a time. In order to use a common
relationship that we are all familiar with for this example let me use the
common child, parent, grandparent relationship. For this example, I will be
using the following table structure:

CREATE TABLE Person(
ID int NOT NULL 
,FirstName varchar(20) NULL
,LastName varchar(20) NOT Null
,MotherID INT NULL);

Here I have created a Person table that contains the ID of a person, along
with their FirstName, LastName, and the ID of the person’s mother (MotherID). I
will be using this table to recursively use a CTE to display a person’s name,
their mother’s name, and then their grandmother’s name. Alternatively, another
way to look at it is, I’m going to display three generations including the
person I’m interested in. Here is some data to populate my table:

SET NOCOUNT ON;
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (1,'James','Tyman',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (2,'Mary','Tyman',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (3,'Doris','Smith',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (4,'Francis','Smith',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (5,'Kathy','Jackson',3);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (6,'Jim','Jackson',2);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (7,'Paul','Poe',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (8,'Heather','Poe',5) 
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (9,'Sally','Jackson',8);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (10,'Joshua','Thomas',8);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (11,'Francis','Jackson',9);

If you go through this data, you will find there are five different
generations of family members. Now that you understand the data and the
requirements of my CTE, let me show you the code for my CTE:

-- Set the ID of the child generation we are exploring
DECLARE @PersonID INT;
SET @PersonID = 11;
-- Generation CTE
WITH Generation (
FirstGenFirstName, FirstGenLastName,
SecondGenFirstName, SecondGenLastName,
MotherId,ThirdGenFirstName, ThirdGenLastName,
Generation)
AS
(
-- Anchor Statement
SELECT X.FirstName, X.LastName,
CAST('' AS VARCHAR(20)), CAST('' AS VARCHAR(20)), X.MotherID,
CAST('' AS VARCHAR(20)), CAST('' AS VARCHAR(20)), 1 AS Generation
FROM Person X
-- @PersonID controls the child generation to start at
WHERE @PersonID = X.ID
UNION ALL
-- Recursive Portion
SELECT X.FirstName, X.LastName,
G.FirstGenFirstName, G.FirstGenLastName, X.MotherID,
G.SecondGenFirstName , G.SecondGenLastName, G.Generation + 1 AS Generation
FROM Person X
JOIN Generation G
ON G.MotherID = X.ID
WHERE Generation < 3
)

In the above code, I defined my "Generation" CTE. This code will
use two different SELECT statements to return three generations from the Person
table, starting with the child record first.

Prior to defining my CTE I first declare a local variable named @PersonID,
and set it to ID number 11. This local variable controls which child record
this CTE will use as the anchor for my recursive query. My CTE declaration
starts with the "WITH Generation (…) as …"clause. Within
the parenthesis, I have defined the columns that will be returned from each
iteration of this CTE. If you examine further down in my CTE you will see that
each SELECT statement outputs the same number of columns as I specified using
the WITH clause. For each iteration through this CTE, these columns will be
outputted even though I did not specific column alias on these columns within
my SELECT statements.

In the body of the CTE, the two SELECT statements are joined together by a
UNION ALL clause. This first SELECT statement is known as the anchor statement
of the CTE. The second SELECT statement is known as the recursive part of the
CTE. You can see that the recursive portion of the CTE references the CTE name
("Generation") within the FROM clause in the second SELECT statement.

To call this CTE I execute the following SELECT statement immediately
following the declaration of this CTE:

-- Select from Generation CTE
SELECT * FROM Generation;

When this SELECT statement is executed, it will call the CTE logic. Let me
walk you through each iteration of the CTE, so you can get a better sense for
how a recursive CTE works. When the CTE is executed the first time, the anchor
SELECT statement is executed. This execution returns Person table records where
the ID of the Person record is equal to 11. This SELECT statement only returns
one record, which is represented by the following output:

FirstGenFirstNam  FirstGenLastNam   SecGenFirstNam    SecGenLastNam     MotherId    ThirdGenFirstNam  ThirdGenLastNam   Generation
----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- -----------
Francis           Jackson                                               9                                               1

Here you can see that "Francis Jackson" is returned for the first
generation, or the child generation. I also populated the ID for Francis's
mother, which in this case is 9. This ID column will be used in the recursive
query to find the next generation. Note that I populated the second and third
generation names with the empty string.

Now for each of the following iterations the recursive SELECT statement will
be executed. The results of each recursive execution will be added to our
initial result set by using the "UNION ALL" clause. The second
iteration will return the Person table records where the ID of a Person record
is 9 (or Francis's mother). This is accomplished by using the MotherID of the
first iteration of the "Generation" CTE in the "ON" clause
of the JOIN operation, when joining CTE output with the Person table. After the
second iteration, the following records are part to the result set being built
by this recursive CTE:

FirstGenFirstNam  FirstGenLastNam   SecGenFirstNam    SecGenLastNam     MotherId    ThirdGenFirstNam  ThirdGenLastNam   Generation
----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- -----------
Francis           Jackson                                               9                                               1
Sally             Jackson           Francis           Jackson           8                                               2

If you look at this output, you can now see two generations of Person
records, Francis and Francis's mother, Sally Jackson. Note that in the second
record that is now included in the result set that Francis has moved to the
second generation name fields, and Francis's mothers name information now
occupies the first generation name fields.

For the third iteration of the CTE we will be locating Francis's
grandmother's information by using the MotherID from Francis's mothers records (
ID 8). Here is the set of records returned from the third iteration of this
CTE:

FirstGenFirstNam  FirstGenLastNam   SecGenFirstNam    SecGenLastNam     MotherId    ThirdGenFirstNam  ThirdGenLastNam   Generation
----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- -----------
Francis           Jackson                                               9                                               1
Sally             Jackson           Francis           Jackson           8                                               2
Heather           Poe               Sally             Jackson           5           Francis           Jackson           3

Note that the CTE returned a third record, where "Heather Poe"
(the grandmother) is listed as the first generation, "Sally Jackson"
(the mother) moved into the second generation, and now "Francis
Jackson" (the child) has moved into the third generation.

When the CTE is executed the fourth time no records are returned because the
"WHERE Generation < 3" clause causes this iteration to return an
empty set. This empty set is returned because we already have three different
generation identified. Since the final iteration returns an empty record set,
the recursive CTE is terminated and the CTE returns the three different records
listed above.

If you remove the "WHERE Generation < 3" clause above, then all
generations for "Francis" would be returned (five different generations
for this example). However, this will cause the second and third iteration
information to keep cycling new information into these output columns for each
older generation displayed.

Simplification with CTEs

A CTE provides a way to simplify complicated code to make it more readable.
CTEs also allow you an easy way to iteratively return a set of related records
by recursively calling the defined CTE. Next time you have some complicated
code, or need to reclusively return data try a CTE.

Related Articles

Give the New PIVOT and UNPIVOT Commands in SQL Server 2005 a Whirl

It All Adds Up

»


See All Articles by Columnist

Gregory A. Larsen

Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles