Introduction to Common Table Expressions
September 28, 2010
A Common Table Expression (CTEs) is an expression that returns a record set that can be referenced within the scope of a single INSERT, UPDATE, DELETE, INSERT or CREATE VIEW statement. Greg Larsen describes how to define a CTE and then provides some simple examples that demonstrate how to use a CTE.
With the introduction of SQL Server 2005, Microsoft introduced Common Table Expression, which is commonly called CTE. A CTE is an expression that returns a record set that can be referenced within the scope of a single INSERT, UPDATE, DELETE, INSERT or CREATE VIEW statement. They are commonly used to simplify complicated queries, to make them more readable, and maintainable. A CTE can be thought of as a derived table, although a CTE can reference itself. In this article, I will describe how to define a CTE and then provide some simple examples that demonstrates how to use a CTE.
Defining a CTE
You define a CTE using the WITH clause. Here is the syntax for the WITH clause:
WITH <expression_name> (Column1, Column2, …) AS (CTE Definition)
Here is a simple CTE declaration followed by a SELECT statement that references the CTE declaration. This CTE returns information from the AdventureWorks database:
WITH OrderCountByYear(OrderYear, OrderCount) AS ( SELECT DATEPART(Year,OrderDate), COUNT(*) FROM Sales.SalesOrderHeader GROUP BY DATEPART(Year,OrderDate) ) SELECT * FROM OrderCountByYear;
The CTE defined in this example calculates the number of Sales.SalesOrderDetail records per year based on the OrderDate column. This is a very trivial example. Note how the more complicated SELECT statement that groups the SalesOrderDetail records by year is contained within the CTE. The CTE is then referenced in the SELECT statement that following the CTE definition. Notice how simple that actual SELECT statement is, when all it does is reference the CTE.
CTEs can also be recursive. When you define a recursive CTE, they require two different parts in the CTE Definition: An anchor part and a recursive part. The anchor part defines the initial record set for the CTE, and its result set feeds the recursive part. The anchor part of a recursive CTE must be specified first in the CTE Definition. Note there can be multiple anchor part SELECT statements. The second part or recursive part is joined to the anchor part by a UNION ALL clause. If there are multiple anchor parts, the last anchor statement needs to be joined to the recursive part with a UNION ALL clause. The recursive part must return the same number of columns as the anchor part, as well as it references the CTE expression name as part of its statement. A recursive CTE continues until the recursive part returns an empty set (more on this later).
Here is a simple recursive CTE example that uses the AdventureWorks database:
WITH EmployeeHierarchy(EmployeeID, Title, OrgLevel) AS ( SELECT EmployeeID, Title, 1 AS OrgLevel FROM HumanResources.Employee WHERE ManagerID IS NULL UNION ALL SELECT E.EmployeeID, E.Title, H.OrgLevel + 1 AS OrgLevel FROM HumanResources.Employee E JOIN EmployeeHierarchy H ON H.EmployeeId = E.ManagerID) SELECT * FROM EmployeeHierarchy H ORDER BY OrgLevel;
The above recursive CTE returns all the EmployeeID's, their Title, and OrgLevel. The returned results are ordered by OrgLevel, where the lower the OrgLevel number the higher up on the management food chain for the given employee. OrgLevel=1 would will be associated with the president of the company.
Walking Through a Recursive CTE Example
To help you to better understand how a recursive CTE works, I will walk through an example, one recursive call at a time. In order to use a common relationship that we are all familiar with for this example let me use the common child, parent, grandparent relationship. For this example, I will be using the following table structure:
CREATE TABLE Person( ID int NOT NULL ,FirstName varchar(20) NULL ,LastName varchar(20) NOT Null ,MotherID INT NULL);
Here I have created a Person table that contains the ID of a person, along with their FirstName, LastName, and the ID of the person's mother (MotherID). I will be using this table to recursively use a CTE to display a person's name, their mother's name, and then their grandmother's name. Alternatively, another way to look at it is, I'm going to display three generations including the person I'm interested in. Here is some data to populate my table:
SET NOCOUNT ON; INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (1,'James','Tyman',NULL); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (2,'Mary','Tyman',NULL); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (3,'Doris','Smith',NULL); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (4,'Francis','Smith',NULL); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (5,'Kathy','Jackson',3); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (6,'Jim','Jackson',2); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (7,'Paul','Poe',NULL); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (8,'Heather','Poe',5) INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (9,'Sally','Jackson',8); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (10,'Joshua','Thomas',8); INSERT INTO Person (ID,FirstName, LastName, MotherID) VALUES (11,'Francis','Jackson',9);
If you go through this data, you will find there are five different generations of family members. Now that you understand the data and the requirements of my CTE, let me show you the code for my CTE:
-- Set the ID of the child generation we are exploring DECLARE @PersonID INT; SET @PersonID = 11; -- Generation CTE WITH Generation ( FirstGenFirstName, FirstGenLastName, SecondGenFirstName, SecondGenLastName, MotherId,ThirdGenFirstName, ThirdGenLastName, Generation) AS ( -- Anchor Statement SELECT X.FirstName, X.LastName, CAST('' AS VARCHAR(20)), CAST('' AS VARCHAR(20)), X.MotherID, CAST('' AS VARCHAR(20)), CAST('' AS VARCHAR(20)), 1 AS Generation FROM Person X -- @PersonID controls the child generation to start at WHERE @PersonID = X.ID UNION ALL -- Recursive Portion SELECT X.FirstName, X.LastName, G.FirstGenFirstName, G.FirstGenLastName, X.MotherID, G.SecondGenFirstName , G.SecondGenLastName, G.Generation + 1 AS Generation FROM Person X JOIN Generation G ON G.MotherID = X.ID WHERE Generation < 3 )
In the above code, I defined my "Generation" CTE. This code will use two different SELECT statements to return three generations from the Person table, starting with the child record first.
Prior to defining my CTE I first declare a local variable named @PersonID, and set it to ID number 11. This local variable controls which child record this CTE will use as the anchor for my recursive query. My CTE declaration starts with the "WITH Generation (…) as …"clause. Within the parenthesis, I have defined the columns that will be returned from each iteration of this CTE. If you examine further down in my CTE you will see that each SELECT statement outputs the same number of columns as I specified using the WITH clause. For each iteration through this CTE, these columns will be outputted even though I did not specific column alias on these columns within my SELECT statements.
In the body of the CTE, the two SELECT statements are joined together by a UNION ALL clause. This first SELECT statement is known as the anchor statement of the CTE. The second SELECT statement is known as the recursive part of the CTE. You can see that the recursive portion of the CTE references the CTE name ("Generation") within the FROM clause in the second SELECT statement.
To call this CTE I execute the following SELECT statement immediately following the declaration of this CTE:
-- Select from Generation CTE SELECT * FROM Generation;
When this SELECT statement is executed, it will call the CTE logic. Let me walk you through each iteration of the CTE, so you can get a better sense for how a recursive CTE works. When the CTE is executed the first time, the anchor SELECT statement is executed. This execution returns Person table records where the ID of the Person record is equal to 11. This SELECT statement only returns one record, which is represented by the following output:
FirstGenFirstNam FirstGenLastNam SecGenFirstNam SecGenLastNam MotherId ThirdGenFirstNam ThirdGenLastNam Generation ----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- ----------- Francis Jackson 9 1
Here you can see that "Francis Jackson" is returned for the first generation, or the child generation. I also populated the ID for Francis's mother, which in this case is 9. This ID column will be used in the recursive query to find the next generation. Note that I populated the second and third generation names with the empty string.
Now for each of the following iterations the recursive SELECT statement will be executed. The results of each recursive execution will be added to our initial result set by using the "UNION ALL" clause. The second iteration will return the Person table records where the ID of a Person record is 9 (or Francis's mother). This is accomplished by using the MotherID of the first iteration of the "Generation" CTE in the "ON" clause of the JOIN operation, when joining CTE output with the Person table. After the second iteration, the following records are part to the result set being built by this recursive CTE:
FirstGenFirstNam FirstGenLastNam SecGenFirstNam SecGenLastNam MotherId ThirdGenFirstNam ThirdGenLastNam Generation ----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- ----------- Francis Jackson 9 1 Sally Jackson Francis Jackson 8 2
If you look at this output, you can now see two generations of Person records, Francis and Francis's mother, Sally Jackson. Note that in the second record that is now included in the result set that Francis has moved to the second generation name fields, and Francis's mothers name information now occupies the first generation name fields.
For the third iteration of the CTE we will be locating Francis's grandmother's information by using the MotherID from Francis's mothers records ( ID 8). Here is the set of records returned from the third iteration of this CTE:
FirstGenFirstNam FirstGenLastNam SecGenFirstNam SecGenLastNam MotherId ThirdGenFirstNam ThirdGenLastNam Generation ----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- ----------- Francis Jackson 9 1 Sally Jackson Francis Jackson 8 2 Heather Poe Sally Jackson 5 Francis Jackson 3
Note that the CTE returned a third record, where "Heather Poe" (the grandmother) is listed as the first generation, "Sally Jackson" (the mother) moved into the second generation, and now "Francis Jackson" (the child) has moved into the third generation.
When the CTE is executed the fourth time no records are returned because the "WHERE Generation < 3" clause causes this iteration to return an empty set. This empty set is returned because we already have three different generation identified. Since the final iteration returns an empty record set, the recursive CTE is terminated and the CTE returns the three different records listed above.
If you remove the "WHERE Generation < 3" clause above, then all generations for "Francis" would be returned (five different generations for this example). However, this will cause the second and third iteration information to keep cycling new information into these output columns for each older generation displayed.
Simplification with CTEs
A CTE provides a way to simplify complicated code to make it more readable. CTEs also allow you an easy way to iteratively return a set of related records by recursively calling the defined CTE. Next time you have some complicated code, or need to reclusively return data try a CTE.