Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Sep 28, 2010

Introduction to Common Table Expressions

By Gregory A. Larsen

A Common Table Expression (CTEs) is an expression that returns a record set that can be referenced within the scope of a single INSERT, UPDATE, DELETE, INSERT or CREATE VIEW statement. Greg Larsen describes how to define a CTE and then provides some simple examples that demonstrate how to use a CTE.

With the introduction of SQL Server 2005, Microsoft introduced Common Table Expression, which is commonly called CTE. A CTE is an expression that returns a record set that can be referenced within the scope of a single INSERT, UPDATE, DELETE, INSERT or CREATE VIEW statement. They are commonly used to simplify complicated queries, to make them more readable, and maintainable. A CTE can be thought of as a derived table, although a CTE can reference itself. In this article, I will describe how to define a CTE and then provide some simple examples that demonstrates how to use a CTE.

Defining a CTE

You define a CTE using the WITH clause. Here is the syntax for the WITH clause:

WITH <expression_name> (Column1, Column2, …) AS (CTE Definition)

Where:

  • "<expression_name>" is the name of the CTE
  • "Column 1, Column2,…" are the names of the columns returned from the CTE (only required if column names are not provided, or are not unique)
  • "CTE Definition" is the set of statements that describes the definition of the CTE

Here is a simple CTE declaration followed by a SELECT statement that references the CTE declaration. This CTE returns information from the AdventureWorks database:

WITH OrderCountByYear(OrderYear, OrderCount)
AS (
SELECT DATEPART(Year,OrderDate), COUNT(*)
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(Year,OrderDate)
)
SELECT * FROM OrderCountByYear;

The CTE defined in this example calculates the number of Sales.SalesOrderDetail records per year based on the OrderDate column. This is a very trivial example. Note how the more complicated SELECT statement that groups the SalesOrderDetail records by year is contained within the CTE. The CTE is then referenced in the SELECT statement that following the CTE definition. Notice how simple that actual SELECT statement is, when all it does is reference the CTE.

CTEs can also be recursive. When you define a recursive CTE, they require two different parts in the CTE Definition: An anchor part and a recursive part. The anchor part defines the initial record set for the CTE, and its result set feeds the recursive part. The anchor part of a recursive CTE must be specified first in the CTE Definition. Note there can be multiple anchor part SELECT statements. The second part or recursive part is joined to the anchor part by a UNION ALL clause. If there are multiple anchor parts, the last anchor statement needs to be joined to the recursive part with a UNION ALL clause. The recursive part must return the same number of columns as the anchor part, as well as it references the CTE expression name as part of its statement. A recursive CTE continues until the recursive part returns an empty set (more on this later).

Here is a simple recursive CTE example that uses the AdventureWorks database:

WITH EmployeeHierarchy(EmployeeID, Title, OrgLevel)
AS (
SELECT EmployeeID, Title, 1 AS OrgLevel
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT E.EmployeeID, E.Title, H.OrgLevel + 1 AS OrgLevel
FROM HumanResources.Employee E
JOIN EmployeeHierarchy H
ON H.EmployeeId = E.ManagerID) 
SELECT *
FROM EmployeeHierarchy H
ORDER BY OrgLevel;

The above recursive CTE returns all the EmployeeID's, their Title, and OrgLevel. The returned results are ordered by OrgLevel, where the lower the OrgLevel number the higher up on the management food chain for the given employee. OrgLevel=1 would will be associated with the president of the company.

Walking Through a Recursive CTE Example

To help you to better understand how a recursive CTE works, I will walk through an example, one recursive call at a time. In order to use a common relationship that we are all familiar with for this example let me use the common child, parent, grandparent relationship. For this example, I will be using the following table structure:

CREATE TABLE Person(
ID int NOT NULL 
,FirstName varchar(20) NULL
,LastName varchar(20) NOT Null
,MotherID INT NULL);

Here I have created a Person table that contains the ID of a person, along with their FirstName, LastName, and the ID of the person's mother (MotherID). I will be using this table to recursively use a CTE to display a person's name, their mother's name, and then their grandmother's name. Alternatively, another way to look at it is, I'm going to display three generations including the person I'm interested in. Here is some data to populate my table:

SET NOCOUNT ON;
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (1,'James','Tyman',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (2,'Mary','Tyman',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (3,'Doris','Smith',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (4,'Francis','Smith',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (5,'Kathy','Jackson',3);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (6,'Jim','Jackson',2);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (7,'Paul','Poe',NULL);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (8,'Heather','Poe',5) 
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (9,'Sally','Jackson',8);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (10,'Joshua','Thomas',8);
INSERT INTO Person (ID,FirstName, LastName, MotherID)
VALUES (11,'Francis','Jackson',9);

If you go through this data, you will find there are five different generations of family members. Now that you understand the data and the requirements of my CTE, let me show you the code for my CTE:

-- Set the ID of the child generation we are exploring
DECLARE @PersonID INT;
SET @PersonID = 11;
-- Generation CTE
WITH Generation (
FirstGenFirstName, FirstGenLastName,
SecondGenFirstName, SecondGenLastName,
MotherId,ThirdGenFirstName, ThirdGenLastName,
Generation)
AS
(
-- Anchor Statement
SELECT X.FirstName, X.LastName,
CAST('' AS VARCHAR(20)), CAST('' AS VARCHAR(20)), X.MotherID,
CAST('' AS VARCHAR(20)), CAST('' AS VARCHAR(20)), 1 AS Generation
FROM Person X
-- @PersonID controls the child generation to start at
WHERE @PersonID = X.ID
UNION ALL
-- Recursive Portion
SELECT X.FirstName, X.LastName,
G.FirstGenFirstName, G.FirstGenLastName, X.MotherID,
G.SecondGenFirstName , G.SecondGenLastName, G.Generation + 1 AS Generation
FROM Person X
JOIN Generation G
ON G.MotherID = X.ID
WHERE Generation < 3
)

In the above code, I defined my "Generation" CTE. This code will use two different SELECT statements to return three generations from the Person table, starting with the child record first.

Prior to defining my CTE I first declare a local variable named @PersonID, and set it to ID number 11. This local variable controls which child record this CTE will use as the anchor for my recursive query. My CTE declaration starts with the "WITH Generation (…) as …"clause. Within the parenthesis, I have defined the columns that will be returned from each iteration of this CTE. If you examine further down in my CTE you will see that each SELECT statement outputs the same number of columns as I specified using the WITH clause. For each iteration through this CTE, these columns will be outputted even though I did not specific column alias on these columns within my SELECT statements.

In the body of the CTE, the two SELECT statements are joined together by a UNION ALL clause. This first SELECT statement is known as the anchor statement of the CTE. The second SELECT statement is known as the recursive part of the CTE. You can see that the recursive portion of the CTE references the CTE name ("Generation") within the FROM clause in the second SELECT statement.

To call this CTE I execute the following SELECT statement immediately following the declaration of this CTE:

-- Select from Generation CTE
SELECT * FROM Generation;

When this SELECT statement is executed, it will call the CTE logic. Let me walk you through each iteration of the CTE, so you can get a better sense for how a recursive CTE works. When the CTE is executed the first time, the anchor SELECT statement is executed. This execution returns Person table records where the ID of the Person record is equal to 11. This SELECT statement only returns one record, which is represented by the following output:

FirstGenFirstNam  FirstGenLastNam   SecGenFirstNam    SecGenLastNam     MotherId    ThirdGenFirstNam  ThirdGenLastNam   Generation
----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- -----------
Francis           Jackson                                               9                                               1

Here you can see that "Francis Jackson" is returned for the first generation, or the child generation. I also populated the ID for Francis's mother, which in this case is 9. This ID column will be used in the recursive query to find the next generation. Note that I populated the second and third generation names with the empty string.

Now for each of the following iterations the recursive SELECT statement will be executed. The results of each recursive execution will be added to our initial result set by using the "UNION ALL" clause. The second iteration will return the Person table records where the ID of a Person record is 9 (or Francis's mother). This is accomplished by using the MotherID of the first iteration of the "Generation" CTE in the "ON" clause of the JOIN operation, when joining CTE output with the Person table. After the second iteration, the following records are part to the result set being built by this recursive CTE:

FirstGenFirstNam  FirstGenLastNam   SecGenFirstNam    SecGenLastNam     MotherId    ThirdGenFirstNam  ThirdGenLastNam   Generation
----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- -----------
Francis           Jackson                                               9                                               1
Sally             Jackson           Francis           Jackson           8                                               2

If you look at this output, you can now see two generations of Person records, Francis and Francis's mother, Sally Jackson. Note that in the second record that is now included in the result set that Francis has moved to the second generation name fields, and Francis's mothers name information now occupies the first generation name fields.

For the third iteration of the CTE we will be locating Francis's grandmother's information by using the MotherID from Francis's mothers records ( ID 8). Here is the set of records returned from the third iteration of this CTE:

FirstGenFirstNam  FirstGenLastNam   SecGenFirstNam    SecGenLastNam     MotherId    ThirdGenFirstNam  ThirdGenLastNam   Generation
----------------- ----------------- ----------------- ----------------- ----------- ----------------- ----------------- -----------
Francis           Jackson                                               9                                               1
Sally             Jackson           Francis           Jackson           8                                               2
Heather           Poe               Sally             Jackson           5           Francis           Jackson           3

Note that the CTE returned a third record, where "Heather Poe" (the grandmother) is listed as the first generation, "Sally Jackson" (the mother) moved into the second generation, and now "Francis Jackson" (the child) has moved into the third generation.

When the CTE is executed the fourth time no records are returned because the "WHERE Generation < 3" clause causes this iteration to return an empty set. This empty set is returned because we already have three different generation identified. Since the final iteration returns an empty record set, the recursive CTE is terminated and the CTE returns the three different records listed above.

If you remove the "WHERE Generation < 3" clause above, then all generations for "Francis" would be returned (five different generations for this example). However, this will cause the second and third iteration information to keep cycling new information into these output columns for each older generation displayed.

Simplification with CTEs

A CTE provides a way to simplify complicated code to make it more readable. CTEs also allow you an easy way to iteratively return a set of related records by recursively calling the defined CTE. Next time you have some complicated code, or need to reclusively return data try a CTE.

Related Articles

Give the New PIVOT and UNPIVOT Commands in SQL Server 2005 a Whirl
It All Adds Up

» See All Articles by Columnist Gregory A. Larsen



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM