Introduction
Have you ever been in a situation where you needed to write a query that needed to do comparisons or access data from the subsequent rows along with the data from the current row? This article discusses different ways to write these types of queries and more specifically examines LEAD and LAG analytics functions, which were introduced with SQL Server 2012, and helps you understand how leveraging these functions can aid you in such situations.
Accessing Prior or Subsequent Rows
SQL Server 2012 introduced LAG and LEAD functions for accessing prior or subsequent rows along with the current row but before we go into the details of these functions, let me explain how you can write these queries in earlier versions of SQL Server.
Let’s first create a table and load some sample data with the script below. This table contains customer information along with when a specific plan for the customer was started, assuming when a new plan starts, the older one gets ended automatically.
DECLARE @CustomerPlan TABLE ( CustomerCode VARCHAR(10), PlanCode VARCHAR(10), StartDate DATE ) INSERT INTO @CustomerPlan VALUES ('C00001', 'P00001', '1-Sep-2014') INSERT INTO @CustomerPlan VALUES ('C00001', 'P00002', '1-Oct-2014') INSERT INTO @CustomerPlan VALUES ('C00001', 'P00003', '10-Oct-2014') INSERT INTO @CustomerPlan VALUES ('C00001', 'P00004', '25-Oct-2014') INSERT INTO @CustomerPlan VALUES ('C00002', 'P00001', '1-Oct-2014') INSERT INTO @CustomerPlan VALUES ('C00002', 'P00002', '1-Nov-2014') SELECT * FROM @CustomerPlan;
You can use Common Table Expression (CTE) along with the ROW_NUMBER ranking function to access subsequent rows in the same result set. For example, for a given customer I want to know the expiration date for the current plan based on the activation date of the next plan. Basically, when a new plan is started the previous plan is automatically ended and hence the end date for a previous plan is the start date minus one day of the next plan:
WITH CTE as ( SELECT RN = ROW_NUMBER() OVER (PARTITION BY CustomerCode ORDER BY StartDate ASC), * FROM @CustomerPlan ) SELECT [Current Row].*, ISNULL(DATEADD(DAY, -1, [Next Row].StartDate), '31-Dec-2099') AS EndDate FROM CTE [Current Row] LEFT JOIN CTE [Next Row] ON [Current Row].CustomerCode = [Next Row].CustomerCode AND [Next Row].RN = [Current Row].RN + 1 ORDER BY [Current Row].CustomerCode, [Current Row].RN;
In the image above, you can see the plan P00002 of customer C00001 starts on 1st October 2014 and hence the end date for plan P00001 is the 30th September 2014 (1st October 2014 minus one day), likewise P00003 starts on 10th October and hence the end date for P00002 ends on 9th October 2014.
LEAD Function in SQL Server 2012
Though it was very much possible to write queries as shown above using CTE and ranking function to access prior or subsequent rows in earlier versions of SQL Server, SQL Server 2012 has simplified it further.
SQL Server 2012 introduced LEAD analytic function to access the subsequent row (or columns from the subsequent row) without using self-join or CTE or ranking function.
There are three main parameters for the LEAD function along with the OVER clause, which works in the same way as it works with ranking functions:
- With the first parameter, you can specify a scalar expression or column name whose value from the subsequent row is to be returned.
- With the second parameter, you can specify an offset to access not only the next immediate row but any row after the current row. For example, its default value of 1 accesses the next immediate row whereas a value of 3 accesses the third row from the current row.
- With the third parameter, you can specify the default value to be returned in case the returned value is NULL.
The script below is a re-write of the above script (which uses CTE and ranking function) using the LEAD function, which gives the same result as above, but as you can see this query is more simplified and does not uses self-join.
SELECT *, DATEADD(DAY, -1, LEAD(StartDate, 1,'01-Jan-2100') OVER (PARTITION BY CustomerCode ORDER BY StartDate ASC)) AS EndDate FROM @CustomerPlan
LAG Function in SQL Server 2012
Like the LEAD function, LAG is another analytic function introduced in SQL Server 2012 that has the same three parameters as the LEAD function along with the OVER clause, which works in same way as it works with ranking functions. But unlike the LEAD function, which allows accessing subsequent rows, the LAG function allows accessing previous rows from the same result set, again without using self-join. With the second offset parameter, you can specify the physical offset that comes before the current row.
If you look at the structure of the table we created above, each row contains a customer’s current plan. With the help of the LAG function, we will find out the previous (plan from the last row) as shown in the script below:
SELECT CustomerCode, PlanCode AS CurrentPlanCode, LAG(PlanCode, 1, 'NA') OVER (PARTITION BY CustomerCode ORDER BY StartDate ASC) AS LastPlan FROM @CustomerPlan;
If you notice in the figure below, the second row contains the P00002 as the current plan and P00001 as the last plan (coming the immediate previous row); likewise the third row contains P00003 as the current plan and P00002 as the last plan (again coming from the immediate previous row).
LEAD and LAG functions are flexible to let you specify the number of rows to move backward or forward from the current row with the offset parameter. For example, as you can see in the script below I have specified 2 as offset with the LAG function and hence for the third row where the current plan is P00003, the last plan is P00001 (as it goes two rows back to pick up the value).
SELECT CustomerCode, PlanCode AS CurrentPlanCode, LAG(PlanCode, 2, 'NA') OVER (PARTITION BY CustomerCode ORDER BY StartDate ASC) AS LastPlan FROM @CustomerPlan;
Conclusion
In this article I discussed how you can use CTE and ranking function to access or query data from previous or subsequent rows. Then I also talked about how to leverage LEAD and LAG analytics functions, introduced with SQL Server 2012, to achieve the same without writing self-join query using CTE and ranking function.