In last
month's article, I discussed what and how to use a subquery in a T-SQL
statement. This month I will expand on this subject by discussing correlated
subqueries. I will explain what a correlated subquery is, and show a number of
different examples on how to use a subquery in a T-SQL statement.
What is a Correlated Subquery?
A correlated subquery is a SELECT statement nested inside
another T-SQL statement, which contains a reference to one or more columns in
the outer query. Therefore, the correlated subquery can be said to be
dependent on the outer query. This is the main difference between a correlated
subquery and just a plain subquery. A plain subquery is not dependent on the
outer query, can be run independently of the outer query, and will return a
result set. A correlated subquery, since it is dependent on the outer query
will return a syntax errors if it is run by itself.
A correlated subquery will be executed many times while
processing the T-SQL statement that contains the correlated subquery. The
correlated subquery will be run once for each candidate row selected by the
outer query. The outer query columns, referenced in the correlated subquery,
are replaced with values from the candidate row prior to each execution. Depending
on the results of the execution of the correlated subquery, it will determine
if the row of the outer query is returned in the final result set.
Using a Correlated Subquery in a WHERE Clause
Suppose you want a report of all "OrderID's" where
the customer did not purchase more than 10% of the average quantity sold for a
given product. This way you could review these orders, and possibly contact
the customers, to help determine if there was a reason for the low quantity
order. A correlated subquery in a WHERE clause can help you produce this
report. Here is a SELECT statement that produces the desired list of "OrderID's":
select distinct OrderId
from Northwind.dbo.[Order Details] OD
where
Quantity <l; (select avg(Quantity) * .1
from Northwind.dbo.[Order Details]
where OD.ProductID = ProductID)
The correlated subquery in the above command is contained
within the parenthesis following the greater than sign in the WHERE clause
above. Here you can see this correlated subquery contains a reference to "OD.ProductID".
This reference compares the outer query's "ProductID" with the inner
query's "ProductID". When this query is executed, the SQL engine
will execute the inner query, the correlated subquery, for each "[Order
Details]" record. This inner query will calculate the average "Quantity"
for the particular "ProductID" for the candidate row being processed
in the outer query. This correlated subquery determines if the inner query
returns a value that meets the condition of the WHERE clause. If it does, the
row identified by the outer query is placed in the record set that will be
returned from the complete T-SQL SELECT statement.
The code below is another example that uses a correlated
subquery in the WHERE clause to display the top two customers, based on the
dollar amount associated with their orders, per region. You might want to
perform a query like this so you can reward these customers, since they buy the
most per region.
select CompanyName, ContactName, Address,
City, Country, PostalCode from Northwind.dbo.Customers OuterC
where CustomerID in (
select top 2 InnerC.CustomerId
from Northwind.dbo.[Order Details] OD join Northwind.dbo.Orders O
on OD.OrderId = O.OrderID
join Northwind.dbo.Customers InnerC
on O.CustomerID = InnerC.CustomerId
Where Region = OuterC.Region
group by Region, InnerC.CustomerId
order by sum(UnitPrice * Quantity * (1-Discount)) desc
)
order by Region
Here you can see the inner query is a correlated subquery
because it references "OuterC", which is the table alias for the "Northwind.DBO.Customer"
table in the outer query. This inner query uses the "Region" value
to calculate the top two customers for the region associated with the row being
processed from the outer query. If the "CustomerID" of the outer
query is one of the top two customers, then the record is placed in the record
set to be returned.
Correlated Subquery in the HAVING Clause
Say your organizations wants to run a yearlong incentive
program to increase revenue. Therefore, they advertise to your customers that
if each order they place, during the year, is over $750 you will provide them a
rebate at the end of the year at the rate of $75 per order they place. Below
is an example of how to calculate the rebate amount. This example uses a correlated
subquery in the HAVING clause to identify the customers that qualify to receive
the rebate. Here is my code for this query:
select C.CustomerID, Count(*)*75 Rebate
from Northwind.DBO.Customers C
join
Northwind.DBO.Orders O
on c.CustomerID = O.CustomerID
where Datepart(yy,OrderDate) = '1998'
group by C.CustomerId
having 750 < ALL(select sum(UnitPrice * Quantity * (1-Discount))
from Northwind.DBO.Orders O
join
Northwind.DBO.[Order Details] OD
on O.OrderID = OD.OrderID
where CustomerID = C.CustomerId
and Datepart(yy,OrderDate) = '1998'
group by O.OrderId
)
By reviewing this query, you can see I am using a correlated
query in the HAVING clause to calculate the total order amount for each
customer order. I use the "CustomerID" from the outer query and the
year of the order "Datepart(yy,OrderDate)", to help identify the
Order records associated with each customer, that were placed the year '1998'.
For these associated records I am calculating the total order amount, for each
order, by summing up all the "[Order Details]" records, using the
following formula: sum(UnitPrice * Quantity * (1-Discount)). If each and
every order for a customer, for year 1998 has a total dollar amount greater
than 750, I then calculate the Rebate amount in the outer query using this
formula "Count(*)*75 ".
SQL Server's query engine will only execute the inner correlated
subquery in the HAVING clause for those customer records identified in the
outer query, or basically only those customer that placed orders in "1998".
Performing an Update Statement Using a Correlated Subquery
A correlated subquery can even be used in an update
statement. Here is an example:
create table A(A int, S int)
create table B(A int, B int)
set nocount on
insert into A(A) values(1)
insert into A(A) values(2)
insert into A(A) values(3)
insert into B values(1,1)
insert into B values(2,1)
insert into B values(2,1)
insert into B values(3,1)
insert into B values(3,1)
insert into B values(3,1)
update A
set S = (select sum(B)
from B
where A.A = A group by A)
select * from A
drop table A,B
Here is the result set I get
when I run this query on my machine:
A S
----------- -----------
1 1
2 2
3 3
In my query above, I used the correlated subquery to update
column A in table A with the sum of column B in table B for rows that have the
same value in column A as the row being updated.
Conclusion
Let me summarize. A subquery and a correlated subquery are
SELECT queries coded inside another query, known as the outer query. The
correlated subquery and the subquery help determine the outcome of the result
set returned by the complete query. A subquery, when executed independent of
the outer query, will return a result set, and is therefore not dependent on
the outer query. Where as, a correlated subquery cannot be executed
independently of the outer query because it uses one or more references to
columns in the outer query to determine the result set returned from the
correlated subquery. I hope that you now understand the different of
subqueries and correlated subqueries, and how they can be used in your T-SQL
code.
»
See All Articles by Columnist Gregory A. Larsen