Sometimes the criteria for
determining which set of records will be affected by a SELECT, UPDATE, DELETE
and/or INSERT statement cannot be obtained by hard coding the selection
criteria. Occasionally there is a need to use the results of a SELECT
statement to help determine which records are returned or are affected by a
T-SQL statement. When a SELECT statement is used inside another statement, the
inside SELECT statement is known as a subquery. Subqueries can help to
dynamically control the records affected by an UPDATE, DELETE or INSERT
statement, as well as to help determine the records that will be returned by a
SELECT statement. This article will show different examples of how to use a
subquery to help clarify the records affected or selected by a T-SQL statement.
What is a Subquery?
A subquery is a SELECT statement
that is nested within another T-SQL statement. A subquery SELECT statement if
executed independently of the T-SQL statement, in which it is nested, will
return a result set. Meaning a subquery SELECT statement can standalone and is
not depended on the statement in which it is nested. A subquery SELECT
statement can return any number of values, and can be found in, the column list
of a SELECT statement, a FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL
statement. A Subquery can also be used as a parameter to a function call.
Basically a subquery can be used anywhere an expression can be used.
Use of a Subquery in the Column List of a SELECT Statement
Suppose you would like to
see the last OrderID and the OrderDate for the last order that was shipped to Paris.
Along with that information, say you would also like to see the OrderDate for
the last order shipped regardless of the ShipCity. In addition to this, you
would also like to calculate the difference in days between the two different OrderDates.
Here is my T-SQL SELECT statement to accomplish this:
select top 1 OrderId,convert(char(10),
OrderDate,121) Last_Paris_Order,
(select convert(char(10),max(OrderDate),121) from
Northwind.dbo.Orders) Last_OrderDate,
datediff(dd,OrderDate,
(select Max(OrderDate)from Northwind.dbo.Orders)) Day_Diff
from Northwind.dbo.Orders
where ShipCity = ‘Paris’
order by OrderDate desc
The above code contains two
subqueries. The first subquery gets the OrderDate for the last order shipped
regardless of ShipCity, and the second subquery calculates the number of days
between the two different OrderDates. Here I used the first subquery to return
a column value in the final result set. The second subquery was used as a
parameter in a function call. This subquery passed the “max(OrderDate)” date
to the DATEDIFF function.
Use of a Subquery in the WHERE clause
A subquery can be used to control
the records returned from a SELECT by controlling which records pass the
conditions of a WHERE clause. In this case the results of the subquery would be
used on one side of a WHERE clause condition. Here is an example:
select distinct country from Northwind.dbo.Customers
where country not in (select distinct country from Northwind.dbo.Suppliers)
Here I have returned a list
of countries where customers live, but there is no supplier located in that
country. I suppose if you where trying to provide better delivery time to
customers, then you might target these countries to look for additional
suppliers.
Suppose a company would like
to do some targeted marketing. This targeted marketing would contact customers
in the country with the fewest number of orders. It is hoped that this targeted
marketing will increase the overall sales in the targeted country. Here is an
example that uses a subquery to return the customer contact information for the
country with the fewest number of orders:
select Country,CompanyName, ContactName, ContactTitle, Phone
from Northwind.dbo.Customers
where country =
(select top 1 country
from Northwind.dbo.Customers C
join
Northwind.dbo.Orders O
on C.CustomerId = O.CustomerID
group by country
order by count(*))
Here I have written a
subquery that joins the Customer and Orders Tables to determine the total
number of orders for each country. The subquery uses the “TOP 1” clause to
return the country with the fewest number of orders. The country with the
fewest number of orders is then used in the WHERE clause to determine which
Customer Information will be displayed.
Use of a Subquery in the FROM clause
The FROM clause normally
identifies the tables used in the T-SQL statement. You can think of each of
the tables identified in the FROM clause as a set of records. Well, a subquery
is just a set of records, and therefore can be used in the FROM clause just
like a table. Here is an example where a subquery is used in the FROM clause
of a SELECT statement:
select au_lname, au_fname, title from
(select au_lname, au_fname, au_id from pubs.dbo.authors
where state = ‘CA’) as a
join
pubs.dbo.titleauthor ta on a.au_id=ta.au_id
join
pubs.dbo.titles t on ta.title_id = t.title_id
Here I have used a subquery
to select only the author record information, if the author’s record has a
state column equal to “CA.” I have named the set returned from this subquery
with a table alias of “a.” I can then use this alias elsewhere in the T-SQL statement
to refer to the columns from the subquery by prefixing them with an “a”, as I
did in the “ON” clause of the “JOIN” criteria. Sometimes using a subquery in
the FROM clause reduces the size of the set that needs to be joined. Reducing
the number of records that have to be joined enhances the performance of
joining rows, and therefore speeds up the overall execution of a query.
Here is an example where I
used a subquery in the FROM clause of an UPDATE statement:
set nocount on
create table x(i int identity,
a char(1))
insert into x values (‘a’)
insert into x values (‘b’)
insert into x values (‘c’)
insert into x values (‘d’)
select * from xupdate x
set a = b.a
from (select max(a) as a from x) b
where i > 2select * from x
drop table x
Here I created a table named
“x,” that has four rows. Then I proceeded to update the rows where “i” was
greater than 2 with the max value in column “a”. I used a subquery in the FROM
clause of the UPDATE statement to identity the max value of column “a.”
Use of a Subquery in the HAVING clause
In the following example, I
used a subquery to find the number of books a publisher has published where the
publisher is not located in the state of California. To accomplish this I used a subquery in a HAVING
clause. Here is my code:
select pub_name, count(*) bookcnt
from pubs.dbo.titles t
join
pubs.dbo.publishers p
on t.pub_id = p.pub_id
group by pub_name
having p.pub_name in
(select pub_name from pubs.dbo.publishers where state <> ‘CA’)
Here my subquery returns the
pub_name values for all publishers that have a state value not equal to “CA.”
The HAVING condition then checks to see if the pub_name is in the set returned
by my subquery.
Conclusion
The subquery is a valuable
concept to understand. Creative use of a subquery allows the desired results
to be returned in a single T-SQL statement, and helps avoid the use of
temporary tables and cursors to solve complicated selection criteria. In
addition, depending on the query, a subquery might improve performance by
reducing the number of records that SQL Server needs to process. In my next article,
I will expand on the idea of subqueries by looking at correlated subqueries.