T-SQL Best Practices – Don’t Use Scalar Value Functions in Column List or WHERE Clauses

Making sure your queries run as fast as possible is
important to ensure your application gets the kind of performance that users
require or expect. Query performance can be affected by lots of different
factors. In this article I will be introducing another best practice of how
and how not to use a function within your T-SQL statements. Where you place
your function within your T-SQL statements determines how your query will be
processed and can severely impact the query execution plan and the performance
of your query.

Types of Functions

There are different types of functions. Not all functions
are created equal. Some functions read data while others do not. Some
functions return multiple records while others return only a single record.
For the purpose of this article, I will be discussing the use of scalar value
functions and how they affect the performance of your query.

I’m sure most of you already know what a scalar value
function is. But just in case you don’t I’ll provide you a definition, so we
are all on the same page. A scalar value function is a function that returns
a single value.

Keep in mind not all scalar functions are created equal.
Some scalar value functions reference data in tables where others don’t. For
the purpose this article we will be talking about scalar value functions that
accept input parameters, and then use those parameters to read some data from
your database, and return a value back to the calling T-SQL statement that
referenced the function.

Do’s and Don’ts When Using Scalar Functions

There are probably a number of do’s and don’ts related to
scalar functions. I’m going to only discuss a single aspect of using scalar
value functions. In this article, I will be discussing how you reference your
scalar value function within a T-SQL SELECT statement. The placement of a
function will affect how the query is processed and how that can lead to poor
performing queries or well performing queries.

You can place a scalar value function in many different
places within a single SELECT statement. For the purpose of this article I
will be discussing the impacts of placing a scalar value function within a
SELECT column list and/or a WHERE clause. I will then be showing you
alternatives for using a scalar function in these locations.

To demonstrate how placing a scalar value function in a
column list and within the WHERE clause is a bad practice, let’s look at a
couple of coding examples. Here is the first SELECT statement to review:


USE AdventureWorks
GO

CREATE FUNCTION dbo.fn_GetName
(
@CustomerID INT
)
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @CustomerName VARCHAR(100);
SELECT @CustomerName = PC.LastName + ‘, ‘ + PC.FirstName
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID
WHERE SC.CustomerID = @CustomerID
RETURN @CustomerName
END
GO

SELECT dbo.fn_GetName(CustomerID)
,CustomerType
FROM Sales.Customer
WHERE dbo.fn_GetName(CustomerID) IS NOT NULL
GO

If you look at this code, you can see I’m creating a
function that accepts a CustomerID and then returns the LastName and FirstName
concatenated together with a comma in between the last and first name of the
customer. Now look at the SELECT statement and note that I am calling the
function from within the column list.

If you run the first chunk of code that creates the function
in one batch and than run the SELECT statement in a separate batch you will see
the SELECT statement takes a little while to run. On my machine, it took around
10 seconds. To identify why this SELECT statement takes so long and why I
consider this a bad practice, you need to also have Profiler running and
monitoring the SP:Completed and SQL:BatchCompleted events while you run this
query. Here is a screen print of my Profiler session for the tail end of the
processing associated with the above SELECT statement.

If you look at this output, you can see that the SELECT
statement is being run over and over again. By putting the function call in
the column list and in the WHERE clause, the function is being called multiple
times. You can see this above by noting all the different SP:Completed events.
Also, note the CPU, Reads, Write and Duration on the SQL_BatchCompleted EventClass
line at the bottom of this screenshot. This event captures the true cost of
running this query. Remember these numbers. Later I will show you how to
write this query differently to make it more efficient.

Now look at this code:


SELECT PC.LastName + ‘, ‘ + PC.FirstName [Customer Name]
,SC.CustomerType
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID
WHERE dbo.fn_GetName(SC.CustomerID) = ‘Roy, Luke’

Here I am calling the same function but this time only in
the WHERE clause. If you run this code, you will see it takes a while to
return data. In my case, it took 3 seconds. Once again, if we monitor this
command with Profiler we see the following trace events:

Once again, by looking at this output you can see that the
SELECT statement appears to be executed over and over again. This occurs
because the fn_GetName function once again needs to be executed multiple
times. All these extra executions of this function are causing this query to
be very inefficient. The final accumulated cost for this query once again can
be noted by looking at the CPU, Reads, Writes, and Duration columns on the SQL:BatchCompleted
event at the bottom of the Profiler output.

There are a number of different options for re-writing this
query, like using JOIN logic, an inline table value function or a view. I’ll
go through each one of these options and compare the performance of each one of
these against the original query above.

First, I’m going to re-write the first query above using
JOIN logic. Here is my new code using the JOIN method:


SELECT PC.LastName + ‘, ‘ + PC.FirstName [Customer Name]
,SC.CustomerType
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID
GO

In this code I have now taken the tables used in the
function and brought them into the FROM clause. I’ve also replaced the
function call in the column list and am now just calculating the “Customer
Name” column by concatenating the two Person.Contact columns directly in the
column list. The profiler output below shows the execution statistics for this
re-write:

By looking at the above output, you can see that my re-write
only has one event displayed in Profile. Moreover, by looking at the CPU,
Reads, Writes, and Duration columns you can see this re-write used significantly
less resources. If we only look at the I/O savings of this re-write over the
original query, you will see that I saved 130,983 I/O. That is a significant
savings.

For the next re-write, I will take my original scalar value
function and turn it into an inline table value function. I then will use this
inline table value function in a re-written T-SQL SELECT statement that uses a
CROSS APPLY operator. First, let’s review the inline table value function:


CREATE FUNCTION fn_GetNameTable(@CustomerID int)
RETURNS TABLE
AS
RETURN (
SELECT LastName + ‘, ‘ + FirstName [Customer Name]
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID
WHERE SC.CustomerID = @CustomerID
)

This code is almost exactly the same as my original function
code. The only difference here is this function now returns a TABLE instead of
a VARCHAR column. To use this new inline table value function I will now need
to use a CROSS APPLY operator. Here is my new T-SQL SELECT statement:


SELECT I.[Customer Name]
,SC.CustomerType
FROM Sales.Customer SC
CROSS APPLY fn_GetNameTable(SC.CustomerID) I

Here I referenced my new function within the FROM part of
the statement. I use the CROSS APPLY operator to more or less join the Sales.Customer
information with the results of my table value function based on CustomerID.
My function fn_GetNameTable is evaluated with every CustomerID from the Sales.Customer
table. When the function returns a “Customer Name”, it is joined with the “CustomerType”
column from the Sales.Customer table to produce the final output. Let’s look at
how efficient this re-write method is by reviewing the Profiler output:

Here you can see that the CROSS JOIN option is much more
efficient then using a function call within the column list. The number of I/O’s
used by using a CROSS JOIN is the same as my prior example using the JOIN
option. However, this example uses a little less CPU.

For the last example of re-writing my original poorly
performing query, I will be using a view. Here is the T-SQL code for my view:


CREATE VIEW vw_GetName
AS
SELECT LastName + ‘, ‘ + FirstName [Customer Name]
,SC.CustomerID
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID
GO

The view here looks similar to the code for the function.
This code joins together the Sales.Customer information with the Person.Contact
information, so that the “Customer Name” field can be created. Here is the
code that uses this view to return the same record set as my original example.


— Rewrite using View
SELECT V.[Customer Name]
,CustomerType
FROM Sales.Customer SC
JOIN vw_GetName V
ON SC.CustomerID = V.CustomerID
GO

If we look at the Profiler output below we can determine how
efficient this coding method is compared to the original, JOIN and CROSS JOIN
options. Here is the Profiler output for this example:

Once again using a view is much more efficient than the original
code. Here my view required the same number of I/O’s as the JOIN and CROSS
JOIN option. The view uses slightly more CPU than the other two re-write
options.

Scalar Function Can be Inefficient

You need to be careful how you use a scalar function in your
T-SQL statements. Scalar value functions when used in a column list, or WHERE
clause perform much like a cursor and are called repeatedly to resolve the
query. These repeative evalutions of the function code makes scalar value
functions inefficient when used this way. If you are using scalar value
functions in your column list or WHERE clause then you should consider
re-writing you queries to use one of the methods I demonstrated in this
article.

»


See All Articles by Columnist
Gregory A. Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles