T-SQL Best Practices - Don't Use Scalar Value Functions in Column List or WHERE Clauses
October 29, 2009
Making sure your queries run as fast as possible is important to ensure your application gets the kind of performance that users require or expect. Query performance can be affected by lots of different factors. In this article I will be introducing another best practice of how and how not to use a function within your T-SQL statements. Where you place your function within your T-SQL statements determines how your query will be processed and can severely impact the query execution plan and the performance of your query.
Types of Functions
There are different types of functions. Not all functions are created equal. Some functions read data while others do not. Some functions return multiple records while others return only a single record. For the purpose of this article, I will be discussing the use of scalar value functions and how they affect the performance of your query.
Im sure most of you already know what a scalar value function is. But just in case you dont Ill provide you a definition, so we are all on the same page. A scalar value function is a function that returns a single value.
Keep in mind not all scalar functions are created equal. Some scalar value functions reference data in tables where others dont. For the purpose this article we will be talking about scalar value functions that accept input parameters, and then use those parameters to read some data from your database, and return a value back to the calling T-SQL statement that referenced the function.
Dos and Donts When Using Scalar Functions
There are probably a number of dos and donts related to scalar functions. Im going to only discuss a single aspect of using scalar value functions. In this article, I will be discussing how you reference your scalar value function within a T-SQL SELECT statement. The placement of a function will affect how the query is processed and how that can lead to poor performing queries or well performing queries.
You can place a scalar value function in many different places within a single SELECT statement. For the purpose of this article I will be discussing the impacts of placing a scalar value function within a SELECT column list and/or a WHERE clause. I will then be showing you alternatives for using a scalar function in these locations.
To demonstrate how placing a scalar value function in a column list and within the WHERE clause is a bad practice, let's look at a couple of coding examples. Here is the first SELECT statement to review:
USE AdventureWorks GO CREATE FUNCTION dbo.fn_GetName ( @CustomerID INT ) RETURNS VARCHAR(100) AS BEGIN DECLARE @CustomerName VARCHAR(100); SELECT @CustomerName = PC.LastName + ', ' + PC.FirstName FROM Sales.Customer SC JOIN Sales.Individual SI ON SC.CustomerID = SI.CustomerID JOIN Person.Contact PC ON SI.ContactID = PC.ContactID WHERE SC.CustomerID = @CustomerID RETURN @CustomerName END GO SELECT dbo.fn_GetName(CustomerID) ,CustomerType FROM Sales.Customer WHERE dbo.fn_GetName(CustomerID) IS NOT NULL GO
If you look at this code, you can see Im creating a function that accepts a CustomerID and then returns the LastName and FirstName concatenated together with a comma in between the last and first name of the customer. Now look at the SELECT statement and note that I am calling the function from within the column list.
If you run the first chunk of code that creates the function in one batch and than run the SELECT statement in a separate batch you will see the SELECT statement takes a little while to run. On my machine, it took around 10 seconds. To identify why this SELECT statement takes so long and why I consider this a bad practice, you need to also have Profiler running and monitoring the SP:Completed and SQL:BatchCompleted events while you run this query. Here is a screen print of my Profiler session for the tail end of the processing associated with the above SELECT statement.
If you look at this output, you can see that the SELECT statement is being run over and over again. By putting the function call in the column list and in the WHERE clause, the function is being called multiple times. You can see this above by noting all the different SP:Completed events. Also, note the CPU, Reads, Write and Duration on the SQL_BatchCompleted EventClass line at the bottom of this screenshot. This event captures the true cost of running this query. Remember these numbers. Later I will show you how to write this query differently to make it more efficient.
Now look at this code:
SELECT PC.LastName + ', ' + PC.FirstName [Customer Name] ,SC.CustomerType FROM Sales.Customer SC JOIN Sales.Individual SI ON SC.CustomerID = SI.CustomerID JOIN Person.Contact PC ON SI.ContactID = PC.ContactID WHERE dbo.fn_GetName(SC.CustomerID) = 'Roy, Luke'
Here I am calling the same function but this time only in the WHERE clause. If you run this code, you will see it takes a while to return data. In my case, it took 3 seconds. Once again, if we monitor this command with Profiler we see the following trace events:
Once again, by looking at this output you can see that the SELECT statement appears to be executed over and over again. This occurs because the fn_GetName function once again needs to be executed multiple times. All these extra executions of this function are causing this query to be very inefficient. The final accumulated cost for this query once again can be noted by looking at the CPU, Reads, Writes, and Duration columns on the SQL:BatchCompleted event at the bottom of the Profiler output.
There are a number of different options for re-writing this query, like using JOIN logic, an inline table value function or a view. Ill go through each one of these options and compare the performance of each one of these against the original query above.
First, Im going to re-write the first query above using JOIN logic. Here is my new code using the JOIN method:
SELECT PC.LastName + ', ' + PC.FirstName [Customer Name] ,SC.CustomerType FROM Sales.Customer SC JOIN Sales.Individual SI ON SC.CustomerID = SI.CustomerID JOIN Person.Contact PC ON SI.ContactID = PC.ContactID GO
In this code I have now taken the tables used in the function and brought them into the FROM clause. I've also replaced the function call in the column list and am now just calculating the Customer Name column by concatenating the two Person.Contact columns directly in the column list. The profiler output below shows the execution statistics for this re-write:
By looking at the above output, you can see that my re-write only has one event displayed in Profile. Moreover, by looking at the CPU, Reads, Writes, and Duration columns you can see this re-write used significantly less resources. If we only look at the I/O savings of this re-write over the original query, you will see that I saved 130,983 I/O. That is a significant savings.
For the next re-write, I will take my original scalar value function and turn it into an inline table value function. I then will use this inline table value function in a re-written T-SQL SELECT statement that uses a CROSS APPLY operator. First, lets review the inline table value function:
CREATE FUNCTION fn_GetNameTable(@CustomerID int) RETURNS TABLE AS RETURN ( SELECT LastName + ', ' + FirstName [Customer Name] FROM Sales.Customer SC JOIN Sales.Individual SI ON SC.CustomerID = SI.CustomerID JOIN Person.Contact PC ON SI.ContactID = PC.ContactID WHERE SC.CustomerID = @CustomerID )
This code is almost exactly the same as my original function code. The only difference here is this function now returns a TABLE instead of a VARCHAR column. To use this new inline table value function I will now need to use a CROSS APPLY operator. Here is my new T-SQL SELECT statement:
SELECT I.[Customer Name] ,SC.CustomerType FROM Sales.Customer SC CROSS APPLY fn_GetNameTable(SC.CustomerID) I
Here I referenced my new function within the FROM part of the statement. I use the CROSS APPLY operator to more or less join the Sales.Customer information with the results of my table value function based on CustomerID. My function fn_GetNameTable is evaluated with every CustomerID from the Sales.Customer table. When the function returns a Customer Name, it is joined with the CustomerType column from the Sales.Customer table to produce the final output. Let's look at how efficient this re-write method is by reviewing the Profiler output:
Here you can see that the CROSS JOIN option is much more efficient then using a function call within the column list. The number of I/Os used by using a CROSS JOIN is the same as my prior example using the JOIN option. However, this example uses a little less CPU.
For the last example of re-writing my original poorly performing query, I will be using a view. Here is the T-SQL code for my view:
CREATE VIEW vw_GetName AS SELECT LastName + ', ' + FirstName [Customer Name] ,SC.CustomerID FROM Sales.Customer SC JOIN Sales.Individual SI ON SC.CustomerID = SI.CustomerID JOIN Person.Contact PC ON SI.ContactID = PC.ContactID GO
The view here looks similar to the code for the function. This code joins together the Sales.Customer information with the Person.Contact information, so that the Customer Name field can be created. Here is the code that uses this view to return the same record set as my original example.
-- Rewrite using View SELECT V.[Customer Name] ,CustomerType FROM Sales.Customer SC JOIN vw_GetName V ON SC.CustomerID = V.CustomerID GO
If we look at the Profiler output below we can determine how efficient this coding method is compared to the original, JOIN and CROSS JOIN options. Here is the Profiler output for this example:
Once again using a view is much more efficient than the original code. Here my view required the same number of I/Os as the JOIN and CROSS JOIN option. The view uses slightly more CPU than the other two re-write options.
Scalar Function Can be Inefficient
You need to be careful how you use a scalar function in your T-SQL statements. Scalar value functions when used in a column list, or WHERE clause perform much like a cursor and are called repeatedly to resolve the query. These repeative evalutions of the function code makes scalar value functions inefficient when used this way. If you are using scalar value functions in your column list or WHERE clause then you should consider re-writing you queries to use one of the methods I demonstrated in this article.