Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Oct 29, 2009

T-SQL Best Practices - Don't Use Scalar Value Functions in Column List or WHERE Clauses

By Gregory A. Larsen

Making sure your queries run as fast as possible is important to ensure your application gets the kind of performance that users require or expect. Query performance can be affected by lots of different factors. In this article I will be introducing another best practice of how and how not to use a function within your T-SQL statements. Where you place your function within your T-SQL statements determines how your query will be processed and can severely impact the query execution plan and the performance of your query.

Types of Functions

There are different types of functions. Not all functions are created equal. Some functions read data while others do not. Some functions return multiple records while others return only a single record. For the purpose of this article, I will be discussing the use of scalar value functions and how they affect the performance of your query.

I’m sure most of you already know what a scalar value function is. But just in case you don’t I’ll provide you a definition, so we are all on the same page. A scalar value function is a function that returns a single value.

Keep in mind not all scalar functions are created equal. Some scalar value functions reference data in tables where others don’t. For the purpose this article we will be talking about scalar value functions that accept input parameters, and then use those parameters to read some data from your database, and return a value back to the calling T-SQL statement that referenced the function.

Do’s and Don’ts When Using Scalar Functions

There are probably a number of do’s and don’ts related to scalar functions. I’m going to only discuss a single aspect of using scalar value functions. In this article, I will be discussing how you reference your scalar value function within a T-SQL SELECT statement. The placement of a function will affect how the query is processed and how that can lead to poor performing queries or well performing queries.

You can place a scalar value function in many different places within a single SELECT statement. For the purpose of this article I will be discussing the impacts of placing a scalar value function within a SELECT column list and/or a WHERE clause. I will then be showing you alternatives for using a scalar function in these locations.

To demonstrate how placing a scalar value function in a column list and within the WHERE clause is a bad practice, let's look at a couple of coding examples. Here is the first SELECT statement to review:

USE AdventureWorks
GO

CREATE FUNCTION dbo.fn_GetName 
     (
       @CustomerID INT
      ) 
RETURNS VARCHAR(100)
AS
BEGIN
  DECLARE @CustomerName VARCHAR(100);
  SELECT @CustomerName =  PC.LastName + ', ' + PC.FirstName
  FROM Sales.Customer SC
  JOIN Sales.Individual SI
  ON SC.CustomerID = SI.CustomerID
  JOIN Person.Contact PC
  ON SI.ContactID = PC.ContactID 
  WHERE SC.CustomerID = @CustomerID
  RETURN @CustomerName
END
GO

SELECT dbo.fn_GetName(CustomerID) 
      ,CustomerType
FROM Sales.Customer
WHERE dbo.fn_GetName(CustomerID) IS NOT NULL
GO

If you look at this code, you can see I’m creating a function that accepts a CustomerID and then returns the LastName and FirstName concatenated together with a comma in between the last and first name of the customer. Now look at the SELECT statement and note that I am calling the function from within the column list.

If you run the first chunk of code that creates the function in one batch and than run the SELECT statement in a separate batch you will see the SELECT statement takes a little while to run. On my machine, it took around 10 seconds. To identify why this SELECT statement takes so long and why I consider this a bad practice, you need to also have Profiler running and monitoring the SP:Completed and SQL:BatchCompleted events while you run this query. Here is a screen print of my Profiler session for the tail end of the processing associated with the above SELECT statement.

If you look at this output, you can see that the SELECT statement is being run over and over again. By putting the function call in the column list and in the WHERE clause, the function is being called multiple times. You can see this above by noting all the different SP:Completed events. Also, note the CPU, Reads, Write and Duration on the SQL_BatchCompleted EventClass line at the bottom of this screenshot. This event captures the true cost of running this query. Remember these numbers. Later I will show you how to write this query differently to make it more efficient.

Now look at this code:

SELECT PC.LastName + ', ' + PC.FirstName [Customer Name]
      ,SC.CustomerType
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID 
WHERE dbo.fn_GetName(SC.CustomerID) = 'Roy, Luke'

Here I am calling the same function but this time only in the WHERE clause. If you run this code, you will see it takes a while to return data. In my case, it took 3 seconds. Once again, if we monitor this command with Profiler we see the following trace events:

Once again, by looking at this output you can see that the SELECT statement appears to be executed over and over again. This occurs because the fn_GetName function once again needs to be executed multiple times. All these extra executions of this function are causing this query to be very inefficient. The final accumulated cost for this query once again can be noted by looking at the CPU, Reads, Writes, and Duration columns on the SQL:BatchCompleted event at the bottom of the Profiler output.

There are a number of different options for re-writing this query, like using JOIN logic, an inline table value function or a view. I’ll go through each one of these options and compare the performance of each one of these against the original query above.

First, I’m going to re-write the first query above using JOIN logic. Here is my new code using the JOIN method:

SELECT PC.LastName + ', ' + PC.FirstName [Customer Name]
      ,SC.CustomerType
FROM Sales.Customer SC
JOIN Sales.Individual SI
ON SC.CustomerID = SI.CustomerID
JOIN Person.Contact PC
ON SI.ContactID = PC.ContactID 
GO

In this code I have now taken the tables used in the function and brought them into the FROM clause. I've also replaced the function call in the column list and am now just calculating the “Customer Name” column by concatenating the two Person.Contact columns directly in the column list. The profiler output below shows the execution statistics for this re-write:

By looking at the above output, you can see that my re-write only has one event displayed in Profile. Moreover, by looking at the CPU, Reads, Writes, and Duration columns you can see this re-write used significantly less resources. If we only look at the I/O savings of this re-write over the original query, you will see that I saved 130,983 I/O. That is a significant savings.

For the next re-write, I will take my original scalar value function and turn it into an inline table value function. I then will use this inline table value function in a re-written T-SQL SELECT statement that uses a CROSS APPLY operator. First, let’s review the inline table value function:

CREATE FUNCTION fn_GetNameTable(@CustomerID int)
RETURNS TABLE
AS 
RETURN (
  SELECT  LastName + ', ' +  FirstName [Customer Name]
  FROM Sales.Customer SC
  JOIN Sales.Individual SI
  ON SC.CustomerID = SI.CustomerID
  JOIN Person.Contact PC
  ON SI.ContactID = PC.ContactID 
  WHERE SC.CustomerID = @CustomerID
 )

This code is almost exactly the same as my original function code. The only difference here is this function now returns a TABLE instead of a VARCHAR column. To use this new inline table value function I will now need to use a CROSS APPLY operator. Here is my new T-SQL SELECT statement:

SELECT I.[Customer Name]
      ,SC.CustomerType
FROM Sales.Customer SC
CROSS APPLY fn_GetNameTable(SC.CustomerID) I

Here I referenced my new function within the FROM part of the statement. I use the CROSS APPLY operator to more or less join the Sales.Customer information with the results of my table value function based on CustomerID. My function fn_GetNameTable is evaluated with every CustomerID from the Sales.Customer table. When the function returns a “Customer Name”, it is joined with the “CustomerType” column from the Sales.Customer table to produce the final output. Let's look at how efficient this re-write method is by reviewing the Profiler output:

Here you can see that the CROSS JOIN option is much more efficient then using a function call within the column list. The number of I/O’s used by using a CROSS JOIN is the same as my prior example using the JOIN option. However, this example uses a little less CPU.

For the last example of re-writing my original poorly performing query, I will be using a view. Here is the T-SQL code for my view:

CREATE VIEW vw_GetName
AS
  SELECT LastName + ', ' + FirstName [Customer Name]
        ,SC.CustomerID 
  FROM Sales.Customer SC
  JOIN Sales.Individual SI
  ON SC.CustomerID = SI.CustomerID
  JOIN Person.Contact PC
  ON SI.ContactID = PC.ContactID
GO

The view here looks similar to the code for the function. This code joins together the Sales.Customer information with the Person.Contact information, so that the “Customer Name” field can be created. Here is the code that uses this view to return the same record set as my original example.

-- Rewrite using View  
SELECT V.[Customer Name] 
      ,CustomerType
FROM Sales.Customer SC
JOIN vw_GetName V
ON SC.CustomerID = V.CustomerID
GO

If we look at the Profiler output below we can determine how efficient this coding method is compared to the original, JOIN and CROSS JOIN options. Here is the Profiler output for this example:

Once again using a view is much more efficient than the original code. Here my view required the same number of I/O’s as the JOIN and CROSS JOIN option. The view uses slightly more CPU than the other two re-write options.

Scalar Function Can be Inefficient

You need to be careful how you use a scalar function in your T-SQL statements. Scalar value functions when used in a column list, or WHERE clause perform much like a cursor and are called repeatedly to resolve the query. These repeative evalutions of the function code makes scalar value functions inefficient when used this way. If you are using scalar value functions in your column list or WHERE clause then you should consider re-writing you queries to use one of the methods I demonstrated in this article.

» See All Articles by Columnist Gregory A. Larsen



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date