The Effect of Function Invocation on SQL Query Performance in IBM DB2
September 17, 2010
Function use is common in IBM DB2 SQL. However, use and misuse of functions can affect query access paths and performance. Here are a few tips to help you tune IBM DB2 queries and avoid using functions that consume precious CPU resources.
Most of the common functions fall into two groups: scalar functions and aggregate functions.
A scalar function is defined as: "An SQL operation that produces a single value from another value and is expressed as a function name, followed by a list of arguments that are enclosed in parentheses." Some common scalar functions are:
Aggregate functions derive their results by using values from one or more rows. They are also referred to as set functions. Some common aggregate functions include MIN, MAX, SUM, and AVG.
Scalar Function Invocation Effects on Predicate Evaluation
During execution, WHERE clause predicates are evaluated in two stages:
When DB2 can evaluate certain predicates at stage 1, the query that contains the predicate takes less time to run.
Within each stage, predicates are evaluated by type:
Here are some general performance rules that the DBA and developer should know. For more details about predicates, query tuning, database monitoring and database performance tuning see the DB2 Performance and Tuning manual reference at the end of this article.
Another interesting factor is that within each type predicates are evaluated in the order encountered in the SQL statement; so predicate order may affect performance. For multiple predicates of the same stage and the same type you should code the more restrictive predicate first. This allows DB2 to eliminate rows quickly from consideration; in addition, you can reduce unnecessary CPU usage by avoiding complex function invocations. Last, for predicates that are equally restrictive, you can order them so that complex and CPU-intensive functions are executed last (if at all).
Consider the following predicates, both Stage 2, both equals predicates:
SUBSTR(TYPE_CODE,2,1) = '5' AND MONTH(PER_DATE) = MONTH(START_DATE)
You should code the most restrictive predicate first. However, if these two predicates are equally restrictive, then code them in the order shown. In general, a single invocation of the simple SUBSTR function will consume less CPU than the two invocations of MONTH.
Another alternative would be to re-code the first predicate as follows:
TYPE_CODE LIKE '_5%'
This is because the LIKE operator is Indexable and Stage 1.
Re-Coding Predicates in a More Efficient Form
The previous example of re-coding a SUBSTR predicate with LIKE shows that there may be multiple ways of coding the identical predicate logic. Here are a few ideas.
Re-code initial substrings as either LIKE or BETWEEN to remove function invocation overhead and make predicate stage 1:
old SUBSTR(TYPE_CODE,2,1) = '2' new TYPE_CODE LIKE '2%' or, TYPE CODE BETWEEN '2 ' and '2999'
Re-code arithmetic expressions involving column values so that the column name appears alone on the left side of the expression. In the example below, :hv1 is a host variable containing a percentage:
old SALARY + :hv1 * SALARY > 50000 new SALARY > 50000 + 50000 / :hv1 or, SALARY > 50000 / (1 + :hv1)
Although numerically equivalent to the first predicate, the first change above will cause the SQL statement to fail if the value of :hv1 is equal to zero; hence, the second form would be preferred.
old HIRE_DATE - 1 DAYS < CURRENT DATE new HIRE_DATE < CURRENT DATE + 1 DAYS
Here, the date comparison remains the same, but the HIRE_DATE column appears by itself on the left of the "<" operator, making this a stage 1 predicate.
One last point on using SUBSTR. A major disadvantage of using this function in a predicate is that DB2 cannot use data distribution statistics to estimate the number of rows that may qualify. The optimizer will use a default filter factor, perhaps leading to an inefficient access path.
In cases where you are able to develop multiple equivalent predicates, I recommend executing Explains for all of them to determine whether the access path chosen by DB2 is efficient.
Aggregate Function Evaluation
If your query involves aggregate functions such as MAX and SUM, you can improve query performance by coding the SQL so that the functions are evaluated during data access, rather than later after sorting or data retrieval.
Use EXPLAIN to determine when DB2 evaluates the aggregate functions. In the PLAN_TABLE, column COLUMN_FN_EVAL shows when an SQL aggregate function is evaluated:
Code the query so that every aggregate function that it contains meets the following criteria:
Functions provide a convenient way to manipulate data. However, they also consume CPU time and sometimes prevent the optimizer from selecting an efficient access path. Develop alternatives and use the Explain facility to avoid wasting resources.
IBM • DB2 V9.1 Performance Monitoring and Tuning Guide SC18-9851; (see Chapter 16. Tuning your queries)
Sheryl Larsen Top Ten SQL Performance Tips
IBM developerworks Tuning DB2 SQL Access Paths
MyDeveloperConnecton SQL Performance and Tuning
IBM Systems Magazine An Intuitive Approach to DB2 for z/OS SQL Query Tuning