Function use is common in IBM DB2 SQL. However, use and misuse of functions can affect query access paths and performance. Here are a few tips to help you tune IBM DB2 queries and avoid using functions that consume precious CPU resources.
Most of the common functions fall into two groups: scalar functions and
aggregate functions.
A scalar function is defined as: "An SQL operation that produces a
single value from another value and is expressed as a function name, followed
by a list of arguments that are enclosed in parentheses." Some common
scalar functions are:
- Character Manipulation: SUBSTR, LTRIM, RTRIM, LOCATE,
LOWER, UPPER, POSSTR, LEFT, RIGHT, STRIP
- Numeric Conversion: CAST, INTEGER, DECIMAL, CHAR, DIGITS
- Date / Time Calculation: DATE, TIME, TIMESTAMP, HOUR,
MINUTE, DAYOFMONTH
Aggregate functions derive their results by using values from one or more
rows. They are also referred to as set functions. Some common aggregate
functions include MIN, MAX, SUM, and AVG.
Scalar Function Invocation Effects on Predicate Evaluation
During execution, WHERE clause predicates are evaluated in two stages:
- Stage 1
- Indexable predicates (those that can match index entries)
- Non-indexable predicates
- Stage-2
When DB2 can evaluate certain predicates at stage 1, the query that contains
the predicate takes less time to run.
Within each stage, predicates are evaluated by type:
- Equals (has an equal operator and no NOT operator; also
includes predicates like C1 IS NULL)
- Range (contains one of the following operators: >,
>=, <, <=, LIKE, or BETWEEN)
- Other
Here are some general performance rules that the DBA and developer should
know. For more details about predicates, query tuning, database monitoring and
database performance tuning see the DB2 Performance and Tuning manual reference
at the end of this article.
- Use stage 1 predicates whenever possible. Stage 1
predicates are better than stage 2 predicates because they disqualify rows
earlier and reduce the amount of processing that is needed at stage 2. In
terms of resource usage, the earlier a predicate is evaluated the better.
- Write queries to evaluate the most restrictive predicates
first. When predicates with a high filter factor are processed first,
unnecessary rows are screened as early as possible, which can reduce
processing cost at a later stage. However, a predicate's restrictiveness
is only effective among predicates of the same type and at the same
evaluation stage.
Another interesting factor is that within each type predicates are evaluated
in the order encountered in the SQL statement; so predicate order may affect performance.
For multiple predicates of the same stage and the same type you should code the
more restrictive predicate first. This allows DB2 to eliminate rows quickly
from consideration; in addition, you can reduce unnecessary CPU usage by
avoiding complex function invocations. Last, for predicates that are equally
restrictive, you can order them so that complex and CPU-intensive functions are
executed last (if at all).
Consider the following predicates, both Stage 2, both equals predicates:
SUBSTR(TYPE_CODE,2,1) = '5'
AND MONTH(PER_DATE) = MONTH(START_DATE)
You should code the most restrictive predicate first. However, if these two
predicates are equally restrictive, then code them in the order shown. In
general, a single invocation of the simple SUBSTR function will consume less
CPU than the two invocations of MONTH.
Another alternative would be to re-code the first predicate as follows:
TYPE_CODE LIKE '_5%'
This is because the LIKE operator is Indexable and Stage 1.
Re-Coding Predicates in a More Efficient Form
The previous example of re-coding a SUBSTR predicate with LIKE shows that
there may be multiple ways of coding the identical predicate logic. Here are a
few ideas.
Re-code initial substrings as either LIKE or BETWEEN to remove function
invocation overhead and make predicate stage 1:
old SUBSTR(TYPE_CODE,2,1) = '2'
new TYPE_CODE LIKE '2%'
or, TYPE CODE BETWEEN '2 ' and '2999'
Re-code arithmetic expressions involving column values so that the column
name appears alone on the left side of the expression. In the example below,
:hv1 is a host variable containing a percentage:
old SALARY + :hv1 * SALARY > 50000
new SALARY > 50000 + 50000 / :hv1
or, SALARY > 50000 / (1 + :hv1)
Although numerically equivalent to the first predicate, the first change
above will cause the SQL statement to fail if the value of :hv1 is equal to
zero; hence, the second form would be preferred.
old HIRE_DATE - 1 DAYS < CURRENT DATE
new HIRE_DATE < CURRENT DATE + 1 DAYS
Here, the date comparison remains the same, but the HIRE_DATE column appears
by itself on the left of the "<" operator, making this a stage 1
predicate.
One last point on using SUBSTR. A major disadvantage of using this function
in a predicate is that DB2 cannot use data distribution statistics to estimate
the number of rows that may qualify. The optimizer will use a default filter
factor, perhaps leading to an inefficient access path.
In cases where you are able to develop multiple equivalent predicates, I
recommend executing Explains for all of them to determine whether the access
path chosen by DB2 is efficient.
Aggregate Function Evaluation
If your query involves aggregate functions such as MAX and SUM, you can
improve query performance by coding the SQL so that the functions are evaluated
during data access, rather than later after sorting or data retrieval.
Use EXPLAIN to determine when DB2 evaluates the aggregate functions. In the
PLAN_TABLE, column COLUMN_FN_EVAL shows when an SQL aggregate function is
evaluated:
- R While the data is being read from the table or index
- S While performing a sort to satisfy a GROUP BY clause
- blank After data retrieval and after any sorts
Code the query so that every aggregate function that it contains meets the
following criteria:
- Ensure that no sort is needed due to a GROUP BY. Execute
an Explain, see columns SORTN_GROUPBY and SORTC_GROUPBY.
- Ensure that there are no stage 2 predicates.
- Do not use the DISTINCT function.
- For a Join, ensure that the aggregate function is on the
last table joined. Execute an Explain, see column PLANNO.
- The aggregate function(s) are executed on single columns,
with no arithmetic expressions.
- The aggregate function may not be one of the following:
STDDEV, STDDEV_SAMP, VAR, VAR_SAMP.
Summary
Functions provide a convenient way to manipulate data. However, they also
consume CPU time and sometimes prevent the optimizer from selecting an
efficient access path. Develop alternatives and use the Explain facility to
avoid wasting resources.
Additional Resources
IBM • DB2 V9.1
Performance Monitoring and Tuning Guide SC18-9851; (see Chapter 16. Tuning your
queries)
IBM • IBM Information
Center - Performance
Sheryl Larsen Top Ten
SQL Performance Tips
IBM developerworks Tuning DB2 SQL Access Paths
MyDeveloperConnecton SQL Performance
and Tuning
IBM Systems Magazine An Intuitive
Approach to DB2 for z/OS SQL Query Tuning
»
See All Articles by Columnist
Lockwood Lyon