Dynamic SQL Performance Tuning Techniques for IBM DB2 z/OS
December 20, 2010
Applications running with dynamic SQL are ever-increasing. Without some methods to contain the onslaught of queries, performance drops and phone calls increase. What can a DBA do to remain sane and avoid a visit from the head of IT?
This article addresses common issues encountered in performance of applications containing dynamic SQL, and how to avoid them. While the primary focus is distributed applications that access IBM DB2 for z/OS on the mainframe, most of these bottlenecks and their solutions arise with other DBMSs as well.
Dynamic SQL and the Distributed Environment
The mainframe-centric DBA tends to think of a distributed application as one executing on an application server (usually in the enterprise's intranet) that accesses DB2 data on the mainframe. The application itself is usually coded in Java, although other languages and environments also exist.
Access to DB2 data can be accomplished in several ways. These different methods vary in their performance characteristics, maintainability, and security, and are usually a function of whether the application is built in-house or purchased. In this article, I will focus on distributed applications that mainly execute dynamic SQL.
Common tuning strategies available to the DBA include:
In a distributed environment, application tuning is the best strategy. The other strategies tend to involve in-depth technical knowledge or advanced performance reporting tools.
Being Proactive About Monitoring
The DBA (or other responsible party) should set up specific active and historical performance monitoring reports for each application. In some cases, this should include detail down to specific SQL statements. Several standard reports are extremely helpful. These top-N reports contain lists of applications and SQL statements (by environment) that consume large amounts of resources:
Simply executing the reports isn't enough. They must be relevant to a business unit that has a vested interest in performance, they should be regularly reviewed, and the process that produces them may need to be updated.
For example, you may find that some of your SQL statements responsible for a large number of I/Os are already optimally tuned. This may occur for analytical queries, or those associated with a data warehouse. These queries that are statistically different from others (or significantly change over time) need to be considered separately.
On the other hand, a query that suddenly shows up as long-running begs the question: Why did this happen? The query that used to run for two seconds and now runs for two hours should be assigned to the programming group for analysis. The query then needs to be documented or marked in some way so it doesn't re-appear on the report.
The next step is to regularly execute these reports and store the results in a repository (such as a database) where they can be queried. Make this information easily available to the business unit so they can check on their own problems that crop up. This allows for convenient historical analysis, especially if you're making multiple performance enhancements and want to correlate their effects.
Application Tuning for Dynamic SQL
The most common performance-related issues encountered using dynamic SQL are an overabundance of locks and poor data access patterns. Here are some points to ponder.
Understand the difference between locks and claims. The DBA needs to understand how and why DB2 uses transaction locks for concurrent control and how claims are used for SQL serialization control.
Carefully consider using uncommitted read access for SELECT statements. It is possible, and sometimes desirable, to SELECT data from a database without acquiring locks on that data. This can be accomplished at the package level with the appropriate bind option, or on a statement by statement basis using the WITH UR parameter. This option lets the application read while acquiring few locks, but at the risk of reading uncommitted data.
Know the lock sizes of the objects and how application SQL will access them. In a distributed environment transaction, elapsed time is often a concern. To avoid deadlocks and timeouts on some tables, the DBA will consider row-level locking rather than page-level locking. The effect of this reduced lock granularity is to minimize the transaction "locking footprint." However, there are several side effects, including additional CPU spent on lock management and an increase in the absolute number of locks the transaction holds. These must be taken into account during database design and application design phases.
Coordinate application transaction execution with required infrastructure jobs. DB2 has evolved to the point where utility execution (copy, reorg, and so forth) rarely contends with application data access. This is primarily due to the Sharelevel Change options added to the utilities, which allow concurrent update to the data during utility execution. One exception occurs during a table space reorganization. At the end of the reorganization process, the utility requires access to the table called a drain that prevents new SQL statement execution and waits for existing claims to be released.
As the DB2 for z/OS Administration Guide, V9 states: "The drain quiesces the applications by allowing each one to reach a commit point, but preventing any of them, or any other applications, from making a new claim. When no more claims exist, the process that drains (the drainer) controls access to the drained object. The applications that were drained can still hold transaction locks on the drained object, but they can't make new claims until the drainer has finished."
This means that to successfully complete, the Reorg utility requires a drain on the table being reorged. This necessitates that applications be designed to commit at appropriate points.
Data Access and Its Effects
It's important to understand how data volatility affects the database. The "hot spot" is a physical place in the database where a lot of data access is happening. The worst situation is when a lot of inserts, updates, deletes, and selects are happening to a set of rows in one location. A flood of inserted rows requires DB2 to find space to store the data, eating up embedded free space. Simultaneous updates and selects to rows can lead to deadlocks and timeouts. Hot spots are common in databases accessed in a distributed environment.
Usually, hot spots can be predicted (and perhaps avoided) during database design. Take the case of an account table where new rows are assigned new, ascending account numbers. If the table is clustered by account number, the physical "end" of the table now becomes a hot spot where all new rows will be inserted.
Another example might be a table keyed by date, where all transactions for a particular date range are clustered. Some hot spots can be avoided, or at least mitigated, during database design. The DBA can embed free space for rows to be inserted, and can schedule frequent reorgs to redistribute rows. It may be possible to physically cluster data in ways that spread new row inserts evenly across the table. The DBA also can coordinate with the data owners to ensure old or unused data is regularly purged or archived.
Tables of ever-increasing size usually result in longer query execution times; this drives the need for a purge or data archive process. There are several database design alternatives that may make this easier. One common possibility is to partition data so you can purge by partition. For example, a table keyed by date where the chronologically oldest data is to be purged can be partitioned by date range. The purge process might then be designed to simply empty the oldest partition, rather than execute costly SQL delete statements to remove the rows. Such designs also lend themselves to data archival processes.
Dynamic SQL in a distributed environment presents several performance challenges to the application team and DBA. Typical issues tend to be due to delays associated with waiting for locks, access path issues, or other resource constraints. Application tuning in the distributed environment should proactively concentrate on good database design to avoid hot spots, monitoring to detect problems, regular performance reporting, and trend analysis.
IBM DB2 V9 for z/OS Administration Guide March 2010 Copr. 2010 IBM