Database Design Techniques
There are two distinct methods for physically partitioning data. Horizontal partitioning is the most common data partitioning option. With this technique, the database designer physically separates the data into multiple datasets. Each dataset is a partition, and each contains rows within a particular key range. Common key ranges are by day or month or year, by geographic location, or simply by a subset of key ranges. This method results in several advantages.
SQL queries that access multiple data rows having a specific key range will now find that required data within a single partition or a few physically adjacent partitions. Without this arrangement, the query would likely have to scan the entire table to find all the required data rows.
Another advantage is the common separation between data loads and application access. One partition can be in the process of being loaded, perhaps with today's data, while other partitions are being queried. Thus, contention between applications and load processes is minimized.
A second common horizontal partitioning scheme is called active/inactive. With this method, the database designer creates two physical partitions. At any one time, only one of the partitions is deemed active, the other passive. All querying is against data in the active partition.
This allows a separate process to load data into the inactive partition without disturbing applications. Once the data load is complete, the load process switches the flag indicating which partition is currently active, thus reversing the roles of the partitions.
The other physical partitioning technique is called vertical partitioning. Here, the database designer analyzes the column data in a table to determine the common data access patterns to specific columns. The table data is then split into two or more separate and distinct tables, each with the original table's key.
One idea is to separate columns by frequency of access. Frequently-used columns are placed in the first tables, infrequently used columns in the second. This can greatly increase query performance, since SQL accessing frequently-used columns now accesses only one of the tables, which reduces total data access. This is usually accompanied by creating an index on the first table that contains all of its columns, so that an index-only access path results.
Another idea is to split off large text or binary data columns (so-called large objects, or LOBs, that may contain image or video data). Such data can be extremely large and if used infrequently is best stored in a separate table.
Avoid Hot Spots
A hot spot is a physical place in the database where a lot of data access is happening. The worst situation is when a lot of inserts, updates, deletes, and selects are happening to a set of rows in one location. A flood of inserted rows requires DB2 to find space to store the data, eating up embedded free space. Simultaneous rows updates and selects can lead to deadlocks and timeouts. Hot spots are common in databases accessed by distributed applications.
During database design there are several useful techniques for avoiding these.
First, take care to analyze how new rows are added to the table. For example, if new rows are assigned ascending key values, the physical end of the table becomes a hot spot as new rows are inserted there.
Next, review table clustering. Clustering means storing data rows with similar keys in nearly adjacent physical areas. A common example is when transaction data is keyed by date, and rows with the same date are then stored near each other. This is an advantage for querying, especially if the SQL accesses transaction rows within a specific date range. However, such clustering also means that an SQL update to a row will result in locking of nearby rows (specifically, rows on adjacent pages assuming that DB2 is doing page locking). Such locking may result in slower data access as an entire key range becomes locked.
To avoid these problems, the database designer can embed free space for rows to be inserted, and can schedule frequent reorgs to re-distribute rows. It may also be possible to specify data partitioning or clustering so that new row inserts will be spread evenly across the physical table. The DBA will also coordinate with the data owners to ensure that old or unused data is regularly purged or archived.
Reorg Utility Scheduling
DB2 has evolved to the point where utility execution (copy, runstats, and so forth) rarely contends with application data access. This is primarily due to the sharelevel change options added to the utilities, which allows concurrent update to the data during utility execution.
One exception to this occurs during a tablespace reorganization. At the end of the reorganization process, the utility requires access to the table called a drain that prevents new SQL statement execution and waits for existing claims to be released.
A reorg using drains waits for each application accessing the shared data to reach a commit point. Once there, each application is quiesced, and cannot make new claims. When all claims are gone, the utility now has exclusive control of the shared data. The quiesced applications continue to hold transaction locks.
This means that in order to complete successfully the Reorg utility requires a drain on the table being reorged. This necessitates that applications be designed to commit at appropriate points.
Of course, if you have a lot of data to load into a table it may take a long while. Some of the partitioning schemes mentioned earlier mitigate this somewhat, as it may be possible to load multiple partitions simultaneously. Still, many IT organizations must support application access during the entire 24-hour day. This means that data load processes must co-exist with application data access.
Application Coding Techniques
DB2 features transactional locks on data as a way of managing concurrent access. Multiple applications can read data (tables, indexes) concurrently by acquiring share locks, while an updating application (inserting, updating, or deleting rows) had to acquire an exclusive lock.
Deadlock occurs when two or more applications contend for resources in such a way where none of them can proceed. The simplest example is when two applications contend for resources.
Application X has updated resource A and is waiting to access resource B.
Meanwhile, Application B has already updated resource B and is waiting to access A.
Each application is on hold waiting for a resource that is already exclusively used by the other. They will hold forever, or until DB2 resolves the conflict by abending and rolling back one of the applications.
The effect on the rolled-back application (called the victim) is immediate. All of its recent updates are rolled back to the most recent commit point, and the application is given an error SQL code. At this point, it is the responsibility of the application to deal with the situation, either by issuing an error message, failing the entire application, or restarting or rerunning the failed work
To prevent this from happening, consider the following application design ideas.
- If possible, consolidate all SQL access to a single time during application execution. Avoid executing some SQL, doing additional in-program work, then more SQL. The longer the period of time a transaction or unit-of-work takes, the greater the chance of locking out another application.
- Review query-only applications and consider using uncommitted read. This option allows the application to read while acquiring few locks, but at the risk of reading uncommitted data.
Commit frequently, based on a completed, consistent unit of work. The COMMIT statement releases most locks and provides a recovery point. The primary reason for commit processing from the application's point of view is to set a logical point for recovery. In these cases, the application design is predicated on the ability of the database management system to return the database to a consistent state by rolling back recent changes to the previous commit point. By keeping transactions short, the time that locks are held is reduced.
Application data contention can lead to slower perceived response times, deadlocks, timeouts, and transaction failures. Consider the techniques presented here to avoid issues and increase application quality.
Dr. Gerardo Pardo-Castellote A Data-Centric Approach to Distributed Application Architecture March 28, 2007
IBM DB2 9 for z/OS Performance Topics 2012
IBM DB2 for z/OS DB2 database design: Principles of peak performance 2012
IBM Managing DB2 Performance 2012
See all articles by Lockwood Lyon