Version 12 of IBM’s Db2 for z/OS (Db2 12 for z/OS) contains many new features and functions for DBAs and application developers. This latest Db2 version emphasizes performance enhancements across a wide variety of areas, including enhanced on-line transaction performance, improved efficiency for analytical query access paths, removal of some size limitations and exploitation of operating system and storage hardware characteristics.
Exploiting Large Real Memory
It is not uncommon to have mainframe hardware configured with multiple terabytes of memory, with several hundred gigabytes allocated to each Db2 subsystem. This is due to several factors, including the following.
Reduced costs. Over the last decade the price of real memory has gone down while access speeds have increased, making memory more cost-competitive for data storage over hard media such as disk.
Manage software license charges. Many vendors license their mainframe software based on a system called monthly license charge (MLC) wherein one or more characteristics of the system workload are averaged to compute monthly charges. The main component of the MLC is CPU usage. In many cases shops can use additional memory to reduce CPU usage, resulting in a net lowering of costs.
Low-cost performance upgrade. In cases where shops need to increase transaction throughput or reduce batch turnaround time, adding memory can be cheaper and easier than purchasing more disk drives or upgrading CPUs.
The largest Db2 blocks of memory are the buffer pools that contain pages of data read from disk representing table and index data. With the advent of larger available real memory there are several Db2 12 features that can use memory to reduce CPU usage. One of these is in-memory contiguous buffer pools that are marked as non-pageable. This permits Db2 to use a new direct page access algorithm that can avoid using hash chains and least-recently-used chains, which require CPU resources to create and maintain.
Large or even extra-large buffer pools permit DBAs and developers to coordinate having certain data objects (tables and indexes) to be read into memory and remain there while Db2 is up. So-called “memory resident” objects may be reference tables or frequently used indexes that are relatively static and accessed by many applications and transactions. Being resident in memory, Db2 does not need to read these data from disk based on query access, again saving the CPU required to execute read I/Os.
Workloads that do Massive Data Inserts
Another feature in Db2 12 for z/OS is called insert algorithm 2. It comes into play for some workloads that used an SQL insert function to insert data into a table. Some of the byproducts of using SQL to insert large amounts of data include the following.
Increased index maintenance. Index pages contain entries that match index keys to data pages. As rows are inserted into a table, the corresponding index pages require inserts of key data. When an index page fills it must be ‘split’ into two pages with half of the entries of each page to ensure that more entries can be inserted easily. While the original page can contain half of these entries, Db2 must spend elapsed time, CPU and sometimes I/Os to find an available empty page in the index dataset to use for the second half of the entries. Meanwhile, until the split process completes the indexes entries are usually locked, preventing other SQL statements from accessing them.
Increased logging of index changes. As tables and indexes change the changes are logged in the Db2 log datasets. While a change to a single column of a single row will generate a single log record, massive inserts will generate log records for the new rows inserted in the table, the new keys entered in the index, and all of the changes caused by the page splits mentioned above. As before, this additional work consumes elapsed time, CPU resources and additional I/Os.
Disk mirroring to disaster recovery (DR) site. Tables associated with operational systems must be recoverable in the event of a disaster. One common method of ensuring recoverability (especially with large tables or a big data environment) is to implement physical copying of disk dataset changes to disk hardware at the disaster site. IBM customers may use peer-to-peer remote copy (PPRC) for this, or an alternative method of taking data and index changes at the primary site and sending them to the remote site to be applied to local objects there.
In the case of an application doing massive SQL inserts, not only must table changes be copied to the DR site but index changes as well, including the splits and structure changes noted above. The result is an increase in the amount of data that must be sent to the DR site as well as the increased time needed to do so.
The new insert algorithm 2 applies to universal table spaces created with the MEMBER CLUSTER keyword. It significantly reduces constraints caused by index page split and space search.
Index Traversal Enhancement
Some SQL queries specify index access paths that cause searching of multiple index pages. This can happen when specifying filters such as data value ranges or searching for specific data values that are non-unique. If such indexes are frequently used, the I/Os resulting from reading and re-reading index pages can delay transactions and increase CPU usage. A new feature in Db2 12 for z/OS is the index fast traverse block (FTB). This is a memory structure that is configured for fast index data lookups. FTBs are not stored in the buffer pools; instead, they are stored in a separate memory area. Currently they can be used only for unique indexes with key sizes of 64 bytes or less.
Db2 will monitor index usage and automatically decide whether a frequently used index qualifies for an FTB. If so, Db2 asynchronously creates the FTB and populates it. Subsequent queries whose access paths reference that index may make use of the FTB.
New SQL Syntax for Pagination
A common application requirement is to display multiple table rows on an application screen. As screens are physically limited (at least as far as the user’s attention is concerned), the application developer must choose how much data should be retrieved from tables, how much should be displayed to the user, and how to handle a user request that ‘scrolls up or down’. One example would be a screen that lists parts with a certain characteristic. If there are thousands of such parts should the application retrieve all of them? If so, and the user only needs to see the first few parts, you have wasted resources. Alternatively, if you only access tables for the number of parts that fit on the screen, when the user scrolls for more parts you must re-access the tables for additional data.
Many application designers take a middle-of-the-road approach by defining the average number of screens that a user would request and fetching data that would populate those screens. If the user scrolls, the data has already been fetched and cached and can be displayed quickly. If a single screen satisfies the user’s needs, only a small amount of resources was wasted retrieving the unneeded data.
Coding SQL statements that retrieve only a certain amount of rows can sometimes be complex, particularly if the data has a multi-part key. To solve this, Db2 12 for z/OS implements a new form of the SELECT statement that can limit the number of rows fetched. For example, consider an application that lists customers by name in alphabetical order. The first screen of twenty customers has been displayed, beginning with ‘John Smith’ and the user requests the ‘next’ page of customers. A typical SQL statement to fetch the next twenty customers might be:
SELECT LastName, FirstName FROM Customer_TableA
OFFSET 20 ROWS
FETCH FIRST 20 ROWS ONLY
WHERE (LastName, FirstName) >= (‘Smith’, ‘John’)
DRDA Fast Load
Many IT systems have processes that must load lots of data quickly into multiple tables. Big data applications and the data warehouse are prime examples. Data from operational systems must be extracted, transformed and loaded (ETL) on a regular basis. In many cases the final table population step is done with the Db2 Load utility.
The Load utility performs very well. In the z/OS environment it is typically used to load a table from a mainframe dataset co-located with the Db2 subsystem containing the table being loaded. Years ago, this was not seen as a limitation; however, with the advent of federated data systems it has become more and more common for operational systems to be located remotely. ETL then consists of multiple data copy steps from system to system that requires extra disk space and CPU to perform.
Db2 12 for z/OS implements a remote load process using a Db2 client application called DRDA Fast Load. The application runs on the remote system, captures the data there and sends the data directly to the mainframe for processing by the Load utility. This eliminates the CPU and I/Os required for creating and maintaining an extra dataset copy.
IBM’s Db2 12 for z/OS continues a pattern of new versions concentrating on substantial performance improvements coupled with reductions in resource requirements. The advent of relatively low-cost real memory allows IT support to balance resource constraints, perhaps mitigating CPU, I/O or elapsed time issues by increasing real memory. Changes to index internal structures and a new index traversal feature will speed up transactional and analytical queries. New SQL extensions make application coding of list processing easier. Finally, a new table load facility allows simple and direct population of a mainframe table from a remote dataset.
See all articles by Lockwood Lyon