Db2 Version 12, the latest version of IBM’s flagship database management system for the z/OS hardware platform, includes many new and improved features specifically aimed at improving application performance. As today’s development teams are driven to implement applications at a faster pace, the DBMS must support the ability to retrieve data quickly, while at the same time reducing overall resource usage. Here is a detailed look at some modern transactional data processing issues and how Db2 meets these challenges.
The Challenge of the Data Archive
Due to governmental regulations or compliance reasons, the information technology (IT) enterprise is required to retain much of its transactional, financial and customer-related data for long periods of time. One typical requirement: financial data must be kept for a period of seven years. The data must be retained, and must also be accessible in case of legal or audit requirements.
This places capacity planners and performance tuners in a bind, because keeping this data available in operational systems increases database sizes. This in turn leads to longer query times, larger indexes that take longer to process and increased elapsed times for utilities such as database reorganizations and database backups. IT commonly prefers to purge or archive stale or old data, keeping the sizes of operational tables and indexes manageable and permitting faster application processing. However, with the requirement to retain data, how can the enterprise meet the needs of both regulations and customer expectations?
One best practice is the data archive table. This is a table defined similarly to a base table and used to store purged or archived data. Operational systems access the base table as part of normal batch and on-line processing, while the archive table remains ready for access should the need arise. However, a problem arises when the archived data is to be queried. Since the data in both tables changes over time, how can you construct an SQL query that accesses (say) data within a range of dates, without knowing in which table(s) the data may reside? Enter the Union All operator. This SQL construct logically merges the contents of two or more tables. A simple form of the syntax is as follows:
SELECT <column-list> FROM Customer-Table WHERE <selection-criteria> UNION ALL SELECT <column-list> FROM Customer-Table-Archive WHERE <selection-criteria> . . . ORDER BY <Sort-Criteria> FETCH FIRST n ROWS . . .
One of the difficulties that arises from such queries is the potential for poor performance. Archive tables may not have the requisite indexes for fast access; after all, they are rarely used. In addition, the requirement that the result set be sorted (using the Order By clause) or that only the first n rows are required (the Fetch First clause) may require the database manager to first retrieve all qualifying rows from both tables, sort them, and from that result set present only n rows to the requestor.
Db2 Version 12 provides some performance enhancements that can speed up such queries. One feature is to push the Order By and Fetch First clauses ‘down’ into each of the two subqueries. In the case of Order By, this cuts one big sort into two smaller sorts, requiring fewer resources. Also, if Fetch First is specified, then n rows can be retrieved from each of the tables (after sorting) and then merged to form a set from which the final n rows is chosen. By pushing the sort operation down into each of the subqueries, Db2 can take advantage of performance features for that table that favor the sort, such as indexes, data clustering or partitioning.
Advances in Data Compression
Data compression has been available in Db2 for several decades, beginning with Version 3. Db2 uses the common algorithm called Lempel-Ziv to provide dictionary-based, lossless compression and decompression for table data. (Variants of this algorithm are used for .GIF and .TIFF files, for .PDF files in Adobe Acrobat, and various Unix platform processes such as ARC and PKZIP.) Depending upon column data types, compression ratios of 50% – 80% are possible.
Of course, if you are storing already compressed data such as video, images or audio clips, these data are usually already compressed. Typically, such data are stored in Large Object columns (LOBs) in bit form. Processing them takes place in the application, not in the database manager. In the late 1980s a glut of new data formats for non-traditional data types appeared, including bit files from equipment monitoring, web click streams, telephone audio response clips and others related to a sudden increase in new hardware types.
Initially defined in 1998, extensible markup language (XML) provided a method for encoding data that was self-describing, and both machine- and human-readable. Although at the beginning the focus was on encoding documents, XML became popular for encoding many kinds of data structures, including web-based data. (For more information, see Reference 1.) Today, more and more data stored in relational databases is formatted with XML data types, and can be stored conveniently in LOBs. Since XML takes the form of a string of characters, data compression now becomes feasible. In addition, other types of character data are commonly stored in LOBs, including HTML (for web pages), and text files from distributed sources such as .txt files.
Db2 Version 12 provides a new ability to compress LOBs. This is different from the ability to specify data compression for an entire table, and only applies to the LOB column(s) desired. This feature uses the z/OS hardware compression feature called the zEnterprise Data Compression (zEDC) card, which uses a different compression algorithm than Lempel-Ziv. In order to take advantage of this feature there are several hardware prerequisites. Consult the Db2 performance manual for details.
Depending upon the data in the LOB, compression ratios can vary from 50 percent to as high as 90 percent or greater. Also, since the compression is hardware-assisted, very few CPU cycles are consumed during compress or decompress, leading to increases in Insert and Select throughput. DBAs can use the DSN1COMP stand-alone utility to estimate the amount of compression prior to implementing it. Note that while the data may be compressed inside the database, when it is externalized it is automatically decompressed. This requires the DBA and application developers to carefully estimate disk dataset sizes, especially for processes that unload a large number of rows.
Index Processing Gets Faster
Indexes have an internal data structure that is ‘tree-like’. This means that the first index block, or root page, contains pointers to subsequent blocks called non-leaf pages. Each of these contains pointers to lower level non-leaf pages. This continues downward until you reach the final pages, called leaf pages, that contain pointers to the actual table data.
Some queries that access multiple table rows may use an index to retrieve data. Due to the index structure, this may require multiple scans and searches of the index. Each such search requires that the relevant index root page, non-leaf pages, and leaf pages be read into memory. Multiple scans may mean multiple accesses of these pages, and may require multiple I/Os to ensure that they remain in memory. It is also possible that queries such as these will compete for resources with queries that insert, update or delete rows. Since these other queries change the index tree structure and/or the pointers on the various pages, our sample query may have to wait until updating queries complete.
Db2 Version 12 introduces a new process called Index Fast Traversal (IFT). This uses a new memory area called Index Control that is managed by the database manager, and contains high-performance versions of index pages. When a suitable query arrives from an application, Db2 analyzes it to determine if there are potential cost savings (CPU, I/O and elapsed time) if IFT is used. If so, Db2 assembles the relevant index pages in index control, and directs query processing to this new area.
Index Fast Traversal can provide up to 20 percent CPU reduction over classic index usage. In its initial implementation, IFT is limited to unique indexes with a maximum 64 byte key length.
Summary
The needs of the IT enterprise and modern application developers has caused some performance challenges. Among these are the need for extensive data retention, low-cost methods of storing large volumes of data and faster transaction speeds. Db2 Version 12 delivers several features that DBAs can employ in capacity planning and performance management to mitigate these concerns.
# # #