Big data applications were once limited to hybrid hardware/software platforms. Now, recent advances are allowing applications like these to be integrated with and federated into operational systems. In particular, IBM’s DB2 for z/OS Version 12 (DB2 V12) delivers new features and functions that allow the DBAs to design, define and implement very large databases and business intelligence query platforms that fulfill some big data expectations. Such applications can then provide value to the enterprise without expending money and time on specialized infrastructure, and can also be the first step towards building out one or more true big data applications.
Dynamic SQL Plan Stability
Many operational systems access DB2 through SQL statements that are hard-coded and never change. These are called static SQL statements. During the program preparation process called “bind”, these static SQL statements are read and stored in DB2 along with the optimal data access path. During execution, DB2 uses the pre-determined access path of the SQL statement. The DBA regularly reviews these stored statements and the corresponding access paths in order to do performance analysis.
Contrast this with SQL statements that arrive at DB2 in an ad hoc fashion. These SQL statements are created real-time in applications or by query tools, and are termed dynamic SQL. DB2 is forced to parse these SQL statements on arrival, determine the optimal access path and then execute the query. This leads to several issues. Repeated execution of the same SQL statement or similar statements may result in different access paths at different times, perhaps as a result of one of the following:
- Changes in data distribution statistics;
- Changes in DB2 code due to patches, upgrades, or new releases;
- Changes in DB2 subsystem execution parameters;
- Changes in hardware or software configurations, including memory allocations and CPU availability.
In order to reduce the instability of dynamic SQL access paths, DB2 V12 includes several parameters that can be set to capture and monitor dynamic SQL statements and maintain current access paths despite hardware or software configuration changes.
Dynamic SQL can now be captured, monitored, and reviewed for performance. The DBA can configure DB2 to maintain particular access paths for a set of dynamic SQL statements while allowing others’ performance to change as the system changes. This is extremely useful for applications that construct queries based on a set of user choices.
For example, consider a customer information system that retrieves a set of customer records based on criteria such as customer name, location, account type and account balance. It is possible that only some of the criteria are entered by the user, so the application may construct one of several different SQL statements. As the number of criteria increases the number of possible SQL statements increases geometrically.
With dynamic SQL plan stability, the DBA can capture the various SQL statements along with their access paths and have DB2 continue to use the same access path while making performance changes such as adding memory, adding indexes or updating data distribution statistics. This alleviates the potential problem of these changes potentially influencing DB2 to choose another access path that does not perform as well. In short, DBAs now have the capability to measure whether their performance changes actually result in better access paths before implementing the changes in production.
DRDA Fast Load
Many applications use the LOAD utility to perform large data loads. (Single rows can easily be loaded using the SQL Insert statement.) Some examples of data loads include loading or appending today’s transactions to a Transaction table, or loading new customers to a Customer table. In a data warehouse environment, the LOAD utility is used extensively when extracting daily data from operational systems as part of the extract transform load (ETL) process.
For data originating in remote systems, the ETL process can be quite cumbersome and I/O-intensive. Consider a daily transaction file on a remote system. The file needs to be extracted from the originating system (copy 1), sent to the mainframe (copy 2), transformed and sorted in the proper key sequence (copy 3) then loaded into a DB2 table. The multiple copies and multiple data movements seem redundant.
The data protocol used to communicate between the remote system and mainframe DB2 is called distributed relational database architecture, or DRDA. With the new DRDA Fast Load capability, applications can invoke a remote load process using a local DB2 client application. This application will send the data directly to the mainframe for processing directly by the Load utility.
This has several advantages for the DBA to consider. It reduces mainframe CPU by placing the extraction and transformation logic on the same platform as the originating system. It also bypasses several file creation steps, reducing I/Os, disk space usage and total elapsed time.
Table Partition Limitations Removed
One of the reasons that big data appliances are so popular is that they can store so much data. While this may be obvious, prior versions of DB2 had limits on table sizes that precluded using DB2 tables for very large databases. Thus, big data applications were used to store large tables and the result was integrated into the current DB2 subsystem, usually including the enterprise data warehouse.
Prior to DB2 V12, table partitions were limited as well as the total number of partitions per tablespace. A typical table using 4k pages was limited to 16 terabytes (TB) total size.
DB2 V12 introduces a new type of partitioned table called “partition by range relative page number”. This tablespace type allows each partition to be 1 TB in size, and the total size of a tablespace is increased by a factor of over 100, to 4 petabytes (PB). In terms of rows, the largest number of rows allowed in one of these new tables is approximately 280 trillion.
This massive increase in allowed table size means that DB2 now includes native options for storing large amounts of data. This may be a viable option for shops whose limited budgets preclude leasing a big data appliance, or who wish to implement a big-data-like application in native DB2 first as a proof of concept prior to considering a big data application.
One issue that DBAs and systems support staff have had concerns about in recent years is the complexity of DB2 version upgrades. Many mainframe shops implement multiple DB2 subsystems in a cluster configuration called data sharing. Since each DB2 subsystem is a separate software instance, each must be updated when IBM issues DB2 software updates or a new version. Some of the new version code and system databases are shared by all the subsystems, and some is unique to each subsystem. Thus, the process for upgrading them all can be complex, especially when systems support must take into account undoing the upgrade or falling back to a prior version should issues occur.
With continuous delivery, IBM has simplified their method of software delivery. Upgrades are now provided in smaller pieces that are much easier to implement and faster to test. Shops can choose which subsets of functionality they wish to apply at what time and to what subsystems.
Version 12 of IBM’s mainframe DB2 database management system contains several functions and features that provide the IT enterprise with additional options for processing large amounts of data. Rather than budget for, lease and install a new hybrid hardware/software platform for a big data application, management can consider implementing one or more “small big data” applications on the mainframe server. DB2 V12 allows for much larger tables and much faster processing than in previous versions, so DBAs can install “proof of concept” applications and large databases and try out various analytical query tools.
Once IT management reviews the results of these tests, they are in a much better position to justify making the big step to a full-sized big data application.