A new separately-priced software offering from IBM on z/OS systems uses machine learning and artificial intelligence to assist the Db2 Optimizer in choosing high-performance access paths for SQL statements.
Database Proliferation Affects DBAs
At the enterprise level, it is now normal to expect that production applications issue hundreds of thousands (or even millions) of SQL statements every day. These statements run the gamut of complexity from single-row requests based upon a unique key to multi-table joins to business analytical queries involving scores of tables and potentially running for hours. Yet, despite decades of experience in designing databases and tuning SQL queries, database administrators (DBAs) still spend much of their time doing query performance tuning. Why is this?
One explanation is the growing number of application categories. In the last several decades most businesses have expanded their software suites from simple accounting, reporting and inventory applications to include customer-facing real-time order entry, the enterprise data warehouse, remote data marts, multi-site federated systems and finally big data. Each application category has its own unique set of table types, various flavors of indexes and special performance features such as summary tables.
These have led to new database designs and access paths, such as the star join frequently used in the data warehouse. They also include special-purpose hybrid hardware and software solutions, such as the big data appliance. As each category matured, DBAs tended to specialize in one category or another, mainly because the number of applications, tables and query types required performance tuning tactics unique to that category. The DBA generalist, who could support all applications across the enterprise, is slowly giving way to DBA specialists. These professionals do not have the depth and breadth of experience of the generalist, but they do have first-hand knowledge of their own application areas and databases.
SQL Performance Tuning Today
DBA specialists still spend a significant amount of time doing SQL tuning. Some studies indicate that DBAs may spend up to a quarter of their time on such performance tuning, making it difficult for them to support other areas such as software upgrades, user support and systems maintenance. And, considering that application categories tend to have quite different classes of queries, DBAs find it difficult to assist each other.
Considering that a lot of SQL tuning involves analyzing historical access path information, it is no surprise that IBM has implemented an artificial intelligence solution to do some of this work. Called Db2 AI for z/OS (Db2ZAI), this software product uses a machine learning (ML) service called ML for z/OS. Db2ZAI gathers SQL performance information over time and passes this data to ML for z/OS, which in turn creates models for specific SQL statements, trains the models, and then deploys them. As a result, the Db2 Optimizer, the core Db2 code that chooses optimal access paths for SQL statements, can then use these models as input when deciding upon future access paths alternatives.
As a result, some of the most time-consuming SQL performance analysis can now be automated. This offloads a significant portion of tedious work from the DBA, while also potentially reducing resource consumption of many queries. In particular, Db2ZAI can significantly reduce CPU usage.
The Details
Machine learning software relies on gathering data and finding patterns within the data, patterns that may not be perceived by humans. Another aspect of machine learning is that it often deals with big data. Thus, it deals with consuming enormous amounts of data, analyzing the data for patterns, and developing a summary model of the patterns that can then be used to predict future patterns or trends.
In the case of SQL, each SQL statement that is presented to Db2 must be bound, or prepared. The Db2 Optimizer controls this process, and uses a combination of data distribution statistics (created by the RunStats utility), the actual values of variables or parameters present, and any explicit indication of the number of rows desired from the result. This last item can be coded in the SQL statement in several ways, including the OPTIMIZE FOR n ROWS or FETCH FIRST n ROWS clauses. In normal processing, the Optimizer will analyze the tables accessed, discern what indexes, if any, exist, whether sorting of intermediate or final results is possible. It then determines the costs of the most likely access paths and chooses the one with the lowest cost, where cost is an internal measure of weighted resource usage that includes CPU, I/O, memory usage, and other factors.
When implemented, Db2ZAI gathers this information, along with the number of rows actually retrieved by the application and the resources actually used, and stores all of it in a set of internal tables. Over time these tables accumulate historical results for sets of queries that includes inputs to the Optimizer and the final query results. It passes this information to the machine learning software, in a process called training the model. As training data accumulates, the model eventually returns predictions to the Optimizer for modelled queries, and the Optimizer may then use this information in considering future access paths.
Big Data Implications
One interesting byproduct of this AI solution is the effect it can have on business analytics queries that run against your big data application. IBM shops will usually choose the IBM Db2 Analytics Accelerator (IDAA) as their big data solution. This hybrid hardware and software stores Db2 tables in a proprietary internal format that allows for extremely fast query execution times. Shops have the option of maintaining two synchronized copies of important tables, one copy in Db2 and one in the IDAA. This allows the Optimizer two choices for data access, each with its particular performance profile.
In general, business queries joining a few tables or requesting many rows and/or columns will perform faster accessing the table in Db2. Alternatively, analytics queries with many tables that access only a few columns, access large numbers of rows and do aggregating, will run faster in the IDAA. With Db2ZAI, the Optimizer now has additional information about query history and performance, and can make better decisions about which tables to access in which locations. As customers get more experience with queries and table access, they may find that certain table locations are preferred; thus, they can make decisions about primary table locations, index alternatives, and other performance features.
The Costs
No data-intensive feature is without cost. However, IBM has attempted to minimize the CPU costs by offloading the learning tasks to the System z Integrated Information Processor (zIIP) CPUs engines. These special-purpose CPUs can be installed on your z hardware, and have the advantage of their work not being counted in most software license charges, which are commonly based on CPU usage.
The process of gathering Optimizer input information and query execution performance history is already implemented within the Db2 z software, and Db2ZAI adds a process to externalize and save this data. Therefore, it uses general purpose CPU resources. IBM notes that this typically results in a one to two percent CPU overhead for any SQL statement being analyzed.
Db2ZAI also includes a user interface that manages the machine learning process for each Db2 subsystem. Users can monitor how models are created and trained, whether models are deployed for what categories of SQL statements, and before and after performance data.
Db2ZAI is available for Db2 12 for z/OS only and requires that the Db2 software be at function level 503 or above.
What This Means for the DBA
While early big data solutions allowed few tuning options, the latest have multiple storage and processing options that can affect query performance, including data key clustering and various forms of data partitioning. Users executing business analytics queries were usually unaware of query access paths and big data performance options, and DBAs had little prior notification of what queries were being executed against which objects. As big data applications have become common in the enterprise shop, DBAs and data scientists must closely coordinate how and where queries are executed, and where and how tables are stored. This can be a tedious process.
Db2ZAI removes some of this burden by gathering historical query performance data, modeling it, and making that model available to the Db2 Optimizer. The Optimizer can then make more efficient access path cost estimates, resulting in both faster query execution and a reduction in resource consumption.
What could the future hold? As Db2ZAI gathers more information and improves its model training and retraining abilities, it may be possible to interface this product with other tuning products such as the IBM Query Workload Tuner. The tuner already analyzes SQL statements and gives tuning recommendations such as alternative syntax and index changes. Consider how Db2 Ai for z/OS could pass model information to an advanced workload tuner that could then recommend possible SQL, table or index changes. And, if machine learning models could be combined in some way, the software suite might even make system-wide performance recommendations that reduced the resource usage of a set of application SQL as a whole.
Additional Information
Db2 AI for z/OS – Blog on developerworks