In September 2018, IBM announced a new product, IBM Db2 AI for z/OS. This artificial intelligence engine monitors data access patterns from executing SQL statements, uses machine learning algorithms to decide on optimal patterns and passes this information to the Db2 query optimizer for use by subsequent statements.
Machine Learning on the IBM z Platform
In May of 2018, IBM announced version 1.2 of its Machine Learning for z/OS (MLz) product. This is a hybrid zServer and cloud software suite that ingests performance data, analyzes and builds models that represent the health status of various indicators, monitors them over time and provides real-time scoring capabilities.
Several features of this product offering are aimed at supporting a community of model developers and managers. For example:
- It supports multiple programming languages such as Python, Scala and R. This allows data modelers and scientists to use a language with which they are familiar;
- A graphical user interface called the Visual Model Builder guides model developers without requiring highly-technical programming skills;
- It includes multiple dashboards for monitoring model results and scoring services, as well as controlling the system configuration.
This machine learning suite was initially aimed at zServer-based analytics applications. One of the first obvious choices was zSystem performance monitoring and tuning. System Management Facility (SMF) records that are automatically generated by the operating system provide the raw data for system resource consumption such as central processor usage, I/O processing, memory paging and the like. IBM MLz can collect and store these data over time, and build and train models of system behavior, score those behaviors, identify patterns not readily foreseen by humans, develop key performance indicators (KPIs) and then feed the model results back into the system to affect system configuration changes that can improve performance.
The next step was to implement this suite to analyze Db2 performance data. One solution, called the IBM Db2 IT Operational Analytics (Db2 ITOA) solution template, applies the machine learning technology to Db2 operational data to gain an understanding of Db2 subsystem health. It can dynamically build baselines for key performance indicators, provide a dashboard of these KPIs and give operational staff real-time insight into Db2 operations.
While general Db2 subsystem performance is an important factor in overall application health and performance, IBM estimates that the DBA support staff spends 25% or more of its time, " ... fighting access path problems which cause performance degradation and service impact.". (See Reference 1).
AI Comes to Db2
Consider the plight of modern DBAs in a Db2 environment. In today's IT world they must support one or more big data applications, cloud application and database services, software installation and configuration, Db2 subsystem and application performance tuning, database definition and management, disaster recovery planning, and more. Query tuning has been in existence since the origins of the database, and DBAs are usually tasked with this as well.
The heart of query path analysis in Db2 is the Optimizer. It accepts SQL statements from applications, verifies authority to access the data, reviews the locations of the objects to be accessed and develops a list of candidate data access paths. These access paths can include indexes, table scans, various table join methods and others. In the data warehouse and big data environments there are usually additional choices available. One of these is the existence of summary tables (sometimes called materialized query tables) that contain pre-summarized or aggregated data, thus allowing Db2 to avoid re-aggregation processing. Another option is the starjoin access path, common in the data warehouse, where the order of table joins is changed for performance reasons.
The Optimizer then reviews the candidate access paths and chooses the access path, "with the lowest cost." Cost in this context means a weighted summation of resource usage including CPU, I/O, memory and other resources. Finally, the Optimizer takes the lowest cost access path, stores it in memory (and, optionally, in the Db2 Directory) and begins access path execution.
Big data and data warehouse operations now include software suites that allow the business analyst to use a graphical interface to build and manipulate a miniature data model of the data they wish to analyze. The packages then generate SQL statements based on the users’ requests.
The Problem for the DBA
In order to do good analytics on your multiple data stores you need a good understanding of the data requirements, an understanding of the analytical functions and algorithms available and a high-performance data infrastructure. Regrettably, the number and location of data sources is expanding (both in size and in geography), data sizes are growing, and applications continue to proliferate in number and complexity. How should IT managers support this environment, especially with the most experienced and mature staff nearing retirement?
Understand also that a large part of reducing the total cost of ownership of these systems is to get Db2 applications to run faster and more efficiently. This usually translates into using fewer CPU cycles, doing fewer I/Os and transporting less data across the network. Since it is often difficult to even identify which applications could benefit from performance tuning, one approach is to automate the detection and correction of tuning issues. This is where machine learning and artificial intelligence can be used to great effect.
Db2 12 for z/OS and Artificial Intelligence
Db2 version 12 on z/OS uses the machine learning facilities mentioned above to gather and store SQL query text and access path details, as well as actual performance-related historical information such as CPU time used, elapsed times and result set sizes. This offering, described as Db2 AI for z/OS, analyzes and stores the data in machine learning models, with the model analysis results then being scored and made available to the Db2 Optimizer. The next time a scored SQL statement is encountered, the Optimizer can then use the model scoring data as input to its access path choice algorithm.
The result should be a reduction in CPU consumption as the Optimizer uses model scoring input to choose better access paths. This then lowers CPU costs and speeds application response times. A significant advantage is that using AI software does not require the DBA to have data science skills or deep insights into query tuning methodologies. The Optimizer now chooses the best access paths based not only on SQL query syntax and data distribution statistics but on modelled and scored historical performance.
This can be particularly important if you store data in multiple places. For example, many analytical queries against big data require concurrent access to certain data warehouse tables. These tables are commonly called dimension tables, and they contain the data elements usually used to control subsetting and aggregation. For example, in a retail environment consider a table called StoreLocation that enumerates every store and its location code. Queries against store sales data may wish to aggregate or summarize sales by Location; hence, the StoreLocation table will be used by some big data queries. In this environment it is common to take the dimension tables and copy them regularly to the big data application. In the IBM world this location is the IBM Db2 Analytics Accelerator (IDAA).
Now think about SQL queries from both operational applications, data warehouse users and big data business analysts. From Db2's perspective, all these queries are equal, and are forwarded to the Optimizer. However, in the case of operational queries and warehouse queries they should most likely be directed to access the StoreLocation table in the warehouse. On the other hand, the query from the business analyst against big data tables should probably access the copy of the table there. This results in a proliferations of potential access paths, and more work for the Optimizer. Luckily, Db2 AI for z/OS can provide the Optimizer the information it needs to make smart access path selections.
How It Works
The sequence of events in Db2 AI for z/OS (See Reference 2) is generally the following:
- During a bind, rebind, prepare or explain operation, an SQL statement is passed to the Optimizer;
- The Optimizer chooses the data access path; as the choice is made, Db2 AI captures the SQL syntax, access path choice and query performance statistics (CPU used, etc.) and passes it to a "learning task";
- The learning task, which can be executed on a zIIP processor (a non-general-purpose CPU core that does not factor into software licensing charges), interfaces with the machine learning software (MLz Model Services) to store this information in a model;
As the amount of data in each model grows, the MLz Scoring Service (which also can be executed on a zIIP processor) analyzes the model data and scores the behavior;
- During the next bind, rebind, prepare or explain, the Optimizer now has access to the scoring for SQL models, and makes appropriate changes to access path choices.
There are also various user interfaces that give the administrator visibility to the status of the accumulated SQL statement performance data and model scoring.
IBM's machine learning for zOS (MLz) offering is being used to great effect in Db2 version 12 to improve the performance of analytical queries as well as operational queries and their associated applications. This requires management attention, as you must verify that your business is prepared to consume these ML and AI conclusions. How will you measure the costs and benefits of using machine learning? Which IT support staff must be tasked to reviewing the result of model scoring, and perhaps approving (or overriding) the results? How will you review and justify the assumptions that the software makes about access path choices?
In other words, how well do you know your data, its distribution, its integrity and your current and proposed access paths? This will determine where the DBAs spend their time in supporting analytics and operational application performance.
# # #
John Campbell, IBM Db2 Distinguished Engineer
From "IBM Db2 AI for z/OS: Boost IBM Db2 application performance with machine learning"
Db2 AI for z/OS
See all articles by Lockwood Lyon