The design requirements of an application usually determine the most effective database design and database administration support processes. If, however, the database administrator (DBA) is not present during the requirements definition process, sub-par performance can result.
Essential DBA Choices and Processes
Quite often the DBA must make choices that will affect application performance, database recoverability, or data availability. Some of these choices involve physical data storage such as database partitioning. Others will affect the ability of the application to access data effectively.
Some common concerns and options that DBAs consider include the following.
Database physical partitioning. This is the most common option. The DBA defines multiple physical datasets called partitions. Each partition will contain rows with a common key or key range. One possibility is to have each partition represent a particular day (or month or year). Another is to specify partitions to contain active or inactive data. Partitioning permits multiple processes to access different partitions with minimal contention. For example, Partition A might contain active data that is constantly being updated, while Partition B contains inactive data being purged or archived.
Database reorganization. Some application designs involve volatile data, where table rows are constantly being added and changed. An order entry application is one example. New orders are constantly being inserted, while the order status of others is simultaneously updated. In this environment it is common that tables would be defined with internal free space (for new rows) and clustering (internal row sorting). Most tables with this design lose their free space and clustering after a large number of updates, and this affects performance. The DBA will commonly schedule a database reorganization utility to restore the tables to a ‘clean’ configuration.
Data purge criteria. Many applications operate on current data that is used for operational transactions. What happens to the data when it is no longer used? In our order entry example, once an order is filled, shipped and delivered, it may not be necessary to retain the order information in the current tables. The DBA will usually design a data purge or archive process that reads the current data, determines what data to remove, and stores that data in either an archive area (a ‘filled orders’ table), or perhaps simply writes the data to storage media such as magnetic tape.
Mass insert processes. Some applications have a requirement for mass insert of data. This is common in data warehouse applications. On a regular basis, usually daily, the data warehouse extract / transform / load process reads data from the operational database and stores it in the data warehouse where it can be analyzed. The DBA must consider balancing the performance of a large data load process with that of application access. Options include reading a file into a utility that executes SQL inserts of rows. Another option is the database management system (DBMS) load utility.
Recovery requirements. The DBAs act as stewards of the data for the corporation. They must implement processes that backup and restore data in case of emergency. A rarely-used application with a small database may be backed up once a week. However, a mission-critical application with terabytes of data may have a recovery requirement of “complete data recovery within 2 hours of a disaster”. For this application and its associated data, backup and recovery processes must be high-speed and high-reliability.
The DBA’s “Must Know” List
In a mainframe environment, where you share DB2 with hundreds of other apps, being well-behaved is important. The DBA will ensure this by gathering the following important information during application design.
Recovery requirements. All important applications in the enterprise are rated according to their importance of recovery after a disaster. The higher the rating, the faster the application and its data must be made available. Naturally, fast recovery of a large volume of data requires special techniques. Data may be backed up more often (nightly or even hourly). DB2 for z/OS allows backup and recovery of indexes, thus avoiding the need to recover indexes after table recovery.
Data volume. Sometimes called “speeds and feeds”, the DBA notes where there are requirements for large volumes of data processing. In those cases, the DBA also documents how incoming data is keyed. In cases where data is keyed by a natural key like customer name, the DBA may consider adding special indexes to support fast row inserts. Where data arrives with a time-based key, sometimes the best design is to simply append each new row to the physical end of the table, thus naturally clustering the data in key order.
Data purge criteria. Sometimes the application requires that data be retained for some period of time. This is usually due to regulatory or legal requirements. In this case the database design may involve a partitioning scheme by date. As new data arrives it is placed into partitions at the physical end of the table. Earlier partitions, therefore, contain old data and may be easily purge by emptying a partition.
Application performance. In some cases an application may have a performance requirement called a service-level agreement (SLA). SLAs typically specify how quickly on-line transaction must return data to a user, and how quickly other processes such as batch reporting and data analysis are completed. In these cases, the DBA will most likely consider additional table indexes to help with query performance by providing additional data access paths. Another possible tactic is to recommend that read-only SQL queries include the WITH UR attribute, which bypasses locks to improve performance. Last, the DBA will review application definitions and recommend how often applications issue commits.
Application SQL review. Despite the protests of many developers, a well-designed and documented application definition includes the SQL statements used to access the data. The DBA analyzes the SQL for several reasons:
- Sets of application queries that access data but do not update should be clearly defined as read-only, ensuring that the DBMS takes no unnecessary locks.
- Application queries and transactions must be structured so that they do not interfere with other concurrent queries or database utilities. For example, a query against a set of tables should not interfere with a DBA process that is backing up or reorganizing the table.
- Queries that access partitioned tables should contain specific reference to the partitions accessed. A query that accesses an entire table but only uses data from a single partition will perform poorly.
- Review application transaction commit frequency. Even if the application is doing read-only access to the database, frequent commits will help the DBMS in preventing locking issues such as timeouts and deadlocks. In addition, commits are a must when queries and DBMS utilities such as a load are executing concurrently.
- The DBA executes the Explain process for the queries. This displays and documents query access paths. The DBA can then analyze the complete suite of Explain data to determine if there are potentially poor access paths, or if additional indexes would help.
After Design Comes Testing
Another essential task performed by the DBAs is preparing an environment for the application developers to perform load testing. This environment contains all tables, and if possible contains production-like data and data volumes.
While the developers are testing the application to ensure it executes the business requirements correctly, the DBA reviews these specific indicators:
Query performance. The DBA had previously analyzed the application SQL. Now is the time to see if predictions of query performance are correct. While there are many vendor tools available to capture, display and analyze query performance analysis, it may be sufficient to simply measure and report query elapsed times. Two other important query performance metrics are CPU used and amount of I/So executed.
Data volume issues. The DBA expects that most of the high-volume data requirements were discussed and dealt with during database design. Load testing will show whether this is true or not. There are two common root causes of such performance issues. The first is called the ‘hot spot’. This is a physical location within the database where multiple rows are being accessed simultaneously. To avoid hot spots the DBA must spread activity across the database, which is usually done by re-clustering the tables. The second common cause of issues is lack of a good process to keep tables clustered. To address this, the DBA commonly adds embedded free space to the tables and schedules regular table reorganizations.
Batch processes against entire tables. Some applications have reporting or auditing requirements that must access all rows in some tables. Such queries can sometimes interfere with other queries, updates, or DBMS utilities such as Load. In these cases, the DBA can offer an alternative to these batch processes in the form of access to database backups. Tables can be backed up regularly, and many DBMSs (including DB2) have processes that allow SQL queries to run against the backups rather than against the base tables.
Application development management must include database administrators during application requirements definition and application design. This ensures that potential issues such as performance and data recoverability are addressed as early as possible.