Enterprises are delegating various aspects of application development and support to outside firms in order to shorten implementation schedules. The sooner you implement, the thought goes, the sooner you are in front of customers and making profits. Another advantage is getting your application running before your competition implements their own. This delegation comes at a cost. Application and database definitions are now spread among multiple firms across multiple platforms. Business rules are implemented in (potentially) many places, and many support staff skills are not under your control. This raises the complexity of the application and increases the risks of failure during application changes, maintenance or upgrades.
This article addresses how to measure and maintain the quality of your application when you take advantage of database as a service (DBaaS).
The Decision to Delegate
The decision to delegate (or “outsource”) the management of your application’s database(s) is usually part of a strategy to get the application implemented as quickly as possible with a minimum of local resources. Database management and support require sophisticated and experienced staff; in addition, for complex or data-intensive applications there may be requirements for expensive hardware and software. Partnering with a DBaaS provider means being relieved of these costs and forgoing in-house development of the support staff.
Implementing DBaaS means delegating both the storage and retrieval of data and the hardware and infrastructure that support it. We will look at quality metrics in several categories.
Basic Database Administration
The four most important areas that database administrators (DBAs) must support are:
- Database backup and recovery
- Data availability
- Data security
- Database performance
Each of these areas has its own quality metrics. Application owners can use these metrics to measure their DBaaS provider and indirectly measure their application’s performance.
Database backup and recovery. Backup and recovery are complimentary processes that are essential in today’s business world. Database backups provide a method to recover from hardware, software, and application failures. Hardware issues are the ones most businesses consider. If your primary computer fails or you lose power, staff can recover the database as soon as the hardware or power issue is fixed.
Just as important, but rarely considered, are database software or application software failures. For example, you implement a new version of your on-line product sales application only to discover that the code is generating inaccurate prices. The current database may be inaccurate or even corrupt. Staff can compare the current database with the backup to determine what customers were affected and how.
Another similar issue involves the database definition. Some business rules (such as “customer numbers must be unique”) are enforced by the database management system. If the database administrator implements an incorrect parameter, you may end up with duplicate customer numbers. Again, a database backup is imperative.
There is more to recovery than taking simple backup snapshots of the database. All database changes should be recorded in database log records. These provide a byte-by-byte record of every database change and can be instrumental in analyzing data issues. In addition, the administrator can recover a database by copying in one of the backups, then “replay” the log to re-execute transactions to any point in time. This permits recovery to almost any time without requiring backups being taken every hour or every minute.
The primary metrics for measuring backup and recovery quality involve defining two requirements: the recovery point objective (RPO) and recovery time objective (RTO).
The RPO is the point to which the application owner wishes the database recovered after a disaster or outage occurs. For example, a no-update application such as a data warehouse might have an RPO defined as “recover the database to a point as of 6 am. on the day of the outage.” A high-update application such as on-line product sales might have an RPO defined as “recover the database to the point immediately prior to any in-process transactions.” Clearly, the first example requires simply recovering from the prior nightly backup, while the second involves a more complex mixture of backups and log processing.
The RTO is the time required for the recovery process. Low-priority or non-critical applications may not require immediate recovery. Our warehouse example may have an RTO of “within 48 hours of beginning the recovery process”. On the other hand, the on-line product sales application may have an RTO of “recovery within one hour of disaster mitigation.” While long RTOs can be handled with simple nightly backups, short RTOs may require some creative hardware and software solutions such as parallel database recovery, high-speed disk drives or even a separate recovery site in “hot standby” mode.
Once the application owner has defined the RPO and RTO for their application, quality metrics for measuring the DBaaS provider revolve around executing a disaster recovery plan and practicing their recovery processes. Specification of these processes and quality measures should be included in the DBaaS contract.
Data availability. In addition to “accessible”, availability is defined more broadly. Consider stale or old data, such as customers who have not ordered a product for over a year. The DBaaS provider may wish to purge such customer data from the database. This provides several performance advantages, including:
- Total storage space for the database is smaller, hence cheaper;
- Some queries that potentially access the entire database may complete more quickly;
- Smaller databases may be backed up and recovered more quickly;
- Indexes on data will be smaller and easier to update.
What if the application requires the ability to access this purged data? One option is to store the purged data in a second database that can be queried if required.
Another availability issue involves application and system performance. As the database expands in size, the number of customers grows and the complexity of the application increases, more resources may be required to maintain reasonable transaction times. The usual quality measure is perceived user elapsed time (PUET). This is the total time that the user perceives to elapse between completing a transaction and receiving the results. This includes network transit times, database data storage and retrieval, application code execution and operating system execution.
As average PUET increases, the DBaaS provider may add resources (such as CPUs, disk drives or database indexes) to speed transaction execution. There are also several performance monitoring and tuning choices such as database partitioning.
The maximum PUET is a standard for the application to define prior to implementation. One simple example might be, “95% of all purchase transactions shall execute with a maximum of 0.75 seconds.” Natural quality metrics should be derived from actual application examples.
Data security. Data breaches figure prominently in the news both historically and recently. There are several collections of your data that must be protected, including customer data and application code. Data must be secured against both external and internal threats. Using a DBaaS provider adds another set of people who have access to your data. Quality standards in this area include the following:
- Requiring that DBaaS staff have no (or limited) access to raw data. This can be implemented with a combination of security roles and data encryption.
- Requiring that database backup copies be stored in a secured site;
- Requiring physical access security at the DBaaS hardware site.
One novel approach to security in a DBaaS environment, particularly for data stored in the cloud, is to partner with multiple DBaaS providers. Spreading your data across two or more databases in multiple physical locations diversifies your security risk, as a security breach at one site affects only a portion of your data. Of course, having multiple providers creates complexity in contracts and in application implementation. It may also affect transaction performance due to site-to-site data transfer rates.
Database performance. In general, fast database performance translates to fast queries and low transaction elapsed times. Database performance is only one aspect of total perceived performance, a major portion of which is network speed. One large component of database performance is query parsing, access path choice and execution. Standard practice for the database administrator is to review the data model and sample queries, determine common data access paths, and then define a database that performs to requirements (PUETs).
There are several methods for the database administrator to increase database performance, including physical (data partitioning, data clustering) and logical (query transformations, summary tables, indexes) enhancements. Most of these will be transparent to the application.
One critical success measure is the collection of typical database queries furnished by the application owner. Analyzing these queries, especially those that perform table joins or table unions, is essential for the DBA, as the analysis determines which data access paths are most common and/or costly; hence, where the DBA should spend time implementing performance enhancements.
Measuring database performance directly is the responsibility of the DBaaS provider. Typical performance measures include lists of the top ten worst performing queries and top ten most common queries. This information can be compared to that originally provided by the application owner. In addition, a history of query times can show whether database performance is degrading, providing the DBA with warnings about potential future problems.
Quality measures for the application owner will include the above performance metrics. In addition, ensure that the provider gathers information about database table sizes to forecast growth and storage needs as well as CPU consumption to anticipate required hardware upgrades.
For the application owner, quality measures may seem less important than getting the application up and running. However, as your application grows and your customer base grows, future application performance (and profits) may depend upon how well the database performs. The four areas of recovery, availability, security and performance provide a way to clarify the state of the database being managed by your provider.