Managing Big Data DBAs

Technical support teams usually support familiar hardware and software configurations. Specialization in particular combinations of operating systems and database management software is common, and this allows some team members to gain in-depth experience that is extremely valuable in an enterprise IT setting.

How has big data changed this paradigm?

The Database Support Team

One of the goals of a technical support team is to collaborate with management to prioritize their work. Management presents strategic plans, the team converts this into tasks with estimations of required time and resources, and they then work together to prioritize those tasks.

Tasks will fall into three categories:

1.    “Keep the shop running”

2.    Project-related

3.    Discretionary

The first category is typically reactive tasks that have a standard or fixed duration and contribute to normal operations. These include taking frequent database backups, running software upgrades, attending regular meetings, filling out status reports, completing documentation, and the like.

Project-related tasks are also typically reactive, and correspond to work involved following technical specifications for a defined project. Some examples would be: designing and creating new databases and tables; copying data from one database to another; reviewing program logic and SQL; and performance monitoring during application testing.

The last category is usually pro-active work, and includes disk and CPU capacity planning, designing and executing exception reports that display trends, and analyzing databases and SQL to uncover performance tuning issues.

Database Support with Big Data

Big data applications typically involve:

  • One or more large data sources that need to be stored and analyzed;
  • A hybrid hardware/software solution (an “appliance”) for data storage and high-speed access;
  • Special-purpose data analytics software.

In order to support one or more big data applications, the database administration team needs specialized knowledge of the technical environment, including:

  • The business use cases for the large data sources;
  • Installation, configuration, and monitoring of the appliance;
  • Tools to support performance tuning of the specialized analytical queries.

This is specialized knowledge that is new to the DB support team. In the beginning perhaps a few specialists will have this knowledge, or must attend special training classes. Eventually, most of the team must be trained and get experienced in managing and controlling a big data environment.

Big Data and DB Team Operations

The DBA’s manager still presents strategic plans to the team, and these plans now include designing and implementing big data solutions. The tasks involved in this support will span all three categories of tasks, although this may not be obvious in the beginning.

The team must now use its new knowledge to develop tactical plans and tasks for the new big data applications across their current hardware and software support space. There will be multiple sets of run-the-shop tasks related to big data, but these will differ significantly from the norm. Some of these are:

  • Database backups. Generally speaking, big data file sizes are big!  So big, that it may not be feasible to run database backups. There may not be enough disk or tape storage to hold the backups, the cost of the media may be too high, and executing the backup process may take days, or weeks.
  • Software upgrades. Most software upgrades require that the software not be running during the upgrade. This may be difficult with a big application solution, since they are usually highly important to the business. Big data infrastructure (disk storage, special hardware, special software, etc.) is costly, and businesses implement them only when a return on their investment seems likely.
  • Data copying. Typical big data appliances usually implement proprietary data storage and access methods. For performance and data governance reasons, the DBA team will implement big data storage in both the database management system and the special appliance. Tactically, this translates into loading data into both DB2 tables and the appliance as well.
  • Capacity planning.  Big data appliances are relatively new, and include both storage media, CPU power, and advanced data access channels. The DBA team now must be cognizant of how these resources are used, monitor the usage, and report on resource usage trends. They will be responsible for choosing a combination of performance tuning and hardware upgrades in order to keep the business happy.

Trends in Big Data Management

Management will see the knowledge and skills profiles of the DBA team shift over time as they deal with support of big data. Most of this shift will revolve around generalists and specialists.

Generalists are typically DBAs that are new, or that have not yet developed special expertise.  For the generalist the best tasks are those that are defined with standardized procedures. These include such things as:

  • Developing and maintaining database backup and recovery procedures;
  • Developing processes that are DBMS-based that assist in tuning such as scripts that gather data distribution statistics;
  • Implementing automation of normal and exception reporting, such as resource capacity usage.

This last item (automation) deserves further mention. The advantage of automation isn’t merely speed; automating tasks helps move the DBA away from reactive tasks such as reporting and analysis toward more proactive functions.

Here’s a typical list of processes many DBAs still manually perform that can be replaced by an automated reporting or data gathering process of some kind:

  • Executing an EXPLAIN process for SQL access path analysis
  • Generating performance reports such as System Management Facility (SMF) accounting and statistics reports
  • Verifying that new tables have columns with names and attributes that follow standard conventions and are compatible with the enterprise data model and data dictionary
  • Verifying that access to production data is properly controlled through the correct authority GRANTs
  • Monitoring application thread activity for deadlocks and timeouts
  • Reviewing console logs and DB2 address space logs for error messages or potential issues.

Big Data Specialists

Big data provides ample opportunities for gaining and using specialized knowledge. As mentioned earlier, skills in the following areas provide the DBA team with a significantly greater ability to positively affect the business:

  • Business use cases. For a data modeler the first thing to understand is how data will be used. The same is true for the big data DBA. Will old or stale data be archived or purged? Will the heaviest data access be confined to a particular time period? Are certain subsets of the data elements destined to be analyzed the most? Answers to these questions will help the DBA decide on data partitioning schemes, database backup frequency, table and index designs, and more.
  • Appliance management. While most appliance vendors prefer to deliver turnkey solutions, the DBA team will still have responsibility for performance monitoring and tuning. If nothing can be configured or tuned, will the enterprise stay with this vendor? The future will see tuneable appliances, and the DBAs must keep up-to-date on industry trends. You can be sure that your business need for the big data solution is not going away any time soon.
  • Analytical query tuning tools. The hype of big data applications can obscure this basic truth: if a business cannot get usable data in a timely fashion, the costs of the solution will outweigh the benefits.  Most vendor solutions include the promise (or prediction) that queries will run extremely quickly. While true at this time, will this be so in the future?  What if you implement multiple additional very large data stores?  What if you begin accumulating years of historical data for analysis?  What if several hundred (or thousand!) additional users begin running new, long and complex queries? The DBA is an essential technician for query tuning in this environment.

Summary

While there are many types and teams and gradations of expertise across the DBA team, the advent of supporting big data applications will change their tasks, priorities, and the way they are managed.  In the beginning only a few specialists will have the requisite knowledge and skills; however, as the business implements more applications and adds more users into the mix, the entire team must be involved in big data support.

Source(s):

IBM IBM Database Information Center 2012

Wikipedia Technical Support: Overview 2013

http://en.wikipedia.org/wiki/Database_administrator

See all articles by Lockwood Lyon

Lockwood Lyon
Lockwood Lyon
Lockwood Lyon is a systems and database performance specialist. He has more than 20 years of experience in IT as a database administrator, systems analyst, manager, and consultant. Most recently, he has spent time on DB2 subsystem installation and performance tuning. He is also the author of The MIS Manager's Guide to Performance Appraisal (McGraw-Hill, 1993).

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles