Staffing Your IT Organization for Big Data

Big data applications are commonplace now. Most large companies have one or more of these applications, which provide fast access to large stores of customer and sales data. As the IT organization grows to install, support and maintain these applications, new job categories and new tasks are added to the mix. These include big data hardware and software support, business analysts who use analytics to probe and explore the data, and managers who must supervise and prioritize job tasks.

Your Current Staff

Consider the following tasks, and see if you can assign them to one of two categories: tasks executed by generalists, or tasks executed by specialist:

  1. Managing database backup and recovery processes;
  2. System and network performance tuning;
  3. Monitoring and measuring disaster recovery readiness;
  4. Software installation and version migration;
  5. Implement self-analyzing and self-tuning processes (sometimes called autonomics);
  6. Being lead technical support for a mission-critical application;
  7. SQL query tuning, including Explains and access path analysis;
  8. Assisting or managing data architecture changes; Database performance tuning;
  9. Test – to – production object and data migration;
  10. Performing benchmarks for prospective vendor tools.
  11. Monitoring database logs for errors and issues.

Odd tasks (1, 3, etc.) should be well-documented, basic processes with clearly defined goals and results. Prior to the big data revolution these tasks were perfect for generalists, as they did not require extensive experience or skills. In a big data environment, many of these processes are redundant, unnecessary, or automated. The result: generalists may no longer be needed in the IT organization.

Even tasks (2, 4, etc.) require special skills, and are best left to specialists. Still, as the organization transitions to managing multiple big data applications, fewer of these tasks will have high priority. The result: fewer specialists will be needed.

In summary, supporting big data applications will not be done by a cadre of programmer analysts and database administrators. Several new job categories have appeared that change the IT support paradigm. The primary categories are listed below, with a description of where to find (or grow) these people in your IT organization.

Category: Data Acquisition Support

The data for your big data application usually comes from snapshots of transactional data that flows through your operational systems. Any data stream containing information on customers, prices, costs, products, accounts, and the like are fair game. Generally speaking, the more streams you can acquire, store and analyze, the more relevant and detailed your analytical results will be.

Data acquisition specialists will be responsible for knowing the various sources of data available across the IT enterprise. In addition, they must also be aware of data sources outside the company that can be purchased. Any and all of this data may be required for analysis.

These specialists must also coordinate and communicate with the business analysts, who will want to know what data is available, or may request specific data. The result is an interesting job description that requires knowledge of multiple enterprise applications, customer data, and IT best practices such as data modeling techniques and the enterprise data model. The DA specialist also needs to understand data across the enterprise.

From where will these specialists come?  Most likely from current database administrators and other generalists that support current operational systems.

Category: Big Data Storage

Initially, big data is stored either in a high-performance database management system (DBMS) such as DB2, or in a big data hybrid hardware/software appliance such as the IBM DB2 Analytics Accelerator (IDAA).  Both the DB2 database and IDAA solutions permit complex queries, though they do have different performance profiles that depend upon data volume, query complexity, and so forth.

The result is a store of large data that must be managed to support high-intensity querying.  This is in contrast to normal operational systems, such as order entry, where new data changes with high frequency throughout the day. In big data applications, snapshots of large data stores are acquired and loaded into the DBMS. After that, the database administrators (DBAs) must be responsible for fast data access. This can be accomplished in a variety of ways, including memory management, adding data indexes, using high-speed disk arrays, or using a hybrid solution (as with IDAA above).

These specialists will come from the ranks of experienced database administrators. They may need to

increase their knowledge of internal customers’ applications. What data do they need, when do they need it, and who uses the results? Specialists can become subject matter experts in several application areas, increasing their value as internal consultants and advising on matters such as query efficiency and advanced analytics methods such as cubes.

Category: Big Data Analytics

The business analyst is usually familiar with querying data, usually that stored in an enterprise data warehouse (EDW). The EDW commonly consists of one or more fact tables containing transactional data, with additional dimension tables (see above) listing dimension values for grouping. This querying expertise will transfer very well to the big data application, with one caveat: big data will eventually (if not already) contain data types not familiar to the analyst. In addition, as data volumes increase it becomes essential that queries are written to perform well.

In the big data environment, this becomes even more critical. While business analytics software packages are designed to generate efficient queries, and big data application solutions promise high-performance storage, sometimes there is just too much data. Another thing to consider is that amount and complexity of queries against the big data application will increase exponentially as analysts get familiar with the new software and the data.

More data volume, and more queries coming more often translates into very heavy use of your big data solution. This is good!  This also means that performance may well become a problem. Plan for this by having analysts share queries with the DBAs, and collaborate on timing and performance options.

As more and more data is stored, having an organized data dictionary and data model becomes critical. Without knowing what data you have, how can it be efficiently queried?

Specialists, especially database administrators, should already be aware of data modeling concepts and should have knowledge of multiple applications and systems. This knowledge can now be put to use as they assist business analysts with initial analytics definitions and query construction.

Generalists can serve as investigators for applications, listing and cataloging data elements and confirming data attributes and sources.

Managing the Transition to Big Data

Some generalists will serve initially interfacing with specific internal customers with the intent of assisting them with data requirements definitions. This can then be expanded into more technical roles in either analytics execution or results analysis. It may even be worthwhile to consider transferring generalists into the lines of business.

Many specialists will remain to deal with technology-related issues such as performance and tuning. Some may consult internally on advanced analytics options, methods of analyzing new data types, and the like. Management must keep these specialists motivated in their work, or risk losing them to other similar businesses.

Infrastructure support teams successfully survived the advent of big data, and were most likely responsible for the successful implementation of many of these applications.  However, the outcome leads to an inevitable decrease in the need for IT support staff. IT specialists and generalists must expand their skill sets by learning internal lines of business and familiarize themselves with current business data needs. Assuming a customer-facing role may be their only long-term career option.


Since generalists will shift to doing maintenance and support work, their tasks and processes should be clearly documented. Generalists, as they gain experience, eventually become specialists; therefore, clear documentation will assist management with either automating their tasks or outsourcing them.

Specialists will have many, new tasks to perform. They will need training in new hardware and software solutions and experience with enterprise data and applications. This may take some time; however, it is essential to have a core group of specialists to support big data applications.

Management may consider different organizational structures in order to get the greatest value from the new staff. Managers will continue to set strategic goals and prioritize work, and must also spend time working on expanding job descriptions, hiring procedures, and performance evaluations. One possible structure that may be valuable is to split the teams logically into specialists and generalists, allowing the teams to work together on common projects.


IBM – What is Big Data?

IBM developerWorks – Big Data Overview

See all articles by Lockwood Lyon

Lockwood Lyon
Lockwood Lyon
Lockwood Lyon is a systems and database performance specialist. He has more than 20 years of experience in IT as a database administrator, systems analyst, manager, and consultant. Most recently, he has spent time on DB2 subsystem installation and performance tuning. He is also the author of The MIS Manager's Guide to Performance Appraisal (McGraw-Hill, 1993).

Latest Articles