DB2 management, tutorials, scripts, coding, programming and tips for database administrators
Most large organizations have implemented one or more big data applications. As more data accumulates internal users and analysts execute more reports and forecasts, which leads to additional queries and analysis, and more reporting. The cycle continues: data growth leads to better analysis, which generates more reporting. Eventually the big data application swells with so much data and querying that performance suffers. How to avoid this?
Big data is everywhere, and most large IT enterprises have installed one or more big data applications. These applications provide fast access to large stores of data, usually customer or sales data. Your technical staff that supports these applications and the systems that analyze and consume the data didn't exist ten years ago. Who are these new IT professionals, and how should you manage them?
Your big data repository won't simply add another twelve months of data over the next year. More data is coming, more categories of data will be created, and your analytical environment must expand to fit future needs. But size alone won't be your only problem. In the rush to accumulate a sufficient amount of valuable data and implement a business analytics environment that can produce usable results, several items may have been ignored, postponed, or simply forgotten. These missing details can make or break your company in the future.
Big data software, hardware, application suites, business analytics solutions ... suddenly, it seems, IT enterprises are deluged with vendor offerings that solve problems it didn't know it had. As you dive into what will most likely be your largest IT project of the year, ensure that you have planned and budgeted for the following items that are unique to big data implementations.
It’s difficult to simply 'drop' big data applications into an existing IT infrastructure and expect to run smoothly. In addition to energy and cooling requirements for new hardware to support the new big data application, other IT areas need to prepare. The major factors that determine whether enhancements will be needed to existing applications include large data storage needs, larger data transmission capacity, and the demands these will place on existing hardware and software.
Big Data implementations are more than just lots of data. Of equal importance is the analytics software used to query the data. Analyzing business data using advanced analytics is common, especially in companies that already have an enterprise data warehouse. It is therefore only natural that your big data application must be integrated with the existing warehouse.
Big data applications do not need the same infrastructure support teams as do more common applications. As the enterprise embraces big data, management assumes that staff sizes will decrease. What should be done with those unneeded technologists? One answer: convert them into technology consultants that collaborate and coordinate with the lines of business. In other words, give them customer-facing roles.
Technical support teams usually support familiar hardware and software configurations. Specialization in particular combinations of operating systems and database management software is common, and this allows some team members to gain in-depth experience that is extremely valuable in an enterprise IT setting. How has big data changed this paradigm?
Big data applications are here to stay. The promise of this technology is the ability to quickly and easily analyze large amounts of data and derive from that analysis changes to customer-facing systems. Management believes that the analysis and subsequent changes will drive up customer satisfaction, market share and profits, hopefully at a reasonable cost.
Many big data application implementations seem to begin with an existing data warehouse, one or more new high-volume data streams, and some specialized hardware and software. The data storage issue is often accommodated by installing a proprietary hardware appliance that can store huge amounts of data while providing extremely fast data access. In these cases, do we really need to worry about database design?
Big data applications and their associated proprietary, high-performance data stores arrived on the scene a few years ago. With promises of incredibly fast queries, many IT shops implemented one or more of these combination hardware and software suites. However, few IT enterprises have implemented metrics that clearly measure the benefits of these systems. The expected monetary gains from big data applications have not yet materialized for many companies, due to inflated expectations. The solution: Measure resource usage, and use these measurements to develop quality metrics.
Load tests give the database administrator (DBA) quite a lot of valuable information and may make the difference between poor and acceptable application performance. But what about a big data environment? Are there any gotchas or traps associated with big data?