Top 10 IBM Information Management Trends

Julian Stuhler shares his pick of the most important current trends in the world of IBM Information Management. Some are completely new and some are evolutions of existing technologies, and he’s betting that every one of them will have some sort of impact on data management professionals during the next 12-18 months.

Introduction
1. Living on a Smarter Planet
2. The Information Explosion
3. Hardware Assist
4. Versioned/Temporal Data
5. The Rise of XML and Spatial Data
6. Application Portability
7. Scalability and Availability
8. Stack ’em high…
9. BI on the Mainframe
10. Data Governance
Additional Resources

Introduction

The Greek philosopher Heraclitus is
credited with the saying "Nothing endures but change". Two millennia
later those words still ring true, and nowhere more so than within the IT
industry. Each year brings exciting new technologies, concepts and buzzwords
for us to assimilate. Here is my pick of the most important current trends in
the world of IBM Information Management. Some are completely new and some are
evolutions of existing technologies, but I’m betting that every one of them
will have some sort of impact on data management professionals during the next
12-18 months.

1. Living on a Smarter Planet

You don’t have to be an IT
professional to see that the world around us is getting smarter. Let’s just
take a look at a few examples from the world of motoring: we’ve become used to
our in-car GPS systems giving us real-time traffic updates, signs outside car
parks telling us exactly how many spaces are free, and even the cars themselves
being smart enough to brake individual wheels in order to control a developing skid.
All of these make our lives easier and safer by using real-time data to make
smart decisions.

However, all of this is just the
beginning: everywhere you look the world is getting more
"instrumented", and clever technologies are being adopted to use the
real-time data to make things safer, quicker and greener. Smart electricity
meters in homes are giving consumers the ability to monitor their energy usage
in real time and make informed decisions on how they use it, resulting in an
average reduction of 10% in a recent US study. Sophisticated traffic management
systems in our cities are reducing congestion and improving fuel efficiency,
with an estimated reduction in journey delays of 700,000 hours in another study
covering 439 cities around the world.

All of this has some obvious
implications for the volume of data our systems will have to manage (see trend
#2 below) but the IT impact goes a lot deeper than that. The very
infrastructure that we run our IT systems on is also getting smarter. Virtualization
technologies allow server images to be created on demand as capacity increases,
and just as easily torn down again when the demand reduces. More extensive
instrumentation and smarter analysis allows the peaks and troughs in demand to
be more accurately measured and predicted so that capacity can be dynamically
adjusted to cope. With up to 85% of server capacity typically sitting idle on
distributed platforms, the ability to virtualize and consolidate multiple
physical servers can save an enormous amount of power, money and valuable IT
center floor space.

If you live in the mainframe space, virtualization
is an established technology that you’ve been working with for many years. If
not, this might be a new way of thinking about your server environment. Either
way, most of us will be managing our databases on virtual servers running on a
more dynamic infrastructure in the near future.

2. The Information Explosion

As IT becomes ever more prevalent in
nearly every aspect of our lives, the amount of data generated and stored
continues to grow at an astounding rate. According to IBM, worldwide data
volumes are currently doubling every two years. IDC estimates that 45GB of data
currently exists for each person on the planet: that’s a mind-blowing 281
billion gigabytes in total. While a mere 5 percent of that data will end up on
enterprise data servers, it is forecast to grow at a staggering 60 percent per
year, resulting in 14 exabytes of corporate data by 2011.

Major industry trends such as the move
towards packaged ERP and CRM applications, increased regulatory and audit
requirements, investment in advanced analytics and major company mergers and
acquisitions are all contributing to this explosion of data, and the move
towards instrumenting our planet (see trend #1 above) is only going to make
things worse.

As the custodians of the world’s
corporate data, we are at the sharp end of this particular trend. We’re being
forced to get more inventive with database partitioning schemes to reduce the
performance and operational impact of increased data volumes. Archiving
strategies, usually an afterthought for many new applications, are becoming
increasingly important. The move to a 64-bit memory model on all major
computing platforms allows us to design our systems to hold much more data in
memory rather than on disk, further reducing the performance impact. As volumes
continue to increase and new types of data such as XML and geospatial
information are integrated into our corporate data stores (see trend #5), we’ll
have to get even more inventive.

3. Hardware Assist

OK, so this is not a new trend: some
of the earliest desktop PCs had the option to fit coprocessors to speed up
floating point arithmetic, and the mainframe has used many types of
supplementary hardware over the years to boost specific functions such as sort
and encryption. However, use of special hardware is becoming ever more
important on all of the major computing platforms.

In 2004, IBM introduced the zAAP
(System z Application Assist Processor), a special type of processor aimed at
Java workloads running under z/OS. Two years later, it introduced the zIIP
(System z Integrated Information Processor) which was designed to offload
specific types of data and transaction processing workloads for business
intelligence, ERP and CRM, and network encryption. In both cases, work can be
offloaded from the general-purpose processors to improve overall capacity and
significantly reduce running costs (as most mainframe customers pay according
to how much CPU they burn on their general-purpose processors). These "specialty
coprocessors" have been a critical factor in keeping the mainframe
cost-competitive with other platforms, and allow IBM to easily tweak the
overall TCO proposition for the System z platform. IBM has previewed its Smart
Analytics Optimizer blade for System z (see trend #9) and is about to release
details of the next generation of mainframe servers: we can expect the theme of
workload optimization through dedicated hardware to continue.

On the distributed computing platform,
things have taken a different turn. The GPU (graphics processing unit),
previously only of interest to CAD designers and hard-core gamers, is gradually
establishing itself as a formidable computing platform in its own right. The
capability to run hundreds or thousands of parallel processes is proving
valuable for all sorts of applications, and a new movement called CPGPU
(General-Purpose computation on Graphics Processing Units) is rapidly gaining
ground. It is very early days, but many database operations (including joins,
sorting, data visualization and spatial data access) have already been proven
and the mainframe database vendors won’t be far behind.

4. Versioned/Temporal Data

As the major relational database
technologies continue to mature, it’s getting more and more difficult to
distinguish between them on the basis of pure functionality. In that kind of environment,
it’s a real treat when a vendor comes up with a major new feature, which is
both fundamentally new and immediately useful. The temporal data capabilities
being delivered as part of DB2 10 for z/OS qualify on both counts.

Many IT systems need to keep some form
of historical information in addition to the current status for a given
business object. For example, a financial institution may need to retain the
previous addresses of a customer as well as the one they are currently living
at, and know what address applied at any given time. Previously, this would
have required the DBA and application developers to spend valuable time
creating the code and database design to support the historical perspective,
while minimizing any performance impact.

The new temporal data support in DB2
10 for z/OS provides this functionality as part of the core database engine.
All you need to do is indicate which tables/columns require temporal support,
and DB2 will automatically maintain the history whenever an update is made to
the data. Elegant SQL support allows the developer to query the database with
an "as of" date, which will return the information that was current
at the specified time.

With the ongoing focus on improving
productivity and reducing time-to-market for key new IT systems, you can expect
other databases (both IBM and non-IBM) to implement this feature sooner rather
than later.

5. The Rise of XML and Spatial Data

Most relational databases have been
able to store "unstructured" data such as photographs and scanned
images for a while now, in the form of BLOBS (Binary Large OBjects). This has
proven useful in some situations, but most businesses use specialized
applications such as IBM Content Manager to handle this information more
effectively than a general-purpose database. These kind of applications
typically do not have to perform any significant processing on the BLOB itself
– they merely store and retrieve it according to externally defined index
metadata.

In contrast, there are some kinds of
non-traditional data that need to be fully understood by the database system so
that it can be integrated with structured data and queried using the full power
of SQL. The two most powerful examples of this are XML and spatial data,
supported as special data types within the latest versions of both DB2 for z/OS
and DB2 for LUW.

More and more organizations are coming
to rely on some form of XML as the primary means of data interchange, both
internally between applications and externally when communicating with
third-parties. As the volume of critical XML business documents increases, so
too does the need to properly store and retrieve those documents alongside
other business information. DB2’s pureXML feature allows XML documents to be
stored natively in a specially designed XML data store, which sits alongside
the traditional relational engine. This is not a new feature any more, but the
trend I’ve observed is that more organizations are beginning to actually make
use of pureXML within their systems. The ability to offload some XML parsing
work to a zAAP coprocessor (see trend #3) is certainly helping.

Nearly all of our existing
applications contain a wealth of spatial data (customer addresses, supplier
locations, store locations, etc): the trouble is we’re unable to use it
properly as it’s in the form of simple text fields. The spatial abilities
within DB2 allow that data to be "geoencoded" in a separate column,
so that the full power of SQL can be unleashed. Want to know how many customers
live within a 10-mile radius of your new store? Or if a property you’re about
to insure is within a known flood plain or high crime area? All of this and
much more is possible with simple SQL queries. Again, this is not a brand new
feature but more and more organizations are beginning to see the potential and
design applications to exploit this feature.

6. Application Portability

Despite the relative maturity of the
relational database marketplace, there is still fierce competition for overall
market share between the top three vendors. IBM, Oracle and Microsoft are the
main protagonists, and each company is constantly looking for new ways to tempt
their competitor’s customers to defect. Those brave souls that undertook
migration projects in the past faced a difficult process, often entailing
significant effort and risk to port the database and associated applications to
run on the new platform. This made large-scale migrations relatively rare, even
when there were compelling cost or functionality reasons to move to another
platform.

Two trends are changing this and
making porting projects more common. The first is the rise of the packaged
ERP/CRM solution from companies such as SAP and Siebel. These applications have
been written to be largely database agnostic, with the core business logic
isolated from the underlying database by an "I/O layer". So, while
there may still be good reasons to be on a specific vendor’s database in terms
of functionality or price, the pain of moving from one to another is vastly
reduced and the process is supported by the ERP solution vendor with additional
tooling. Over 100 SAP/Oracle customers are known to have switched to DB2 during
the past 12 months for example, including huge organizations such as Coca-Cola.

The second and more recent trend is
direct support for competitor’s database APIs. DB2 for LUW version 9.7 includes
a host of new Oracle compatibility features that makes it possible to run the vast
majority of Oracle applications natively against DB2 with little or no change
required to the code. IBM has also announced the "DB2 SQL Skin" feature,
which provides similar capabilities for Sybase ASE applications to run against
DB2. With these features greatly reducing the cost and risk of changing the
application code to work with a different database, all that is left is to
physically port the database structures and data to the new platform (which is
a relatively straightforward process that is well supported by vendor tooling).
There is a huge amount of excitement about these new features and IBM is
expecting to see a significant number of Oracle customers switch to DB2 in the
coming year. I’m expecting IBM to continue to pursue this strategy by targeting
other databases such as SQL Server, and Oracle and Microsoft may well return
the favor if they begin to lose significant market share as a result.

7. Scalability and Availability

The ability to provide unparalleled
scalability and availability for DB2 databases is not new: high-end mainframe
users have been enjoying the benefits of DB2 Data Sharing and Parallel Sysplex
for more than 15 years. The shared-disk architecture and advanced optimizations
employed in this technology allow customers to run mission-critical systems
with 24×7 availability and no single point of failure, with only a minimal
performance penalty. Major increases in workload can be accommodated by adding
additional members to the data sharing group, providing an easy way to scale.

Two developments have resulted in this
making my top 10 trends list. Firstly, I’m seeing a significant number of
mainframe customers who had not previously taken advantage of data sharing
begin to take the plunge. There are various reasons for this, but we’ve
definitely moved away from the days when DB2 for z/OS data sharing customers
were a minority group huddling together at conferences and speaking a different
language to everyone else.

The second reason that this is set to
be big news over the next year is DB2 pureScale: the implementation of the same
data sharing shared-disk concepts on the DB2 for LUW platform. It’s difficult
to overstate the potential impact this could have on distributed DB2 customers
that run high volume mission critical applications. Before pureScale, those
customers had to rely on features such as HADR to provide failover support to a
separate server (which could require many seconds to take over in the event of
a failure) or go to external suppliers such as Xkoto with their Gridscale
solution (no longer an option since the company was acquired by Teradata and
the product was removed from the market). pureScale brings DB2 for LUW into the
same ballpark as DB2 for z/OS in terms of scalability and availability, and I’m
expecting a lot of customer activity in this area over the next year.

8. Stack ’em high…

For some time now, it has been
possible for organizations to take a "pick and mix" approach to their
IT infrastructure, selecting the best hardware, operating system, database and
even packaged application for their needs. This allowed IT staff to concentrate
on building skills and experience in specific vendor’s products, thereby
reducing support costs.

Recent acquisitions have begun to put
this environment under threat. Oracle’s previous purchase of ERP vendors such
as Peoplesoft, Siebel and JD Edwards had already resulted in big pressure to
use Oracle as the back-end database for those applications (even if DB2 and
other databases are still officially supported). That reinforced SAP’s alliance
with IBM and the push to run their applications on DB2 (again, other databases
are supported but not encouraged).

Two acquisitions during the past 12
months have further eroded the "mix and match" approach, and started
a trend towards single-vendor end-to-end solution "stacks" comprising
hardware, OS, database and application. The first and most significant of these
was Oracle’s acquisition of Sun Microsystems in January 2010. This gave the
company access to Sun’s well-respected server technology and the Solaris OS
that runs on it. At a single stroke, Oracle was able to offer potential
customers a completely integrated hardware/software/application stack.

The jury is still out on the potential
impact of the second acquisition: SAP’s
purchase of Sybase in May 2010. Although the official SAP position is
that the Sybase technology has been purchased for the enhanced mobile and
in-memory computing technologies that Sybase will bring, there is the
possibility that SAP will choose to integrate the Sybase database technology
into the SAP product. That will still leave them dependent on other vendors
such as IBM for the hardware and operating system, but it would be a major step
forward in any integration strategy they may have.

Older readers of this article may see
some startling similarities to the bad old days of vendor lock-in prevalent in
the 1970s and 1980s. IBM’s strategy to support other vendor’s database APIs
(see trend # 6) is in direct contrast to this, and it will be interesting to
see how far customers are willing to go down the single vendor route.

9. BI on the Mainframe

The concept of running Business
Intelligence applications on the mainframe is not new: DB2 was originally
marketed as a back-end decision support application for IMS databases. The
ability to build a warehouse within the same environment as your operational
data resides (and thereby avoid the expensive and time-consuming process of
moving that data to another platform for analysis) is attractive to many
customers.

IBM is making significant efforts to
make this an attractive proposition for more of their mainframe customers. The
Cognos tools have been available for zLinux for a couple of years now, and the
DB2 for z/OS development team have been steadily adding BI-related functions to
the core database engine for years. Significant portions of a typical BI
workload can also be offloaded to a zIIP coprocessor (see trend # 3), reducing
the CPU costs.

More recently, IBM unveiled its Smart
Analytics System 9600 – an integrated, workload balanced bundle of hardware,
software and services based on System z and DB2 for z/OS. It has also begun to
talk about the Smart Analytics Optimizer – a high performance appliance-like
blade for System z capable of handling intensive BI query workloads with
minimal impact to CPU.

IBM is serious about BI on the
mainframe, and is building an increasingly compelling cost and functionality
case to support it.

10. Data Governance

Ensuring that sensitive data is
properly secured and audited has always been a concern, but this has received
more attention in recent years due to legislation such as Sarbanes-Oxley, HIPAA
and others. At the same time, there has been an increasing focus on data
quality: bad data can result in bad business decisions, which no one can afford
in today’s competitive markets. There has also been an increasing awareness of
data as both an asset and a potential liability, making archiving and lifecycle
management more important.

All of these disciplines and more and
beginning to come together under the general heading of data governance. As our
database systems get smarter and more self-managing, database professionals are
increasingly morphing from data administrators to data governors. A new
generation of tools is being rolled out to help, including Infosphere
Information Analyser, Guardium and the Optim data management products.