New Paradigms to Overcome the Data Bottleneck in Next-Generation Transactional Applications

By Sumit Kundu


The effectiveness of meeting the needs of the next-generation transactional applications will depend on the ability to support the memory-based, data virtualization layer and the level of integration of this layer with the backend systems. Read on to learn the challenges with traditional transactional systems.

The Flash Crash on May 6, 2010 may have been caused by a
trader’s fat fingers, or perhaps the real reason may never be known. But it
highlights the need for more effective pre-trade compliance checks. With
algorithmic trading, decimalization of stock quotes and other factors, trading
volume and frequency has increased by many folds today. The underlying data
management infrastructure that handles fraud and compliance checks needs to be
able to keep up with current levels of trading.

In today’s post-crisis financial markets, regulation has
emerged as an important driver [1]. In this climate, reducing risk is a key
business imperative and data management infrastructure that supports that will
be critical. This support may involve providing the risk desk a complete
position of the intra-day trades and avoid the situation where risk
calculations at the end of a day spill over to the following day’s trading
window. A more ideal situation would be to support a real-time view of all the
trades that every desk in the company makes at any instant to understand the
enterprise-wide risk at any time and take immediate actions.

Challenges with Traditional Transactional Systems

In general, new classes of applications need data to be
served to a larger number of users with lower response times or perform
temporary operations, which makes accessing and storing data with traditional
transactional systems non-ideal for such applications. Such applications can be
found in a variety of markets:

  • Capital markets – Applications that access reference data,
    perform pre-trade compliance checks, order matching, pricing and risk calculations
  • E-commerce – Applications needing personalized content for better
    online experience, temporary storage of page views, and real-time inventory
    views
  • Telco – Real-time billing and pricing applications, location
    register for mobile networks
  • Mobile Marketing – Context aware applications for more targeted
    marketing using handheld devices

These situations point to the need for high performance and
scalable data management infrastructure. High frequency transactions create a
bottleneck at the data server layer due a number of factors. One constraint to
increased performance of traditional OLTP systems is the reliance on the disk
system, and the optimizations that are done to reduce the impact of disk I/O do
not go far enough. Another constraint with traditional RDBMS systems is data
contention, and horizontal scaling techniques, such as sharding, have to be
employed to ensure that performance does not degrade as more users are thrown
in.

With the increase in web-based applications and growth in
e-commerce, there is a critical need to manage session information, which can
also hold contextual data that is accessed frequently. Consider the world of a
frequent airline traveler with a mobile device as the preferred interface to
the digital world. An airline application could detect when the traveler has
missed his or her connecting flight and provide information on alternate
connecting flights or local hotels instantly when the traveler lands at a
connecting airport. In such situations, much of the computations involve
working with temporal data and going to the database and retrieving such
information adds overhead that makes such systems unable to scale and meet the
performance requirements. Further, as the state information changes with a user
accessing newer sites, the performance problem compounds due to the writes
involved.

In the airline example above, one can expect a spike in
workload when flights are cancelled due to bad weather or a terrorist threat. A
traditional RDBMS may not be able to support the level of user scalability
required in such cases. Traditional OLTP systems strictly adhere to ACID
(atomicity, consistency, isolation, durability) properties. While this makes
life easier for application developers, the need to achieve consistency across
partitioned databases limits the scaling that can be achieved. In developing
web applications with partitioned data, some of the consistency requirements
can be sacrificed to achieve scaling [2]. By following the BASE (basically
available, soft state, eventually consistent) paradigm, web applications can
achieve a higher degree of scaling by knowing that data consistency can be in a
state of flux. However, this imposes on the application developer a level of
understanding of the operations within a transaction to take advantage of this
paradigm.

In-memory to the Rescue

But developers accustomed to developing traditional transactional
applications may insist on the ACID properties being there. So what can be done?
Fortunately, there are a few alternate paradigms that could be considered.

First, to avoid the slow disk-subsystem, we can consider in-memory
database systems, which can provide lower latency and increased throughput. If data
is temporal and does not need to be saved, then increased performance can be
gained without the overhead of saving data to disk [3]. But even when
durability is required, applications can gain performance where workloads are
compute intensive or involve mostly read transactions. Pre-trade compliance
checks is a good example where in-memory databases can provide a significant performance
boost. The airline traveler or the user of the e-commerce application would
also experience faster response times from such a solution. A key value
proposition here is being able to boost the performance by the flip of a switch
without requiring any changes to an existing application.

An alternate way to achieve lower latency and increased
throughput while maintaining ACID is to take advantage of distributed caching platforms
(DCP). Such systems are memory-based and lie at the mid-tier level, beyond the
boundaries of the traditional RDBMS. There are several advantages to this
approach. First, by moving the data closer to the application, we can avoid the
overhead of multiple hops to retrieve the data and significantly improve the
response times. Second, we can enable new application development paradigms,
such as object-oriented programs developed using a Java framework, by avoiding
the programming and translation overheads between object and relational models.
Finally, by avoiding the bottleneck at the RDBMS layer, distributed caching
platforms can provide elastic scaling that is a key value proposition for cloud
computing.

The lookup of information on hotels nearby a certain airport
in the frequent airline traveler example above can be done more efficiently
using DCPs. Commonly referenced data can be stored in the cache once and
referenced by different keys using different indexes from different processes,
saving the need to do the lookup in the database for each process. Further,
DCPs can be used to distribute application cached data across multiple
processes and make application development easier where such distributed
architecture is needed. While the RDBMS can continue to be the system of record,
DCPs complement by being the synchronization layer for transaction control.

Data Virtualization – Enabling Next-generation Applications

The march towards real-time and need for low-latency
response will be felt across various types of applications. To satisfy this
need, there will be increasing reliance on use of in-memory systems. Instead
of ad-hoc measures, next-generation application development will be enabled by
the concept of a data virtualization layer that has the following attributes:

  • A conceptual layer in the data management stack that presents an
    instance of the data to the consumer without identifying the source. The
    physical data that is the system of record continues to reside in disk-based
    and other permanent storage systems
  • 100% memory based and incorporates a continuum of distributed
    caching platform solution to in-memory database systems
  • Supports producers and consumers across the enterprise and over a
    WAN. It should not matter where a producer or consumer is located, but at any
    instance, a piece of data will appear the same to any entity interfacing with
    the layer
  • Support a common interface to an application working with objects
    or using the relational paradigm

In general, most enterprise applications will rely on a
disk-based RDBMS as a system of record at the backend, but the effectiveness of
meeting the needs of the next-generation transactional applications will depend
on the ability to support the memory-based, data virtualization layer and the
level of integration of this layer with the backend systems. In this regard, it
will be critical for a data virtualization solution to consider optimizations
on loading data, cache invalidation and synchronization techniques, among other
strategies.

About the Author

Sumit Kundu

Senior Director of Product
Management, Sybase

Sumit Kundu is a senior director of product management
responsible for Sybase ASE and other data management products. Sumit’s
interests lie in working at the intersection of business and technology. He is
currently involved in leading a project to drive cross product synergies by
bringing together various data management products into a single platform to
address the business and technology needs of Sybase customers.

 

[1] Melanie Rodier, Wall Street Technology, Data Management
a Top Priority for Wall Street Firms, June 9, 2010

[2] Dan Pritchett, ACM Queue, May/June 2008

[3] Peter Dobler, Sybase ASE 15.5 – The Need for Speed,

Latest Articles