dcsimg

New Paradigms to Overcome the Data Bottleneck in Next-Generation Transactional Applications

September 9, 2010

By Sumit Kundu

The effectiveness of meeting the needs of the next-generation transactional applications will depend on the ability to support the memory-based, data virtualization layer and the level of integration of this layer with the backend systems. Read on to learn the challenges with traditional transactional systems.

The Flash Crash on May 6, 2010 may have been caused by a trader’s fat fingers, or perhaps the real reason may never be known. But it highlights the need for more effective pre-trade compliance checks. With algorithmic trading, decimalization of stock quotes and other factors, trading volume and frequency has increased by many folds today. The underlying data management infrastructure that handles fraud and compliance checks needs to be able to keep up with current levels of trading.

In today’s post-crisis financial markets, regulation has emerged as an important driver [1]. In this climate, reducing risk is a key business imperative and data management infrastructure that supports that will be critical. This support may involve providing the risk desk a complete position of the intra-day trades and avoid the situation where risk calculations at the end of a day spill over to the following day’s trading window. A more ideal situation would be to support a real-time view of all the trades that every desk in the company makes at any instant to understand the enterprise-wide risk at any time and take immediate actions.

Challenges with Traditional Transactional Systems

In general, new classes of applications need data to be served to a larger number of users with lower response times or perform temporary operations, which makes accessing and storing data with traditional transactional systems non-ideal for such applications. Such applications can be found in a variety of markets:

  • Capital markets – Applications that access reference data, perform pre-trade compliance checks, order matching, pricing and risk calculations
  • E-commerce – Applications needing personalized content for better online experience, temporary storage of page views, and real-time inventory views
  • Telco – Real-time billing and pricing applications, location register for mobile networks
  • Mobile Marketing – Context aware applications for more targeted marketing using handheld devices

These situations point to the need for high performance and scalable data management infrastructure. High frequency transactions create a bottleneck at the data server layer due a number of factors. One constraint to increased performance of traditional OLTP systems is the reliance on the disk system, and the optimizations that are done to reduce the impact of disk I/O do not go far enough. Another constraint with traditional RDBMS systems is data contention, and horizontal scaling techniques, such as sharding, have to be employed to ensure that performance does not degrade as more users are thrown in.

With the increase in web-based applications and growth in e-commerce, there is a critical need to manage session information, which can also hold contextual data that is accessed frequently. Consider the world of a frequent airline traveler with a mobile device as the preferred interface to the digital world. An airline application could detect when the traveler has missed his or her connecting flight and provide information on alternate connecting flights or local hotels instantly when the traveler lands at a connecting airport. In such situations, much of the computations involve working with temporal data and going to the database and retrieving such information adds overhead that makes such systems unable to scale and meet the performance requirements. Further, as the state information changes with a user accessing newer sites, the performance problem compounds due to the writes involved.

In the airline example above, one can expect a spike in workload when flights are cancelled due to bad weather or a terrorist threat. A traditional RDBMS may not be able to support the level of user scalability required in such cases. Traditional OLTP systems strictly adhere to ACID (atomicity, consistency, isolation, durability) properties. While this makes life easier for application developers, the need to achieve consistency across partitioned databases limits the scaling that can be achieved. In developing web applications with partitioned data, some of the consistency requirements can be sacrificed to achieve scaling [2]. By following the BASE (basically available, soft state, eventually consistent) paradigm, web applications can achieve a higher degree of scaling by knowing that data consistency can be in a state of flux. However, this imposes on the application developer a level of understanding of the operations within a transaction to take advantage of this paradigm.

In-memory to the Rescue

But developers accustomed to developing traditional transactional applications may insist on the ACID properties being there. So what can be done? Fortunately, there are a few alternate paradigms that could be considered.

First, to avoid the slow disk-subsystem, we can consider in-memory database systems, which can provide lower latency and increased throughput. If data is temporal and does not need to be saved, then increased performance can be gained without the overhead of saving data to disk [3]. But even when durability is required, applications can gain performance where workloads are compute intensive or involve mostly read transactions. Pre-trade compliance checks is a good example where in-memory databases can provide a significant performance boost. The airline traveler or the user of the e-commerce application would also experience faster response times from such a solution. A key value proposition here is being able to boost the performance by the flip of a switch without requiring any changes to an existing application.

An alternate way to achieve lower latency and increased throughput while maintaining ACID is to take advantage of distributed caching platforms (DCP). Such systems are memory-based and lie at the mid-tier level, beyond the boundaries of the traditional RDBMS. There are several advantages to this approach. First, by moving the data closer to the application, we can avoid the overhead of multiple hops to retrieve the data and significantly improve the response times. Second, we can enable new application development paradigms, such as object-oriented programs developed using a Java framework, by avoiding the programming and translation overheads between object and relational models. Finally, by avoiding the bottleneck at the RDBMS layer, distributed caching platforms can provide elastic scaling that is a key value proposition for cloud computing.

The lookup of information on hotels nearby a certain airport in the frequent airline traveler example above can be done more efficiently using DCPs. Commonly referenced data can be stored in the cache once and referenced by different keys using different indexes from different processes, saving the need to do the lookup in the database for each process. Further, DCPs can be used to distribute application cached data across multiple processes and make application development easier where such distributed architecture is needed. While the RDBMS can continue to be the system of record, DCPs complement by being the synchronization layer for transaction control.

Data Virtualization – Enabling Next-generation Applications

The march towards real-time and need for low-latency response will be felt across various types of applications. To satisfy this need, there will be increasing reliance on use of in-memory systems. Instead of ad-hoc measures, next-generation application development will be enabled by the concept of a data virtualization layer that has the following attributes:

  • A conceptual layer in the data management stack that presents an instance of the data to the consumer without identifying the source. The physical data that is the system of record continues to reside in disk-based and other permanent storage systems
  • 100% memory based and incorporates a continuum of distributed caching platform solution to in-memory database systems
  • Supports producers and consumers across the enterprise and over a WAN. It should not matter where a producer or consumer is located, but at any instance, a piece of data will appear the same to any entity interfacing with the layer
  • Support a common interface to an application working with objects or using the relational paradigm

In general, most enterprise applications will rely on a disk-based RDBMS as a system of record at the backend, but the effectiveness of meeting the needs of the next-generation transactional applications will depend on the ability to support the memory-based, data virtualization layer and the level of integration of this layer with the backend systems. In this regard, it will be critical for a data virtualization solution to consider optimizations on loading data, cache invalidation and synchronization techniques, among other strategies.

About the Author

Sumit Kundu

Senior Director of Product Management, Sybase

Sumit Kundu is a senior director of product management responsible for Sybase ASE and other data management products. Sumit's interests lie in working at the intersection of business and technology. He is currently involved in leading a project to drive cross product synergies by bringing together various data management products into a single platform to address the business and technology needs of Sybase customers.

 

[1] Melanie Rodier, Wall Street Technology, Data Management a Top Priority for Wall Street Firms, June 9, 2010

[2] Dan Pritchett, ACM Queue, May/June 2008

[3] Peter Dobler, Sybase ASE 15.5 – The Need for Speed,








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers