The Guiding Principles for Cloud-scale, Geo-distributed Databases

Today’s world — the digital economy — runs in the cloud. Most consumers, including many business executives, don’t know much about the inner workings of the cloud or its architecture even though they expect a lot from it.

We expect clouds — private, public and hybrid — to perform and deliver information quickly, even if the data centers are scattered around the world. We expect the cloud to be up and running, even if a hurricane or major power outage hits a multi-state area. We expect the data and transactions processed in the cloud to be secure, consistent and durable, especially financial transactions and records. Some governments are also mandating that records accessible via the cloud only reside in-country when on-disk.

These expectations dictate better cloud architectures and applications. These demands also drive the need for databases that scale in the cloud, can be geographically distributed, scale “horizontally” to quickly add compute power without being expensive and have transactional integrity that is common to relational databases. These drivers have given rise to geo-distributed databases.

Briefly, a geo-distributed database is a database spread across two or more geographically distinct locations and runs without degraded transaction performance.

But beyond this base definition of a geo-distributed database, what should business leaders, information technology (IT) managers and others involved with migrating systems to the cloud know about geo-distributed databases?

Here are three key considerations to keep in mind about geographically distributed databases that support cloud-scale systems.

Horizontally Scalable 

To easily accommodate compute-load spikes (or declines), a cloud database must be able to quickly scale horizontally. That way you can add commodity servers to handle more capacity without having to shut down and migrate workloads. Past computing architectures were designed to scale vertically — to increase capacity, add a bigger, more powerful server, even if that meant some downtime. Relational databases can sometimes scale horizontally, but the amount is often limited and it typically means more software, more administration, and costly hardware.

Clouds are typically built using low-cost servers, so when more capacity is needed, you just add more of those commodity boxes to create more capacity. This is true both with public and private cloud architectures. Naturally, this makes it easy to provision more hardware.

Transactional Consistency

Cloud databases also should be transactionally consistent. If a database has been sliced up or “partitioned” as part of its geo-distribution, it is more challenging for cloud applications to maintain transactional integrity.

A better way is to always maintain transactional consistency with a database architecture that separates transaction management from data storage and presents what in effect is a single logical database for applications, even when geographically distributed. Most relational databases are transactionally consistent, which is often referred to as “ACID” compliance, for “atomicity, consistency, isolation & durability”. These properties guarantee that transactions are processed reliably. So two questions to keep in mind with a cloud database are — can it be geographically distributed and is it ACID compliant?

Data Residency

Another benefit of a database that looks like one logical database even when it is geographically distributed is that data residency can be established at a specific location. This property supports compliance with data sovereignty laws cropping up in some countries that mandate that data about a citizen should remain in-country, even if the person or other authorized users can access the data via the cloud outside that country. When you start slicing and dicing a database architecture to support geo-distribution, data residency can be harder to achieve.

We know that today’s global, digital economy requires cloud-scale systems that can be geo-distributed. Keep these three considerations in mind when thinking about the pitfalls and capabilities you need in that world.

In conclusion, if you are looking to move your enterprise application to the cloud, look for a transactional SQL database that will run simultaneously in various global locations, deliver information in milliseconds and provide real-time transactional consistency – no matter where the user is located.

Charles Lamb

Charles Lamb
Chief Architect

With over 30 years of experience in the database industry, Charles joined NuoDB in 2015. Since then he has spent time understanding the issues that customers face in the areas of geo-distributed data management, horizontal scalability, data residency, and cluster management, all with an eye towards future product direction of NuoDB.

Charles’ experience includes DBMS implementation and architecture, log structured storage systems, embedded and distributed key-value databases, and Hadoop. He has presented at Strata, Hadoop Summit, HPTS, Oracle OpenWorld, HPCS, and JavaOne. He is a Hadoop contributor and has authored papers for CACM.

Prior to joining NuoDB, Charles worked at Cloudera on the Transparent Data Encryption feature of the Hadoop Distributed File System (HDFS).  Before then, he was an architect and developer at Oracle for their NoSQL database, which is a horizontally scalable distributed key-value store. 

At Sleepycat (acquired by Oracle in 2006) he was co-architect of the Berkeley DB Java Edition. From 1988 to 1999, Charles was a founder at Object Design, where he helped design and implement ObjectStore, a memory-mapped object database. Prior to that, at Symbolics, he was a developer of Statice, one of the earliest object databases.

Charles has an S.B. and S.M. in Computer Science from Massachusetts Institute of Technology.

Latest Articles