Global distribution of updatable data stores is one of the more challenging goals that software designers and architects have been struggling with over the years. Addressing this challenge has become increasingly pressing as the rapid growth of cloud technologies increased the need for deployments capable of spanning multiple geographical areas. Azure Cosmos DB facilitates this need by offering built-in global distribution functionality with a single write region and multiple, automatically synchronized read regions. In addition, the introduction of multi-master support promises to significantly change the approach to building highly available and performant data stores. In this article, we will provide an introduction to global distribution-related functionality in Cosmos DB.
Traditionally, the majority of database management systems limited the scope of their geo-replication to cross-region disaster recovery. Azure Cosmos DB offers this capability as part of its core functionality. This eliminates the need for developing custom replication mechanisms, while at the same time, ensures that the resulting implementation provides high availability (99.999% for multi-region database accounts) and complies with Cosmos DB performance guarantees. These guarantees stipulate read latency within the 10 ms range for the 99 percentile of designated database operations (including reads and indexed writes) in accordance with the customer-designated consistency level. Similarly, throughput guarantees apply consistently across all replicas in each region.
In addition, since Cosmos DB is considered a so-called foundational service, customers can be assured of its availability in every existing and newly provisioned Azure region. Furthermore, there is no limit on the number of regions with which you can associate a single Azure Cosmos DB account (although obviously, there are pricing implications that we will describe later in this article). Newly designated regions become available for reads within 30 minutes for accounts within the 100 TB range.
Cosmos DB also supports policy-based geo-fencing, facilitating compliance with data governance and residency requirements of its customers. By leveraging these policies, it is possible, for example, to limit the scope of a global distribution to specific regions within national clouds (Azure China, Azure Germany, or Azure Government). This also accounts for scenarios where customers must comply with government tax laws (such as those applicable to business entities utilizing Australia-based regions).
You can enable global distribution in a few easy steps directly from the Azure portal (alternatively, you have the option of configuring global distribution by using REST API). Once you select the Replicate data globally from the Cosmos DB account blade, you will be presented with the interface from which you can select additional locations that will be designated as read regions. Designating additional regions not only minimizes latency of reads originating from users and applications in their proximity, but also allows you to implement disaster recovery by configuring either manual or automatic failover. Automatic failover, configurable from the same interface, is based on the priorities you assign to each read region. There are two main automatic failover scenarios:
- a failure of a read region – the failed read region is placed in the offline state. Data access tier applications that use Cosmos DB SDK rely on a discovery protocol to determine whether the region is unavailable and if so, automatically fail over to the next available region on the preferred region list they maintain. Once the failed region is restored to the operational state, the platform automatically recovers the local replica of the Cosmos DB account, resynchronizes its content with the write region, and restores its read region status with the original priority. Data access tier applications detect the change and perform subsequent reads from the newly recovered region.
- a failure of the write region – the write region is placed in the offline state and its role is automatically transitioned to the read region with the highest priority (the lowest priority value). Data access tier applications that use Cosmos DB SDK can programmatically (by using the WriteEndpoint property of the DocumentClient class) detect the change and switch to the newly designated write region. Once the failed region is restored to the operational state, the platform automatically recovers the local replica of the Cosmos DB account and configures it as a read region. Any local data that has not been replicated to the read regions during automatic failover is published in the form of a conflict feed. It is the customers’ responsibility to remediate any pending conflicts by analyzing its content. Customers also have the option of failing back to the original configuration by performing a manual failover.
Note that Cosmos DB provides an upper bound that dictates maximum duration of the automatic failover. This, in turn, helps you to determine the Recovery Point Objective during a regional failover.
Currently, in order to facilitate multiple write region for applications that rely on Cosmos DB as their data store, you have to partition your data into separate database accounts (obviously this workaround relies on designing and implementing a suitable partitioning schema). For an example illustrating this approach, refer to Cosmos DB online documentation. However, there is a promise of a solution to this challenge in the form of the Cosmos DB multi-master capability (currently in preview), which will provide support for multiple write regions with globally distributed containers within the same account. For an overview of this functionality, refer to the Microsoft Docs article Multi-master at global scale with Azure Cosmos DB. We will describe its implementation in our upcoming articles.
Last, but not least, when considering global distribution, you need to take into account its pricing. Charges include cost of additional storage allocated to each Cosmos DB container and the corresponding provisioned throughput in each region. Effectively, costs increase linearly with each additional region.