In a recent article published on this forum, we have described a range of disaster recovery capabilities that facilitate business continuity of Azure SQL Database deployments. While business continuity is of paramount significance, it is important to keep in mind that high availability tends to be even more relevant. Site-wide or region-wide disasters, while undoubtedly more impactful, happen considerably less frequently than downtime resulting from scheduled maintenance events or isolated hardware and software faults. In this article, we will provide an overview of high availability features of Azure SQL Database that mitigate risks affecting services within an individual Azure datacenter, rather than an entire Azure region.
Specifics of the methodology that delivers these features depend on the service tier, which in turn ties to the purchasing model. At the time of authoring of this content, Azure SQL Database is available in the following purchasing models and service tiers (for more information regarding this topic, refer to Azure SQL Database purchasing models and resources):
- Database Transaction Unit (DTU)-based purchasing model, with the following service tiers:
- Basic
- Standard
- Premium
- vCore-based purchasing model, with the following service tiers:
- General Purpose
- Business Critical
The DTU-based purchasing model is available exclusively for Azure SQL Database databases and elastic pools. The vCore-based purchasing model is available with Azure SQL Database databases, elastic pools, and Managed Instances (note that Managed Instances are in public preview at the time of authoring of this content).
It is worth noting that regardless of the purchasing model or the service tier, Azure SQL Database is a fully managed service that is subject to the 99.99% availability Service Level Agreement (SLA). However, the choice of the pricing tier does influence failover time, which is an important factor when considering availability of mission critical workloads. This difference in failover times is a result of two distinct architectural models that service tiers are based on.
In particular, in the Basic and Standard service tiers of the DTU-based purchasing model as well as in the General-Purpose service tier of the vCore-based purchasing model, there is a separation between the compute and storage components. The compute components are implemented by using Azure Service Fabric microservices that are either stateless or deliver caching functionality. The storage for database and transaction log files is provided by Azure Premium Storage, which ensures data persistence. With this architecture, a failure of the underlying infrastructure or a maintenance event will trigger failover of the stateless compute components to another physical node, resulting in a temporary service interruption (lasting typically no more than a few seconds). This interruption might cause a brief degradation in performance of workloads utilizing the affected database.
The architectural model applicable to the Premium service tier of the DTU-based purchasing model and the Business-Critical model of the vCore-based purchasing model utilizes the compute and local Solid State Disk (SSD) storage of the same node. To ensure resiliency, this model involves forming a four-node Always On Availability Group cluster with synchronously replicating secondaries. In case of a failure of the primary node, the failover time is in the range of milliseconds. Besides practically eliminating downtime, this approach also facilitates the use of read-only secondaries for workloads that do not need to perform any writes (such as analytics or reporting). Such applications need to indicate their data access patterns by using the ApplicationIntent property within their database connection string. By following this methodology, it becomes possible to minimize the load on the primary replica of the clustered database availability group.
If high availability is your primary concern, then additionally you have the option of placing individual nodes of Premium and Business Critical Azure SQL Database deployments into separate availability zones (this functionality is in public preview at the time of authoring this article). This effectively protects you from issues that affect a single datacenter within a multi-zone Azure region. However, you should note that spanning an Always On Availability Group cluster across multiple zones has performance implications. Since the primary node replicates with secondaries synchronously, increasing distance between them makes the period of time it takes to commit individual writes longer.
This concludes our overview of high availability features of Azure SQL Database. In our upcoming articles, we will continue describing features that contribute to the value of this offering.