In one of our earlier articles published on this forum, we have presented a range of capabilities that enhance business continuity of Azure SQL Database deployments. That article is, at this point, over a year old, which constitutes a rather long time when you consider the rapid rate at which cloud technologies change. Since then, Microsoft not only enhanced the existing functionality, but also introduced a number of new features that facilitate Azure SQL Database business continuity. The purpose of this article is to provide their overview.
From the business continuity standpoint, each of the current pricing tiers of Azure SQL Database still supports the same set of options, including:
- Point in Time Restore from backup – restores leverage automatic Point in Time Backups, which take place weekly (full), hourly (differential), and every five to ten minutes (transaction log). All pricing tiers provide the same restore experience and characteristics. The only distinction is the retention period, which lasts 35 days for all pricing tiers with the exception of Basic, for which the retention period is limited to 7 days.
- Geo-restore from geo-replicated backups – restores leverage built-in replication of full and differential automatic backups to another Azure region. This provides Recovery Point Objective (RPO) within a one-hour window and Estimated Recovery Time (ERT) within a 12-hour window.
- Restore from Azure Backup Vault – restores leverage the ability to schedule backups of databases to Azure Backup Vault in cases where the maximum retention period of Point in Time Backups is not sufficiently long. Since this capability is based on full backups, its RPO is one week.
- Active Geo-Replication allows you to create in arbitrarily chosen Azure regions up to four readable database replicas, which replicate asynchronously from the writable instance in a primary region. This capability offers the best ERT and RPO of 30 seconds and 5 seconds, respectively.
It is worth noting that in addition to previously available Basic, Standard, and Premium pricing tiers, which are part of the DTU-based purchasing model, customers now also have the option of choosing General Purpose and Business Critical pricing tiers, which are part of the new vCore-based purchasing model. However, this does not affect the business continuity options described above.
When using Active Geo-Replication, the processes of creating replicas and performing failover between them are quite straightforward and can be performed directly from the Azure portal. However, with the original approach, such failover requires a manual intervention. In particular, you have to designate one of the secondary replicas to take over the role of the primary. In addition, since replicas reside on different logical servers and each server has a unique endpoint, you have to also update connection strings in your applications in order account for the new writeable endpoint.
To address these limitations, Microsoft introduced auto-failover groups, which as their name indicate, facilitate automatic failover. To implement this new approach, you create a group consisting of two servers in different Azure regions, with each server containing one or more databases that you intend to include in the scope of automatic failover. A server can be a part of multiple failover groups. All primary databases in any given group must be located on the same server. Adding a database residing on the primary server to the group will trigger creation of a readable replica on the secondary server. If the primary database is part of an elastic pool, then the secondary server will place its replica into a pool with the matching name.
The failover group exposes two listeners:
- read-write listener, which is referenced by a CNAME DNS record in the format <failover-group-name>.database.windows.net. The listener is auto-configured to always point to the primary server of the group.
- read-only listener, which is referenced by a CNAME DNS record in the format <failover-group-name>.secondary.database.windows.net. The listener is auto-configured to always point to the secondary server of the group. The availability of read-only listener facilitates scenarios in which you have applications or services that rely on read-only access to SQL Database and can tolerate minor differences between the content of the primary and secondary databases, inherent to the asynchronous nature of their replication.
The group failover behavior is controlled by two policies:
- Automatic failover policy. This policy controls automatic failover of the read-write listener in case of a failure of the primary server. Its GracePeriodWithDataLossHours parameter, referenced in the Azure portal as Read/Write Grace Period (hours), designates the period of time (up to 24 hours) following a failure of the primary after which the automatic failover should take place. By default, this policy is enabled, however, considering that a failover introduces the possibility of data loss, you have the option of disabling it and controlling failovers manually. Such an approach still allows you to benefit from the capabilities offered by group listeners. In addition, when performing manual failover, you can attempt a failover with full data synchronization (also referred to as friendly failover), which eliminates the possibility of data loss. You should note that the friendly failover is also invoked during automatic failover of the primary as long as the writeable database is accessible via data plane.
- Read-only failover policy. This policy controls automatic failover of the read-only listener in case of a failure of the secondary server. By default, this policy is disabled, in order to eliminate potential negative performance impact on the primary following a failover. However, with the read-only policy disabled, the applications and services that you pointed to the read-only listener will lose database access for as long as the secondary server is unavailable. If their availability is important and you can tolerate potential decrease in performance of the primary following a failover, you have the option of enabling this policy.
This concludes our overview of enhanced business continuity functionality of Azure SQL Database. In the upcoming articles, we will cover its latest high availability features.