Introduction to Azure SQL Database Auto-Failover Groups

Azure SQL Database offers a number of resiliency features that deliver a mix of high availability and disaster recovery capabilities. The former is an inherent characteristic of most Azure Platform-as-a-Service offerings, including Azure SQL Database and Azure SQL Managed Instance. The latter was traditionally implemented by using Active Geo-Replication. That approach, however, had several limitations, such as lack of support for automatic and multi-database failover. These limitations have been addressed by Auto-Failover Groups, which will be the topic of this article.

With Active Geo-Replication, you designate up to four readable secondary databases (each residing on a separate logical server in any region supported by your Azure subscription, including the region hosting the primary database), which are automatically configured as replication partners. The read-only secondaries facilitate scenarios in which applications or services do not require write access and can tolerate minor delays in retrieving data updates. Once the replication is established, you can initiate a planned or unplanned failover between the current primary and one of the secondaries. This includes scenarios where the primary becomes unavailable. You also have the ability to configure replication of databases that are part of elastic pools, although this requires configuring and managing each database replication partnership individually.

While Active Geo-Replication is very straightforward to configure (it can be set up directly from the Azure portal by simply designating the desired location of secondary replicas), there are a few drawbacks that limit its usefulness. The first one is the lack of support for simultaneous replication and failover of multiple databases. Another is the need for manual failover, which is not surprising considering that replication is asynchronous, so switching the primary might result in data loss. That manual failover takes place across different logical servers, each with its own network endpoint, which must be accounted for when configuring client applications that require access to the target database.

These limitations have been addressed by Auto-Failover Groups, although with some tradeoffs (in particular, each group supports only a single secondary logical server, which, in addition, must be located in an Azure region different from the one hosting the primary). As their name indicates, one of the primary strengths of this technology is support for automatic failover of multiple databases based on custom groupings that you define. All databases in the same group must be part of the same logical server with the source and target servers residing in two different Azure regions. That server can belong to multiple failover groups. This allows you, for example, to place all databases in an elastic pool into the same auto-failover group, thus ensuring that all of them fail over together. As a matter of fact, adding a single database that is a part of an elastic pool on the source server will automatically result in the secondary replica being added to an elastic pool with the matching name on the target server You still need to ensure that that pool exists and has sufficient capacity to host this and any other secondary databases you decide to add to the auto-failover group.

Another distinction between Active Geo-Replication and Auto-Failover Groups is the latter’s support for read-write and read-only listener endpoints that remain the same following a failover. Read-write listener is referenced by a CNAME DNS record in the format <failover-group-name> and is automatically configured to point to the primary server of the group. Read-only listener is referenced by a CNAME DNS record in the format <failover-group-name> and is automatically configured to point to the secondary server of the group. Effectively, client applications will continue to function without any additional configuration regardless of the location of the primary or secondary replica.

As with Active Geo-Replication, the failover can be performed manually, but there is also support for automating it. Its characteristics are defined by using two policies, named automatic failover policy and read-only policy. These policies correspond to the two types of listener endpoints. The first one, known as automatic failover policy is enabled by default and controls automatic failover of the read-write listener following a failure of the primary server. Its GracePeriodWithDataLossHours parameter designates the period of time (up to 24 hours) following a failure of the primary after that the automatic failover should take place, assuming the platform detects the possibility of data loss. If that is not the case (for example, the platform detects that data plane is operational and there is no expectation of data loss), the failover is invoked immediately.

The other policy, referred to as read-only failover policy is disabled by default. If you enable it, you will have the ability to control automatic failover of the read-only listener in case of a failure of the secondary server. Keeping this policy disabled eliminates the possibility of negative impact on the primary databases in case the secondary replica group fails. On the other hand, if access to the secondary endpoint is important, you might want to consider enabling this option.

When planning for failover, you need to also consider security and access control implications. Security on the network level can be controlled by the server or database-level firewall, as well as by relying on service endpoints. In Auto-Failover Groups scenarios, you might want to consider using database-level firewall rules, which (unlike those defined on the server level) do not need to be modified or maintained, since they are defined within the replicated databases. If you use virtual network service endpoints for connectivity between front-end components running in Azure virtual machines and SQL Database instances, then keep in mind that their scope is limited to the Azure region where they reside. Such arrangement effectively precludes the ability to leverage automatic failover. Instead, you would need to activate failover of the front-end components at the same time when you manually failover the databases they need to access. From the access control perspective, you might want to take advantage of database contained users. This eliminates the need to create and maintain server-level logins and their mappings to database users.

This concludes our introduction to Azure SQL Database Auto-Failover groups. In the upcoming articles, its functionality will be explored in more detail.

# # #

See All Articles by Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles