The majority of Azure SQL Database related features reach their General Availability (GA) stage relatively quickly (even when measured in the Azure hyper-accelerated terms). There are however, some exceptions. The most prominent example in this category is likely Azure SQL Data Sync, which has remained in Preview since its introduction 7 years ago. Fortunately, there are signs that this service might be finally reaching production-ready state. In our article, we will introduce its main characteristics.
Azure SQL Data Sync is an Azure-resident service based on Microsoft Sync Framework specifications. (For more information regarding these specifications, refer to Microsoft Developer Network). Its primary purpose is to provide synchronization functionality across multiple database instances, with support for both SQL Server (residing on-premises or in the cloud) and Azure SQL Database. All databases within the scope of synchronization are members of the same Sync Group, with a SQL Database instance serving the role of the hub and all the remaining ones (referred to as Reference databases) forming spokes in the synchronization topology. The hub also provides the Data Sync Service responsible for coordinating synchronization activities. Each Sync Group is associated with a specific Azure region, which determines the location of the hub and the SQL Data Sync database, containing sync metadata and handling sync activities. You have the option of triggering synchronization on-demand or scheduling it to take place in regular intervals (which duration can range from five minutes to one month). Data flow can be bidirectional or unidirectional (in either direction), allowing you to perform updates of the hub based on changes originating from selected spokes and keep spokes in sync with the hub. As part of the setup, you specify the database, its tables, and columns that you want to synchronize, as well as any filters that should apply to restrict the range of data rows in scope. Conflict resolution is handled via policies. With the Hub wins policy in place, changes to the hub take precedence over updates to equivalent records in spokes. The Client wins policy yields the opposite behavior, resulting in the most recent change in any of the reference databases propagating to all other members of the same sync group.
While support for Azure SQL Data Sync is incorporated directly into Azure SQL Database, if you intend to facilitate the synchronization process across SQL Server instances, you need to deploy Azure SQL Data Sync Client Agent either directly to their host operating system or to Windows systems that have direct connectivity to these instances. The agent installation has a number of prerequisites, including .NET Framework 4.0, Microsoft SQL Server 2008 R2 SP1 System CLR Types (x86), and Microsoft SQL Server 2008 R2 SP1 Shared Management Objects (x86). Data Sync client operates as a Windows service and provides graphical interface for registering SQL Servers with Azure-resident sync groups. The user account in which security context the agent operates must have sufficient permissions to the reference database. For the initial sync setup, these permissions include Create and Alter Table, Alter Database, Create Procedure, Select and Alter Schema, as well as Create User Defined Type. For ongoing operations, you have the option of modifying these permissions by limiting them to Select, Insert, Update, and Delete on tables containing data to be synchronized as well as tables containing sync metadata and tracking tables, Execute on stored procedures created by the sync service, and Execute on user defined table types. During the installation, the agent creates a set of tables in the reference database that are subsequently used to configure and maintain the synchronization process. Once the installation completes, you need to register each SQL Server instance with the client agent. The registration requires the knowledge of the access key, which uniquely identifies the corresponding sync agent.
Besides its long-lasting Preview status, Azure SQL Data Sync is also one of the few remaining services that have not been transitioned to the Azure portal. It is also not accessible via Windows PowerShell or REST API. Instead, if you want to interact with it, you have to resort to using the Azure classic portal.
In addition, note that in its current rendition, Azure SQL Data Sync is subject to a number of limitations that affect primarily its scalability. For example, the maximum number of reference databases across all sync groups cannot exceed 30, with no more than 500 synchronized tables. The number of on-premises reference databases within a single sync group is restricted to 5. There is also a limit of up to 12 filters per table (although it is possible to add an extra filter on the primary column). Interestingly, this limit is not enforced via graphical interface in the Azure classic portal, however, any attempt to provision a database with a larger number of filters will result in a failure. SQL Azure Data Sync also places constraints on data types of synchronized content, which preclude, for example, support for user defined or Common Language Runtime-based data types. These constraints result from the Sync Framework specifications that SQL Azure Data Sync is based on, rather than the limitations of SQL Server and Azure SQL Database that SQL Azure Data Sync interacts with.
According to a recent update on the Azure SQL Database blog, Azure SQL Data Sync will become accessible via the Azure portal, along with support for Windows PowerShell and REST API. In addition, its availability will be extended globally, far beyond the current offering, including even Azure China, Azure Germany, and the Azure Government cloud.
This concludes our introduction to Azure SQL Data Sync. We will explore this topic in more detail in upcoming articles published on this forum.