SQL Server 2005 Part 3 - High Availability and Scalability Enhancements - Failover Clustering
December 10, 2004
In the previous article of this series, we started reviewing high availability and scalability enhancements in SQL Server 2005 (based on its recent Beta 2 release) by discussing database mirroring. In this article, we will continue exploring the same area of functionality, shifting our attention to failover clustering. We will also provide comparison of both features, which should help in recognizing the types of scenarios for which each of them is better suited.
Failover clustering has been supported since SQL Server 7.0 so it is not surprising that its latest implementation is a combination of already familiar (and stable) technology with a number of useful improvements. As before, the solution is based on Microsoft Cluster Services (MSCS), which, starting with Windows 2000 Server became an integral part of Advanced (or Enterprise, in case of Windows 2003 Server) and DataCenter editions. In essence, clustering provides high-availability by combining several physical servers (referred to as nodes) into a single virtual unit, typically by connecting them to a shared set of disk devices. Disks are attached to each server via Fibre Channel or SCSI links (although iSCSI-based configurations become increasingly popular), but access to them is arbitrated so no two servers are permitted to write to (or read from) the same disks simultaneously (in order to prevent data corruption). In the most common scenario, all servers in a cluster are configured the same way (from both a hardware and software perspective), which makes them capable of running the same applications. Clustered software must be installed by running the setup program in a special manner, which allows the application state to be easily transferred across cluster members. This is typically accomplished by separating application binaries, placed on local drives of each individual node, from shared data (such as clustered databases and their transaction logs) residing on shared disks (which implies, by the way, that both local and shared logical drives should be assigned the same drive letters on each node). There is only a single instance of a particular application running at any given time within a cluster (although you can have multiple, distinct instances of different applications operating in parallel). The burden of executing the processes of any of clustered application is handled only by one of the nodes (however, in the case of several clustered applications, they are typically distributed evenly across all nodes, in order to take full advantage of processing resources). If a server hosting the application instance fails, processing tasks are automatically transferred to one of remaining cluster members, which is also given exclusive access to the shared disk hosting this application data. Except for a short failover period, the application continues running, without data loss.
Application operational requirements are satisfied by creating virtual cluster resources, such as disks, IP addresses, network names, services, file shares, etc. (the complete list is much longer). Each set of related resources (typically the ones associated with the same application) is placed into its own resource group, which constitutes a failover unit; (all resources within it are typically interdependent and they all need to failover together in order for a corresponding application to remain functional). If a resource group is accessible as a network entity (which typically implies that it must have at least an IP address, a network name, and a disk among its resources), then it is known as a virtual server. An example of a virtual server could be a single SQL Server 2005 instance, with its associated name, IP address, disks where databases and transaction logs are stored, and services, such as SQL Server, SQL Server Agent, and Full-Text Search (FTS) Service.
The number of nodes supported in the clustered configuration of SQL Server 2005 is dependent on the operating system limits. You can have up to 8 cluster members when using 32-bit version of Windows 2003 Server Datacenter Edition (up to 4 with 64-bit version) and up to 4 cluster members with Windows 2003 Server Enterprise Edition (Windows 2000 Advanced and Datacenter Server SP4 support up to 2 and 4 nodes in a cluster, respectively). Note, that hardware requirements in this case are dictated largely by clustering technology (rather then by SQL Server 2005-specific demands). In order for the installation to be supported by Microsoft, underlying hardware must comply with Windows 2000 Hardware Compatibility List or with the Microsoft Windows 2003 Catalog, (you need to ensure that the hardware is listed in the cluster solution category, since compliance of its individual components does not suffice).
There are some caveats pertinent to setting up SQL Server 2005 on a cluster (besides the already described issues, which apply to operating any cluster-based application, such as satisfying physical and operating system requirements or creating and configuring resources and virtual servers). First, you need to install and configure the Microsoft Distributed Transaction Coordinator (MSDTC) component (responsible for managing distributed queries, two-phase commit transactions, and some of the replication tasks). The steps necessary to accomplish this task on Windows 2003 platform are documented in the Microsoft Knowledge Base Article Q301600 (and the Windows 2000-based procedure is described in the article Q290624. Note that it is recommended to place MSDTC and its associated resources in their own resource group. Before launching the clustered installation of SQL Server 2005, ensure that you are logged on with an account that is not only a member of the local Administrators group on each cluster node, but also has rights to log on as a service and act as a part of operating system. The setup program is able to detect that it is running on a clustered server and indicates this by enabling "Install as virtual server" options for SQL Server and Analysis Services components on the "Components to Install" page of the SQL Server Installation Wizard (make sure that you select these options when installing a clustered instance). When specifying target disks where database and transaction logs will be placed, choose them from the available cluster groups (you need to pick the ones where appropriate disk resources are located). Remember that each instance name must be unique across the entire cluster. Ensure also that the MSCS Cluster Service account is a member of SysAdmin role on clustered instances of SQL Server 2005 (this requirement is satisfied by default because SysAdmin includes BUILTIN/Administrators group). Finally, there are additional considerations that you should keep in mind when incorporating clustered SQL Server 2005 installations as part of your replication environment. In such cases, it is important (from the availability perspective) to create a clustered file share resource and designate it as the default snapshot folder for clustered Distributor or as an alternate snapshot location for individual publications (this is done as part of the publication definition from the Publication Properties dialog box). This provides Subscribers access to snapshots in case of individual server failures.
Scalability of clustered SQL Server is limited due to a couple of factors. First, there is only a single copy of each instance of SQL Server, with its data residing on shared disks (this also has availability implications - even though the disks are typically highly redundant and reliable, failure of a storage unit hosting them results in application downtime). In addition, in your design, you need to take into consideration the worst-case scenario, in which the majority of servers fail. In this case, the remaining ones must be capable of handling the increased level of utilization, which under normal conditions is evenly distributed across the entire cluster. These factors, combined with the failover strategy, affect the number of cluster nodes and their configuration (it is possible to designate specific nodes as potential or preferred owners of particular resource groups, while prohibiting others from participating in their failover).
While the goal of database mirroring and failover clustering is to provide high availability through automatic failover without data loss, their characteristics (and, consequently, the manner in which they accomplish this goal) are very different. When deciding on which one better suits your requirements, you should consider the following factors:
In the next article of our series, we will continue our discussion on high availability and scalability features of SQL Server 2005, such as online restore and indexing.