SQL Server 2005 Part 3 – High Availability and Scalability Enhancements – Failover Clustering

In the previous
article
of this series, we started reviewing high availability and
scalability enhancements in SQL Server 2005 (based on its recent Beta 2
release) by discussing database mirroring. In this article, we will continue
exploring the same area of functionality, shifting our attention to failover
clustering. We will also provide comparison of both features, which should help
in recognizing the types of scenarios for which each of them is better suited.

Failover clustering has been supported since SQL Server 7.0 so it is not
surprising that its latest implementation is a combination of already familiar
(and stable) technology with a number of useful improvements. As before, the
solution is based on Microsoft Cluster Services (MSCS), which, starting with
Windows 2000 Server became an integral part of Advanced (or Enterprise, in case
of Windows 2003 Server) and DataCenter editions. In essence, clustering
provides high-availability by combining several physical servers (referred to
as nodes) into a single virtual unit, typically by connecting them to a shared
set of disk devices. Disks are attached to each server via Fibre Channel or
SCSI links (although iSCSI-based configurations become increasingly popular),
but access to them is arbitrated so no two servers are permitted to write to
(or read from) the same disks simultaneously (in order to prevent data
corruption). In the most common scenario, all servers in a cluster are
configured the same way (from both a hardware and software perspective), which
makes them capable of running the same applications. Clustered software must be
installed by running the setup program in a special manner, which allows the
application state to be easily transferred across cluster members. This is
typically accomplished by separating application binaries, placed on local
drives of each individual node, from shared data (such as clustered databases
and their transaction logs) residing on shared disks (which implies, by the
way, that both local and shared logical drives should be assigned the same
drive letters on each node). There is only a single instance of a particular
application running at any given time within a cluster (although you can have multiple,
distinct instances of different applications operating in parallel). The burden
of executing the processes of any of clustered application is handled only by
one of the nodes (however, in the case of several clustered applications, they
are typically distributed evenly across all nodes, in order to take full
advantage of processing resources). If a server hosting the application instance
fails, processing tasks are automatically transferred to one of remaining
cluster members, which is also given exclusive access to the shared disk
hosting this application data. Except for a short failover period, the
application continues running, without data loss.

Application operational requirements are satisfied by creating virtual
cluster resources, such as disks, IP addresses, network names, services, file
shares, etc. (the complete list is much longer). Each set of related resources
(typically the ones associated with the same application) is placed into its
own resource group, which constitutes a failover unit; (all resources within it
are typically interdependent and they all need to failover together in order
for a corresponding application to remain functional). If a resource group is
accessible as a network entity (which typically implies that it must have at
least an IP address, a network name, and a disk among its resources), then it
is known as a virtual server. An example of a virtual server could be a single
SQL Server 2005 instance, with its associated name, IP address, disks where
databases and transaction logs are stored, and services, such as SQL Server,
SQL Server Agent, and Full-Text Search (FTS) Service.

The number of nodes supported in the clustered configuration of SQL Server
2005 is dependent on the operating system limits. You can have up to 8 cluster
members when using 32-bit version of Windows 2003 Server Datacenter Edition (up
to 4 with 64-bit version) and up to 4 cluster members with Windows 2003 Server
Enterprise Edition (Windows 2000 Advanced and Datacenter Server SP4 support up
to 2 and 4 nodes in a cluster, respectively). Note, that hardware requirements
in this case are dictated largely by clustering technology (rather then by SQL
Server 2005-specific demands). In order for the installation to be supported by
Microsoft, underlying hardware must comply with Windows 2000 Hardware
Compatibility List or with the Microsoft Windows 2003 Catalog, (you need to
ensure that the hardware is listed in the cluster solution category, since
compliance of its individual components does not suffice).

There are some caveats pertinent to setting up SQL Server 2005 on a cluster
(besides the already described issues, which apply to operating any
cluster-based application, such as satisfying physical and operating system
requirements or creating and configuring resources and virtual servers). First,
you need to install and configure the Microsoft Distributed Transaction
Coordinator (MSDTC) component (responsible for managing distributed queries,
two-phase commit transactions, and some of the replication tasks). The steps necessary
to accomplish this task on Windows 2003 platform are documented in the
Microsoft Knowledge Base Article Q301600
(and the Windows 2000-based
procedure is described in the article Q290624.
Note that it is recommended to place MSDTC and its associated resources in
their own resource group. Before launching the clustered installation of SQL
Server 2005, ensure that you are logged on with an account that is not only a
member of the local Administrators group on each cluster node, but also has
rights to log on as a service and act as a part of operating system. The setup
program is able to detect that it is running on a clustered server and
indicates this by enabling "Install as virtual server" options for
SQL Server and Analysis Services components on the "Components to Install"
page of the SQL Server Installation Wizard (make sure that you select these
options when installing a clustered instance). When specifying target disks
where database and transaction logs will be placed, choose them from the available
cluster groups (you need to pick the ones where appropriate disk resources are
located). Remember that each instance name must be unique across the entire
cluster. Ensure also that the MSCS Cluster Service account is a member of SysAdmin
role on clustered instances of SQL Server 2005 (this requirement is satisfied
by default because SysAdmin includes BUILTIN/Administrators group). Finally,
there are additional considerations that you should keep in mind when
incorporating clustered SQL Server 2005 installations as part of your
replication environment. In such cases, it is important (from the availability
perspective) to create a clustered file share resource and designate it as the
default snapshot folder for clustered Distributor or as an alternate snapshot
location for individual publications (this is done as part of the publication
definition from the Publication Properties dialog box). This provides
Subscribers access to snapshots in case of individual server failures.

Scalability of clustered SQL Server is limited due to a couple of factors. First,
there is only a single copy of each instance of SQL Server, with its data residing
on shared disks (this also has availability implications – even though the
disks are typically highly redundant and reliable, failure of a storage unit
hosting them results in application downtime). In addition, in your design, you
need to take into consideration the worst-case scenario, in which the majority
of servers fail. In this case, the remaining ones must be capable of handling
the increased level of utilization, which under normal conditions is evenly
distributed across the entire cluster. These factors, combined with the
failover strategy, affect the number of cluster nodes and their configuration
(it is possible to designate specific nodes as potential or preferred owners of
particular resource groups, while prohibiting others from participating in
their failover).

While the goal of database mirroring and failover clustering is to provide
high availability through automatic failover without data loss, their
characteristics (and, consequently, the manner in which they accomplish this
goal) are very different. When deciding on which one better suits your
requirements, you should consider the following factors:

  • hardware and software requirements – clustering has much more
    stringent requirements, since every piece of hardware needs to be included in
    the Windows 2000 Hardware Compatibility Listing or Microsoft Windows 2003
    Catalog (in addition, it is recommended that each cluster member is set up
    identically). Clustered disk devices operate typically in either shared SCSI of
    Fibre Channel configuration (iSCSI-based storage emerges as an attractive
    alternative). Servers must be running Windows 2003 Enterprise (or Windows 2000
    Advanced) or higher. While definite information regarding SQL Server 2005
    versions that will support clustering is not available at this point,
    historically it was necessary to purchase Enterprise Edition in order to be
    able to use it for clustered environments. On the other hand, database
    mirroring does not introduce any additional hardware restrictions, beyond the
    ones expected for standard SQL Server 2005 installations (although it does
    increase storage needs – you need to approximately double the amount of space
    occupied by your highly-available databases, to store their mirrored copies).

  • price – The increased level of hardware and software requirements
    (outlined above) of failover clustering introduces a significant increase in
    price comparing with database mirroring. While you might need additional disk
    space when using database mirroring, this should not have considerable impact
    on the price (disk space is relatively inexpensive), with with exception of
    very large databases.

  • distance limitations – The maximum distance covered by failover
    clustering depends on the type of technology used for attaching shared storage
    devices. In extreme cases (shared SCSI channel), the limitations are very
    stringent, restricting distance between cluster nodes to no more than a few
    meters. While Fibre-Channel-based technology significantly increases this range
    (with distances being measured in miles), it still might not be suitable in
    geographically distributed environments, where high level of availability is
    required. Database mirroring, on the other hand, needs only fast and reliable
    network connectivity (note, however, that iSCSI-based clustering can match this
    level of efficiency).

  • level of resiliency – in case of failover clustering, there is
    only one instance of each database and its transaction logs, residing on shared
    storage devices. While such devices are fault tolerant (typically through
    various levels of hardware-based RAID), only high-end solutions (which offer
    their own independent replication mechanisms) can deliver high-availability on
    the site level (where even a failure of the entire location still provides data
    access). Unfortunately, such solutions impose significant extra cost. Database
    mirroring, on the other hand, allows setting up a copy of a live database in
    either a local or remote location, which becomes available automatically in
    case of the failure of the primary, including site-wide disasters.

  • implementation and management complexity – clustering technology
    is considerably more difficult to implement, manage, and troubleshoot.

  • server-level vs. database-level failover – one of the benefits of
    clustering results from the fact that it fails over on the virtual server
    level. This eliminates some of the issues inherent to database mirroring, which
    we mentioned in the previous article (such as a need for synchronizing mappings
    between SQL Server logins and database users on the mirror or the lack of
    ability to protect system databases). Furthermore, failover is completely
    transparent to clients, regardless of the client version or content of
    connection string.

  • uptime expectations – mirroring offers faster failover than
    failover clustering (since the failover takes place on the database level,
    rather than for entire the SQL Server instance). The actual values depend on a
    number of factors (such as status of transactions on all active databases), but
    they would typically be within a range of few seconds in case of database
    mirroring and about one order larger in case of virtual server failover.

In the next article of our series, we will continue our discussion on high
availability and scalability features of SQL Server 2005, such as online
restore and indexing.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles