Scalability and High Availability of Microsoft SQL Server 2000 (Part 3)
March 11, 2003
In the first two articles of the series, I described concepts and implementation details of SQL Server 2000 clustering. Clusters are used primarily to provide high availability through their support for failover. In this article, we will look into Distributed Partitioned Views, which are intended for environments requiring a high degree of scalability. You should note, however, that Distributed Partitioned Views do not offer high availability. In fact, the probability of data access problems increases (compared to traditional, non-partitioned views), since all partitions within the view need to be accessible in order for the view to function. This means that if your goal is to create scalable and highly available configuration, you should consider combining both solutions together.
Distributed Partitioned Views are based on horizontal partitioning of the data. This term means that all rows in a table are divided into distinct sets. Each set contains rows for which a value in one of the key columns belongs to a specific range of values. The choice of these ranges is one of the most critical decisions when developing Distributed Partitioned Views. The ranges need to reflect data access patterns. The goal is to load balance distribution of queries by placing each of the ranges in a separate partition residing on a separate SQL server. Each partition takes the form of a table stored in a database residing on a separate server; (A collection of servers with partitioned data is known as a federation). If ranges are chosen properly, then for the majority of queries, it is sufficient to access a single partition only. For example, a table can contain data that is somehow related to a geographical location. Location code, stored in key column, would be used in this case to determine to which partition a particular row belongs.
Distributed partitioned views are used to provide transparent access to data residing in separate partitions. Such views combine all rows from all partitions and make them appear as a single table. Partitioned views can be read-only or read/write. The read/write type requires SQL Server 2000 Enterprise Edition or SQL Server 2000 Developer Edition (they also need to follow a number of rules defined later in the article). In addition, you should ensure that SQL servers, where the partitions reside, are connected using fast network links (otherwise, latency for operations involving partitioned views might be significant).
Implementation of partitioned data and federated servers should be preceded by careful database design. It is essential to determine optimal placement of data, such that related data is stored together. One way to accomplish this is to determine all foreign keys referencing rows in a local partition and store them on the same server. Another method can be used if you need to partition already existing set of tables. In such case, you can base your decision on observing and analyzing current data access patterns. Tables that do not qualify as good candidates for partitioning can be copied across federated servers and synchronized through the use of triggers or replication.
Once you determined the distribution of the data in a table (or tables) across the partitions, you will need to create replicas on each of the servers. Each table should contain the CHECK constraint, enforcing uniqueness of all partitioned data sets. The criteria used by the CHECK constraint are used to make sure that any of the rows in the table can belong to only a single partition. For example, assuming that we create two partitions for data tracking sales records in two different states ('NJ' and 'NY'), we could create two tables Sales_NJ and Sales_NY, the first one in a databases on New Jersey SQL server, the second one in a database on New York SQL server:
CREATE TABLE Sales_NJ (State CHAR(2) PRIMARY KEY CHECK (STATE = 'NJ'), SalesPerson CHAR(10), Date SMALLDATETIME, Record CHAR(10) ) CREATE TABLE Sales_NY (State CHAR(2) PRIMARY KEY CHECK (STATE = 'NY'), SalesPerson CHAR(10), Date SMALLDATETIME, Record CHAR(10) )
You also need to add each server to all of the others participating in the federation as linked servers. This can be done by using SQL Enterprise Manager (from Security -> Linked Servers container), or by running sp_addlinkedserver stored procedure (on the New York and New Jersey servers, respectively).
EXEC sp_addlinkedserver @server='NJServer' EXEC sp_addlinkedserver @server='NYServer'
If you are not using integrated security (or if credentials used to access both servers are different), you would also need to specify the mappings between local and remote server logins with either Enterprise Manager (on the Security tab of each linked server Properties dialog box) or by running sp_addlinkedsrvlogin stored procedure.
Finally, on each of the federated servers, in the database containing the partitioned table, you would create a distributed partitioned view that uses UNION ALL to combine all rows. In our case, this could take the following form:
CREATE VIEW Sales_All AS SELECT State, SalesPerson, Date, Record FROM NJServer.FederatedDB.dbo.Sales_NJ UNION ALL SELECT State, SalesPerson, Date, Record FROM NYServer.FederatedDB.dbo.Sales_NY
At this point, users can refer to Sales_All the same way they would refer to a regular table (or a view). This provides not only data transparency, (users do not need to be concerned which server contains which data), but also speeds up queries that access the data for a single state only.
Note that there are several requirements that need to be satisfied in order for the Distributed Partitioned Views to work properly. For example, in order for a view to be updateable, each of the SELECT statements, (joined by the UNION ALL statements in the VIEW definition - as in the example above), need to refer to a single table only. For a full list of rules that partitioned views need to satisfy, refer to the SQL Server Books Online.
Even though it might seem that creating partitioned views is fairly straightforward, keep in mind that the primary condition for their efficient operation is the proper distribution of data. You need to be able to recognize the distribution and access patterns before you decide to implement this solution.
In the next article of this series, I will cover log shipping, which can be used to increase SQL Server availability.