Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Dec 2, 2003

Configuring Disks and Managing Space in SQL Server Part 1

By Andrew Novick

Something like fifty percent of system outages are caused by "Out of space" conditions. I kid you not. I run into them all the time. Combine this problem with the need to maintain high availability and peak performance and the consequence is that disk configuration and space allocation become one of the principle tasks for a SQL Server DBA. This is the first of two articles that discuss the configuration of disks and how to allocate space to user databases.

Configuration is the task of choosing the disk drives and setting up RAID configurations among the disks. It involves tradeoffs between performance, availability and cost. The first section of this article discusses RAID configurations and gives a recommended starting point for creating yours. With the cost of most disks so low these days, I am going to come down on the side of maximizing availability and performance.

Properly allocating space on the disks is necessary to minimize out of space considerations but it is also a factor in performance and availability. The latter portion of this article discusses what should go on each RAID drive set. You may have a different combination of disks but the rationale provided should aid in your decisions.

Next month, in Part II of this article, I will show you some code that can be used to track disk usage and establish estimates for database growth. This information is needed to make informed decisions about allocating space to the files used for user tables and indexes as well as planning for future disk purchases.

Configuring Disks

There are always many conflicting factors when configuring disks but the primary factors are:

  • Preservation of the data.
  • Availability also known as uptime.
  • Performance.
  • Cost

A database is not any good if it does not preserve your data. However, there are many ways to accomplish this task. Using SQL Server's backup and transaction log it is possible to insure recovery of a database until the last transaction log was saved and sometimes up to the second that an application stopped working. However, a single disk failure might bring your system to a halt. For most of us, this type of risk is not acceptable and we turn to RAID arrays for better reliability.

Performance is another major consideration and a difficult one to improve. Over the last decade, we have seen spectacular growth in disk size to the point where the major system vendors do not really sell disks with a capacity under 30 gigabytes. The growth in size has not been matched by a growth in the number of input/output operations that a disk can perform in each second. In fact, performance has grown only linearly.

One rule of thumb that has held true since I began building large systems in the 1980's is that the more disk heads (spindles) available, the better the overall I/O performance. This turns out to be one of the best guiding principles when configuring disks.

Putting these considerations together and with disk capacities so large and the need for having more disks heads (spindles) a factor; I have taken the following approach to recommending the purchase of disk for a server that can run with SQL Server standard edition:

  • The system should be configured with 5 sets of RAID 1 pairs.
  • The use of the pairs should be:
    1. For the operating system, SQL Server software, paging file, master, msdb, and transaction log backup files. This disk pair should have two partitions. The first is for the OS and software and it should be 8-12 GB. The second partition should use the remaining space and hold the SQL system databases and transaction log backup files.
    2. tempdb and a file group with low activity indexes.
    3. Database Log files (the LDF files).
    4. A small PRIMARY file group and a file group with all user table data.
    5. A file group with the high activity indexes.
  • Database backups should be made directly off the server, preferably directly to a tape.

There are a zillion variations on the layout of data on the five RAID 1 pairs. The above allocation is a tradeoff between the various practical considerations. The next section discusses the possible RAID choices. After that, the next section, File Placement, discusses which files go on each disk set.

Why RAID 1

Let's take a look at the alternative types of RAID storage. I think it will become apparent why RAID 1 is the preferred choice for data storage in most situations. Table 1 shows the typical RAID types:

Table 1 RAID Types

RAID

Type

How it works

Evaluation

None

Plain disk

Provides no protection. Should be used only if absolutely necessary.

0

Disk striping

Two disks with data distributed across them. The disks act as one larger disk.

High performance but no protection of data. Not suitable for SQL Server

1

Disk Mirrors. Data copied on both disks

A little faster for reads and a little slower for writes. Provides good protection of data at the expense of double the disks.

5

Parity checks performed across three or more disks. Data is stripped across them.

Provides good protection at the cost of performance, particularly for writes.

1+0

Or 10

Mirroring a striped set

Combines performance and protection but expensive.

No RAID and RAID 0 are poor choices because they provide no protection in the case of a single drive failure. Sure, you could go to backups and transaction logs but your system becomes unavailable for the time it takes to recover and there is a risk of data loss.

RAID 1 is preferable because it gives the best tradeoff between performance, protection and cost. Because it is 100 percent redundant, it both protects the data and the availability of your system. Availability is the reason that you need RAID even for tempdb. Failure of a disk drive with tempdb would bring your SQL Server to a halt. Sure, you could bring it back up and move tempdb to another drive. In this case, you will not lose any data but the users will have to live without your system. What is the cost of the down time compared to the cost of the extra disk needed for RAID 1 protection? You will have to answer that for yourself, but it is almost always worth the cost.

RAID 5 is cost effective but generally too slow unless you are working with a read-only application. It is possible to have 5 or more disks in a RAID configuration and still be protected from the failure of a single drive. That makes it less expensive than RAID 1, which uses half the disks to provide redundancy. Because of the parity bits used to protect the data in RAID 5, each write is going to require two physical disk reads and two physical writes to complete. It is a lot of overhead.

Combining RAID 0 and RAID 1 is called RAID 1+0 or alternatively RAID 10. It combines some of the performance benefits of RAID 0 with the protection of RAID 1. It is like having a much bigger RAID 1 disk and because of the striping, performance should be pretty good. However, if you need more space, why not just add another pair of RAID 1 disks. I think the two alternatives are pretty close so I prefer the simpler option of using two RAID 1 sets.

For more information about RAID configurations and their implication for SQL Server, I turn to the book SQL Server 2000 Administrator's Companion published by Microsoft Press. Chapter 5 has a thorough discussion of RAID configurations and how to use them with SQL Server. It will go into the number of physical reads and writes to accomplish each logical read or write.

Now that you have configured a number of RAID pairs, you have to choose which files go on each disk. The next section discusses the rationale behind my choices.

File Placement

The scheme outlined above suggests the placement of specific groups of files on specific drives. I have developed these practies after a little experience and this section tries to communicate the rationale behind the scheme. In all cases, I am assuming that each pair of disks is roughly 30 gigabytes in size. Let's start with the first RAID 1 set, which is devoted to the operating system and transaction log backups.

Windows 2000 server generally does best with a relatively small partition for the operating system. System administrators have explained this to me a few times but I generally just accept it. Along with the OS goes the paging file. While you could have a drive devoted to the paging file, SQL Server is going to manage its own memory and does not page that much unless there are other programs running on the server.

Once the OS and paging file are allocated on the primary partition of the first RAID 1 set, there is a partition with around 20 Gigabytes available. The best use that I have found for this is backup files. The transaction log backups or full database backup files could go here. I prefer to get the full backups off the machine as soon as possible, so there are usually only transaction log backup files on this partition

The second RAID 1 set is for tempdb and a file group that holds low activity indexes. In most systems, tempdb is a high activity database and the whole system benefits from having tempdb on its own drive set.

I generally create tempdb at 500 megabytes and set file growth in increments of 10 megabytes. That is because it is recreated every time that SQL Server is started. The larger its initial allocation, the longer it takes for SQL Server to start. The file growth allows it to be extended in increments that are small enough not to cause most transactions to time out.

Even if tempdb grows to several gigabytes, there is still room on this disk. Since tempdb is so high in activity, the uses for this space must be low in activity. The best choice is a file group with low activity indexes. These are non-clustered indexes on user tables that do not change often. An alternative use of this space is for full database backups, that is, if they are not being done strait to tape.

The third set of disks is devoted to the database log files. As SQL Server commits a transaction it writes modified pages and transaction control information to the log sequentially. Leaving the log file on its own drive allows this process to proceed as fast as possible. Without other uses for the drive, the disk heads never move from the point where new log entries are written. This lets them be completed as quickly as possible, which lets transactions be completed as quickly as possible, which lets locks be released as quickly as possible. This makes for a healthy system.

Another consideration is that the log files should be on a different drive than both the data files and the transaction log backup files. Even though all data is protected by the use of RAID, separating the logs from both the data and their backup files provides an extra measure of protection.

The fourth set of drives is devoted to the file groups with the data. The PRIMARY file group should only have system tables. User tables should be in their own file group, which I usually call USERDATA1. The files for both can be on this drive set. Having just data in the PRIMARY and USERDATA1 file groups allows the quickest possible backup of the data.

Finally, the fifth drive set is devoted to a file group with indexes for user tables and views. Separating indexes from their data on separate drives increases the likelihood that both sets of disks are working to complete data writing. However, depending on your application, indexes may have much more activity than the data portion of the user tables and you may have to adjust where you place the indexes.

The practices that I have outlined here are intended to spread SQL Server's input/output load across as many drives as are available. At the same time, we need to provide protection from single drive failure or something worse.

Conclusion

This article has concentrated on how to configure disks. RAID 1 is the preferred choice for most drive sets because of its full redundancy and reasonable performance. These positive attributes come at the cost of a few dollars.

RAID 1 is disk mirroring so it requires two disks to store the data that could fit on one. For most production SQL Servers the cost is worth the expense because it virtually insures the system against single disk failures and if you have disks that can be replaced without bringing down the system, the users may barely notice any problem.

I have also outlined an approach to spreading the I/O load from SQL Server across the available drives. An additional consideration is keeping files separate from the files that back them up.

Next month's article takes this discussion further by showing how to establish file growth information for file groups, disk drives, and databases. This information is needed to allocate space for the files in the various file groups.

» See All Articles by Columnist Andrew Novick



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date