Configuring Disks and Managing Space in SQL Server Part 1
December 2, 2003
Something like fifty percent of system outages are caused by "Out of space" conditions. I kid you not. I run into them all the time. Combine this problem with the need to maintain high availability and peak performance and the consequence is that disk configuration and space allocation become one of the principle tasks for a SQL Server DBA. This is the first of two articles that discuss the configuration of disks and how to allocate space to user databases.
Configuration is the task of choosing the disk drives and setting up RAID configurations among the disks. It involves tradeoffs between performance, availability and cost. The first section of this article discusses RAID configurations and gives a recommended starting point for creating yours. With the cost of most disks so low these days, I am going to come down on the side of maximizing availability and performance.
Properly allocating space on the disks is necessary to minimize out of space considerations but it is also a factor in performance and availability. The latter portion of this article discusses what should go on each RAID drive set. You may have a different combination of disks but the rationale provided should aid in your decisions.
Next month, in Part II of this article, I will show you some code that can be used to track disk usage and establish estimates for database growth. This information is needed to make informed decisions about allocating space to the files used for user tables and indexes as well as planning for future disk purchases.
There are always many conflicting factors when configuring disks but the primary factors are:
A database is not any good if it does not preserve your data. However, there are many ways to accomplish this task. Using SQL Server's backup and transaction log it is possible to insure recovery of a database until the last transaction log was saved and sometimes up to the second that an application stopped working. However, a single disk failure might bring your system to a halt. For most of us, this type of risk is not acceptable and we turn to RAID arrays for better reliability.
Performance is another major consideration and a difficult one to improve. Over the last decade, we have seen spectacular growth in disk size to the point where the major system vendors do not really sell disks with a capacity under 30 gigabytes. The growth in size has not been matched by a growth in the number of input/output operations that a disk can perform in each second. In fact, performance has grown only linearly.
One rule of thumb that has held true since I began building large systems in the 1980's is that the more disk heads (spindles) available, the better the overall I/O performance. This turns out to be one of the best guiding principles when configuring disks.
Putting these considerations together and with disk capacities so large and the need for having more disks heads (spindles) a factor; I have taken the following approach to recommending the purchase of disk for a server that can run with SQL Server standard edition:
There are a zillion variations on the layout of data on the five RAID 1 pairs. The above allocation is a tradeoff between the various practical considerations. The next section discusses the possible RAID choices. After that, the next section, File Placement, discusses which files go on each disk set.
Why RAID 1
Let's take a look at the alternative types of RAID storage. I think it will become apparent why RAID 1 is the preferred choice for data storage in most situations. Table 1 shows the typical RAID types:
Table 1 RAID Types
No RAID and RAID 0 are poor choices because they provide no protection in the case of a single drive failure. Sure, you could go to backups and transaction logs but your system becomes unavailable for the time it takes to recover and there is a risk of data loss.
RAID 1 is preferable because it gives the best tradeoff between performance, protection and cost. Because it is 100 percent redundant, it both protects the data and the availability of your system. Availability is the reason that you need RAID even for tempdb. Failure of a disk drive with tempdb would bring your SQL Server to a halt. Sure, you could bring it back up and move tempdb to another drive. In this case, you will not lose any data but the users will have to live without your system. What is the cost of the down time compared to the cost of the extra disk needed for RAID 1 protection? You will have to answer that for yourself, but it is almost always worth the cost.
RAID 5 is cost effective but generally too slow unless you are working with a read-only application. It is possible to have 5 or more disks in a RAID configuration and still be protected from the failure of a single drive. That makes it less expensive than RAID 1, which uses half the disks to provide redundancy. Because of the parity bits used to protect the data in RAID 5, each write is going to require two physical disk reads and two physical writes to complete. It is a lot of overhead.
Combining RAID 0 and RAID 1 is called RAID 1+0 or alternatively RAID 10. It combines some of the performance benefits of RAID 0 with the protection of RAID 1. It is like having a much bigger RAID 1 disk and because of the striping, performance should be pretty good. However, if you need more space, why not just add another pair of RAID 1 disks. I think the two alternatives are pretty close so I prefer the simpler option of using two RAID 1 sets.
For more information about RAID configurations and their implication for SQL Server, I turn to the book SQL Server 2000 Administrator's Companion published by Microsoft Press. Chapter 5 has a thorough discussion of RAID configurations and how to use them with SQL Server. It will go into the number of physical reads and writes to accomplish each logical read or write.
Now that you have configured a number of RAID pairs, you have to choose which files go on each disk. The next section discusses the rationale behind my choices.
The scheme outlined above suggests the placement of specific groups of files on specific drives. I have developed these practies after a little experience and this section tries to communicate the rationale behind the scheme. In all cases, I am assuming that each pair of disks is roughly 30 gigabytes in size. Let's start with the first RAID 1 set, which is devoted to the operating system and transaction log backups.
Windows 2000 server generally does best with a relatively small partition for the operating system. System administrators have explained this to me a few times but I generally just accept it. Along with the OS goes the paging file. While you could have a drive devoted to the paging file, SQL Server is going to manage its own memory and does not page that much unless there are other programs running on the server.
Once the OS and paging file are allocated on the primary partition of the first RAID 1 set, there is a partition with around 20 Gigabytes available. The best use that I have found for this is backup files. The transaction log backups or full database backup files could go here. I prefer to get the full backups off the machine as soon as possible, so there are usually only transaction log backup files on this partition
The second RAID 1 set is for tempdb and a file group that holds low activity indexes. In most systems, tempdb is a high activity database and the whole system benefits from having tempdb on its own drive set.
I generally create tempdb at 500 megabytes and set file growth in increments of 10 megabytes. That is because it is recreated every time that SQL Server is started. The larger its initial allocation, the longer it takes for SQL Server to start. The file growth allows it to be extended in increments that are small enough not to cause most transactions to time out.
Even if tempdb grows to several gigabytes, there is still room on this disk. Since tempdb is so high in activity, the uses for this space must be low in activity. The best choice is a file group with low activity indexes. These are non-clustered indexes on user tables that do not change often. An alternative use of this space is for full database backups, that is, if they are not being done strait to tape.
The third set of disks is devoted to the database log files. As SQL Server commits a transaction it writes modified pages and transaction control information to the log sequentially. Leaving the log file on its own drive allows this process to proceed as fast as possible. Without other uses for the drive, the disk heads never move from the point where new log entries are written. This lets them be completed as quickly as possible, which lets transactions be completed as quickly as possible, which lets locks be released as quickly as possible. This makes for a healthy system.
Another consideration is that the log files should be on a different drive than both the data files and the transaction log backup files. Even though all data is protected by the use of RAID, separating the logs from both the data and their backup files provides an extra measure of protection.
The fourth set of drives is devoted to the file groups with the data. The PRIMARY file group should only have system tables. User tables should be in their own file group, which I usually call USERDATA1. The files for both can be on this drive set. Having just data in the PRIMARY and USERDATA1 file groups allows the quickest possible backup of the data.
Finally, the fifth drive set is devoted to a file group with indexes for user tables and views. Separating indexes from their data on separate drives increases the likelihood that both sets of disks are working to complete data writing. However, depending on your application, indexes may have much more activity than the data portion of the user tables and you may have to adjust where you place the indexes.
The practices that I have outlined here are intended to spread SQL Server's input/output load across as many drives as are available. At the same time, we need to provide protection from single drive failure or something worse.
This article has concentrated on how to configure disks. RAID 1 is the preferred choice for most drive sets because of its full redundancy and reasonable performance. These positive attributes come at the cost of a few dollars.
RAID 1 is disk mirroring so it requires two disks to store the data that could fit on one. For most production SQL Servers the cost is worth the expense because it virtually insures the system against single disk failures and if you have disks that can be replaced without bringing down the system, the users may barely notice any problem.
I have also outlined an approach to spreading the I/O load from SQL Server across the available drives. An additional consideration is keeping files separate from the files that back them up.
Next month's article takes this discussion further by showing how to establish file growth information for file groups, disk drives, and databases. This information is needed to allocate space for the files in the various file groups.