Efficient SQL Server Indexing by Design
April 30, 2010
Having a good set of indexes on your SQL Server database is critical to performance. Efficient indexes don't happen by accident; they are designed to be efficient. Greg Larsen discusses whether primary keys should be clustered, when to use filtered indexes and what to consider when using the Fill Factor.
This is the final installment in my Index Guidelines series. If you have been following this series, I have covered many different aspects of indexing, like having an index development lifecycle, to the order in which columns should be placed in your indexes. In this article, I will cover primary key, filtered indexes, and fill factor.
What is a primary key? A primary key is a value that uniquely identifies a record. This key value can be made up of one or more columns. In order to enforce a primary key constraint the SQL Server database engine creates a unique index for the primary key. When creating the primary key, SQL Server will create the primary key as a clustered index, by default, if the table does not already have a clustered index on it and you do not specify it to be created as a non-clustered index.
Should all primary keys be a clustered index? As with many questions when it comes to database design, it depends. As described in my Dos and Donts of Database Indexing article I said clustered indexes should be frequently used for retrieving data and should be highly selective and the index value should be as short as possible. A primary key by definition meets the highly selective criteria. However, a primary key might not be frequently used to retrieve data or be short in length. For example if a GUID is used for a primary key it will be unique, but it isnt short in length and most likely will not be used to retrieve data. If your primary key is an int value and is commonly used to constrain your queries then it would be a good candidate to be a clustered index. Remember the clustered index key value is used in all non-clustered indexes; therefore you do not want to pick a clustered index key that is long in length. This is especially true for tables that contain many rows. Primary keys are ideal for quickly resolving WHERE constraints when you are looking for records that contain a specific value. Therefore when selecting your primary key for a table make sure you create it as a non-clustered index when it is not used to frequently retrieve data, and/or it is quite large and you dont what to bloat the size of all the non-clustered indexes with an excessively large clustered index key value.
Should every table have a primary key? Clearly, there are times when a table requires a primary key, like when you want to enforce referential integrity by using a primary key column in a foreign key relationship, or you want every record to be uniquely identifiable by a single key value. Not all tables have a column or set of columns that make a natural key that uniquely identify each record in the table. In this, should you derive a primary key column, by creating an identity column or some other column to uniquely identify a record, so you can have a primary key column? Well once again, I suppose it depends on how you are going to use that derived primary key column. If you are not going to refer to the primary key in your code, or in a foreign key constraint then I would suggest there is no value in having a primary key column. I am sure there are some purists out there that would disagree with this last statement and say that best practices dictate that all tables should be built with a primary key, even if you have to derive it and it is not used or referenced for anything.
Filtered index is a new concept that was introduced in SQL Server 2008. A filtered index is a special non-clustered index that is defined on a well-known subset of column values. In other words, a filtered index is an index that is created only for a specific set of column values. For an example say you have a table that has a list of employees that live in different countries, which is identified by the Country column. If you created a normal non-clustered index on the Country, the index would contained all the different County values for all Employees, whereas, with a filtered index, you could create an index where the values in the index would only contain a subset of defined country values like, France, Italy and Spain.
Filtered indexes are useful when you have queries that only need to look at a small set of records. Filtered indexes use less disks space because they are only associated with a small set of the records within a table. Take for instance my example above regarding the countries associated with employees. A filtered index would only have clustered index key values for those employees in France, Italy and Spain. Therefore, the filtered index would take less pages to store the subset of country key values and their corresponding clustered index keys, than a normal non-clustered index.
Filtered indexes improve performance. They improve performance a couple of different ways. First off, the filter index only needs to be updated whenever a row value is created or changed that affects the subset of values associated with the filtered index. Going back to my employee country example, we can explore this a little more. If you inserted a new employee into the Employee table and they lived in France then the filtered index would need to be updated to reflect this new employee. However, if a new employee was added and they lived in the United States then the filtered index would not need to be updated, because it only needs to be updated when employees live in France, Italy and Spain. The second way filtered indexes improve performance is when SQL Server reads through the indexes to resolve a query. If you once again think of the employee example, say you had to search for all the employees that lived in Spain. That query would have to do less I/O to process through the filtered index to find employees that live in Spain, than it would have to do if it processed through a normal non-clustered index that contain employees for all countries.
Here is the syntax for creating my country filtered index:
CREATE NONCLUSTERED INDEX FilteredCountries ON HR.Employees (Country) WHERE Country IN ('France','Italy','Spain')
As you can see, it is like creating a normal non-clustered index, but includes a WHERE clause to filter the column values contained in the filtered index.
When creating indexes you need to take into account how the key values in an index will change over time. Will there be new values added in the middle of the sorted list of index values, or only at the end? Will values be removed over time when records are deleted? Your answers to these kinds of questions related to your index values, will determine how you should set your index Fill Factor.
What is the Fill Factor? The index Fill Factor is a value, expressed as a percentage, for how full each index page should be populated when the index is built. You need to set the Fill Factor appropriately based on how your index values will change over time. Setting a Fill Factor appropriately keeps SQL Server from having to create page splits when trying to insert new values into the middle of an index.
What should you choose for a Fill Factor? If you were not expecting to add any values to your table over time then a Fill Factor of 100 would be appropriate and the index pages would be filled completely when the index was populated. If your table has a fair number of updates and inserts of new values happening all the time then you might consider setting a Fill Factor between 50 and 80 to allow more values to be placed in existing pages without causing page splits. If you are usually adding column values to the end of your sorted index values, which might be the case in an identity column index key value, then you might want to pick a Fill Factor between 90-100%. Ideally, you want to identify a Fill Factor that allows your indexes to grow with minimal page splits between index rebuild operations.
Efficient Indexing By Design
Having a good set of indexes on your database is critical to performance. Efficient indexes, that optimize your application performance do not normally happen by accident, but are designed to be efficient. Knowing how your application is querying, loading, and updating your data are critical to designing a good indexing structure. Monitor your indexes over time to make sure they are meeting your needs and modify them, if needed, to improve application performance.
Indexes on Computed Columns: Speed Up Queries, Add Business Rules