Efficient SQL Server Indexing by Design


Having a good set of indexes on your SQL Server database is critical to performance. Efficient indexes don’t happen by accident; they are designed to be efficient. Greg Larsen discusses whether primary keys should be clustered, when to use filtered indexes and what to consider when using the Fill Factor.

This is the
final installment in my Index Guidelines series. If you have been following
this series, I have covered many different aspects of indexing, like having an
index development lifecycle, to the order in which columns should be placed in
your indexes. In this article, I will cover primary key, filtered indexes, and
fill factor.

Primary Key

What is a
primary key? A primary key is a value that uniquely identifies a record.
This key value can be made up of one or more columns. In order to enforce a
primary key constraint the SQL Server database engine creates a unique index
for the primary key. When creating the primary key, SQL Server will create the
primary key as a clustered index, by default, if the table does not already
have a clustered index on it and you do not specify it to be created as a
non-clustered index.

Should all
primary keys be a clustered index? As with many questions when it comes to
database design, “it depends”. As described in my “Do’s and Don’ts of Database Indexing” article I said clustered indexes
should be frequently used for retrieving data and should be highly selective
and the index value should be as short as possible. A primary key by
definition meets the highly selective criteria. However, a primary key might
not be frequently used to retrieve data or be short in length. For example if
a GUID is used for a primary key it will be unique, but it isn’t short in
length and most likely will not be used to retrieve data. If your primary key
is an “int” value and is commonly used to constrain your queries then it would
be a good candidate to be a clustered index. Remember the clustered index key
value is used in all non-clustered indexes; therefore you do not want to pick a
clustered index key that is long in length. This is especially true for tables
that contain many rows. Primary keys are ideal for quickly resolving WHERE
constraints when you are looking for records that contain a specific value. Therefore
when selecting your primary key for a table make sure you create it as a
non-clustered index when it is not used to frequently retrieve data, and/or it
is quite large and you don’t what to bloat the size of all the non-clustered
indexes with an excessively large clustered index key value.

Should every
table have a primary key? Clearly, there are times when a table requires a
primary key, like when you want to enforce referential integrity by using a
primary key column in a foreign key relationship, or you want every record to
be uniquely identifiable by a single key value. Not all tables have a column
or set of columns that make a natural key that uniquely identify each record in
the table. In this, should you derive a primary key column, by creating an
identity column or some other column to uniquely identify a record, so you can
have a primary key column? Well once again, I suppose it depends on how you
are going to use that derived primary key column. If you are not going to
refer to the primary key in your code, or in a foreign key constraint then I
would suggest there is no value in having a primary key column. I am sure
there are some purists out there that would disagree with this last statement
and say that best practices dictate that all tables should be built with a
primary key, even if you have to derive it and it is not used or referenced for
anything.

Filtered Index

Filtered
index is a new concept that was introduced in SQL Server 2008. A filtered
index is a special non-clustered index that is defined on a well-known subset
of column values. In other words, a filtered index is an index that is
created only for a specific set of column values. For an example say you have
a table that has a list of employees that live in different countries, which is
identified by the Country column. If you created a normal non-clustered index
on the Country, the index would contained all the different County values for
all Employees, whereas, with a filtered index, you could create an index where the
values in the index would only contain a subset of defined country values like,
France, Italy and Spain.

Filtered
indexes are useful when you have queries that only need to look at a small set
of records. Filtered indexes use less disks space because they are only
associated with a small set of the records within a table. Take for instance my
example above regarding the countries associated with employees. A filtered
index would only have clustered index key values for those employees in France,
Italy and Spain. Therefore, the filtered index would take less pages to store
the subset of country key values and their corresponding clustered index keys,
than a normal non-clustered index.

Filtered
indexes improve performance. They improve performance a couple of different
ways. First off, the filter index only needs to be updated whenever a row
value is created or changed that affects the subset of values associated with
the filtered index. Going back to my employee country example, we can explore
this a little more. If you inserted a new employee into the Employee table and
they lived in France then the filtered index would need to be updated to
reflect this new employee. However, if a new employee was added and they lived
in the United States then the filtered index would not need to be updated,
because it only needs to be updated when employees live in France, Italy and
Spain. The second way filtered indexes improve performance is when SQL
Server reads through the indexes to resolve a query. If you once again think
of the employee example, say you had to search for all the employees that lived
in Spain. That query would have to do less I/O to process through the filtered
index to find employees that live in Spain, than it would have to do if it
processed through a normal non-clustered index that contain employees for all
countries.

Here is the
syntax for creating my country filtered index:

CREATE NONCLUSTERED INDEX FilteredCountries
    ON HR.Employees (Country)
    WHERE Country IN ('France','Italy','Spain')

As you can
see, it is like creating a normal non-clustered index, but includes a WHERE
clause to filter the column values contained in the filtered index.

Fill Factor

When
creating indexes you need to take into account how the key values in an index
will change over time. Will there be new values added in the middle of the
sorted list of index values, or only at the end? Will values be removed over
time when records are deleted? Your answers to these kinds of questions
related to your index values, will determine how you should set your index Fill
Factor.

What is the
Fill Factor? The index Fill Factor is a value, expressed as a percentage, for
how full each index page should be populated when the index is built. You
need to set the Fill Factor appropriately based on how your index values will
change over time. Setting a Fill Factor appropriately keeps SQL Server from
having to create page splits when trying to insert new values into the middle
of an index.

What should
you choose for a Fill Factor? If you were not expecting to add any values to
your table over time then a Fill Factor of 100 would be appropriate and the
index pages would be filled completely when the index was populated. If your
table has a fair number of updates and inserts of new values happening all the
time then you might consider setting a Fill Factor between 50 and 80 to allow more
values to be placed in existing pages without causing page splits. If you are usually
adding column values to the end of your sorted index values, which might be the
case in an identity column index key value, then you might want to pick a Fill
Factor between 90-100%. Ideally, you want to identify a Fill Factor that
allows your indexes to grow with minimal page splits between index rebuild operations.

Efficient Indexing By Design

Having a
good set of indexes on your database is critical to performance. Efficient
indexes, that optimize your application performance do not normally happen by
accident, but are designed to be efficient. Knowing how your application is
querying, loading, and updating your data are critical to designing a good
indexing structure. Monitor your indexes over time to make sure they are
meeting your needs and modify them, if needed, to improve application performance.

Additional Resources

Indexes on Computed Columns: Speed Up Queries, Add Business Rules

Keep Your SQL Server Indexes Fragmentation Free

Technet Magazine Optimizing SQL Server Query Performance

Jose Barreto’s Blog SQL Server 2008 Indexing for JOINs

Jose Barreto’s Blog SQL Server 2008 Indexing Best Practices

»


See All Articles by Columnist

Gregory A. Larsen

Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles