Proactive Database Index Creation
March 31, 2010
Indexes help your application find your data quickly and provide users with a well performing application, while minimizing server resources. This article discusses indexing guidelines related to join tables and covering indexes.
This is the fourth article in my indexing guideline series. You need to develop indexes to help your application find your data quickly and to provides your users with a well performing application, while minimizing server resources. Adding indexes most of the time seems like an afterthought for most development efforts. I hope that this article and the rest in this series will give you some ideas on how to be more proactive in creating indexes for your database. In this article, I will be discussing indexing guidelines related to join tables and covering indexes.
Building Indexes to Help With Table Joins
In a relational database design, your data is normally broken up in to different tables and related to each other by key values. These key values are used to join tables together to return the desired results from your query. If you place referential integrity constraints in your database, by creating foreign key constraints, SQL Server will use these key values to ensure your referential integrity rules are not broken as you try to INSERT, UPDATE, or DELETE records from a table. By building appropriate indexes, you can speed up the different table join operations.
To build useful indexes you need to review your code and determine what columns are used to join your tables. The columns you identify will be the columns that should be incorporated into one or more indexes. Keep in mind how often the JOIN operation will be performed. If you have a query that is only executed once a year, during some annual nightly batch process, then possibly having an index to support that annual processing will not be that beneficial. Whereas if you have a JOIN operation that is executed every few seconds then an index to support that JOIN would provide big value.
If a single column from each table is used to join two tables, then verify that this column is identified in an index. As a guideline make sure the join column is either in an index that only contains that column or it is at the beginning of a multi-column index.
To demonstrate the value of having an index to support a JOIN operation, run the following example:
USE AdventureWorks; go SET NOCOUNT ON; SELECT * INTO dbo.MySalesOrderDetail FROM Sales.SalesOrderDetail; SELECT * INTO dbo.MySalesOrderHeader FROM Sales.SalesOrderHeader; SET STATISTICS IO ON; SELECT OD.SalesOrderID FROM dbo.MySalesOrderDetail OD JOIN dbo.MySalesOrderHeader OH ON OD.SalesOrderID = OH.SalesOrderID WHERE OD.SalesOrderID > 45660 and OD.SalesOrderID < 46660 SET STATISTICS IO OFF; CREATE INDEX IDX_MySalesOrderHeader_SalesOrderID ON dbo.MySalesOrderHeader(SalesOrderId); CREATE INDEX IDX_MySalesOrderDetail_SalesOrderID ON dbo.MySalesOrderDetail(SalesOrderId); SET STATISTICS IO ON; SELECT OD.SalesOrderID FROM dbo.MySalesOrderDetail OD JOIN dbo.MySalesOrderHeader OH ON OD.SalesOrderID = OH.SalesOrderID WHERE OD.SalesOrderID > 45660 and OD.SalesOrderID < 46660 SET STATISTICS IO OFF; DROP TABLE dbo.MySalesOrderHeader; DROP TABLE dbo.MySalesOrderDetail;
When I run this code against my Adventure works database, I see the first SELECT statement does an INDEX SCAN operation against both tables in the query. It also requires 2,294 logical reads to resolve this SELECT statement, since neither table has an index on the column (SalesOrderID), which is used in the JOIN operation. Whereas the second SELECT statement, which is run after creating two non-clustered indexes, requires only 17 logical reads to resolve this query. That is a significant saving in I/O for the cost of a couple of indexes to support the JOIN operation.
What is a covering index? A covering index is a non-clustered index that contains all the columns needed to resolve a single query. For instance, say you have a really, wide table with say 500 different columns, but you query that table and reference only five columns in your query statement. If you created an index that contained at least those five columns and that index is used by the query engine then it is considered a covering index.
Covering indexes minimize the amount of I/O the query engine needs to perform to resolve a query. Reducing I/O is the name of the game when trying to optimize your queries. I/O operations are slow, so any time you can significantly reduce I/O you are getting the result set quicker and taxing the I/O sub-system less, not to mention the reduced load on the buffer pool.
The concept of a covering index has been around for a long time. However Microsoft introduced the INCLUDE clause in the CREATE INDEX statement with SQL Server 2005 to further reduced the amount of space and I/O required by covering indexes. The INCLUDE clause places the values of the covering index columns only in the leaf level pages. Let me explore how to build covering indexes before the introduction of SQL Server 2005 and then we will look at how to create the same covering index with SQL Server 2005.
Here is a CREATE TABLE and a SELECT statement that we will be using to explore covering index implementation:
CREATE TABLE MyTable( ID int Primary Key, ShortDescription varchar(240), LongDescription varchar(4000), FirstName varchar(100), LastName varchar(100), EmailAddress varchar(100), CreateDate DateTime, ModifiedDate DateTime) SELECT ModifiedDate, ShortDescription, ID FROM dbo.MyTable WHERE ID > '5909127';
By looking at the table definition, you can get a sense of how big a row might be. A row might be very large if each record contained a lengthy LongDescription value. This SELECT statement only uses three different columns (ModifiedDate, ShortComment, ID). To create a covering index all three of these columns will need to be included in the non-clustered index. Here is the syntax to create a covering index in versions of SQL Server prior to SQL Server 2005 (no INCLUDE clause allowed).
CREATE INDEX IDX_MyTable_CoveringIndexPre2005 ON dbo.MyTable (ID, ModifiedDate, ShortDescription);
Here you can see I have included all those the columns used in the above SELECT statement in the index. After creating this index, anytime I run the above SELECT statement, SQL Server will be able use my covering index to resolve the query instead of going to the clustered index to resolve this query. This would greatly reduce the number of I/Os SQL Server would need to perform, by not having to read the larger clustered index records.
The above syntax for creating a covering index is still valid in SQL Server 2005, but creating a covering index in this way places values for the index columns up in the upper index pages of the index, as well as having the values in each row of the leaf pages. In my SELECT statement above I only reference the ID column in my WHERE statement, so the database engine doesnt need to use the values for the ModifiedDate, and ShortComment columns in my upper index pages when trying to determine which leaf pages it need to read to return the result set. This is why Microsoft introduced the INCLUDE clause in SQL Server 2005.
The INCLUDE option, of the CREATE INDEX statement, allows you to create a covering index without putting the values of those columns not used in WHERE or JOIN predicates in the upper index pages. By identifying the non-searchable columns in the INCLUDE column they will only be included in the leaf pages of a non-clustered index. By doing this, the size of the non-clustered covering index using the INCLUDE clause is smaller than the pre-2005 way of creating covering index. A smaller index equates to less I/O when reading the upper index pages of a covering index. To create a covering index using the INCLUDE clause to mirror my non-clustering index (IDX_MyTable_ConveringIndexPre2005) created above, the following code would be used:
CREATE INDEX IDX_MyTable_CoveringIndex2005 ON dbo.MyTable (ID) INCLUDE (ModifiedDate, ShortDescription);
Here you can see I have an INCLUDE clause to identify the columns ModifiedData and ShortDescription. These columns do not need to be included in the upper index pages of this covering index.
If you have already converted your database to SQL Server 2005 or 2008 from some release prior to SQL Server 2005 did you consider converting your covering indexes to use the INCLUDE clause? If not, those covering indexes are going to be less efficient, and take up more disk space. If you didnt recreate your covering index do not fret, from the polls I have done at a few user groups that Ive presented at, most people didnt recreate their covering indexes using the INCLUDE clause when they migrated to SQL Server 2005 or 2008. Heck in some cases people probable do not even remember which indexes are the covering indexes.
Improving Your Application Performance with Well Designed Indexes
Having your application run as fast as possible is always important. By implementing appropriate indexes to support joining tables, you minimize the amount of work SQL Server needs to do to produce your record sets. Using the INCLUDE clause not only helps speed up your covering index queries but also reduces the amount of disk space required when implementing a covering index. By following indexing guidelines, you do your part to optimize your database design.