Proactive Database Index Creation


Indexes help your application find your data quickly and provide users with a well performing application, while minimizing server resources. This article discusses indexing guidelines related to join tables and covering indexes.

This is the fourth article in my indexing guideline series.
You need to develop indexes to help your application find your data quickly and
to provides your users with a well performing application, while minimizing
server resources. Adding indexes most of the time seems like an afterthought
for most development efforts. I hope that this article and the rest in this
series will give you some ideas on how to be more proactive in creating indexes
for your database. In this article, I will be discussing indexing guidelines
related to join tables and covering indexes.

Building Indexes to Help With Table Joins

In a relational database design, your data is normally
broken up in to different tables and related to each other by key values.
These key values are used to join tables together to return the desired results
from your query. If you place referential integrity constraints in your
database, by creating foreign key constraints, SQL Server will use these key
values to ensure your referential integrity rules are not broken as you try to
INSERT, UPDATE, or DELETE records from a table. By building appropriate
indexes, you can speed up the different table join operations.

To build useful indexes you need to review your code and
determine what columns are used to join your tables. The columns you identify
will be the columns that should be incorporated into one or more indexes. Keep
in mind how often the JOIN operation will be performed. If you have a query
that is only executed once a year, during some annual nightly batch process,
then possibly having an index to support that annual processing will not be
that beneficial. Whereas if you have a JOIN operation that is executed every
few seconds then an index to support that JOIN would provide big value.

If a single column from each table is used to join two
tables, then verify that this column is identified in an index. As a guideline
make sure the join column is either in an index that only contains that column
or it is at the beginning of a multi-column index.

To demonstrate the value of having an index to support a
JOIN operation, run the following example:

USE AdventureWorks;
go
SET NOCOUNT ON;
SELECT * INTO dbo.MySalesOrderDetail
    FROM Sales.SalesOrderDetail;
SELECT * INTO dbo.MySalesOrderHeader
    FROM Sales.SalesOrderHeader;
SET STATISTICS IO ON;
SELECT OD.SalesOrderID FROM dbo.MySalesOrderDetail OD
         JOIN dbo.MySalesOrderHeader OH
         ON OD.SalesOrderID = OH.SalesOrderID
         WHERE OD.SalesOrderID > 45660 and OD.SalesOrderID < 46660
SET STATISTICS IO OFF;
CREATE INDEX IDX_MySalesOrderHeader_SalesOrderID ON dbo.MySalesOrderHeader(SalesOrderId);
CREATE INDEX IDX_MySalesOrderDetail_SalesOrderID ON dbo.MySalesOrderDetail(SalesOrderId);
SET STATISTICS IO ON;
SELECT OD.SalesOrderID FROM dbo.MySalesOrderDetail OD
         JOIN dbo.MySalesOrderHeader OH
         ON OD.SalesOrderID = OH.SalesOrderID
         WHERE OD.SalesOrderID > 45660 and OD.SalesOrderID < 46660
SET STATISTICS IO OFF;
DROP TABLE dbo.MySalesOrderHeader;
DROP TABLE dbo.MySalesOrderDetail;

When I run this code against my Adventure works database, I
see the first SELECT statement does an INDEX SCAN operation against both tables
in the query. It also requires 2,294 logical reads to resolve this SELECT
statement, since neither table has an index on the column (SalesOrderID), which
is used in the JOIN operation. Whereas the second SELECT statement, which is
run after creating two non-clustered indexes, requires only 17 logical reads to
resolve this query. That is a significant saving in I/O for the cost of a
couple of indexes to support the JOIN operation.

Covering Index

What is a covering index? A covering index is a
non-clustered index that contains all the columns needed to resolve a single
query. For instance, say you have a really, wide table with say 500 different
columns, but you query that table and reference only five columns in your query
statement. If you created an index that contained at least those five columns
and that index is used by the query engine then it is considered a covering
index.

Covering indexes minimize the amount of I/O the query engine
needs to perform to resolve a query. Reducing I/O is the name of the game when
trying to optimize your queries. I/O operations are slow, so any time you can
significantly reduce I/O you are getting the result set quicker and taxing the
I/O sub-system less, not to mention the reduced load on the buffer pool.

The concept of a covering index has been around for a long
time. However Microsoft introduced the INCLUDE clause in the CREATE INDEX
statement with SQL Server 2005 to further reduced the amount of space and I/O
required by covering indexes. The INCLUDE clause places the values of the
covering index columns only in the leaf level pages. Let me explore how to
build covering indexes before the introduction of SQL Server 2005 and then we
will look at how to create the same covering index with SQL Server 2005.

Here is a CREATE TABLE and a SELECT statement that we will
be using to explore covering index implementation:

CREATE TABLE MyTable(
      ID int Primary Key,
      ShortDescription varchar(240), 
      LongDescription varchar(4000), 
      FirstName varchar(100), 
      LastName varchar(100), 
      EmailAddress varchar(100),
      CreateDate DateTime, 
      ModifiedDate DateTime)
SELECT ModifiedDate, ShortDescription, ID
      FROM dbo.MyTable 
      WHERE ID > '5909127';   

By looking at the table definition, you can get a sense of
how big a row might be. A row might be very large if each record contained a lengthy
LongDescription value. This SELECT statement only uses three different
columns (ModifiedDate, ShortComment, ID). To create a covering index all three
of these columns will need to be included in the non-clustered index. Here is
the syntax to create a covering index in versions of SQL Server prior to SQL
Server 2005 (no INCLUDE clause allowed).

CREATE INDEX IDX_MyTable_CoveringIndexPre2005 
ON dbo.MyTable
      (ID, ModifiedDate, ShortDescription); 

Here you can see I have included all those the columns used in the above SELECT
statement in the index. After creating this index, anytime I run the above
SELECT statement, SQL Server will be able use my covering index to resolve the
query instead of going to the clustered index to resolve this query. This would
greatly reduce the number of I/O’s SQL Server would need to perform, by not
having to read the larger clustered index records.

The above syntax for creating a covering index is still
valid in SQL Server 2005, but creating a covering index in this way places
values for the index columns up in the upper index pages of the index, as well
as having the values in each row of the leaf pages. In my SELECT statement
above I only reference the ID column in my WHERE statement, so the database
engine doesn’t need to use the values for the ModifiedDate, and ShortComment
columns in my upper index pages when trying to determine which leaf pages it
need to read to return the result set. This is why Microsoft introduced the
INCLUDE clause in SQL Server 2005.

The INCLUDE option, of the CREATE INDEX statement, allows
you to create a covering index without putting the values of those columns not
used in WHERE or JOIN predicates in the upper index pages. By identifying the
non-searchable columns in the INCLUDE column they will only be included in the
leaf pages of a non-clustered index. By doing this, the size of the
non-clustered covering index using the INCLUDE clause is smaller than the
pre-2005 way of creating covering index. A smaller index equates to less I/O
when reading the upper index pages of a covering index. To create a covering
index using the INCLUDE clause to mirror my non-clustering index
(IDX_MyTable_ConveringIndexPre2005) created above, the following code would be
used:

CREATE INDEX IDX_MyTable_CoveringIndex2005 
      ON dbo.MyTable
(ID) INCLUDE (ModifiedDate, ShortDescription);

Here you can see I have an “INCLUDE” clause to identify the columns
“ModifiedData” and “ShortDescription”. These columns do not need to be
included in the upper index pages of this covering index.

If you have already converted your database to SQL Server
2005 or 2008 from some release prior to SQL Server 2005 did you consider
converting your covering indexes to use the INCLUDE clause? If not, those covering
indexes are going to be less efficient, and take up more disk space. If you
didn’t recreate your covering index do not fret, from the polls I have done at
a few user groups that I’ve presented at, most people didn’t recreate their
covering indexes using the INCLUDE clause when they migrated to SQL Server 2005
or 2008. Heck in some cases people probable do not even remember which
indexes are the covering indexes.

Improving Your Application Performance with Well Designed Indexes

Having your application run as fast as possible is always
important. By implementing appropriate indexes to support joining tables, you
minimize the amount of work SQL Server needs to do to produce your record
sets. Using the INCLUDE clause not only helps speed up your covering index
queries but also reduces the amount of disk space required when implementing a
covering index. By following indexing guidelines, you do your part to optimize
your database design.

»


See All Articles by Columnist

Gregory A. Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles