Database Indexing Development Lifecycle...Say What?

This is the first article of a series where I’ll be
exploring Indexing Guidelines. Having appropriate indexes on your database is
critical in making sure your application is retrieving records quickly. However,
having too many indexes can also slow down your application. In this series of
articles, I hope to help you understand how to determine what columns you
should consider for indexes, and how those columns should be used in indexes to
optimize performance. In this first article, I will be discussing the indexing
development lifecycle.

What Normally Happens

It has been my experience that designing indexes for a
database always seems to be an afterthought. Now I don’t mean all indexes.
We all know that when people are putting together their database diagram, at a
minimum they do identify the primary keys and the foreign key relationships.
But that doesn’t mean they build any physical indexes from these diagrams.

As developers are building their application of course, they
add indexes, but they seem to add them when they need them. If they should
identify some query is running slow they will look at the execution plan to
determine what is happening then they will build an index to speed up the
query.

Even after they place their application into production,
they might add an index or two. But then again they do this normally only
after someone complains about slow performance.

Is this the best approach for developing an efficient
indexing strategy for a database? I’d like to say in most cases probably not.
Developers and DBAs need to be more proactive when it comes to developing
indexes. Moreover, the index development process should not stop once an
application gets into production. Developers and DBAs should continue to tweak
the indexes long after an application goes into production. I’d like to
suggest that managing indexes for a database should go through a development
cycle just like you do with your code.

Index Development Lifecycle

The notion of building indexes and forgetting about them
should not be used as philosophy when thinking about database indexes. Indexes
need to be well thought out and tweaked over time. You need to develop an
indexing development lifecycle to build and manage your indexes appropriately
over time. I’ll give you some ideas that you can use to establish an indexing
development lifecycle for your environment.

In most IT shops, there are at least three different
environments: Production, Quality Assurance/Test, and Development. Your T-SQL
code migrates through these different environments as your code progresses from
one development phase to another, and so should your indexes. Therefore, why
not have the following lifecycle phases for developing and maintaining indexes:
Design, Development, Acceptance Testing, Production, and Maintenance. Let me
go through each one of these phases and discuss the kinds of tasks you should
consider performing in each phase.

Design Phase:

The design phase for indexes is just like the design phase
for developing code. In this phase, you should look at the data model of your
new database and consider the processing requirements your application will
need to go through to meet the business rules you have defined. Your programs
will need to read data a particular way to build reports. Alternatively, you
will have an online screen that will allow users to enter some search criteria,
so different screens can be displayed. Play attention to these different data
access requirements of the application. If a report needs data to come back in
sorted order based on a column value then this column would be a good candidate
for being in an index. If customers need to enter an ID and date range to
return some customer records on a screen, then the ID and date column
associated with the date range would be additional candidate columns for
indexes.

In the design phase, you need to get a sense for which columns
are being used, and what order those columns will be returned to the
application. You can then use this information to design some best guesses at
what indexes your application will need. By doing this data analysis you will
have the information you need to start identifying some indexes that will most
likely be useful for your application.

Development Phase:

In the development phase, you will review how well those
best guess indexes are meeting the needs of your application. Just like any
other development phase, this is the phase where you will be tweaking those
indexes in your design when you realize they are not meeting your application
data access requirements.

As you find your code is performing poorly you will add more
indexes. Keep in mind indexes don’t come for free. The more indexes you have
the more costly an INSERT, UPDATE and DELETE statement will be if it has to
update a bunch of indexes. Therefore, create indexes for those queries in your
application that are going to be run frequently. If you also have monthly or
yearly queries that take a long time, possibly you can do without an index.
Possibly you can create these indexes once a month, or year for these
monthly/yearly processes. These are the kinds of decisions you need to make
when you are developing indexes. You need to have well balanced indexes so
your application performs acceptable most of the time for the frequently run
queries. You need to make sure you do not have too many indexes that cause the
INSERT, UPDATE, and DELETE statements to take a long time.

Once you are done with developing your application code, but
prior to moving into user acceptance testing, you need to go back and review
the indexes you have. Make sure all the indexes you created during your
development phase are the ones that really made your queries run faster and
they are not ones you created that provided little value. If you removed those
useless indexes as you went along then you probably can omit this step.
Another thing to do is verify that you don’t have any duplicate indexes. If
you have duplicate indexes you are just wasting resources, both disk space and
the cost of maintaining them over time. Here is a script, written
by Paul Neilson, that can be used to identify those duplicates:


— From Paul Neilson: http://sqlblog.com/blogs/paul_nielsen/archive/2008/06/25/find-duplicate-indexes.aspx
— exact duplicates
with indexcols as
(
select object_id as id, index_id as indid, name,
(select case keyno when 0 then NULL else colid end as [data()]
from sys.sysindexkeys as k
where k.id = i.object_id
and k.indid = i.index_id
order by keyno, colid
for xml path(”)) as cols,
(select case keyno when 0 then colid else NULL end as [data()]
from sys.sysindexkeys as k
where k.id = i.object_id
and k.indid = i.index_id
order by colid
for xml path(”)) as inc
from sys.indexes as i
)
select
object_schema_name(c1.id) + ‘.’ + object_name(c1.id) as ‘table’,
c1.name as ‘index’,
c2.name as ‘exactduplicate’
from indexcols as c1
join indexcols as c2
on c1.id = c2.id
and c1.indid < c2.indid
and c1.cols = c2.cols
and c1.inc = c2.inc;

Acceptance Testing Phase:

Now that you have promoted your database and application
code to the next level in the development lifecycle, you need to start
reviewing those indexes you created. Hopefully, in this phase of your
application development cycle you will have users actually trying out your
application. They will be running your application code through its paces by
testing those reports, and search capability you’ve built into your
application. Having your testing staff use your application gives you an
opportunity to now see which indexes are being used by real application usage.

One of the things you should consider doing in this phase is
to monitor your index usage. Look at how often each index is being used. Pay
particular attention to which indexes are being updated heavily. As part of
this analysis, you should determine those indexes that are constantly being
updated, and never used. For those indexes that are not being used but are
being update, you might want to consider whether you need those indexes. In
addition, for indexes that are being used and updated frequently you might want
to review the FILLFACTOR to make sure it is appropriate, so that you don’t get
too many page splits. You can use the Dynamic Management Views (DVM’s) that
became available with SQL Server 2005 to find index usage information. Keep in
mind this dynamic information is only collected since the last time SQL Server
started, so it might not contain all of the statistics associated with indexes,
especially those indexes that are infrequently used. Here is a sample script
that shows you the index usage information for the AdventureWorks database.


USE AdventureWorks;
GO
— Display Index Usage Information
— Written By Gregory A. Larsen
SELECT o.name Object_Name,
       SCHEMA_NAME(o.schema_id) Schema_name,
       i.name Index_name,
       i.Type_Desc,
       CASE WHEN (s.user_seeks > 0
               OR s.user_scans > 0
               OR s.user_lookups > 0)
              AND s.user_updates > 0
            THEN ‘USED AND UPDATED’
            WHEN (s.user_seeks > 0
               OR s.user_scans > 0
               OR s.user_lookups > 0)
              AND s.user_updates = 0
            THEN ‘USED AND NOT UPDATED’
            WHEN  s.user_seeks IS NULL
              AND s.user_scans IS NULL
              AND s.user_lookups IS NULL
              AND s.user_updates IS NULL
            THEN ‘NOT USED AND NOT UPDATED’
            WHEN (s.user_seeks = 0
              AND s.user_scans = 0
              AND s.user_lookups = 0)
              AND s.user_updates > 0
            THEN ‘NOT USED AND UPDATED’
            ELSE ‘NONE OF THE ABOVE’
            END AS Usage_Info,
       COALESCE(s.user_seeks,0) AS user_seeks,
       COALESCE(s.user_scans,0) AS user_scans,
       COALESCE(s.user_lookups,0) AS user_lookups,
       COALESCE(s.user_updates,0) AS user_updates
 FROM sys.objects AS o
     JOIN sys.indexes AS i
 ON o.object_id = i.object_id
     LEFT OUTER JOIN
  sys.dm_db_index_usage_stats AS s
 ON i.object_id = s.object_id
  AND i.index_id = s.index_id
 WHERE o.type = ‘U’
 — Clustered and Non-Clustered indexes
  AND i.type IN (1, 2)
  AND (DB_NAME(s.database_id) = ‘SmarTPH’ or s.database_id IS NULL);

Another thing to look at is missing index statistics. As
the database engine is processing queries, it determines whether a query would
perform better if an additional index were added to the database. This
information is called missing indexes. The missing index information can be
exposed using DMV’s. You can use this missing index information to determine which
new indexes you might need. Keep in mind there might be lots of missing
indexes identified. You should only consider adding those missing indexes that
are identified to have a large number user seeks and scans. These are the
missing indexes that will be used quite frequently. Once again, you need to
keep a good balance of indexes, so don’t add every missing index. Here is a
query that identifies the missing indexes for the AdventureWorks DB, and orders
them by user scans and seeks:


— Build Create Index Statements From Missing Indexes
— Written By Gregory A. Larsen
SELECT DB_NAME(d.database_id) AS DBNAME,
      d.statement AS [ObjectName],
       gs.unique_compiles,
       gs.user_seeks,
       gs.user_scans,
       gs.avg_total_user_cost,
       gs.avg_user_impact,
       ‘CREATE INDEX MissingIndex_’ + rtrim(cast(d.index_handle AS char(100))) +
       ‘ ON ‘ + d.statement + ‘ (‘ +
       CASE WHEN equality_columns IS NOT NULL THEN equality_columns  ELSE ” END +
       CASE WHEN equality_columns IS NOT NULL AND
                 inequality_columns IS NOT NULL THEN ‘, ‘ ELSE ” END +
       CASE WHEN inequality_columns IS NOT NULL THEN inequality_columns ELSE ” END + ‘) ‘ +
       CASE WHEN included_columns IS NOT NULL THEN ‘INCLUDE (‘ + included_columns + ‘)’ ELSE ” END AS MissingIndex
FROM sys.dm_db_missing_index_groups g
	join sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle
	join sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle
WHERE DB_NAME(d.database_id) = ‘AdventureWorks’
ORDER BY gs.user_scans, gs.user_seeks DESC;

After you have reviewed your index usage statistic and your
missing indexes, you need to determine what index modification you need. Make
any index modification you need first in your development environment, and then
promote them up to your acceptance testing environment. Every time you add
new indexes to your environment, don’t forget to review all of your indexes to
make sure you have not included any new duplicate indexes.

Production Phase:

Once you have promoted your code and your database to the
production environment, are you done developing indexes? No! There is always
index work to consider once you are in the production phase.

Now that your application code is finalized and in production,
you will be able to examine real life usage of your application to determine
how those indexes are really being used and updated. Therefore, you should
start gathering index usage statistics. You need to consider keeping the index
usage statistics over time, as well as the missing index information. Keeping
your index statistics and missing index information over time will allow you to
be able to determine how useful those indexes really are, and how often they
have been updated.

After you have tracked your index usage information for a
week, a month, and/or a year you will really be able to tell which indexes
really have not been used. After you have gathered a sufficient amount of
index usage and missing index information, you can use this information to
manage your indexes. This collected index usage information will help
identify those indexes that are being updated frequently or rarely, if ever
used, as well as those missing indexes that are frequently missing. You can
then use this information to tweak your indexes. Lastly don’t forget to look
for duplicate indexes once and while, especially after you have modified your
indexes and/or added new indexes.

Maintenance Phase:

The last phase of the index development lifecycle is
maintenance. Of course, you have been doing index maintenance all along the
way by added, dropping and modifying indexes. However, this is not the kind of
maintenance I am talking about. I’m talking about running index rebuilding,
index reorganization and updating index statistics.

As your indexes are updated, they will become fragmented.
The more fragmented your indexes the more pages they take up, the more I/O it
takes to traverse the index, and the slower they perform. Periodically you
need to review your index fragmentation to determine how fragmented your
indexes are. By reviewing the fragmentation information, you will be able to
determine if you should identify a new FILL FACTOR for an index. Based on how
fragmented your indexes are you might want to perform an index rebuild verses
an index organization operation. The Microsoft recommendation is to rebuild
an index if the index fragmentation is greater than 30% and reorganize it if
the index fragment is between 5% and 30%. Here is some code that will help you
identify the index fragment information for your indexes:


USE AdventureWorks;
GO
— Display Index Fragmentation
— Written by Gregory A. Larsen
SELECT DB_NAME(ps.[database_id]) AS [database_name],
       OBJECT_NAME(ps.[object_id], DB_ID()) AS [object_name],
       si.[name] AS [index_name], ps.partition_number,
       ps.index_type_desc, ps.alloc_unit_type_desc,
       ps.index_level,
       ps.[avg_fragmentation_in_percent],
       ps.[page_count]
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, ‘LIMITED’) ps
     JOIN sys.sysindexes si
     ON  ps.OBJECT_ID = si.id
     AND ps.index_id = si.indid
WHERE index_type_desc <> ‘HEAP’
ORDER BY OBJECT_NAME(ps.[object_id], DB_ID()), index_id

One last thing to talk about is index statistics. When you
create an index, SQL Server generates index statistics. SQL Server will also
keep these statistics up to date as your index gets updates, provided your
database setting for “AUTO_CREATE_STATISTICS” is on (which is the default).
The database engine keeps these statistics up to date based the percentage of
pages that have been updated. When around 20% or more of the data rows have
been updated, SQL Server will automatically update your index statistics.
Keep in mind when SQL Server automatically creates statistics it does s by only
sampling that data rows. If your tables are quite large, in the millions or
billions of rows, it might take quite a while to update 20% of the rows for the
statistics to get automatically updated. Therefore, because of the sampling
method and the 20% rule you might want to consider routinely updating your
index statistics manually. Doing this will give the database engine updated
statistics and this can help drastically in improving the performance of your
application. Up to date statistics allow the database engine to make the
appropriate choices when selecting an execution plan. Here is a script that
will tell you when the statistics were last updated:


USE AdventureWorks;
GO
— Display Index Statistic Update Date
— Wirtten by Gregory A. Larsen
SELECT  s.name, o.name,i.name, STATS_DATE(i.object_id, i.index_id) StatisticsLastUpdated , i.type_desc
FROM sys.indexes I
JOIN sys.objects o
ON i.object_id = o.object_id
JOIN sys.schemas s
ON o.schema_id = s.schema_id
WHERE o.name NOT like ‘sys%’
  AND STATS_DATE(i.object_id,i.index_id) IS NOT NULL
ORDER BY STATS_DATE(i.object_id, i.index_id)

Good Indexes Are Not a Mistake

Creating good indexes for your application does not happen
by mistake. You need to have a plan for how you will develop your indexes.
Think of developing indexes using an indexing development lifecycle approach.
Using this method gives you the best shot at hitting the mark when it comes to
developing good indexes for your application.

»

See All Articles by ColumnistGregory A. Larsen