Database Indexing Development Lifecycle...Say What?
December 31, 2009
This is the first article of a series where Ill be exploring Indexing Guidelines. Having appropriate indexes on your database is critical in making sure your application is retrieving records quickly. However, having too many indexes can also slow down your application. In this series of articles, I hope to help you understand how to determine what columns you should consider for indexes, and how those columns should be used in indexes to optimize performance. In this first article, I will be discussing the indexing development lifecycle.
What Normally Happens
It has been my experience that designing indexes for a database always seems to be an afterthought. Now I dont mean all indexes. We all know that when people are putting together their database diagram, at a minimum they do identify the primary keys and the foreign key relationships. But that doesnt mean they build any physical indexes from these diagrams.
As developers are building their application of course, they add indexes, but they seem to add them when they need them. If they should identify some query is running slow they will look at the execution plan to determine what is happening then they will build an index to speed up the query.
Even after they place their application into production, they might add an index or two. But then again they do this normally only after someone complains about slow performance.
Is this the best approach for developing an efficient indexing strategy for a database? Id like to say in most cases probably not. Developers and DBAs need to be more proactive when it comes to developing indexes. Moreover, the index development process should not stop once an application gets into production. Developers and DBAs should continue to tweak the indexes long after an application goes into production. Id like to suggest that managing indexes for a database should go through a development cycle just like you do with your code.
Index Development Lifecycle
The notion of building indexes and forgetting about them should not be used as philosophy when thinking about database indexes. Indexes need to be well thought out and tweaked over time. You need to develop an indexing development lifecycle to build and manage your indexes appropriately over time. Ill give you some ideas that you can use to establish an indexing development lifecycle for your environment.
In most IT shops, there are at least three different environments: Production, Quality Assurance/Test, and Development. Your T-SQL code migrates through these different environments as your code progresses from one development phase to another, and so should your indexes. Therefore, why not have the following lifecycle phases for developing and maintaining indexes: Design, Development, Acceptance Testing, Production, and Maintenance. Let me go through each one of these phases and discuss the kinds of tasks you should consider performing in each phase.
The design phase for indexes is just like the design phase for developing code. In this phase, you should look at the data model of your new database and consider the processing requirements your application will need to go through to meet the business rules you have defined. Your programs will need to read data a particular way to build reports. Alternatively, you will have an online screen that will allow users to enter some search criteria, so different screens can be displayed. Play attention to these different data access requirements of the application. If a report needs data to come back in sorted order based on a column value then this column would be a good candidate for being in an index. If customers need to enter an ID and date range to return some customer records on a screen, then the ID and date column associated with the date range would be additional candidate columns for indexes.
In the design phase, you need to get a sense for which columns are being used, and what order those columns will be returned to the application. You can then use this information to design some best guesses at what indexes your application will need. By doing this data analysis you will have the information you need to start identifying some indexes that will most likely be useful for your application.
In the development phase, you will review how well those best guess indexes are meeting the needs of your application. Just like any other development phase, this is the phase where you will be tweaking those indexes in your design when you realize they are not meeting your application data access requirements.
As you find your code is performing poorly you will add more indexes. Keep in mind indexes dont come for free. The more indexes you have the more costly an INSERT, UPDATE and DELETE statement will be if it has to update a bunch of indexes. Therefore, create indexes for those queries in your application that are going to be run frequently. If you also have monthly or yearly queries that take a long time, possibly you can do without an index. Possibly you can create these indexes once a month, or year for these monthly/yearly processes. These are the kinds of decisions you need to make when you are developing indexes. You need to have well balanced indexes so your application performs acceptable most of the time for the frequently run queries. You need to make sure you do not have too many indexes that cause the INSERT, UPDATE, and DELETE statements to take a long time.
Once you are done with developing your application code, but prior to moving into user acceptance testing, you need to go back and review the indexes you have. Make sure all the indexes you created during your development phase are the ones that really made your queries run faster and they are not ones you created that provided little value. If you removed those useless indexes as you went along then you probably can omit this step. Another thing to do is verify that you dont have any duplicate indexes. If you have duplicate indexes you are just wasting resources, both disk space and the cost of maintaining them over time. Here is a script, written by Paul Neilson, that can be used to identify those duplicates:
-- From Paul Neilson: http://sqlblog.com/blogs/paul_nielsen/archive/2008/06/25/find-duplicate-indexes.aspx -- exact duplicates with indexcols as ( select object_id as id, index_id as indid, name, (select case keyno when 0 then NULL else colid end as [data()] from sys.sysindexkeys as k where k.id = i.object_id and k.indid = i.index_id order by keyno, colid for xml path('')) as cols, (select case keyno when 0 then colid else NULL end as [data()] from sys.sysindexkeys as k where k.id = i.object_id and k.indid = i.index_id order by colid for xml path('')) as inc from sys.indexes as i ) select object_schema_name(c1.id) + '.' + object_name(c1.id) as 'table', c1.name as 'index', c2.name as 'exactduplicate' from indexcols as c1 join indexcols as c2 on c1.id = c2.id and c1.indid < c2.indid and c1.cols = c2.cols and c1.inc = c2.inc;
Acceptance Testing Phase:
Now that you have promoted your database and application code to the next level in the development lifecycle, you need to start reviewing those indexes you created. Hopefully, in this phase of your application development cycle you will have users actually trying out your application. They will be running your application code through its paces by testing those reports, and search capability youve built into your application. Having your testing staff use your application gives you an opportunity to now see which indexes are being used by real application usage.
One of the things you should consider doing in this phase is to monitor your index usage. Look at how often each index is being used. Pay particular attention to which indexes are being updated heavily. As part of this analysis, you should determine those indexes that are constantly being updated, and never used. For those indexes that are not being used but are being update, you might want to consider whether you need those indexes. In addition, for indexes that are being used and updated frequently you might want to review the FILLFACTOR to make sure it is appropriate, so that you dont get too many page splits. You can use the Dynamic Management Views (DVMs) that became available with SQL Server 2005 to find index usage information. Keep in mind this dynamic information is only collected since the last time SQL Server started, so it might not contain all of the statistics associated with indexes, especially those indexes that are infrequently used. Here is a sample script that shows you the index usage information for the AdventureWorks database.
USE AdventureWorks; GO -- Display Index Usage Information -- Written By Gregory A. Larsen SELECT o.name Object_Name, SCHEMA_NAME(o.schema_id) Schema_name, i.name Index_name, i.Type_Desc, CASE WHEN (s.user_seeks > 0 OR s.user_scans > 0 OR s.user_lookups > 0) AND s.user_updates > 0 THEN 'USED AND UPDATED' WHEN (s.user_seeks > 0 OR s.user_scans > 0 OR s.user_lookups > 0) AND s.user_updates = 0 THEN 'USED AND NOT UPDATED' WHEN s.user_seeks IS NULL AND s.user_scans IS NULL AND s.user_lookups IS NULL AND s.user_updates IS NULL THEN 'NOT USED AND NOT UPDATED' WHEN (s.user_seeks = 0 AND s.user_scans = 0 AND s.user_lookups = 0) AND s.user_updates > 0 THEN 'NOT USED AND UPDATED' ELSE 'NONE OF THE ABOVE' END AS Usage_Info, COALESCE(s.user_seeks,0) AS user_seeks, COALESCE(s.user_scans,0) AS user_scans, COALESCE(s.user_lookups,0) AS user_lookups, COALESCE(s.user_updates,0) AS user_updates FROM sys.objects AS o JOIN sys.indexes AS i ON o.object_id = i.object_id LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s ON i.object_id = s.object_id AND i.index_id = s.index_id WHERE o.type = 'U' -- Clustered and Non-Clustered indexes AND i.type IN (1, 2) AND (DB_NAME(s.database_id) = 'SmarTPH' or s.database_id IS NULL);
Another thing to look at is missing index statistics. As the database engine is processing queries, it determines whether a query would perform better if an additional index were added to the database. This information is called missing indexes. The missing index information can be exposed using DMVs. You can use this missing index information to determine which new indexes you might need. Keep in mind there might be lots of missing indexes identified. You should only consider adding those missing indexes that are identified to have a large number user seeks and scans. These are the missing indexes that will be used quite frequently. Once again, you need to keep a good balance of indexes, so dont add every missing index. Here is a query that identifies the missing indexes for the AdventureWorks DB, and orders them by user scans and seeks:
-- Build Create Index Statements From Missing Indexes -- Written By Gregory A. Larsen SELECT DB_NAME(d.database_id) AS DBNAME, d.statement AS [ObjectName], gs.unique_compiles, gs.user_seeks, gs.user_scans, gs.avg_total_user_cost, gs.avg_user_impact, 'CREATE INDEX MissingIndex_' + rtrim(cast(d.index_handle AS char(100))) + ' ON ' + d.statement + ' (' + CASE WHEN equality_columns IS NOT NULL THEN equality_columns ELSE '' END + CASE WHEN equality_columns IS NOT NULL AND inequality_columns IS NOT NULL THEN ', ' ELSE '' END + CASE WHEN inequality_columns IS NOT NULL THEN inequality_columns ELSE '' END + ') ' + CASE WHEN included_columns IS NOT NULL THEN 'INCLUDE (' + included_columns + ')' ELSE '' END AS MissingIndex FROM sys.dm_db_missing_index_groups g join sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle join sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle WHERE DB_NAME(d.database_id) = 'AdventureWorks' ORDER BY gs.user_scans, gs.user_seeks DESC;
After you have reviewed your index usage statistic and your missing indexes, you need to determine what index modification you need. Make any index modification you need first in your development environment, and then promote them up to your acceptance testing environment. Every time you add new indexes to your environment, dont forget to review all of your indexes to make sure you have not included any new duplicate indexes.
Once you have promoted your code and your database to the production environment, are you done developing indexes? No! There is always index work to consider once you are in the production phase.
Now that your application code is finalized and in production, you will be able to examine real life usage of your application to determine how those indexes are really being used and updated. Therefore, you should start gathering index usage statistics. You need to consider keeping the index usage statistics over time, as well as the missing index information. Keeping your index statistics and missing index information over time will allow you to be able to determine how useful those indexes really are, and how often they have been updated.
After you have tracked your index usage information for a week, a month, and/or a year you will really be able to tell which indexes really have not been used. After you have gathered a sufficient amount of index usage and missing index information, you can use this information to manage your indexes. This collected index usage information will help identify those indexes that are being updated frequently or rarely, if ever used, as well as those missing indexes that are frequently missing. You can then use this information to tweak your indexes. Lastly dont forget to look for duplicate indexes once and while, especially after you have modified your indexes and/or added new indexes.
The last phase of the index development lifecycle is maintenance. Of course, you have been doing index maintenance all along the way by added, dropping and modifying indexes. However, this is not the kind of maintenance I am talking about. Im talking about running index rebuilding, index reorganization and updating index statistics.
As your indexes are updated, they will become fragmented. The more fragmented your indexes the more pages they take up, the more I/O it takes to traverse the index, and the slower they perform. Periodically you need to review your index fragmentation to determine how fragmented your indexes are. By reviewing the fragmentation information, you will be able to determine if you should identify a new FILL FACTOR for an index. Based on how fragmented your indexes are you might want to perform an index rebuild verses an index organization operation. The Microsoft recommendation is to rebuild an index if the index fragmentation is greater than 30% and reorganize it if the index fragment is between 5% and 30%. Here is some code that will help you identify the index fragment information for your indexes:
USE AdventureWorks; GO -- Display Index Fragmentation -- Written by Gregory A. Larsen SELECT DB_NAME(ps.[database_id]) AS [database_name], OBJECT_NAME(ps.[object_id], DB_ID()) AS [object_name], si.[name] AS [index_name], ps.partition_number, ps.index_type_desc, ps.alloc_unit_type_desc, ps.index_level, ps.[avg_fragmentation_in_percent], ps.[page_count] FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ps JOIN sys.sysindexes si ON ps.OBJECT_ID = si.id AND ps.index_id = si.indid WHERE index_type_desc <> 'HEAP' ORDER BY OBJECT_NAME(ps.[object_id], DB_ID()), index_id
One last thing to talk about is index statistics. When you create an index, SQL Server generates index statistics. SQL Server will also keep these statistics up to date as your index gets updates, provided your database setting for AUTO_CREATE_STATISTICS is on (which is the default). The database engine keeps these statistics up to date based the percentage of pages that have been updated. When around 20% or more of the data rows have been updated, SQL Server will automatically update your index statistics. Keep in mind when SQL Server automatically creates statistics it does s by only sampling that data rows. If your tables are quite large, in the millions or billions of rows, it might take quite a while to update 20% of the rows for the statistics to get automatically updated. Therefore, because of the sampling method and the 20% rule you might want to consider routinely updating your index statistics manually. Doing this will give the database engine updated statistics and this can help drastically in improving the performance of your application. Up to date statistics allow the database engine to make the appropriate choices when selecting an execution plan. Here is a script that will tell you when the statistics were last updated:
USE AdventureWorks; GO -- Display Index Statistic Update Date -- Wirtten by Gregory A. Larsen SELECT s.name, o.name,i.name, STATS_DATE(i.object_id, i.index_id) StatisticsLastUpdated , i.type_desc FROM sys.indexes I JOIN sys.objects o ON i.object_id = o.object_id JOIN sys.schemas s ON o.schema_id = s.schema_id WHERE o.name NOT like 'sys%' AND STATS_DATE(i.object_id,i.index_id) IS NOT NULL ORDER BY STATS_DATE(i.object_id, i.index_id)
Good Indexes Are Not a Mistake
Creating good indexes for your application does not happen by mistake. You need to have a plan for how you will develop your indexes. Think of developing indexes using an indexing development lifecycle approach. Using this method gives you the best shot at hitting the mark when it comes to developing good indexes for your application.