Clustered Index
Rarely do I rely on a clustered index for improved performance, since a
covered index can easily beat the performance of a clustered index. A clustered
index is really just a fatter covered index. The books will tell you there is
only one clustered index. That is true. But if you have the disk space and
insert/update performance is not a problem, you can repeat all the table columns
(up to 16) in as many indexes as you like (up to 249) for the same or better
search performance than a clustered index.
Foreign Keys and Indexing
Foreign Keys can be a factor in insert, update and delete statements. Since
foreign keys have to be checked, an update or delete in one table can result in
a check in another table for the existence of related data.
Create table
appointment(apptID int, ...)
Create table
appointmentPerson(personID int, apptID int, ...)
When deleting an appointment, a check in the appointmentPerson
table will be
performed. If there is no index with a first column of apptID
, an index or table
scan will be performed checking every page in the table. As in previous examples,
a useful index can make a huge performance gain in this operation.
Foreign key fields like this are also good candidates for indexing because of
their likelihood of being involved in joins and used as search criteria. SQL Dev
Pro, my freeware database utility, has a function to search your database for
Foreign Key fields and suggest indexes based on FK's and the rowcounts
involved. You can download the tool from www.itdevpros.com.
Move Functions Out of Search Criteria
Oftentimes, SQL functions will stop an index from being used. The getdate()
function is a prime example:
From
Select *
from aTable
where datediff(day, theDate, getdate()) < 3
To
Declare @afterDate datetime
Select @afterDate = dateadd(day, -3, getdate())
Select *
from aTable
where theDate < @afterDate
Don't be afraid of Over Indexing
With storage becoming cheaper every day and most of the systems I'm working
on being more search intensive than update intensive, I have not encountered
problems adding extra indexes. While it is true they impede insert, delete, and
update performance, usually these are single row operations which complete very
quickly, while select operations more commonly require multiple tables and
multiple rows. In most situations, the benefits of additional indexes will far
outweigh their cost. YMMV.
From BOL:
Indexes not only speed up the retrieval of rows for selects, they also
usually increase the speed of updates and deletes. This is because SQL Server
must first find a row before it can update or delete the row. The increased
efficiency of using the index to locate the row usually offsets the extra
overhead needed to update the indexes, unless the table has a lot of indexes.
Then again, the indexes we have been discussing, are probably not for row
retrieval in updates.
Reformat statements. Apply different strategy
I have noticed that some IN
clauses can be quite intensive for no good
reason. It seems the optimizer sometimes generates the entire recordset of the
clause for each row of the query. I often rephrase an in
clause to a join
:
From
Select *
from aTable
where id in
(select id
from anotherTable)
To
Select * from
ATable a join anotherTable t
On a.id = t.id
Not Ins can be even worse:
From
Select *
from aTable
where id not in
(select id from anotherTable)
To
Select * from
ATable a left join anotherTable t
On a.id = t.id
Where t.id is null
Where possible use 'exists' instead of 'count' because exists stops on the
first row where count must get an exact answer from any qualifying rows.
From
IF (select count(*)
from aTable
where criteriaCol) > 0
To
IF exists(select criteriaCol
from aTable
where criteriaCol)
Count(*) can be more efficient than count(colName) for that reason. While
counting(colName) will eliminate nulls from the count, this is not always
desired. Count(*) is optimized to count the rows in the table by the fastest
means available.
Statistics - Update Statistics
This usually doesn't play a part in my optimizations because I like to have
statistics updated regularly, daily if possible, on all tables with
sp_updatestats run from a job. After that, sp_recompile all stored
procedures and triggers. This is the code using my
sp__execAllInTable procedure:
exec sp_updatestats
select 'SP_RECOMPILE ' + quotename(name ) stmt
into #ExecMe
from sysobjects where type in ('P', 'TR')
exec sp__execAllInTable
SQL 2000 automatically updates statistics, but doesn't automatically
recompile code. Stored procedures and triggers will be recompiled after
the SQL Server is restarted in either version.
The subject of statistics leads to data distribution issues which I'll save
for another article and percentages of qualifying rows which I mentioned.
Temp to Perm Table
I have often found that indexes on temporary tables (#tables) are not well
used. Sometimes making a permanent user table in the database works better. For
single user (likely batch) processes, you can truncate the table before each
use. For multi-user processes, one way to do it is by adding the spid to the
table and delete by spid before each use:
Delete permTempTable
where spid = @@spid
Denormalize
This is usually indicated by joining too many tables (at least 5) or often
having functions like min and max. A multiple query strategy occasionally helps
here where you break the statement up into two or three statements that run
quickly, possibly joined by temp tables.
Denormalization is one of the least desirable courses of action.
Denormalizations have to be maintained. This is a judgement call--a small amount
of denormalization can tremendously help a difficult situation. Unfortunately,
this can be a slippery road leading to a choking morass of code and a
maintenance nightmare. In general, if the denormalization cannot be maintained
by a single trigger or a single stored procedure, I try to avoid it.
I will shortly be doing performance testing on indexed queries in SQL 2000 in
place of denormalizations.
Rearchitect
This is like denormalization, only worse from a labor standpoint. If the
schema truly can't support the process at the required performance level and
all else fails, a reworking of the basic schema may be in order.
Satisfice, Don't Optimize
What is acceptable performance varies from situation to situation. If I can't
achieve orders of magnitude improvement (10, 100, 1000 times faster), I am
usually disappointed. But, that amount of improvement will not be needed in
every situation.
I used to work for a very smart man who always said "Satisfice, don't
optimize." Since we've been attempting optimization all along, maybe we
should step back. Our goal was to make things faster, not necessarily as fast as
they can possibly be, but just fast enough to make the users happy. While it
pains me to let a query go at only 2 or 3 times faster, sometimes, that's good
enough.
SQL 2000
My recent testing suggests that quite noticeable gains can be had by
upgrading to SQL 2000. It may even alleviate the need for some of the
above fixes.
Please Comment
For comments or questions about this article, write me at aaron@itdevpros.com. If you have other optimization suggestions, I'd love to hear about them.