Don’t think that
performance tuning your SQL Server applications is regulated to
the end of the development process. If you want your SQL Server-based
applications to scale and run at their full potential, you must
begin considering scalability and performance issues during the
early stages of your application’s development.
If you have been a
DBA or SQL developer for very long, then you have probably run
across some slow SQL Server-based applications. And often when
this happens, everybody begins blaming everybody else for the
problem. It’s the network. It’s the server hardware. It’s SQL
Server. It’s the users. It’s the database. And this goes on and
on, but unfortunately, blame doesn’t fix slow applications. The
cure to most slow SQL Server-based applications is prevention,
which includes careful up front analysis of the user’s needs,
thoughtful design, optimal coding, and appropriate
implementation.
For any
application, SQL Server or otherwise, scalability and
performance have to be built in from the very beginning. Once
the application is rolled out, it is very difficult and
expensive to resolve most scalability and performance issues.
In this article
you are going to learn the fundamentals of how to design, code,
and implement scalable and performance optimized SQL Server
applications. You won’t learn everything, as that would take an
entire book. The focus of this article is on learning the very
minimum you must know in order to produce scalable and
performance tuned SQL Server-based applications. Here’s what you
will learn:
-
What Every
Developer and DBA Must Know About SQL Server Performance
Tuning -
How to
Optimize Your Server’s Hardware -
How to
Optimize SQL Server’s Configuration Settings -
How to
Optimize Your Application’s Design -
How to
Optimize Your Database’s Design -
How to
Optimize Your Application Code for SQL Server -
How to
Optimize Your Transact-SQL Code -
How to Select
Indexes for Optimal Database Performance -
How to Take
Advantage of SQL Server Performance Tuning Tools
At the very least,
if you take advantage of the advice and information in this
article, you will find that performance tuning your SQL
Server-related applications is not a big as mystery as you might
think. So let’s get to work.
What
Every Developer Must Know About SQL Server Performance Tuning
As a developer, there
are some overriding principals on which to proceed. This section
introduces these principals. Keep these in mind as you read
about specific performance tuning details discussed later in
this article, and whenever performance tuning your SQL Server
applications.
Performance
Tuning is Not a Science
SQL Server performance tuning is more art than science. I am
sure you didn’t want to hear that, but this is a fact of life. I
wish I could tell you exactly, step-by-step, what you need to do
to make your applications scale and perform well. The problem,
as you probably already know, is that any modern software
application is a combination of many complex variables.
Unfortunately, no matter how hard you try, you don’t have full
control of your application and the environment it runs under.
For example, here are some (not all) of the factors that affect
an application’s performance:
-
SQL Server
(the program itself) -
SQL Server‘s
Configuration Settings -
The
Application’s Transact-SQL Code -
The
Application’s non-Transact-SQL Code -
The Database’s
Design -
The Operating
System (server and client) -
The Middleware
(Microsoft Transaction Server, Microsoft Messaging Server) -
The Hardware
(server and client) -
The Network
Hardware and Bandwidth (LAN and WAN) -
The Number of
Clients -
The Client
Usage Patterns -
The Type and
Quantity of Data Stored in SQL Server -
Whether the
Application is OLTP- or OLAP-based
While it is
virtually impossible to control every factor that influences SQL
Servers’ scalability and performance, what you can do is make
the most of what you can control.
Test During All
Stages of Your Application’s Development
Scalability and performance testing should not only be done
after the application is written and ready to be rolled out.
Testing should be an integral part of the development process,
beginning with the earliest stages of the application and
database design, and continuing as appropriate throughout the
entire process. Most scalability and performance issues are a
result of poor initial design, and can only be prevented early
in the game. If you wait until after the application is
complete, you will either have to live with performance
problems, or rewrite the application.
When performing
tests, always proceed scientifically, only testing one dependant
variable at a time. For example, if you suspect that you need to
add an index to a table to boost performance, but you are not
sure which one, or of what type is best, experiment with only
one change at a time, testing each change individually to see if
it produces the results you expect. If you change more than one
thing at a time, you won’t know which change you made worked or
didn’t work. This goes for all testing, whether it is adding
indexes, making SQL Server configuration changes, or testing
various hardware configurations.
Always try to test
under realistic conditions. This means use "real"
data, testing against the largest expected data sets, and using
hardware similar to the hardware that will be used when the
application goes into production. If you don’t, you may be
surprised that while your application works well for 10
simultaneous users during testing, that it fails miserably when
500 users are online.
Not All
Performance Issues are Obvious
If you do much performance tuning, you will soon realize that
many performance tuning and scalability issues are not so
obvious. This is because many performance-related problems are
caused by two or more problems, not a single obvious problem.
This makes it difficult to isolate and fix the problem. While
there are no easy solutions, one of the best approaches to take
is to isolate and correct one problem at a time, until you have
found and fixed them all.
Not All
Performance Tuning Suggestions Work In All Cases
In this article, and from other performance tuning sources, you
will find dozens of ideas and tips on performance tuning. Keep
in mind that in many cases a performance tuning suggestion will
work great in one particular situation, but could actually
reduce performance under a different situation. This is because
many performance tuning suggestions are situation specific. As
the person responsible for performance tuning, you will need to
evaluate each tip or suggestion you run across and decide
whether it is applicable to your particular situation. In other
words, don’t blindly proceed with a performance tuning tip. Be
sure you understand its implications before you use it.
SQL Server
Performance Tuning is a Learned Skill
Learning how to master SQL Sever 2000 performance tuning cannot
be learned overnight. In fact, experience, more than book
learning, is how you will master this skill. But in order to
take advantage of the experience you gain over time, it is also
important to be conversant in the fundamentals of the
technologies that affects your application’s performance.
For example, you
need to fully understand the programming language used to write
your application, database design, application design,
Transact-SQL, how SQL Server stores and indexes data, and how
networks and server hardware really work. The better
understanding you have of the basics of the applicable
technologies used to develop and roll out your application, the
better position you will be in to understand what is causing
performance and scalability problems and how to resolve them.
Learn all you can.
How
to Optimize Your Server’s Hardware
When it comes time to blame poor application performance on
something, server hardware gets a disproportionate amount of
blame. What is ironic, is that in most cases the hardware is not
the main cause of the problem. In fact, server hardware plays a
much smaller role than most people think when it comes to SQL Server-based
application performance and scalability.
The reason for
this is that most slow applications are slow because of poor up
front design, not because of slow hardware. The reason hardware
is often blamed is because performance problems often don’t show
themselves until after the application is rolled out. And since
the application’s design can’t be changed at this time, about
the only thing you can try to help boost performance is to throw
hardware at it. While hardware can help, it usually doesn’t
fully resolve the problem, and this is why hardware is often
blamed for slow performance. While hardware can sometimes be an
issue, most likely it is not.
In order to
prevent your server hardware from being a drag on your SQL
Server-based application (which it can if it is inappropriately
selected or configured), let’s take a brief look at some of the
most common hardware selection and tuning issues.
Selecting
Hardware
Selecting the optimum hardware for your SQL Server-based
application depends on a variety of factors, such as the size of
the database, the number of users, how the database is used
(OLTP or OLAP), and others. While there is no sure-fire formula
for sizing server hardware, the best way to get a feel for
sizing is to test your application early in the development
stage. Ah, testing is mentioned again. That’s right. While many
experienced DBAs can probably give you a good estimate on the
optimum hardware you need, only through realistic testing will
you know for sure what hardware is required to meet your
application’s needs.
When is comes to
server hardware, here are some things to keep in mind:
-
CPU:
Always purchase a server with the ability to expand its
number of CPUs. For example, if testing indicates that a
single CPU server will be adequate, purchase a server with
at least room for two CPUs, even if you only use one of the
slots. The same goes for larger servers with four or more
CPUs. Always leave room for growth. -
Memory:
This is probably the most significant piece of hardware that
affects SQL Server’s performance. Ideally, your entire
database should fit into RAM. Unfortunately, this is not
often possible. At the very minimum, try to get enough RAM
to hold the largest table you expect to have, and if you can
afford it, get all the RAM your server can handle, which is
often 2GB or more. There is no such thing as too much RAM. -
I/O
Subsystem: After RAM, the I/O subsystem is the most
important piece of hardware to affect SQL Server’s
performance. At the very minimum, purchase hardware-based
RAID for your databases. As a rule of thumb, you will to
purchase more, smaller drives, not fewer, larger drives in
your array. The more disks that are in an array, the faster
I/O will be. -
Network
Connection: At the server, have at least one 100Mbs
network card, and it should be connected to a switch.
Ideally, you should have two network cards in the server
connected to a switch in full-duplex mode.
Tuning the
Server
Even the most expensive server hardware won’t perform well if it
is not configured and tuned correctly. I have seen many
hardware-related performance problems caused as the result of
not using Microsoft NT Server approved hardware and drivers.
Some of these types of hardware performance-related issues are
very difficult to trace and resolve. Ideally, ensure that you
hardware, including NT, is correctly installed and configured by
a competent technician. Then test your application under
controlled conditions to test for potential performance issues
before it is used in production.
Your operating
system must also be configured correctly. This includes many
things, too many to describe here. Just as with the hardware,
ensure that the operating system is properly configured and
tested before it is put into production.
For best
performance on a server, SQL Server should be the only
application running on the server, other than management
utilities. Don’t try to save a few bucks by putting your IIS or
MTS server on the same server as SQL Server. Not only does this
hurt SQL Server’s performance, but it also makes it more
difficult to performance tune and troubleshoot SQL Server.
How to Optimize SQL Server’s
Configuration Settings
Another common misconception about tuning SQL Server is that you
must fine-tune its various configuration settings in order to
get optimum performance. While there was some truth to this in
earlier versions of SQL, this is no longer much of an issue,
except on the very largest and busiest of servers.
For the most part,
SQL Server is self-tuning. What does this mean? It means that
SQL Server observes what is running on itself, and automatically
makes internal adjustments which, for the most part, keep SQL Server
running as optimally as possible given the tasks at hand and the
given hardware.
When you perform
performance testing on SQL Server, keep in mind that SQL Server
can take some time before it adjusts itself optimally. In other
words, the performance you get immediately after starting the SQL Server
service, and the performance you get a couple of hours later
after a typical workload has been running, can be different.
Always perform your testing after SQL Server has had a chance to
adjust itself to your workload.
There are 36 SQL Server
configuration options that can be changed using either the
Enterprise Manager or the sp_configure stored procedure. Unless
you have a lot of experience tuning SQL Server, I don’t
recommend you change any of SQL Server’s settings. As a novice,
you may make a change that could in fact reduce performance.
This is because when you change a setting, you are
"hard-coding" the setting from then on. SQL Server has
the ability to change its setting on the fly, based on the
current workload. But once you "hard-code" a setting,
you partially remove SQL Server’s ability to self-tune itself.
If after serious
consideration you feel that making a change to one or more SQL
Server configuration settings can boost performance in your
particular environment, then you will want to proceed slowly and
cautiously. Before you make the setting change, you will first
want to get a good baseline on the SQL Server’s performance,
under a typical workload, using a tool such as Performance
Monitor (discussed later). Then make only one change at a time.
Never make more than one change at a time, because if you do,
you won’t know which change, if any of them, made a difference.
Once the one
change is made, again measure SQL Server’s performance under the
same workload to see if performance was actually boosted. If it
wasn’t, which will often be the case, then change back to the
default setting. If there was a performance boost, then continue
to check to see if the boost in performance continues under
other workloads the server experiences over time. Your later
testing may show that your change helps some workloads, but
hinders others. This is why changing most configuration settings
is not recommended.
In any event, if
your application is suffering from a performance-related issue,
the odds of a configuration change resolving it are quite low.
How to Optimize Your Application’s
Design
If you are using an n-tier design for your application, and who
isn’t for most large-scale applications these days, SQL Server
is just one part of a larger application. And perhaps more
important than you realize, how your implement your n-tier
design affects your application’s performance more than SQL
Server itself. Unfortunately, SQL Server often gets more of the
blame for poor performance than the application design, even
when it is usually the application’s design that is causing most
of the performance problems. What I hope to do here is offer a
few suggestions that may aide you in your application design,
helping to prevent SQL Server from getting all the blame for
poor performance. So let’s start.
One of the first
steps you must decide when designing an n-tier application is to
select the logical and physical design. Of the two, the physical
design is where most of the mistakes are made when it comes to
performance. This is because this is where the theory (based on
the logical design) has to be implemented in the real world. And
just like anything else, you have many choices to make. And many
of these choices don’t lend themselves to scalability or high
performance.
For example, do
you want to implement a physical two-tier implementation with
fat clients, a physical two-tier implementation with a fat
server, a physical three-tier implementation, an Internet
implementation, or some other implementation? Once you decide
this question, then you must ask yourself, what development
language will be used, what browser, will you use Microsoft
Transaction Server (MTS), will you use Microsoft Message Queue
Server (MSMQ), and on and on.
Each of these many
decisions can and will affect performance and scalability.
Because there are so many options, it is again important to test
potential designs early in the design stage, using rapid
prototyping, to see which implementation will best meet your
user’s needs.
More specifically,
as you design your physical implementation, try to follow these
general recommendations to help ensure scalability and optimal
performance in your application:
-
Perform as
many data-centered tasks as possible on SQL Server in the
form of stored procedures. Avoid manipulating data at the
presentation and business services tiers. -
Don’t maintain
state (don’t store data from the database) in the business
services tier. Maintain state in the database as much as
possible -
Don’t create
complex or deep object hierarchies. The creation and use of
complex classes or a large number of objects used to model
complex business rules can be resource intensive and reduce
the performance and scalability of your application. This is
because the memory allocation when creating and freeing
these objects is costly. -
Consider
designing the application to take advantage of database
connection pooling and object pooling using Microsoft
Transaction Server (MTS). MTS allows both database
connections and objects to be pooled, greatly increasing the
overall performance and scalability of your application. -
If your
application runs queries against SQL Server that by nature
are long, design the application to be able to run queries
asynchronously. This way, one query does not have to wait
for the next before it can run. One way to build in this
functionality into your n-tier application is to use the
Microsoft Message Queue Server (MSMQ).
While following
these suggestions won’t guarantee a scalable and fast performing
application, they are a good first start.
How to Optimize Your Database’s Design
Like application design, database design is very critical to the
scalability and performance of your SQL Server applications. And
also like application design, if you don’t do a good job in the
first place, it is very hard and expensive to make changes after
your application has gone into production. Here are some key
things to keep in mind when designing SQL Server databases for
scalability and performance.
As always, you
will want to test your design as early as possible using
realistic data. This means you will need to develop prototype
databases with sample data, and test the design using the type
of activity you expect to see in the database once production
starts.
One of the first
design decisions you must make is whether the database will be
used for OLTP or OLAP. Notice that I said "or". One of
the biggest mistakes you can make when designing a database is
to try to meet the needs of both OLTP and OLAP. These two types
of applications are mutually exclusive in you are interested in
any sense of high performance and scalability.
OLTP databases are
generally highly normalized, helping to reduce the amount of
data that has to be stored. The less data you store, the less
I/O SQL Server will have to perform, and the faster database
access will be. Transactions are also kept as short as possible
in order to reduce locking conflicts. And last of all, indexing
is generally minimized to reduce the overhead of high levels of
INSERTs, UPDATEs, and DELETEs.
OLAP databases, on
the other hand, are highly de-normalized. In addition,
transactions are not used, and because the database is
read-only, record locking is not an issue. And of course, heavy
indexing is used in order to meet the wide variety of reporting
needs.
As you can see,
OLTP and OLAP databases serve two completely different purposes,
and it is virtually impossible to design a database to handle
both needs. While OLAP database design is out of this book’s
scope, I do want to mention a couple of performance-related
suggestions in regard to OLTP database design.
When you go through the normalization process when designing
your OLTP databases, your initial goal should be to fully
normalize it according to the three general principles of
normalization. The next step is to perform some preliminary
performance testing, especially if you foresee having to perform
joins on four or more tables at a time. Be sure to test using
realistic sample data.
If performance is
acceptable, then don’t worry about having to join four or more
tables in a query. But if performance is not acceptable, then
you may want to do some selective de-normalization of the tables
involved in order to reduce the number of joins used in the
query, and to speed performance.
It is much easier
to catch a problem in the early database design stage, rather
than after the finished application has been rolled out.
De-normalization of tables after the application is complete is
nearly impossible. One word of warning. Don’t be tempted to
de-normalize your database without thorough testing. It is very
hard to deduce logically what de-normalization will do to
performance. Only through realistic testing can you know for
sure if de-normalization will gain you anything in regards to
performance.
How to Optimize Your Application Code
for SQL Server
At some point during the development process you will have to
begin coding your application to work with SQL Server. By this
time, the application and database designs should have already
been completed and tested for performance and scalability using
rapid prototyping techniques.
How your code your
application has a significant bearing on performance and
scalability, just as the database design and the overall
application design affect performance and scalability.
Sometimes, something as simple as choosing one coding technique
over another can make a significant different. Rarely is there
only one way to code a task, but often there is only one way to
code a task for optimum performance and scalability.
What I want to do in this section is focus on some essential
techniques that can affect the performance of your application
and SQL Server.
Since I don’t know
what development language you will be using, I am going to
assume here that you will be using Microsoft’s ADO (Active Data
Objects) object model to access SQL Server from your
application. The examples I use here should work for most Visual
Basic and ASP developers. So let’s just dive in and look at some
specific techniques you should implement in your application
code when accessing SQL Server data to help ensure high
performance.
Use OLE DB to
Access SQL Server
You can access SQL Server data using either ODBC or OLE DB.
Which method you use depends on how you specify the connection
string when you use ADO to connect to SQL Server. For best
performance, always select OLE DB. OLE DB is used natively by SQL Server,
and is the most effective way to access any SQL Server data.
Along these same lines, when creating an ADO connection to SQL
Server, you can either use a DSN in the connection string, or
you can use a DSN-less connection. For optimal performance, use
DSN-less connections. Using them prevents the need for the OLE
DB driver to look up connection string information in the
registry of the client the application code is running on,
saving some overhead.
Encapsulate
your DML (Data Manipulation Language) in Stored Procedures
ADO allows you three different ways to SELECT, INSERT, UPDATE,
or DELETE data in a SQL Server database. You can use ADO’s
methods, you can use dynamic SQL, or you can use stored
procedures. Let’s take a brief look at each of these.
The easiest way to manipulate data from your application is to
use ADO’s various methods, such as rs.AddNew, rs.Update, or
rs.Delete. While using these methods is easy to learn and
implement, you pay a relatively steep penalty in overhead for
using them. ADO’s methods often create slow cursors and generate
large amounts of network traffic. If your application is very
small, you would never notice the difference. But if your
application has much data at all, your application’s performance
could suffer greatly.
Another way to
manipulate data stored in SQL Server using ADO is to use dynamic
SQL (also sometimes referred to as ad hoc queries). Here, what
you do is send Transact-SQL in the form of strings from ADO in
your application to be run on SQL Server. Using dynamic SQL is
generally much faster than using ADO’s methods, although it does
not offer the greatest performance. When SQL Server receives the
dynamic SQL from your ADO-based application, it has to compile
the Transact-SQL code, create a query plan for it, and then
execute it. Compiling the code and creating the query plan the
first time takes a little overhead. But once the Transact-SQL
code has been compiled and a query plan created, it can be
reused over and over assuming the Transact-SQL code sent later
is nearly identical, which saves overhead.
For optimal
performance, you will want to use ADO to called stored
procedures on your server to perform all your data manipulation.
The advantages of stored procedures are many. Stored procedures
are already pre-compiled and optimized, so this step doesn’t
have to be repeated every time the stored procedure is run. The
first time a stored procedure is run, a query plan is created
and stored in SQL Server’s memory, so it can be reused, saving
even more time. Another benefit of stored procedures is that
they help reduce network traffic and latency. When your
application’s ADO code calls a stored procedure on SQL Server,
it makes a single network call. Then any required data
processing is performed on SQL Server, where data processing is
most efficiently performed, and then if appropriate, it will
return any results to your application. This greatly reduces
network traffic and increases scalability and performance.
While stored
procedures handle basic data manipulation like a champ, they can
also handle much more very well. Stored procedures can run
virtually any Transact-SQL code, and since Transact-SQL code is
the most efficient way to manipulate data, all of your
application’s data manipulations should be done inside of stored
procedures on SQL Server, not in COM components in the
business-tier or on the client.
When you use ADO
to execute stored procedures on SQL Server, you have two major
ways to proceed. You can use ADO to call the Refresh method of
the Parameters collection in order to save you a little coding.
ADO needs to know what parameters are used by the stored
procedure, and the Refresh method can query the stored procedure
on SQL Server to find out the parameters. But as you might
expect, this produces additional network traffic and overhead.
While it takes a little more coding, a more efficient way to
call a SQL Server stored procedure is to create the parameters
explicitly in your code. This eliminates the extra overhead
caused by the Refresh method and speeds up your application.
Encapsulate
Your ADO Code in COM Components
As part of creating a scalable and optimized n-tier
applications, put the ADO code that accesses SQL Server data
into COM components. This is true whether your front end is a
Visual Basic application or a web-based ASP application. This
gives you all the standard benefits of COM components, such as
object pooling using MTS. And for ASP-based applications, it
provides greater speed because the ADO code in COM objects is
already compiled, unlike ADO code found in ASP pages. How you
implement your data manipulation code in COM components should
be considered when the application is first designed.
When designing COM
objects, make them stateless as possible, avoiding the use of
properties. Instead, use methods to perform your data-related
tasks. This is especially critical if you use MTS, as any
objects that preserve state can significantly reduce MTS’s
ability to scale, while at the same time, increasing overhead
and hurting performance.
For optimum
performance, COM objects should be compiled as in-process DLLs
(which is required if they are to run under MTS). You should
always employ early binding when referencing COM objects, and
create them explicitly, not implicitly.
How
to Optimize Your Transact-SQL Code
Transact-SQL, just like any programming language, offers more
than one way to perform many tasks. And as you might imagine,
some techniques offer better performance than others. In this
section you will learn some of the
"tricks-of-the-trade" when it comes to writing high
performing Transact-SQL code.
Choose the Appropriate Data Types
While you might think that this topic should be under database
design, I have decided to discuss it here because Transact-SQL
is used to create the physical tables that were designed during
the earlier database design stage.
Choosing the
appropriate data types can affect how quickly SQL Server can
SELECT, INSERT, UPDATE, and DELETE data, and choosing the most
optimum data type is not always obvious. Here are some
suggestions you should implement when creating physical SQL
Server tables to help ensure optimum performance.
-
Always choose
the smallest data type you need to hold the data you need to
store in a column. For example, if all you are going to be
storing in a column are the numbers 1 through 10, then the
TINYINT data type is more appropriate that the INT data
type. The same goes for CHAR and VARCHAR data types. Don’t
specify more characters for character columns that you need.
This allows SQL Server to store more rows in its data and
index pages, reducing the amount of I/O needed to read them.
Also, it reduces the amount of data moved from the server to
the client, reducing network traffic and latency. -
If the text
data in a column varies greatly in length, use a VARCHAR
data type instead of a CHAR data type. Although the VARCHAR
data type has slightly more overhead than the CHAR data
type, the amount of space saved by using VARCHAR over CHAR
on variable length columns can reduce I/O, improving overall
SQL Server performance. -
Don’t use the
NVARCHAR or NCHAR data types unless you need to store 16-bit
character (Unicode) data. They take up twice as much space
as VARCHAR or CHAR data types, increasing server I/O
overhead. -
If you need to
store large strings of data, and they are less than 8,000
characters, use a VARCHAR data type instead of a TEXT data
type. TEXT data types have extra overhead that drag down
performance. -
If you have a
column that is designed to hold only numbers, use a numeric
data type, such as INTEGER, instead of a VARCHAR or CHAR
data type. Numeric data types generally require less space
to hold the same numeric value as does a character data
type. This helps to reduce the size of the columns, and can
boost performance when the columns is searched (WHERE
clause) or joined to another column.
Use Triggers Cautiously
Triggers can be a powerful tool in Transact-SQL, but since they
execute every time that a table is INSERTED, UPDATED, or DELETED
(depending on how the trigger is created), they can produce a
lot of overhead. Here’s some tips on how to optimize trigger
performance.
-
Keep the code
in your triggers to the very minimum to reduce overhead. The
more code that runs in the trigger, the slower each INSERT,
UPDATE, and DELETE that fires it will be. -
Don’t use
triggers to perform tasks that can be performed using more
efficient techniques. For example, don’t use a trigger to
enforce referential integrity if SQL Server’s
built-referential integrity is available to accomplish your
goal. The same goes if you have a choice between using a
trigger or a CHECK constraint to enforce rules or defaults.
You will generally want to choose a CHECK constraint as they
are faster than using triggers when performing the same
task. -
Try to avoid
rolling back triggers because of the overhead involved.
Instead of letting the trigger find a problem and rollback a
transaction, catch the error before it can get to the
trigger (if possible based on your code). Catching an error
early (before the trigger fires) consumes fewer server
resources than letting the trigger roll back.
Don’t Access
More Data Than You Need
While this suggestion may sound obvious, it must not be, because
this is a common performance-related issue I find over and over
again in many SQL Server-based applications. Here are some ideas
on how to minimize the amount of data that is returned to the
client.
-
Don’t return
more columns or rows of data to the client than absolutely
necessary. This just increases disk I/O on the server and
network traffic, both of which hurts performance. In SELECT
statements, don’t use SELECT * to return rows, always
specify in your SELECT statement exactly which columns are
needed to be returned for this particular query, and not a
column more. In most cases, be sure to include a WHERE
clause to reduce the number or rows sent to only those rows
the clients needs to perform the task immediately at hand. -
If your
application allows users to run queries, but you are unable
in your application to easily prevent users from returning
hundreds, even thousands of unnecessary rows of data they
don’t need, consider using the TOP operator within the
SELECT statement. This way, you can limit how may rows are
returned, even if the user doesn’t enter any criteria to
help reduce the number or rows returned to the client.
Avoid Using
Cursors
Transact-SQL is designed to work best on result sets, not on
individual records. That’s where cursors come into play. They
allow you to process individual records. The only problem with
individual record processing is that it is slow. Ideally, for
high-performing SQL Server-based applications, cursors should be
avoided.
If you need to
perform row-by-row operations, try to find another method to
perform the task. Some options are to perform row-by-row tasks
at the client instead of the server, using tempdb tables at the
server, or using a correlated sub-query.
Unfortunately,
these are not always possible, and you have to use a cursor. If
you find it impossible to avoid using cursors in your
applications, then perhaps one of these suggestions will help.
-
SQL Server
offers you several different types of cursors, each with its
different performance characteristics. Always select the
cursor with the least amount of overhead that has the
features you need to accomplish your goals. The most
efficient cursor you can choose is the fast forward-only
cursor. -
When using a
server-side cursor, always try to fetch as small a result
set as possible. This includes fetching only those rows and
columns the client needs immediately. The smaller the
cursor, no matter what type of server-side cursor it is, the
fewer resources it will use, and performance will benefit. -
When you are
done using a cursor, don’t just CLOSE it, you must also
DEALLOCATE it. Deallocation is required to free up the SQL
Server resources used by the cursor. If you only CLOSE the
cursor, locks are freed, but SQL Server resources are not.
If you don’t DEALLOCATE your cursors, the resources used by
the cursor will stay allocated, degrading the performance of
your server until they are released.
Use Joins
Appropriately
Table joins can be a big contributor of performance problems,
especially if the joins include more than two tables, or if the
tables are very large. Unfortunately, joins are a fact of life
in relational databases. Because they are so common, you will
need to take extra time to help ensure that your joins are as
optimal as possible. Here are some tips to help.
-
If you have
two or more tables that are frequently joined together, then
the columns used for the joins should have an appropriate
index. If the columns used for the joins are not naturally
compact, then considering adding surrogate keys to the
tables that are compact in order to reduce the size of the
keys, thus decreasing read I/O during the join process, and
increasing overall performance. You will learn more about
indexing in the next section of this article. -
For best
performance, the columns used in joins should be of the same
data types. And if possible, they should be numeric data
types rather than character types. -
Avoid joining
tables based on columns with few unique values. If columns
used for joining aren’t mostly unique, then the SQL Server
optimizer will perform a table scan for the join, even if an
index exists on the columns. For best performance, joins
should be done on columns that have unique indexes. -
If you have to
regularly join four or more tables to get the recordset you
need, consider denormalizing the tables so that the number
of joined tables is reduced. Often, by adding one or two
columns from one table to another, joins can be reduced.
Encapsulate
Your Code in Stored Procedures
Virtually all of the Transact-SQL used in your SQL Server-based
applications should be encapsulated in stored procedures, not
run as dynamic SQL or scripts. This not only reduces network
traffic (only the EXECUTE or CALL is issued over the network
between the client and SQL Server), but it speeds up the
Transact-SQL because the code in the stored procedure residing
on the server is already pre-compiled. Here are a couple of
things to keep in mind when writing stored procedures for
optimal performance.
When a stored
procedure is first executed (and it does not have the WITH
RECOMPILE option specified), it is optimized and a query plan is
compiled and cached in SQL Server’s memory. If the same stored
procedure is called again, it will use the cached query plan
instead of creating a new one, saving time and boosting
performance. This may or may not be what you want. If the query
in the stored procedure is the same each time, then this is a
good thing. But if the query is dynamic (the WHERE clauses
changes substantially from one execution of the stored procedure
to the next), then this is a bad thing, as the query will not be
optimized when it is run, and the performance of the query can
suffer.
If you know that
your query will vary each time it is run from the stored
procedure, you will want to add the WITH RECOMPILE option when
you create the stored procedure. This will force the stored
procedure to be re-compiled each time it is run, ensuring the
query is optimized each time it is run.
Always include in
your stored procedures the statement, "SET NOCOUNT
ON". If you don’t turn this feature on, then every time a
SQL statement is executed, SQL Server will send a response to
the client indicating the number of rows affected by the
statement. It is rare that the client will ever need this
information. Using this statement helps reduce the traffic
between the server and the client.
Deadlocking can
occur within a stored procedure when two user processes have
locks on separate objects and each process is trying to acquire
a lock on the object that the other process has. When this
happens, SQL Server ends the deadlock by automatically choosing
one and aborting the process, allowing the other process to
continue. The aborted transaction is rolled back and an error
message is sent to the user of the aborted process.
To help avoid
deadlocking in your SQL Server application, try to design your
application using these suggestions: 1) have the application
access server objects in the same order each time; 2) during
transactions, don’t allow any user input. Collect it before the
transaction begins; 3) keep transactions short and within a
single batch, and 4) if appropriate, use as low of an isolation
level as possible for the user connection running the
transaction.
How
to Select Indexes for Optimal Database Performance
Index selection is a mystery for many SQL Server DBAs and
developers. Sure, we know what they do and how they boost
performance. The problem often is how to select the ideal type
of index (clustered vs. non-clustered), the number of columns to
index (do I need multi-column indexes?), and which columns
should be indexed.
In this section we
will take a brief look at how to answer the above questions.
Unfortunately, there is no absolute answer for every occasion.
Like much of SQL Server performance tuning and optimization, you
may have to do some experimenting to find the ideal indexes. So
let’s begin by looking as some general index creation
guidelines, then we will take a more detailed look at selecting
clustered and non-clustered indexes.
Is There Such a
Thing as Too Many Indexes?
Yes. Some people think that all you have to do is index
everything, and then all of your performance issues will go
away. It doesn’t work that way. Just as an index can speed data
access, it can also degrade access if it is inappropriately
selected. The problem with extra indexes is that SQL Server must
maintain them every time that a record is INSERTED, UPDATED, or
DELETED from a table. While maintaining one or two indexes on a
table is not too much overhead for SQL Server to deal with, if
you have four, five, or more indexes, they can be a large
performance burden on tables. Ideally, you want to have as few
as indexes as you can. It is often a balancing act to select the
ideal number of indexes for a table in order to find optimal
performance.
As a general rule
of thumb, don’t automatically add indexes to a table because it
seems like the right thing to do. Only add indexes if you know
that they will be used by the queries run against the table. If
you don’t know what queries will be run against your table, then
don’t add any indexes until you know for sure. It is too easy to
make a guess on what queries will be run, create indexes, and
then later find out your guesses were wrong. You must know the
type of queries that will be run against your data, and then
these need to be analyzed to determine the most appropriate
indexes, and then the indexes must be created and tested to see
if they really help or not.
The problem of
selecting optimal indexes is often difficult for OLTP
applications because they tend to experience high levels of
INSERT, UPDATE, and DELETE activity. While you need good indexes
to quickly locate records that need to be SELECTED, UPDATED, or
DELETED, you don’t want every INSERT, UPDATE, or DELETE to
result in too much overhead because you have too many indexes.
On the other hand, if you have an OLAP application that is
virtually read-only, then adding as many indexes as you need is
not a problem because you don’t have to worry about INSERT,
UPDATE, or DELETE activity. As you can see, how your application
is used makes a large difference in your indexing strategy.
Another thing to
think about when selecting indexes is that the SQL Server Query
Optimizer may not use the indexes you select. If the Query
Optimizer chooses not to use your indexes, then they are a
burden on SQL Server and should be deleted. So how come the SQL
Server Query Optimizer won’t always use an index if one is
available?
This is too large
a question to answer in detail here, but suffice to say,
sometimes it is faster for SQL Server to perform a table scan on
a table than it is to use an available index to access data in
the table. Two reasons that this may happen is because the table
is small (not many rows), or if the column that was indexed
isn’t at least 95% unique. How do you know if SQL Server won’t
use the indexes you create? We will answer this question a
little later when we take a look at how to use the SQL Server
Query Analyzer later in this article.
Tips for
Selecting a Clustered Index
Since you can only create one clustered index per table, take
extra time to carefully consider how it will be used. Consider
the type of queries that will be used against the table, and
make an educated guess as to which query is the most critical,
and if this query will benefit from having a clustered index.
In general, use these rules of thumb when selecting a column for
a possible clustered index.
-
The primary
key you select for your table should not always be a
clustered index. If you create the primary key and don’t
specify otherwise, then SQL Server automatically makes the
primary key a clustered index. Only make the primary key a
clustered index if it meets one of the following
recommendations. -
Clustered
indexes are ideal for queries that select by a range of
values or where you need sorted results. This is because the
data is already presorted in the index for you. Examples of
this include when you are using BETWEEN, <, >, GROUP
BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in
your queries. -
Clustered
indexes are good for queries that look up a record with a
unique value (such as an employee number) and when you need
to retrieve most or all of the data in the record. This is
because the query is covered by the index. -
Clustered
indexes are good for queries that access columns with a
limited number of distinct values, such as a columns that
holds country or state data. But if column data has little
distinctiveness, such as columns with a yes or no, or male
or female, then these columns should not be indexed at all. -
Clustered
indexes are good for queries that use the JOIN or GROUP BY
clauses. -
Clustered
indexes are good for queries where you want to return a lot
of rows, just not a few. This is because the data is in the
index and does not have to be looked up elsewhere. -
Avoid putting
a clustered index on columns that increment, such as an
identity, date, or similarly incrementing columns, if your
table is subject to a high level of INSERTS. Since clustered
indexes force the data to be physically ordered, a clustered
index on an incrementing column forces new data to be
inserted at the same page in the table, creating a table hot
spot, which can create disk I/O bottlenecks. Ideally, find
another column or columns to become your clustered index.
What can be
frustrating about the above advice is that there might be more
than one column that should be clustered. But as we know, we can
only have one clustered index per table. What you have to do is
evaluate all the possibilities (assuming more than one column is
a good candidate for a clustered index) and then select the one
that provides the best overall benefit.
Tips for Selecting Non-Clustered Indexes
Selecting non-clustered indexes is somewhat easier than
clustered indexes because you can created as many as is
appropriate for your table. Here are some tips for selecting
which columns in your tables might be helped by adding
non-clustered indexes.
-
Non-clustered
indexes are best for queries that return few rows (including
just one row) and where the index has good selectivity
(above 95%). -
If a column in
a table is not at least 95% unique, then most likely the SQL
Server Query Optimizer will not use a non-clustered index
based on that column. Because of this, don’t add
non-clustered indexes to columns that aren’t at least 95%
unique. For example, a column with "yes" or
"no" as the data won’t be at least 95% unique. -
Keep the
"width" of your indexes as narrow as possible,
especially when creating composite (multi-column) indexes.
This reduces the size of the index and reduces the number of
reads required to read the index, boosting performance. -
If possible,
try to create indexes on columns that have integer values
instead of characters. Integer values have less overhead
than character values. -
If you know
that your application will be performing the same query over
and over on the same table, consider creating a covering
index on the table. A covering index includes all of the
columns referenced in the query. Because of this, the index
contains the data you are looking for and SQL Server doesn’t
have to look up the actual data in the table, reducing
logical and/or physical I/O. On the other hand, if the index
gets too big (too many columns), this can increase I/O and
degrade performance. -
An index is
only useful to a query if the WHERE clause of the query
matches the column(s) that are leftmost in the index. So if
you create a composite index, such as "City,
State", then a query such as "WHERE City =
‘Houston’" will use the index, but the query
"WHERE STATE = ‘TX’" will not use the index.
Generally, if a
table needs only one index, make it a clustered index. If a
table needs more than one index, then you have no choice but to
use non-clustered indexes. By following the above
recommendations, you will be well on your way to selecting the
optimum indexes for your tables.
What to Do
Next?
This article has a lot of information, but on the other hand, it
has barely touched the surface when it comes to performance
tuning and optimization. Reading this article has been your
first step. Now its time for you to take the next step, and that
is to learn all you can about how SQL Server works internally
and the tools described in this article. Remember, SQL Server
performance tuning is more of an art than a science, and only
through the mastery of the basics, and experience, can you
become an expert at SQL Server performance tuning.