Do SQL Server Performance Trend Analysis Part 4: Interpreting Performance Monitor Counters

This is part
four of a four-part tutorial. This part discusses how to
interpret the most common NT Server and SQL Server Performance Monitor counters.

 

Introduction

In the previous
three parts of this tutorial, you learned how to use Performance
Monitor to collect SQL Server-related performance data, store it
in SQL Server, and how to perform trend analysis on it. Now it
is time to take a look at how to interpret the data you have
collected.

For
the purposes of this article, the Performance Monitor data can
be in any form, whether it is being viewed with Performance
Monitor itself, Microsoft Excel, or any other program. All you
need is access to the results of your data. How you interpret
your data does not depend on the format in which it is viewed.

While
our main focus on this article is on SQL Server performance, we
must keep in mind that the operating system (whether it is
Windows NT Server 4.0 or Windows 2000) is closely intertwined
with SQL Server, especially when it comes to Performance Monitor
counters. In fact, when monitoring SQL Server performance, I
tend to monitor more operating system performance counters than
SQL Server counters. This is because many of the most important
performance counters are a part of the operating system, not SQL
Server. Keep this in mind as you read about how to interpret the
various performance counters later in this article.

As
you may know, Windows NT Server 4.0 and Widows 2000 have over
350 performance monitor counters available. SQL Server 7.0 and
SQL Server 2000 both have over 100 performance monitor counters available.
As might well imagine, the number of potential
performance-related counters is overwhelming. Fortunately, the
huge number of performance monitor counters available are seldom
used. Generally, I monitor about a dozen counters on a regular
basis, and only use some of the more obscure ones when
researching specific performance-related problems. In this
article, I will focus on the key performance counters only.

 

Focus
on the Big Picture

My goal when I
monitor SQL Server with Performance Monitor is to get the big
picture, not the details. If the big picture indicates that I
need to drill down for more detail, that is OK. But until then,
I have better things to do with my time. The big picture allows
me to see how my SQL Servers are doing, and that is why I only
focus on about a dozen Performance Monitor counters, which are
discussed below.

Like
most people, I use Performance Monitor to help identify
potential performance bottlenecks, which are usually categorized
into one of these five area:

  • CPU:
    SQL Server can’t do its job if it has run out of CPU cycles,
    so monitoring this potential bottleneck is important.

  • Memory:
    If you want maximum SQL Server performance, your server must
    not have a memory bottleneck. Sure, the operating system can
    page if there is not enough physical RAM in the server, but
    do you really want to want that long?

  • I/O: Of
    all the potential performance bottlenecks, disk I/O is the
    probably hardest one to correct. And like lack of physical
    RAM, it can significantly affect SQL Server’s performance.

  • Network:
    This is generally the least bothersome bottleneck, as most
    servers today can hardly fill a 100Mbs network connection,
    no matter how hard they try.

  • SQL Server
    Specific
    : This includes a variety of SQL Server counters
    that you can monitor to help identify several potential SQL
    Server performance-related problems.

In
the next several sections we will be taking a look at each of
the above bottlenecks, and examining the various Performance
Monitor counters we can use to help identify them. We will also
be taking a look at how to interpret their results, and take a
look at how to resolve these various bottlenecks.

 

CPU
Performance Monitor Counters

Measuring the CPU
activity of your SQL Server is a key way to identify potential
CPU bottlenecks. The Process Object: % Processor Time
counter
is available for each CPU (instance), and measures the
utilization of each individual CPU. While viewing CPU activity
for each of the CPUs in your server can be useful, I generally
prefer to monitor the total CPU activity for the server, using
another counter, described in the next paragraph. If your server
has multiple CPUs and you use the above counter to watch each
one, you will notice that how busy each CPU is varies
considerably. Don’t worry about this. While the operating system
does its best to evenly spread the processing load over each
CPU, the reality is that this is an imperfect process and
different CPUs will be busier than others.

The System Object:
% Total Processor Time
counter measures the average of all the
CPUs in your server. This is the key counter to watch for CPU
utilization. If the % Total Processor Time counter exceeds 80%
for continuous periods (over 10 minutes or so), then you may
have a CPU bottleneck on your server. Occasional spikes of 100%
are nothing to worry about and are normal for most SQL Servers.

This counter is
also useful when performing trend analysis. For example, if you
notice that while the % Total Processor Time counter is well
within limits now, but you notice that is increasing each month,
month after month, this is a good clue that you may eventually
run out of CPU cycles on your server. If you recognize this
problem now, then you can better plan for the future.

While the % Total
Processor Time counter is important, I don’t like to rely on
just a single counter to let me know if a server has a
bottleneck or not. Another valuable indicator of CPU performance
is the System Object: Processor Queue Length. If the Processor
Queue Length exceeds 2 per CPU for continuous periods (over 10
minutes or so), then you probably have a CPU bottleneck. For
example, if you have 4 CPUs in your server, the Processor Queue
Length should not exceed a total of 8 for the entire server.

Use both the
Processor Queue Length and the % Total Process Time counters
together to determine if you have a CPU bottleneck. If both
indicators are exceeding their recommended amounts during the
same continuous time periods, you can be assured there is a CPU
bottleneck.

If the Processor
Queue Length regularly exceeds the recommended maximum, but the
CPU utilization is not correspondingly as high (which is
typical), then consider reducing the SQL Server "max worker
threads" configuration setting. It is possible the reason
that the Processor Queue Length is high is because there are an
excess number of worker threads waiting to take their turn. By
reducing the number of "maximum worker threads", what
you are doing is forcing thread pooling to kick in (if it hasn’t
already), or to take greater advantage of thread pooling.

If your SQL Server
is experiencing CPU bottlenecks, consider these possible
solutions:

  • Get faster
    CPUs, or add additional CPUs.

  • Get CPUs with
    a larger L2 cache.

  • Tune your
    application so that it doesn’t have to access the disk as
    often. For example, add indexes so table scans aren’t
    needed, normalize your database to eliminate redundant data,
    etc.)

  • Tune your
    queries to reduce the CPU load.

  • Move some of
    the processing load to another SQL Server.

  • Consider
    turning on Windows NT fibers. [7.0, 2000 only]

  • Be sure that
    both OLTP and OLAP queries are not being run on the same
    server. These different database applications should be
    performed on separate servers.

 

I/O
Performance Counters

If your I/O
subsystem is working efficiently, then each time SQL Server
wants to write or read data, it can without waiting. But if the
load on the server is too great, then reads and writes will have
to wait, each taking their turn. This can significantly reduce
SQL Server’s performance.

The best way to
monitor this is to use the PhysicalDisk Object: Avg. Disk Queue
Length
to monitor each disk array in your server. If the Avg.
Disk Queue Length exceeds 2 for continuous periods (over 10
minutes or so) for each disk drive in an array, then you
probably have an I/O bottleneck for that array. You will need to
calculate this figure because Performance Monitor does not know
how many physical drives are in arrays.

For example, if
you have an array of 6 physical disks, and the Avg. Disk Queue
Length is 10 for a particular array, then the actual Avg. Disk
Queue Length for each drive is .83 (10/12=.83), which is well
within the recommend 2 per physical disk.

The Physical Disk
Object: % Disk Time
counter is another handy tool for several
reasons. This counter measures how busy a physical array is (not
a logical partition or individual disks in an array). It
provides a good relative measure of how busy your arrays are,
and over a period of time, can be used to determine if I/O needs
are your server are increasing, indicating a potential need for
more I/O capacity in the near future.

As a rule of
thumb, the % Disk Time counter should run less than 90%. If this
counter exceeds 90% for continuous periods (over 10 minutes or
so), then your SQL Server may be experiencing an I/O bottleneck.
If you suspect a physical disk bottleneck, you may also want to
monitor the Physical Disk
Object: % Disk Read Time
counter and the Physical Disk
Object: % Disk Write Time

counter in order to help determine if the I/O bottleneck is
being mostly caused by reads or writes.

Also, this counter
is a good indicator of how busy each array on your server is. By
monitoring each array, you can tell how well balanced your I/O
is over each array. Ideally, you want to distribute the I/O load
of SQL Server as evenly as possible over your arrays, and this
counter will tell you how successful you have been doing this.
If you find that one array is much busier than another array,
you may want to consider moving one or more files from the busy
array to the less busy array.

Before using these
counters for Window NT Server 4.0, be sure to manually turn it
on by going to the NT Command Prompt and entering the following:
"diskperf -y", and then rebooting your server. This is
required to turn on the disk counters on for the first time.

If your SQL Server
is experiencing I/O bottlenecks, consider these possible
solutions:

  • Add additional
    physical RAM so that your server can go to RAM instead of
    the I/O system to access data.

  • If you are not
    already, use RAID level 5 or RAID level 10 for your arrays.
    RAID level 10 is the fastest RAID level you can choose that
    supports redundancy.

  • Add more
    physical drives to the current arrays. This helps to boost
    both read and write access times. But don’t add more drives
    to the array than your I/O controller can support.

  • Replace your
    current hard drives with faster drives.

  • Add faster or
    additional I/O controllers. Consider adding more cache (of
    possible) to your current controllers.

  • Tune your
    application so that it doesn’t have to access the disk as
    often. For example, add indexes so table scans aren’t
    needed, normalize your database to eliminate redundant data,
    etc.)

  • Move database
    or transaction log files from busy arrays to less busy
    arrays.

  • Store your
    databases and transaction log files on a SAN (storage area
    network).

  • Used
    partitioned views and federated servers to distribute
    workload (2000 only).

 

Memory
Performance Counters

This section
assumes that your server is dedicated to SQL Server, and perhaps
some related server utilities. If not, and you are having
memory-related performance problems on your SQL Server, then
your first step should be to move all non-SQL Server-related
programs off of the physical server running SQL Server. Once you
have done that, then use the following information to look for
memory-related bottlenecks.

One of the key
counters you should be regularly watching is the Memory Object:
Pages/Sec
. This measures the number of pages per second that are
paged out of memory to disk, or paged into memory from disk.
Assuming that SQL Server is the only major application running
on your server, then this figure should average nearly zero,
except for occasional spikes, which are normal.

Over continuous
periods of time (10 minutes or so) the Pages/Sec should ideally
be near zero (you will see some activity). If this is not the case, this means that NT Server
is having to page data, which it should not be doing. This is
because SQL Server does not use NT Server’s page file, and since
this should be the only application on your server, there should
be little paging going on.

If there is
regular paging going on, this means that you are running other
applications on your server, which is causing NT Server to page,
or you have set the SQL Server Max Server Memory configuration
setting to some other setting other than "Dynamically
configure SQL Server memory". Determine which is the
problem and fix it, as this paging is slowing down SQL Server’s
performance. Ideally, remove the NT applications causing the
paging.

If you have
changed the SQL Server Max Server Memory configuration to some
other value other than "Dynamically configure SQL Server
memory", then change it back to this setting. SQL Server
should be allowed to take as much RAM as it wants for its own
use without having to compete for RAM with other applications.

Another way to
double-check to see if your SQL Server has enough physical RAM
is to check the Memory Object: Available Bytes counter. This
counter can be viewed from Performance Monitor or from the NT
Server or Windows 2000 Task Manager (see the Performance tab).
This value should be greater than 5MB. If not, then your SQL
Server needs more physical RAM. On a server dedicated to SQL
Server, SQL Server attempts to maintain from 4-10MB of free
physical memory. The remaining physical RAM is used by the
operating system and SQL Server. When the amount of available
bytes is less than 4MB, most likely SQL Server is also paging
(which it shouldn’t) and is experiencing a performance hit.

If your SQL Server
is experiencing memory bottlenecks, consider these possible
solutions:

  • Add more
    physical RAM. If you are already at 2GB of physical RAM and
    are still having a memory problem, consider upgrading to SQL
    Server 7 Enterprise Edition (which supports up to 3GB), or
    upgrading to SQL 2000 Enterprise Edition, which in theory
    can support up to 64GB of physical RAM.

  • Ensure that
    SQL Server is the only application running on the server
    (other than server management utilities).

  • Remove or
    disable all unnecessary services.

  • Ensure SQL
    Server is running as a member server (not a domain
    controller).

  • Configure SQL
    Server to allocate memory dynamically (the default), don’t
    hard code how much RAM SQL Server is allocated.

 

Network
Performance Counters

One of the best
ways to monitor if you have a network bottleneck is to watch the

Network Interface Object: Bytes Total/Sec
counter. This counter
measures the number of bytes that are being sent back and forth
between your server and the network.  This includes both
SQL Server and non-SQL Server network traffic. Assuming your
server is a dedicated SQL Server, then the vast majority of the
traffic measured by this counter should be from SQL Server.

There is no hard
and fast "correct" number for this counter as it
measures the actual traffic. To help you decide if your server
has a network bottleneck, one way to use this number is to
compare it with the maximum traffic supported by the network
connection your server is using. Also, this is another important
counter to watch over time. It is important to know if your
network traffic is increasing regularly. If it is, then you can
use this information to help you plan for future hardware needs.

If your SQL Server
is experiencing network bottlenecks, consider these possible
solutions:

  • Add faster
    network cards.

  • Add additional
    network cards.

  • Server network
    card should be attached to switches.

  • Network cards
    should be running in full duplex mode.

  • Tune your
    application so that it does not require unnecessary network
    trips. Do this by returning only the required data and used
    stored procedures.

  • Remove all
    unnecessary network protocols from the server.

  • Use TCP/IP as
    the network library on the client and server.

Before you can use
the network performance counters, the Network Monitor Agent
service must be installed on your server. After installing it,
you will have to reboot. Also, don’t forget to rerun the latest NT
service pack to update the files added during the installation
process.

 

SQL
Server Performance Counters

Up to this point
we have focused on identifying and resolving key
hardware-related bottlenecks. In this section, we will take a
look of some of the Performance Monitor counters you can use to
identify specific performance issues in SQL Server.

One
of the key SQL Server Performance Monitor counters to watch is the
SQL Server Buffer Manager Object: Buffer
Cache Hit Ratio
. This indicates how often SQL Server goes to
the buffer, not the hard disk, to get data. In OLTP
applications, this ratio should exceed 95% on a regular basis. If it doesn’t, then
you should consider adding more RAM to your server to increase performance.
In OLAP
applications, the hit ratio may be much less because of the nature
of how OLAP works. In any case, more physical RAM should increase the
performance of SQL Server running either OLTP or OLAP
applications.

If you want to see
how much physical RAM is devoted to SQL Server’s data cache,
monitor the SQL Server Buffer Manager Object: Cache Size
(pages)
. This number is presented in pages, so you will have to
take this number and multiply it by 8K (8,192) to determine the
amount of RAM in K that is being used. Generally, this
number should almost come close to the total amount of RAM in
your computer, assuming you are devoting your server to SQL
Server. This number should be close to the total amount of RAM
in the server, less the RAM used by NT, SQL Server, and any
utilities you have running on the server. If the amount of
RAM devoted to the data cache is much smaller than you would
expect, then you need to do some investigating to find out why.
Perhaps you aren’t allowing SQL Server to dynamically allocate
RAM. Whatever the cause, you need to find a solution, as the
amount of data cache available to SQL Server can significantly
affect SQL Server’s performance.

Since the number
of users using SQL Server affects its performance, you may want
to keep an eye on the SQL Server General Statistics Object: User
Connections
. This shows the number of user connections, not
the number of users, that currently are connected to SQL Server.
When interpreting
this number, keep in mind that a single user can have multiple
connections open, and also that multiple people can share a
single user connection. Don’t make the assumption that this
number represents actual users. Instead, use it as a relative
measure of how "used" the server is. Watch the number
over time to get a feel if your server is being more used, or
less used. If usage is going up, then you can use this
information to help you better plan future hardware needs.

One
cause of excess I/O on a SQL Server is page splitting. Page
splitting occurs when an index or data page becomes full, and
then is split between the current page and a newly allocated
page. While occasional page splitting is normal, excess page
splitting can cause performance issues. To find out if you
are experiencing a large number of page splits, monitor the SQL
Server Access Methods Object: Page Splits/sec
.
Unfortunately, I don’t have a good figure to recommend as to
what this maximum number should be for this counter. The goal
should be to keep it as low as possible. What you may want to do
is watch it over a period of time. If it is increasing, this may
indicate that you need to rebuild the indexes on the tables in
your databases, and you may want to consider increasing the
fillfactor on the indexes when you rebuild them.

If your users
are complaining that they have to wait for their transactions to
complete, you may want to find out if object locking on the
server is contributing to this problem. To do this, use the SQL
Server Locks Object: Average Wait Time (ms)
. You can use this
counter to measure the average wait time of a variety of locks,
including: database, extent, Key, Page, RID, and table. If you can
identify one or more types of locks causing transaction delays,
then you will want to investigate further to see if you can
identify what specific transactions are causing the locking. The
Profiler is the best tool for this detailed analysis.

While table scans
are a fact of life, and sometimes faster than index seeks, generally
it is better to have fewer table scans than more. To find
out how many table scans your server is performing, use the SQL
Server Access Methods Object: Full Scans/sec
. Note that this
counter is for an entire server, not just a single database. One
thing you will notice with this counter is that there often
appears to a pattern of scans occurring periodically. In many
cases, these are table scans SQL Server is performing on a
regular basis for internal use. What you want to
look for are the random table scans that represent your
application. If you see what you consider to be an inordinate
number of table scans, then break out the Profiler and Index
Tuning Wizard to help you determine exactly what is causing
them, and if adding any indexes can help reduce the table scans.
Of course, SQL may just be doing its job well, and performing
table scans instead of using indexes because it is just plain
more efficient.

If you suspect
that your backup or restore operations are running at
sub-optimal speeds, you can help verify this by using the
SQL Server Backup Device Object: Device Throughput Bytes/sec
.
This counter will give you a good feel for how fast your backups
are performing. You will also want to use the Physical Disk
Object: Avg. Disk Queue Length counter to help collaborate your
suspicions. Most likely, if your are having backup or restore
performance issues, it is because of an I/O bottleneck.

 

Removing
Bottlenecks is Only One Part of Performance Tuning

When many people
think about performance tuning, they only think about how to
remove hardware-related bottlenecks. What they don’t know is
that most SQL Server-related performance problems are not
hardware related, but are mostly caused by poor database and
application design. Because of this, they are often disappointed
that they can’t figure their performance issues by throwing
hardware at the problem. Removing hardware bottlenecks is just
one part of the much larger subject of SQL Server performance
tuning and optimization.

 

 

Latest Articles