Monitoring SQL Server with Performance Monitor Charting and Alerts
Monitoring production systems can alert you to problems before they get out
of hand. Performance Monitor (perfMon) is a good tool for this. You
can find perfMon under Programs\Administrative Tools (Common) in the Windows NT
Start Menu. The exe is in %SystemRoot%\system32\perfmon.exe.
Start
When you start perfMon, you will start with what might be a perplexing blank
screen:
The performance monitor does charting, alerts, log files, and reports.
This article will detail charts and alerts. These options are selectable
from the buttons on the top menu (chart is selected above) and from the view
menu.
Add Counters
In each of the sections the use is similar in the way you add items you want
to see or track. The plus button brings up a list of items to track.
Edit\Add To Chart gives the same function. The following screen will
appear:
Computer: The target will be the local machine until you select
the target machine. Pressing the “…” button will scan
the network for available computers. You can also enter the computer name
into the textbox. You can scan multiple machines in the same chart if you
like.
Object\Counter\Instance: These three items taken together define
a chart line you can monitor. Objects can contain many counters which can
contain many instances. Instances could be processes (exe’s) or processors
(0 – n), etc. For an explanation of a counter, just use the Explain button
on the right.
Chart
The charting section shows current activity for selected counters. To
add counters to your chart, click "Add" on the "Add to
Chart" screen. After you have added several counters, your chart may
look something like this:
It is highly recommended that you save your chart settings (File\Save) at
this point and start the program next time by opening the settings file.
Since there are dozens or hundreds of counters, one of the most helpful
things I can do is point out a few that are most useful in making sure a
production SQL Server system is running well.
Useful Chart Counters:
These are in the form Object–Counter–Instance.
System–Total Processor Time: This is the percentage of time
all processors spend doing work. If this is 100% your system is being
overtaxed. Admin preferences vary, but this should only rarely top
80%. This is usually the single most important determinant of perceived
performance.
SQLServer: SQL Statistics–Batch Requests/sec: This tells how
many requests users are making against SQL Server. This usually moves in
step with processor time. It shows people are actually using your server.
Highlight
To highlight a chart line (makes the line bold white) highlight the
description in the bottom grid and press the backspace. It is also
convenient to move up and down through the list and the highlighted chart line
will be changed as the highlighted item is changed. You can also double
click an item and see its description.
Periodic Update
The other important setting for charts is how often it updates. Too
often, and the polling will tax the server. If not often enough, you will
not get a realistic picture of current activity. Somewhere between one and
five seconds is usually adequate.
Alerts
Alerts are accessed through the second button (from left) or the
View\Alerts menu. Alerts will let you monitor for more rare or
critical items. You may not be around when the computer starts using 100%
of CPU, so you will miss it as the chart overwrites itself. But an alert
can tell you when it happened and even notify operators.
Counter Setup
You add counters to alerts in the same way as charts, with the "+"
button or by Edit\Add to Chart.
When setting up alerts there are two extra things to do.
You must specify the alert threshold, the number the alert must pass to trigger
a message. Optionally, you can also run a program when the alert
fires. As with charting, you should save your alerts to file.
Useful Alert Counters:
I did not include some alerts that normally are 0 or 100 all the time on the
chart settings above, you may wish to chart these as well. These are in
the form Object–Counter–Instance.
System–Total Processor Time: Alert on Over 80(%)
This is the percentage of time all processors spend doing work. If this is
100% your system is being overtaxed. Admin preferences vary, but this
should only rarely top 80%. This is usually the single most important
determinant of perceived performance.
SQLServerLocks – Number of Deadlocks/sec – Database: Alert on Over 0
Deadlocks can compromise data integrity for applications that don’t handle
every operation with correct transactions (any application). They can also
frustrate users depending on how the front ends handle them. Any deadlocks
are a cause for concern.
SQLServer: Buffer Manager – Buffer Cache Hit Ratio: Alert on
Under 97 (%)
Most systems will see their cache hit ratio forever pegged at 100%, this
yields optimal performance, because SQL Server never has to go to disk to
retrieve data. Minimal cache missing will not likely be a problem and is
to be expected.
Memory – Pages\sec: Alert on Over 2
This tells how often the system uses the page file. When more memory
is requested by applications than physical memory available, the disk will be
used. This should always be 0.
Here is an alert screen with two alerts triggered:
Alert Options
Send network Message–Net Send
This will send a popup message to the specified network user. Setting
this is highly recommended.
Alert Interval
As with charting, you set the update interval. Too often, and the
polling will tax the server. If not often enough, you may not catch a
triggering event. Somewhere between one and five seconds is usually
adequate. Also, if a counter is exceeded for a long time, and you have Net
Send or Run Program enabled, you may receive a lot of messages.
Saving Alerts
You can save triggered alerts to a comma or tab separated file from
File\Export Alerts. This will help you spot longer term patterns that will
be lost if you have to stop the performance monitor or reboot the machine.
Here’s how the exported alerts above look when opened in Excel:
Reported on \\WCTR04W119 | |||||||||
Date: 6/25/01 | |||||||||
Time: 3:47:16 PM | |||||||||
Data: Current Activity | |||||||||
Interval: 3.000 seconds | |||||||||
Date | Time | Value | Trigger Condition | Counter | Instance | Parent | Object | Computer | |
6/25/01 | 3:42:23 PM | 85.247 | > 80.000 | % Total Processor Time | System | \\hgn_sql1_nt | |||
6/25/01 | 3:43:07 PM | 80.491 | > 80.000 | % Total Processor Time | System | \\hgn_sql1_nt | |||
Machine Down
Unfortunately, perfMon will not alert you when a system goes down. It
will probably hang. So when you see perfMon hanging, your server is either
thrashing or down, and users are probably similarly suffering. If you are
lucky, one of your alerts will be tripped before a machine goes down.
Otherwise, I would suggest another monitoring program. I have seen a
small SQL application that will usually be adequate. It just queries the Server
and sends an alert if it encounters a timeout.
Further Monitoring
More detailed information about the performance of individual stored
procedures and queries is obtainable through Performance Monitor, the subject of
my next article.