Note: This article assumes knowledge of how to use and interpret Performance Monitor output.
SQL Server optimisation is at its most basic level a search for bottlenecks. Windows NT Performance Monitor allows the administrator to hunt down these bottlenecks and therefore produce performance gains in SQL Server, it provides easy access to performance information and therefore, a great way into SQL optimisation. Performance Monitor does, however, provide a baffling range of counters, many find it intimidating. (a useful tip is to use the explain button while looking through the counters)
This article covers some of the more important counters with reference to SQL Server. One article couldn’t hope to cover all aspects but at least it might get you familiar enough with the basics of Performance Monitor to allow experimentation.
Its important to remember that you need to use these counters in conjunction with each other to get any sensible results. To change SQL Server configuration options use the sp_configure stored procedure.
There are a number of areas that Performance Monitor can help diagnose, these include:
Processor: % Processor Time
This counter monitors CPU usage. If your using a multiple processor system you can either set up individual counters for each processor, or you can use System: % Total Processor Time. If you are using multiple counters they should ideally be showing similar values. If the counter is in the range 80% – 100%, there may be a problem. Spikes upto 100% are expected and are nothing to worry about, if the counter is constantly above 80% then you will need to take action to discover the cause.
You need to determine if SQL Server is the cause of this high usage, and if it is SQL Server which spid is causing the trouble. Using SQLServer-Users: CPUtime will give you some idea. Once you know which process is causing the trouble you can examine the query design, index usage and database design. The query may well be getting high cache-hit ratios but still require excessive I/O, which eats CPU cycles, look at the table design and index usage. Take a look at using Processor: % Privileged Time below.
If this doesn’t help then try adding faster or more processors to your server. If your system has enough memory to run SQL Server, adding processor power is the best way of increasing overall performance.
Odd as it may seem very low CPU usage is also worth watching for. Although this may seem a little odd low usage can be more of a problem than high usage. Low usage can result from locking contention or your application stalling. The bottleneck in this case is probably your application and not the processor!
Processor: % Privileged Time
Excessive I/O can be diagnosed by using Processor: % Privileged Time. If the counter is constantly above 20% and Processor: % Processor Time is below 80% then you may well have a system where SQL Server is producing far too many I/O requests. As above try looking at the database design, but also take a look at your hardware, especially your I/O controller card and NIC. It may also be worth, if your system is not a dedicated SQL Server machine, to schedule I/O / CPU intensive tasks to run during off-peak hours. A more radical approach (and something which shouldnt be attempted unless you understand all the implications) is to move Tempdb into RAM.
System: Processor Queue
Displays the queue for all processors in a system. If this value is above 2 the CPU is a bottleneck. To cure add more processor power or try and reduce CPU intensive tasks as above, also take a look at System: Context Switches/sec below.
System: Context Switches/sec
It is important to remember that context switching is a normal part of any NT system. It occurs when NT or SQL Server has to switch from executing one thread to another, this causes CPU overhead. If System: Processor Queue indicates that there is a bottleneck at the CPU try changing the number of threads being used by SQL Server and monitor the System: Context Switches/sec and System: Processor Queue values.
Process counters are mostly run on the SQL Server process instance, although it can be useful to run some of them against other problem applications.
Process: % Processor Time
Used in conjunction with Processor: % Processor Time to check what process is in fact using the most processor time. Normally run on the SQL Server process, although it is best to run it on any process you might suspect of eating processor time.
Process: Thread Count
Again used on the SQL Server process instance. This counter will display the number of threads currently active. Used with the worker threads configuration option, it is possible to control this value. Use in conjunction with System: Context Switches/sec to optimise CPU usage.
Process: Virtual Bytes
Displays the total virtual memory being used by SQL Server. Useful to see how much memory SQL Server is using or which other applications are memory hungry.
Process: Working Set
This counter shows the amount of memory that has been used by a process. Look out for a total that is below the amount of memory currently allocated to SQL Server. If the Process: Working Set amount is substantially below the allocated memory SQL Server may have too much memory. Try using the working set size configuration option to set working set size to its optimal value.
Process: Page Faults/sec
If this value is greater then 0 then the SQL Server process is producing soft page faults and as a result CPU overhead. Try setting the working set size value to be as close to the SQL Server’s memory allocation.
SQLServer: Cache Hit Ratio
Aim for this to be above the 90% mark. Althoug this might not be possibe as this counter is application specific, and if the I/O is random you might not get anywhere near 90%. To check you have the optimal value, try adding more physical memory (and configuring SQL Server to use it!) until the value stops rising or you run out of money! It is also worth testing the 1081 trace flag. (which basically allows index pages to remain in the data cache longer than data pages.)
SQLServer: I/O – Page Reads/sec
Look out for values that approach the limits of your hardware. 80 – 90 I/O operations per second. You may need to change your application, making it more efficent through better design, index placement etc. Note: this counter only measures the reads and not the writes. It is also important to remember that this only monitors SQL Server.
SQLServer: I/O – Batch Writes/sec
It is important to watch the average value for this counter as batch writes only occur during checkpoints, so at any given time the counter can display. During the checkpoints the I/O rate will rise dramatically. Try using the max async IO configuration option to maximise the I/O throughput during checkpoints.
SQLServer: I/O – Transactions/sec
This counter shows the number of ‘batches’ per second not transactions. (So queries not within a transaction don’t show.) There is no correct value but the higher the better.
Memory: Pages/sec / Memory: Page Faults/sec
Use when the system is settled into a steady state. Ideally this value should be 0 or as close as possible. Its shows the amount of hard paging that is occuring and is a good measure of the amount of the adequcy of the current RAM configuration. Paging occurs a large I/O overhead and should, therefore, be addressed as part of any optimisation plan. Adding or configuring more RAM should help reduce this value. Memory: Page Faults/sec being consitantly high can be a result of not enough RAM assigned to NT.
Disk Subsystem Tuning
When performance tuning the disk subsystem, you should attempt to isolate the disk I/O bottleneck with the SQLServer counters first, using the PhysicalDisk and LogicalDisk counters for more detailed monitoring. The logical counters monitor the logical NT drives (i.e. the drive letters) while physical counters monitor what disk administrator sees as a sngle physical device.
To be able to examine low-level disk activity it is necassary to run the diskperf -y command. This does however produce a load on the system and it is important to remember to turn it off using the diskperf -n command.
PhysicalDisk: Disk Queue Length
Any system experiencing disk queuing will become slowed down due to I/O overheads. You will need to either increase I/O capactity or reduce I/O within your application.
As you can see, optimisation is very much a work of art, it involves alot of trial and error. I hope this aricle will give you a good introduction as to what to look for and some of the more important counters.
Copyright ) 1998-99 G.h.van den Berg. All rights reserved.
This article may not be resold or redistributed without prior written permission from Guy van den Berg