To ensure that queries run as efficiently as possible, the database administrator needs to monitor performance to find inefficient queries. This article focuses on exploring how a DBA can monitor the I/Os against an instance of SQL Server, using Dynamic Management Objects, or more commonly know as Dynamic Management Views (DMVs) and Functions (DMFs).
When you are designing an
application around a SQL Server database, or any database platform for that
matter, how well your queries perform is important. With SQL Server, you may
have a single application using a single database, or multiple applications running
against multiple databases on the same instance. In order to make queries that
run against a specific instance of SQL Server run as efficient as possible, the database
administrator needs to monitor performance to find inefficient queries. There
are many different things a DBA can monitor, CPU, I/O, memory, etc. In this
article I will focus on exploring how a DBA can monitor the I/Os against an
instance of SQL Server, using Dynamic Management Objects, or more commonly know
as Dynamic Management Views (DMVs) and Functions (DMFs).
Using Dynamic Management Objects for Your Analysis
When you are looking at a
database instance and your goal is to identify what is consuming I/O, where do
you start? There are a number of different ways to go about this, depending on
your situation. If you have a single database running on the instance, then all
you will need to do is peek into that single database and identify queries that
are issuing the bulk of the I/Os. However, if you have multiple databases
running on an instance, you need to determine how much I/O each database is
generating.
Depending on whether you have a
single database instance or a multiple database instance will determine which
DMVs and DMFs to use. I will explore which DMVs to use in a minute. However,
first you need to understand how SQL Server collects information for Dynamic
Management Objects and the limitations they have.
SQL Server starts collecting
information that can be exposed via DMVs and DMFs as soon as SQL Server starts
running. The information collected is available as long as SQL Server is
running. Once SQL Server is stopped all the information collected is lost.
Information collected may also be cleared, or reset depending on what happens
within SQL Server. For example, statistics are collected on indexes as they are
used and updated, but if an index is dropped and recreated then the statistics
collected are also dropped along with the index. Therefore, when you are using
DMVs and DMFs to analyze your I/O you need to keep in mind how long SQL Server
has been running, and whether or not the statistics could have been reset due
to various activities.
Determining I/O Consumption by Database
Before you go looking at a
specific database to find those I/O intensive T-SQL statements, you must first
identify which database, or databases on your SQL Server instance are using
most of the I/O. To do this you can use the
"sys.dm_io_virtual_file_stats" DMF. This DMF returns the I/O
statistics associated with the DATA and LOG files for all databases on an
instance. This DMF replaces the "fn_virtualfilestats" function.
This DMF can return I/O for all
databases, or just a specific database depending on what parameters are pass to
the function. Here is the syntax for calling this function:
sys.dm_io_virtual_file_stats ( { database_id | NULL } , { file_id | NULL } )
If a database_id is passed to this function then this function only returns
information for the database where the database_id matches the ID that is
passed. You can also pass just file_id if you only want to return the I/O
information for a specific file_id across all databases. Alternatively, you can
pass both a database_id and a file_id if you only want this function to return
I/O information for a specific database_id and file_id. If both the database_id
and File_id parameters are null then this DMF returns I/O information for all
databases and files on an instance.
The following table identifies
the output returned from this DMF.
Column name |
Data type |
Description |
database_id |
smallint |
ID of database. |
file_id |
smallint |
ID of file. |
sample_ms |
int |
Number of milliseconds since the computer was started. This |
num_of_reads |
bigint |
Number of reads issued on the file. |
num_of_bytes_read |
bigint |
Total number of bytes read on this file. |
io_stall_read_ms |
bigint |
Total time, in milliseconds, that the users waited for reads |
num_of_writes |
bigint |
Number of writes made on this file. |
num_of_bytes_written |
bigint |
Total number of bytes written to the file. |
io_stall_write_ms |
bigint |
Total time, in milliseconds, that users waited for writes to be |
io_stall |
bigint |
Total time, in milliseconds, that users waited for I/O to be |
size_on_disk_bytes |
bigint |
Number of bytes used on the disk for this file. For sparse |
file_handle |
varbinary |
Windows file handle for this file. |
Now that we understand how to
call the sys.dm_io_virtual_file_stats and the columns returned, let me provided
you with a couple of sample queries. These sample queries can help you identify
different I/O aspects related to your instance.
First, let me identify which
databases are contributing the bulk of the I/O for an instance. In order to do
this, I will be summing up and grouping the number of read I/Os and write I/Os
by database and then ordering the output where the database with the highest
number of I/Os is displayed first. Below is a query that will show the top five
databases from an I/O perspective:
SELECT TOP 5 DB_NAME(database_id) AS [Database Name] ,SUM(num_of_reads + num_of_writes) AS [Total I/Os] FROM sys.dm_io_virtual_file_stats (NULL,NULL) GROUP BY database_id ORDER BY SUM(num_of_reads + num_of_writes) DESC
When I run this code on my
laptop, which has a very lightly used SQL Server instance, I get the following
output.
Database Name Total I/Os ------------------ -------------------- Examples 678 msdb 221 ReportServer 171 Demo 170 master 132
From this output, you can see
that the "Examples" database has done the most I/Os since my SQL Server
started, followed by the "msdb" database. By using this single query,
you could quickly identify which database on your instance has used the most
I/O. This information can then be used to perform additional I/O analysis.
Another thing you can identify
with this DMF is how long SQL Server has been up and running. On each record
returned from querying this DMF is a column named "sample_ms". That
column identifies the number milliseconds since SQL Server started. By using
this column value, you can calculate the number of days, hours, minutes,
seconds, and milliseconds that SQL Server has been up and running, as the
following query demonstrates.
SELECT sample_ms / (1000*60*60*24) AS DAYS , sample_ms / (1000*60*60) - sample_ms / (1000*60*60*24) * 24 AS HOURS , sample_ms / (1000*60) - sample_ms / (1000*60*60) * 60 AS Minutes , sample_ms / (1000) - sample_ms / (1000*60) * 60 AS Seconds , sample_ms - sample_ms / (1000) * 1000 AS MilliSeconds FROM sys.dm_io_virtual_file_stats(1,1);
If SQL Server has been up for a
length of time it is not possible to identify when the I/Os occurred against
each of these virtual files using this DMF. If you are concerned, about which
database is currently performing all the I/O you need to compare the virtual
I/O information between two different samples. Here is a script that will
identify the top five databases, based on the amount of I/Os between two
different samples that are 10 seconds apart.
DECLARE @Sample TABLE ( database_id SMALLINT, file_id SMALLINT, sample_ms INT, num_of_reads BIGINT, num_of_bytes_read BIGINT, io_stall_read_ms BIGINT, num_of_writes BIGINT, num_of_bytes_written BIGINT, io_stall_write_ms BIGINT, io_stall BIGINT, size_on_disk_bytes BIGINT, file_handle VARBINARY(500)); INSERT INTO @Sample SELECT * FROM sys.dm_io_virtual_file_stats(NULL,NULL); WAITFOR DELAY '00:00:10'; SELECT TOP 5 DB_NAME(I.database_id) AS 'Database Name' , SUM(I.num_of_reads - S.num_of_reads) + SUM(I.num_of_writes - S.num_of_writes) AS 'Number of I/Os' FROM sys.dm_io_virtual_file_stats(NULL,NULL) I JOIN @Sample S ON I.database_id = S.database_id AND I.file_id = S.file_id GROUP BY I.database_id ORDER BY SUM(I.num_of_reads - S.num_of_reads) + SUM(I.num_of_writes - S.num_of_writes) DESC;
Here I have captured the first
sample of I/O statistics by inserting the records returned from
sys.dm_io_virtual_file_stats DMF into a table variable named @Sample. I then
use the WAITFOR statement to wait 10 seconds and then I compare the information
from the sys.dm_io_virtual_file_stats with what was saved in the table
variable. In this comparison, I calculate the number of I/Os that have been
accumulated between both samples, by subtracting the count of I/Os from the
first sample, from the I/O count of the second sample, by database. I then
display the top five databases based on the number of I/Os submitted during
this 10 second window.
As you can see, there are a
number of different ways you can use the sys.dm_io_virtual_file_stats DMF, to
calculate which database is causing the bulk of the database I/Os on your
instance.
Which T-SQL Statements Are Issuing those I/Os?
Now that you know which
databases are issuing the bulk of the I/O operations you are probably asking
yourself which T-SQL statements within that database is issuing all the I/Os.
When you are doing performance analysis, you want to find those statements that
are issuing the majority of I/Os. You might find all it takes is to fine-tune a
few queries to reduce your database I/Os considerably.
To find the most I/O intensive
commands for a specific database you need to use a couple different DMVs and
DMFs. The "sys.dm_exec_query_stats" DMV has the I/O statistics by SQL
statement , but not the text of the statement. To identify text of the SQL
statements, you need to join the output of the query stats DMV with the
"sys.dm_exec_sql_text" DMF. In addition, to make matters even a
little more complicated, you need to review the plan attribute information from
the "sys.dm_exec_plan_attributes" DMF to identify database for ADHOC
queries. Below is a T-SQL statement that identifies the 100 most expensive
T-SQL statements from an I/O perspective that were executed from the
"master" database:
SELECT TOP 100 COALESCE(DB_NAME(st.dbid), DB_NAME(CAST(pa.value AS INT))+'*') AS [Database Name] -- find the offset of the actual statement being executed ,SUBSTRING(text, CASE WHEN qs.statement_start_offset = 0 OR qs.statement_start_offset IS NULL THEN 1 ELSE statement_start_offset/2 + 1 END, CASE WHEN qs.statement_end_offset = 0 OR qs.statement_end_offset = -1 OR qs.statement_end_offset IS NULL THEN LEN(st.text) ELSE qs.statement_end_offset/2 END - CASE WHEN qs.statement_start_offset = 0 OR qs.statement_start_offset IS NULL THEN 1 ELSE qs.statement_start_offset/2 END + 1 ) AS [Statement] , qs.creation_time AS [Plan Creation Date] , qs.execution_count [Execution Count] ,(qs.total_logical_reads + qs.total_logical_writes + qs.total_physical_reads )/qs.execution_count [Average IOs] ,qs.total_logical_reads + qs.total_logical_writes + qs.total_physical_reads [Total IOs] ,qs.total_logical_reads/qs.execution_count [Avg Logical Reads] ,qs.total_logical_reads [Total Logical Reads] ,qs.total_logical_writes/qs.execution_count [Avg Logical Writes] ,qs.total_logical_writes [Total Logical Writes] ,qs.total_physical_reads/qs.execution_count [Avg Physical Reads] ,qs.total_physical_reads [Total Physical Reads] FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st OUTER APPLY sys.dm_exec_plan_attributes(qs.plan_handle) pa WHERE attribute = 'dbid' AND COALESCE(DB_NAME(st.dbid), DB_NAME(CAST(pa.value AS INT))) = 'master' ORDER BY (total_logical_reads + total_logical_writes + total_physical_reads) / execution_count DESC
Let me go through this query to
help explain what it is doing.
First, you can see I am
displaying a number of different statistics associated with the
"Statement" that was executed. I am displaying totals and average I/Os
for the physical, and logical reads as well as the write I/Os. All of the
statistics come out of the "sys.dm_exec_query_stats" DMV.
Next, I performed a CROSS APPLY
with the sys.dm_exec_sql_text based on the plan_handle to identify statements
associated with the compiled query plan. Since some query plans can contain
multiple statements, I need to use the "statement_start_offset" and
"statement_end_offset" columns to identify which statement the
statistics are associated with in the query text.
Lastly, I perform an OUTER
APPLY with the sys.dm_exec_plan_attributes DMF to identify the Database ID for
ADHOC plans. I need to do this because the sys.dm_exec_query_text DMF does not
contain the Database ID for ADHOC query plans. Just to let you know, the
"pa.value" identified in the above query is the Database ID in which
the query is submitted. Therefore, it is a possibility that the "Database
Name" generated from this query is not really the database in which the
T-SQL text was run. This could be true if three part naming is used when
identifying database objects in an execution plan. You can identify these ADHOC
queries and their derived database names by the "*" that follows the
database name in the output from this query.
Once I get all the data
together from the different DMVs and DMFs, I then calculate I/Os by command
summarizing the reads and writes and then divide the summed amount by the
execution count to calculate the average number of I/Os per statement
execution. Finally, I output the statistics order by the statement that
performs the most I/Os on average.
When reviewing output from this
query it is important to look at the number of times a query has executed
(Execution Count) verses average I/Os per query. This will allow you to focus
in on those queries that are executed frequently. I say this because, if you
have a query that is executed hundreds of thousands of times and you can cut
down the average I/Os per execution significantly then you drastically reduced
the total I/Os since, the T-SQL statement is executed so many times. That is
not to say you should totally ignore the T-SQL statements that are executed
just a few times, especially if they are executing millions of I/Os per
execution. It really comes down to balancing your time, and the effort related
to resolving the I/O issues. Sometimes all it takes is adding a missing index,
and other times it is a total rework/redesign of your databases.
Keep in mind that these
statistics are collected only since SQL Server started up, or a query plan was
cached. Therefore, when you are reviewing the output of the above query you
might consider when the plan was created (Plan Creation Date) so you know how
long SQL Server has been keeping statistics for each statement.
Tune-up Your Server by Reducing I/O
I/O is the resource that T-SQL
commands wait on the longest. The more I/Os a query does the longer it will
take to run the query. By looking at your instance and identifying the
databases that are using the bulk of the I/Os will help you identify where to
focus your tuning effort. By drilling into the high I/O databases and finding
the specific T-SQL statements that consume large amounts of I/Os, you can
quickly identify those problem queries that you should look at from a tuning
perspective. By reducing I/O on your big I/O consuming queries and databases,
you can help tune-up your server from an I/O perspective.
Additional Resources
Technet: Dynamic Management Views and Functions (Transact-SQL)
MSDN Blogs – Best Practices for Dynamic Management Views – George Heynen