Creating Your Own Custom Data Collections
January 28, 2009
Data Collections are new with SQL Server 2008. Ive written a couple of articles about SQL Server Data Collections. The first one (http://www.databasejournal.com/features/mssql/article.php/3771871/SQL-Server-2008-Data-Collections-and-the-Management-Data-Warehouse.htm) discussed Data Collections and configuring the Management Data Warehouse (MDW). The second one (http://www.databasejournal.com/features/mssql/article.php/3779846/Reports-for-SQL-Server-2008-System-Data-Collections.htm) was about using the history reports to view information collected in the MDW by the system Data Collections. And now Im writing this article that will cover how to create your own custom Data Collections.
What is a Data Collection and the MDW?
Just in case you didnt read either one of my first two articles let me digress a little and describe what a Data Collection and the MDW are.
A Data Collection is a process that allows a DBA to easily define some data they want to collect, how often it should be collected, the method that should be used to collect it, and how long the collected data should be retained. A Data Collection utilizes SQL Server Integrated Services and SQL Agent to perform the actual data collecting.
The data that is collected via Data Collections is stored in the MDW. The MDW is just a database. When you set up an instance to collect data via a Data Collection you need to identify the MDW that will be use for storing the collected data. A given MDW database can be used to store data for a single instance or multiple instances.
Defining a Custom Data Collection
In order to create a custom Data Collection you need to use a T-SQL script. For some reason Microsoft didnt build a GUI interface to create a custom Data Collection. Hopefully in future versions a GUI interface will be developed, but for now the only way to create your custom Data Collection will be to execute a T-SQL script.
You can create a T-SQL script by hand to define your custom Data Collection. Alternatively, you can generate a script from one of the three different system Data Collections, and then modify it to meet your needs. This is the method I prefer. It is easier to generate a script of a system Data Collation by using the script task from Object Explorer and modifying it, then keying in a complete script by hand.
In order to demonstrate how to build a custom Data Collection I need to determine what data I want to collect. For this demonstration, I will be collecting index usage statistics that can be gathered using the sys.dm_db_index_usage_stats DMV. By capturing the index usage statistics from the DMV and storing them in the MDW, I will be able to track which indexes are used overtime. Here is the T-SQL script I have that will list the index usage statistics for a database. This script will be incorporated into a custom Data Collection that will capture index usage statistics for all user databases once a day.
SELECT o.name Object_Name, SCHEMA_NAME(o.schema_id) Schema_name, DB_NAME(s.database_id) Database_Name, i.name Index_name, i.Type_Desc, s.user_seeks, s.user_scans, s.user_lookups, s.user_updates, s.system_seeks, s.system_scans, s.system_lookups, getdate() Stat_Date FROM sys.objects AS o JOIN sys.indexes AS i ON o.object_id = i.object_id JOIN sys.dm_db_index_usage_stats AS s ON i.object_id = s.object_id AND i.index_id = s.index_id AND DB_ID() = s.database_id WHERE o.type = 'u' AND i.type IN (1, 2) AND(s.user_seeks > 0 OR s.user_scans > 0 OR s.user_lookups > 0 OR s.system_seeks > 0 OR s.system_scans > 0 OR s.system_lookups > 0);
To build my custom Data Collection script I will first generate a script from an existing system Data Collection and then modify it to use the above script to collect my statistics. Microsoft has provided three different Data Collection types that can be used to collect statistics and they are: T-SQL Query, SQL Trace and Performance Counters. Since I will be collecting data using a T-SQL script, the T-SQL Query collector type is the appropriate type to use for collecting my data. Therefore, I need to generate a script from a system Data Collection that uses this type. The Disk Usage system Data Collection happens to use a T-SQL collection type. Once I generate the Disk Usage Data Collection script, I will need to modify it to meet my needs. Here is my modified script:
Begin Transaction Begin Try Declare @collection_set_id_1 int Declare @collection_set_uid_2 uniqueidentifier -- Begin Section A --------------------------------------- EXEC [msdb].[dbo].[sp_syscollector_create_collection_set] @name=N'Index Usage Statistics', @collection_mode=1, @description=N'Collects DMV Index Usage Satistis', @logging_level=0, @days_until_expiration=180, @schedule_name=N'Daily at Midnight', @collection_set_id=@collection_set_id_1 OUTPUT, @collection_set_uid=@collection_set_uid_2 OUTPUT -- End Section A ---------------------------------------- Declare @collector_type_uid_3 uniqueidentifier -- Begin Section B -------------------------------------- Select @collector_type_uid_3 = collector_type_uid From [msdb].[dbo].[syscollector_collector_types] Where name = N'Generic T-SQL Query Collector Type'; -- End Section B ---------------------------------------- -- Begin Section C -------------------------------------- Declare @collection_item_id_4 int EXEC [msdb].[dbo].[sp_syscollector_create_collection_item] @name=N'Get Index Usage Statistics', @parameters=N'<ns:TSQLQueryCollector xmlns:ns="DataCollectorType"><Query><Value> SELECT o.name Object_Name, SCHEMA_NAME(o.schema_id) Schema_name, i.name Index_name, i.Type_Desc, s.user_seeks, s.user_scans, s.user_lookups, s.user_updates, s.system_seeks, s.system_scans, s.system_lookups, getdate() Stat_Date FROM sys.objects AS o JOIN sys.indexes AS i ON o.object_id = i.object_id JOIN sys.dm_db_index_usage_stats AS s ON i.object_id = s.object_id AND i.index_id = s.index_id AND DB_ID() = s.database_id WHERE o.type = ''u'' AND i.type IN (1, 2) AND(s.user_seeks > 0 OR s.user_scans > 0 OR s.user_lookups > 0 OR s.system_seeks > 0 OR s.system_scans > 0 OR s.system_lookups > 0) </Value><OutputTable>IndexUsage</OutputTable></Query><Databases UseUserDatabases="true" /></ns:TSQLQueryCollector>', @collection_item_id=@collection_item_id_4 OUTPUT, @collection_set_id=@collection_set_id_1, @collector_type_uid=@collector_type_uid_3 -- End Section C ---------------------------------------- Commit Transaction; End Try Begin Catch Rollback Transaction; DECLARE @ErrorMessage NVARCHAR(4000); DECLARE @ErrorSeverity INT; DECLARE @ErrorState INT; DECLARE @ErrorNumber INT; DECLARE @ErrorLine INT; DECLARE @ErrorProcedure NVARCHAR(200); SELECT @ErrorLine = ERROR_LINE(), @ErrorSeverity = ERROR_SEVERITY(), @ErrorState = ERROR_STATE(), @ErrorNumber = ERROR_NUMBER(), @ErrorMessage = ERROR_MESSAGE(), @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-'); RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, @ErrorLine, @ErrorMessage); End Catch; GO
I have broken this script into sections to make it easier to go through each section and describe how this script creates a custom Data Collection.
Section A is used to create a collection set. I create a collection set by executing the sp_syscollector_create_collection_set stored procedure. This stored procedure requires a number of different parameters, a few of which are worth noting: @collection_mode, @days_until_expiration, and @schedule_name. The @collection_mode Ive set to 1. This tells SQL Server that it should use a non-cached Data Collection. There are two different caching modes, cached and non-cached collections. A cached collection is one that saves the data in a temporary directory and then periodically uploads the cached information into the MDW, whereas non-cached collections tells SQL Server to collect and upload the collected data into the MDW in one operation. The @days_until_expiration parameter is used to identify how long you want to keep the collected information in the MDW. In my case, I specified 180 days. SQL Server will automatically purge my collection index statistics after they have been in the MDW longer than 180 days. Here I have specified Daily at Midnight for the @schedule_name. Daily at Midnight is a shared SQL agent schedule that I created. This tells the Data Collection process to create a SQL Agent job that is to be based on the shared schedule with this name. This causes my new collection to be run once a day at midnight to collect my missing index statistics.
Section B is just a short section. In this section of my script, I identify the type of Data Collection I want to use to collect my index statistics. Since Im using T-SQL code to collect statistics I need to use the Generic T-SQL Query Collector Type. You can use this collector type any time you plan to use a T-SQL script to collect your statistics.
Section C pulls together some of the output columns from Sections A and B along with my T-SQL script to identify what statistics will be collected. If you look closely at the @parameters parameter value, you will note it contains some XML. This XML contains my T-SQL script as well as some other important stuff. Each element of the XML tells SQL Server something different about this Data Collection. The <Query> element contains a <Value> and <OutputTable> element. The <Value> element contains my T-SQL statement that collects index usage statistics. If you look close at the T-SQL you will see that I had to replace the greater than symbol (>) in my T-SQL code with >. This is because the greater than (>) and the less than symbols (<) are used to begin and end the name of XML elements. If you didnt replace these symbols then the string associated with this parameter would not contain valid XML. I also have to replace the single quote with two single quotes. The <OutputTable> identifies the table within the MDW that will be created, and used to store statistics being gathered. Here I identified IndexUsage for the name of the table that I want my statistics to be stored in. The last element is the <Databases> element. This element has the attribute UseUserDatabases set to true. This tells SQL Server to only collect statistics for user databases, and not system databases.
Note that there are a number of different ways to specify this last element <Databases>. If you want to include both user databases and system databases, you would specify this element like so:
<Databases UseSystemDatabases="true" UseUserDatabases="true" />
If you only wanted to gather statistics for the system databases, you could just specify only the UseSystemDatabases parameter. If you should want to collect statistics for only a couple of databases, you would specify something similar to the following:
<Databases> <Database>MyDB1</Database> <Database>MyDB2</Database> </Databases>
If I used the above for my <Databases> element then I would be telling the data collection process to only collect statistics for the databases MyDB1 and MyDB2. As you can see, there are a number of different ways to identify the databases from which you want to collect statistics.
Once I have run the data collection script above, my custom Data Collection will be created, but this doesnt mean SQL Server will be start collecting my statistics. I must first start the collection in order for it to start collecting information. To start it I can run the following command:
EXEC msdb.dbo.sp_syscollector_start_collection_set @name = 'Number of Rows'
Alternatively, I can start my Data Collection in SSMS by right clicking on the Data Collection item in the Object Explorer pane and selecting the Start Data Collection Set option.
Viewing Information in the MDW
After my Data Collection has run the first time, I can view the information it collected in the MDW. To view the data I can run the following simple T-SQL SELECT statement:
SELECT * FROM MDW.custom_snapshots.IndexUsage
This SELECT statement will display the statistics that my custom Data Collection has gathered and place in my IndexUsage table within custom_snapshots schema in the MDW database. If this table doesnt exist when my custom Data Collection is run the first time, it will be created.
All custom Data Collections create their MDW tables in the custom_snapshots schema. It may take a few moments the first time you run your custom Data Collection for this object to be created in the MDW.
Creating custom Data Collections in SQL Server 2008 allows you to gather and manage information about your SQL Server Instances. The amount of effort it takes to set up a Data Collection is minimal. Microsoft has built in all the SSIS components to gather, populate and purge information from the MDW for you. By using custom Data Collections, DBAs can now easily create a method to gather the kinds of statistics they need to better manage their SQL Server environment.