Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:

By submitting your information, you agree that databasejournal.com may send you databasejournal offers via email, phone and text message, as well as email offers about other products and services that databasejournal believes may be of interest to you. databasejournal will process your information in accordance with the Quinstreet Privacy Policy.

News Via RSS Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted Jan 28, 2009

Creating Your Own Custom Data Collections

By Gregory A. Larsen

Data Collections are new with SQL Server 2008. I’ve written a couple of articles about SQL Server Data Collections. The first one (http://www.databasejournal.com/features/mssql/article.php/3771871/SQL-Server-2008-Data-Collections-and-the-Management-Data-Warehouse.htm) discussed Data Collections and configuring the Management Data Warehouse (MDW). The second one (http://www.databasejournal.com/features/mssql/article.php/3779846/Reports-for-SQL-Server-2008-System-Data-Collections.htm) was about using the history reports to view information collected in the MDW by the system Data Collections. And now I’m writing this article that will cover how to create your own custom Data Collections.

What is a Data Collection and the MDW?

Just in case you didn’t read either one of my first two articles let me digress a little and describe what a Data Collection and the MDW are.

A Data Collection is a process that allows a DBA to easily define some data they want to collect, how often it should be collected, the method that should be used to collect it, and how long the collected data should be retained. A Data Collection utilizes SQL Server Integrated Services and SQL Agent to perform the actual data collecting.

The data that is collected via Data Collections is stored in the MDW. The MDW is just a database. When you set up an instance to collect data via a Data Collection you need to identify the MDW that will be use for storing the collected data. A given MDW database can be used to store data for a single instance or multiple instances.

Defining a Custom Data Collection

In order to create a custom Data Collection you need to use a T-SQL script. For some reason Microsoft didn’t build a GUI interface to create a custom Data Collection. Hopefully in future versions a GUI interface will be developed, but for now the only way to create your custom Data Collection will be to execute a T-SQL script.

You can create a T-SQL script by hand to define your custom Data Collection. Alternatively, you can generate a script from one of the three different system Data Collections, and then modify it to meet your needs. This is the method I prefer. It is easier to generate a script of a system Data Collation by using the script task from Object Explorer and modifying it, then keying in a complete script by hand.

In order to demonstrate how to build a custom Data Collection I need to determine what data I want to collect. For this demonstration, I will be collecting index usage statistics that can be gathered using the sys.dm_db_index_usage_stats DMV. By capturing the index usage statistics from the DMV and storing them in the MDW, I will be able to track which indexes are used overtime. Here is the T-SQL script I have that will list the index usage statistics for a database. This script will be incorporated into a custom Data Collection that will capture index usage statistics for all user databases once a day.

SELECT o.name Object_Name,
       SCHEMA_NAME(o.schema_id) Schema_name,
       DB_NAME(s.database_id) Database_Name,
       i.name Index_name, 
       getdate() Stat_Date 
 FROM sys.objects AS o
     JOIN sys.indexes AS i
 ON o.object_id = i.object_id 
  sys.dm_db_index_usage_stats AS s    
 ON i.object_id = s.object_id   
  AND i.index_id = s.index_id
  AND DB_ID() = s.database_id
 WHERE  o.type = 'u'
   AND i.type IN (1, 2) 
   AND(s.user_seeks > 0 OR s.user_scans > 0 OR s.user_lookups > 0 
   OR s.system_seeks > 0 OR s.system_scans > 0 OR s.system_lookups > 0); 

To build my custom Data Collection script I will first generate a script from an existing system Data Collection and then modify it to use the above script to collect my statistics. Microsoft has provided three different Data Collection types that can be used to collect statistics and they are: T-SQL Query, SQL Trace and Performance Counters. Since I will be collecting data using a T-SQL script, the T-SQL Query collector type is the appropriate type to use for collecting my data. Therefore, I need to generate a script from a system Data Collection that uses this type. The “Disk Usage” system Data Collection happens to use a T-SQL collection type. Once I generate the “Disk Usage” Data Collection script, I will need to modify it to meet my needs. Here is my modified script:

Begin Transaction
Begin Try
Declare @collection_set_id_1 int
Declare @collection_set_uid_2 uniqueidentifier
-- Begin Section A ---------------------------------------
EXEC [msdb].[dbo].[sp_syscollector_create_collection_set] 
    @name=N'Index Usage Statistics', 
    @description=N'Collects DMV Index Usage Satistis', 
    @schedule_name=N'Daily at Midnight', 
    @collection_set_id=@collection_set_id_1 OUTPUT, 
    @collection_set_uid=@collection_set_uid_2 OUTPUT
-- End Section A ----------------------------------------
Declare @collector_type_uid_3 uniqueidentifier
-- Begin Section B --------------------------------------
Select @collector_type_uid_3 = collector_type_uid 
  From [msdb].[dbo].[syscollector_collector_types] Where name = N'Generic T-SQL Query Collector Type';
-- End Section B ----------------------------------------
-- Begin Section C --------------------------------------
Declare @collection_item_id_4 int
EXEC [msdb].[dbo].[sp_syscollector_create_collection_item] 
    @name=N'Get Index Usage Statistics', 
    @parameters=N'<ns:TSQLQueryCollector xmlns:ns="DataCollectorType"><Query><Value>
SELECT o.name Object_Name,
       SCHEMA_NAME(o.schema_id) Schema_name,
       i.name Index_name, 
       getdate() Stat_Date 
 FROM sys.objects AS o
     JOIN sys.indexes AS i
 ON o.object_id = i.object_id 
  sys.dm_db_index_usage_stats AS s    
 ON i.object_id = s.object_id   
  AND i.index_id = s.index_id
  AND DB_ID() = s.database_id
 WHERE  o.type = ''u''
   AND i.type IN (1, 2) 
   AND(s.user_seeks > 0 OR s.user_scans > 0 OR s.user_lookups > 0 
   OR s.system_seeks > 0 OR s.system_scans > 0 
   OR s.system_lookups > 0) 
</Value><OutputTable>IndexUsage</OutputTable></Query><Databases UseUserDatabases="true" /></ns:TSQLQueryCollector>', 
    @collection_item_id=@collection_item_id_4 OUTPUT, 
-- End Section C ---------------------------------------- 
Commit Transaction;
End Try
Begin Catch
Rollback Transaction;
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
DECLARE @ErrorNumber INT;
DECLARE @ErrorProcedure NVARCHAR(200);
       @ErrorSeverity = ERROR_SEVERITY(),
       @ErrorState = ERROR_STATE(),
       @ErrorNumber = ERROR_NUMBER(),
       @ErrorMessage = ERROR_MESSAGE(),
       @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');
RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, @ErrorLine, @ErrorMessage);
End Catch;

I have broken this script into sections to make it easier to go through each section and describe how this script creates a custom Data Collection.

Section A is used to create a collection set. I create a collection set by executing the sp_syscollector_create_collection_set stored procedure. This stored procedure requires a number of different parameters, a few of which are worth noting: @collection_mode, @days_until_expiration, and @schedule_name. The @collection_mode I’ve set to 1. This tells SQL Server that it should use a non-cached Data Collection. There are two different caching modes, cached and non-cached collections. A cached collection is one that saves the data in a temporary directory and then periodically uploads the cached information into the MDW, whereas non-cached collections tells SQL Server to collect and upload the collected data into the MDW in one operation. The @days_until_expiration parameter is used to identify how long you want to keep the collected information in the MDW. In my case, I specified 180 days. SQL Server will automatically purge my collection index statistics after they have been in the MDW longer than 180 days. Here I have specified “Daily at Midnight” for the @schedule_name. “Daily at Midnight” is a shared SQL agent schedule that I created. This tells the Data Collection process to create a SQL Agent job that is to be based on the shared schedule with this name. This causes my new collection to be run once a day at midnight to collect my missing index statistics.

Section B is just a short section. In this section of my script, I identify the type of Data Collection I want to use to collect my index statistics. Since I’m using T-SQL code to collect statistics I need to use the “Generic T-SQL Query Collector Type”. You can use this collector type any time you plan to use a T-SQL script to collect your statistics.

Section C pulls together some of the output columns from Sections A and B along with my T-SQL script to identify what statistics will be collected. If you look closely at the “@parameters” parameter value, you will note it contains some XML. This XML contains my T-SQL script as well as some other important stuff. Each element of the XML tells SQL Server something different about this Data Collection. The <Query> element contains a <Value> and <OutputTable> element. The <Value> element contains my T-SQL statement that collects index usage statistics. If you look close at the T-SQL you will see that I had to replace the greater than symbol (“>”) in my T-SQL code with “&gt;”. This is because the greater than (“>”) and the less than symbols (“<”) are used to begin and end the name of XML elements. If you didn’t replace these symbols then the string associated with this parameter would not contain valid XML. I also have to replace the single quote with two single quotes. The <OutputTable> identifies the table within the MDW that will be created, and used to store statistics being gathered. Here I identified “IndexUsage” for the name of the table that I want my statistics to be stored in. The last element is the “<Databases>” element. This element has the attribute “UseUserDatabases” set to “true”. This tells SQL Server to only collect statistics for user databases, and not system databases.

Note that there are a number of different ways to specify this last element “<Databases>”. If you want to include both user databases and system databases, you would specify this element like so:

<Databases UseSystemDatabases="true" UseUserDatabases="true" />

If you only wanted to gather statistics for the system databases, you could just specify only the “UseSystemDatabases” parameter. If you should want to collect statistics for only a couple of databases, you would specify something similar to the following:


If I used the above for my <Databases> element then I would be telling the data collection process to only collect statistics for the databases “MyDB1” and “MyDB2”. As you can see, there are a number of different ways to identify the databases from which you want to collect statistics.

Once I have run the data collection script above, my custom Data Collection will be created, but this doesn’t mean SQL Server will be start collecting my statistics. I must first start the collection in order for it to start collecting information. To start it I can run the following command:

EXEC msdb.dbo.sp_syscollector_start_collection_set @name = 'Number of Rows'

Alternatively, I can start my Data Collection in SSMS by right clicking on the Data Collection item in the Object Explorer pane and selecting the “Start Data Collection Set” option.

Viewing Information in the MDW

After my Data Collection has run the first time, I can view the information it collected in the MDW. To view the data I can run the following simple T-SQL SELECT statement:

SELECT * FROM MDW.custom_snapshots.IndexUsage

This SELECT statement will display the statistics that my custom Data Collection has gathered and place in my IndexUsage table within “custom_snapshots” schema in the MDW database. If this table doesn’t exist when my custom Data Collection is run the first time, it will be created.

All custom Data Collections create their MDW tables in the “custom_snapshots” schema. It may take a few moments the first time you run your custom Data Collection for this object to be created in the MDW.

Collecting Information

Creating custom Data Collections in SQL Server 2008 allows you to gather and manage information about your SQL Server Instances. The amount of effort it takes to set up a Data Collection is minimal. Microsoft has built in all the SSIS components to gather, populate and purge information from the MDW for you. By using custom Data Collections, DBAs can now easily create a method to gather the kinds of statistics they need to better manage their SQL Server environment.

» See All Articles by Columnist Gregory A. Larsen

MS SQL Archives

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM