SQL Server 2008 Data Collections and the Management Data Warehouse
September 29, 2008
SQL Server 2008 was finally released in August 2008. This new version is packed with lots of new features. One of those new features is Data Collections. Along with that new feature comes the Management Data Warehouse (MDW). Data Collections are a great tool to allow DBAs to gather different statistics, and the MDW is the repository for storing those statistics. In this article, I will give you an overview of Data Collections and how to configure the MDW.
What is a Data Collection?
A Data Collection container contains a definition for a set of statistical data you would like to collect, when to collect them and how long to keep them. This definition uses the data collector process to gather and store this information in the MDW. The data collector uses SSIS and SQL Agent jobs to accomplish this. .
There are four different data collector types available; they are T-SQL Query, SQL Trace, Performance Counters, and Query Activity. Each of these collector types have a series of SSIS packages to collect and manage the data collected within a data collection. When you define a data collection, you identify the collector type you want to use, the SQL Server instance from which to collect data, the mode it should run in, and how long you want to keep the data. From the data collection definition, one or more SQL Agent jobs are created. These jobs will use the SSIS packages associated with the data collector type to both gather and manage the data collection data that is stored it in the MDW.
Data collections can be run manually, on a schedule, or can be run continually. Manual and schedule data collections collect and upload data into the MDW on the same schedule. These types of data collections are known as non-cached collections. When a data collection runs continually data is cached in a directory, and then periodically uploaded to the MDW. These are known as cached collections. Because the data collection process uses SSIS and SQL Agent, you need to make sure you install SSIS and start SQL Agent if you plan to use data collections.
Configuring the Management Data Warehouse and the Data Collection Component
The data collected by Data Collections needs to be stored somewhere. That somewhere is the MDW. The MDW is just a database located on a local SQL Server 2008 instance. In fact, before you can define and use a Data Collection on a given SQL Server 2008 instance you must have a configured MDW that those Data Collections can point to. The MDW is just a database on a SQL Server 2008 instance. To configure the MDW, expand the Management folder in SQL Server 2008s Management Studio Object Explorer view, and right click on the Data Collection item. Doing so will bring up the following screen:
Here you want to select the Configure Management Data Warehouse item. When you do that, the following welcome screen will be displayed:
Here you just click the next button. Upon doing that, the following screen is displayed:
Here you have two options. One is to configure the MDW and the other is to set up the data collection component. If you plan to collect statistics in a MDW on your instance, you will need to setup a MDW prior to setting up a data collection. The Set up data collection option is to set up the data collection component to write to an existing MDW. Since I am setting up a MDW on my existing instance I select the first option and click on Next>. When I do this, the following screen is displayed:
Here I need to identify a database for my MDW. You are not allowed to change the Server Name on this screen. It is automatically set to the instances you are connected to when you started the configuration wizard. To identify a database I can either select an existing database by using the drop down menu items, or I can click on the New button. In my case, I want to create a new database. When I click on the New button, the standard new database dialog is displayed to allow me to create a database. In my case, I created a database named MDW.
After I have created my new MDW database the Next> button on the above screen becomes available, and the Database Name field will be populated with the database I created MDW. At that point I click on the Next> button to continue with setting up my MDW. Upon doing this, the following screen is displayed:
On this screen, Im allowed to set up some security rights on my MDW. Here I could add logins to my MDW database and provide them with rights. The mdw_admin role gives users read and write access to the MDW and allows them to run purge and cleanup jobs to manage the amount of data in the MDW. The mdw_reader role provides users with read only access. You might give someone these rights if you wanted them to generate some report based on the statistics in the MDW. Lastly, there is the mdw_writer role that allows users with these permissions to write and upload data to the MDW. You dont need to decide on whether or not to give rights to any users at this point if you dont want to, you can always grant access later. Since I have sysadmin permissions on my instance I dont need to add any additional permissions since sysadmin rights will give me all the permissions I need to manage my MDW database. Clicking next takes me to the wizard completion window below:
Here I get a summary of what I selected. Clicking Finish completes the management data warehouse installation process. This process will take a couple of minutes, while SQL Server configures the MDW. Once the configuration is completed, the following screen is displayed:
At this point, all that has been configured is the MDW database. The Data Collection component still needs to be configured before I can start running system data collections or define my own data collections. To start the configuration of the Data Collection component I need to right click on the Data Collection item under the Management item in SQL Server Management studio just as I did when I created the MDW. Once I get past the welcome screen to the Select configuration task display I need to select the Set up data collection radio button as I have done on the following screen shot:
When I click Next> here the following screen is displayed:
On this screen, I first need to identify the location of my MDW. I need to use the ellipse ( ) button to select a Server Name and Database Name. The server name for me would be my local instance since I just installed my MDW on my local instance. Note if you have multiple SQL Server instances you could configure the data collection processes on each of your instances to use a central MDW repository. If you do that then on this screen you would select the instance and database where you configured your consolidated repository. In fact, Microsoft recommends you do this to minimize the impact of the data collection process.
Also on this screen, I need to identify where I want to place my cache directory for my data collections. The cache directory is a file directory where data collection information will be stored temporarily prior to a scheduled uploaded into the MDW. The caching mechanism improves performance of the data collection process for those data collections that are continuously collecting information. Once Ive completed the storage location information my screen looks like this:
Clicking the Next> button takes me to the following screen:
Here I get a summary of how I have configured my data collections. When I click Finish on this screen, the data collection configuration will start. Once the configuration is completed the following screen is displayed:
At this point, I am done configuring my MDW and Data Collection component.
System Data Collection
When you go through the MDW wizard to set up an instance to start using data collections, three different system data collections are created. They are Disk Usage, Query Statistics and Server Activity. The Disk Usage data collection collects disk space usage information related to each database. This data collection runs by default every 6 hours and captures the disk space information for both Data and Log files associated with each database. You can use the information from this data collection to produce disk space trend reports for your database. This kind of data is useful when you are doing capacity planning. Another system data collection is the Query Statistics data collection. This data collection gathers information about the different queries that are run against your server. Note that not all queries are captured. Books Online documents the criteria for when T-SQL queries are captured by this data collection. You can use the query statistics to help identify queries that use large amounts of resources. Having this information helps your performance and tuning efforts focus in on the queries that are having the greatest impact on resource usage. The last system data collection is Server Activity. This data collection collects system related information like CPU, memory and other system related performance counters. You can use the information this data collection gathers to produce trend reports that show resources usage information over time. This information will help identify server resource bottlenecks.
Here is a screen shot that shows you where to find the different system data collections in SQL Server Management Studios Object Explorer:
Value of the Management Data Warehouse and Data Collections
The MDW and the Data Collection containers provide DBAs with an easy mechanism to obtain statistics over time. The data collection process relies on prebuilt SSIS routines and SQL Agent job, which minimize the amount of things the DBA needs to build to collect and maintain a database that contains performance statistics. With the predefined system data collections a DBA has to do very little work at all to start collecting statistics. By implementing the MDW and Data Collections, DBAs now have the tools necessary to monitor performance and do trend analysis for the SQL Server 2008 instances they manage.