SQL Server 2008 Data Collections and the Management Data Warehouse

SQL Server 2008 was finally
released in August 2008. This new version is packed with lots of new
features. One of those new features is Data Collections. Along with that new
feature comes the Management Data Warehouse (MDW). Data Collections are a
great tool to allow DBAs to gather different statistics, and the MDW is the
repository for storing those statistics. In this article, I will give you an overview
of Data Collections and how to configure the MDW.

What is a Data Collection?

A Data Collection container
contains a definition for a set of statistical data you would like to collect,
when to collect them and how long to keep them. This definition uses the data
collector process to gather and store this information in the MDW. The data
collector uses SSIS and SQL Agent jobs to accomplish this. .

There are four different
data collector types available; they are T-SQL Query, SQL Trace, Performance
Counters, and Query Activity. Each of these collector types have a series of
SSIS packages to collect and manage the data collected within a data collection.
When you define a data collection, you identify the collector type you want to
use, the SQL Server instance from which to collect data, the mode it should run
in, and how long you want to keep the data. From the data collection definition,
one or more SQL Agent jobs are created. These jobs will use the SSIS packages
associated with the data collector type to both gather and manage the data
collection data that is stored it in the MDW.

Data collections can be run
manually, on a schedule, or can be run continually. Manual and schedule data
collections collect and upload data into the MDW on the same schedule. These
types of data collections are known as non-cached collections. When a data collection
runs continually data is cached in a directory, and then periodically uploaded
to the MDW. These are known as cached collections. Because the data
collection process uses SSIS and SQL Agent, you need to make sure you install
SSIS and start SQL Agent if you plan to use data collections.

Configuring the Management Data Warehouse and the Data Collection Component

The data collected by Data
Collections needs to be stored somewhere. That somewhere is the MDW. The MDW
is just a database located on a local SQL Server 2008 instance. In fact,
before you can define and use a Data Collection on a given SQL Server 2008
instance you must have a configured MDW that those Data Collections can point
to. The MDW is just a database on a SQL Server 2008 instance. To configure the
MDW, expand the “Management” folder in SQL Server 2008’s Management Studio
Object Explorer view, and right click on the “Data Collection” item. Doing so
will bring up the following screen:

configure the MDW, expand the

Here you want to select the “Configure Management Data Warehouse” item. When you do that, the following welcome
screen will be displayed:

Configure Management Data Warehouse Wizard

Here you just click the next
button. Upon doing that, the following screen is displayed:

Select configuration task

Here you have two options.
One is to configure the MDW and the other is to set up the data collection
component. If you plan to collect statistics in a MDW on your instance, you
will need to setup a MDW prior to setting up a data collection. The “Set up
data collection” option is to set up the data collection component to write to
an existing MDW. Since I am setting up a MDW on my existing instance I select
the first option and click on “Next>”. When I do this, the following screen
is displayed:

Configure Management Data Warehouse Storage

Here I need to identify a
database for my MDW. You are not allowed to change the “Server Name” on this
screen. It is automatically set to the instances you are connected to when you
started the configuration wizard. To identify a database I can either select
an existing database by using the drop down menu items, or I can click on the
“New” button. In my case, I want to create a new database. When I click on the
“New” button, the standard new database dialog is displayed to allow me to
create a database. In my case, I created a database named “MDW”.

After I have created my new
MDW database the “Next>” button on the above screen becomes available, and
the “Database Name” field will be populated with the database I created “MDW”.
At that point I click on the “Next>” button to continue with setting up my
MDW. Upon doing this, the following screen is displayed:

Map Logins and Users

On this screen, I’m allowed
to set up some security rights on my MDW. Here I could add logins to my MDW
database and provide them with rights. The “mdw_admin” role gives users read
and write access to the MDW and allows them to run purge and cleanup jobs to
manage the amount of data in the MDW. The “mdw_reader” role provides users
with read only access. You might give someone these rights if you wanted them
to generate some report based on the statistics in the MDW. Lastly, there is
the “mdw_writer” role that allows users with these permissions to write and
upload data to the MDW. You don’t need to decide on whether or not to give
rights to any users at this point if you don’t want to, you can always grant
access later. Since I have sysadmin permissions on my instance I don’t need to
add any additional permissions since sysadmin rights will give me all the
permissions I need to manage my MDW database. Clicking next takes me to the
wizard completion window below:

Complete the Wizard

Here I get a summary of what
I selected. Clicking “Finish” completes the management data warehouse
installation process. This process will take a couple of minutes, while SQL
Server configures the MDW. Once the configuration is completed, the following
screen is displayed:

Configure Data Collection Wizard Progress

At this point, all that has
been configured is the MDW database. The Data Collection component still needs
to be configured before I can start running system data collections or define
my own data collections. To start the configuration of the Data Collection
component I need to right click on the “Data Collection” item under the
“Management” item in SQL Server Management studio just as I did when I created
the MDW. Once I get past the welcome screen to the “Select configuration task”
display I need to select the “Set up data collection” radio button as I have
done on the following screen shot:

Select configuration task

When I click “Next>” here
the following screen is displayed:

Configure Management Data Warehouse Storage

On this screen, I first need
to identify the location of my MDW. I need to use the ellipse (“…”) button to
select a “Server Name” and “Database Name”. The server name for me would be my
local instance since I just installed my MDW on my local instance. Note if
you have multiple SQL Server instances you could configure the data collection
processes on each of your instances to use a central MDW repository. If you do
that then on this screen you would select the instance and database where you
configured your consolidated repository. In fact, Microsoft recommends you do
this to minimize the impact of the data collection process.

Also on this screen, I need
to identify where I want to place my cache directory for my data collections.
The cache directory is a file directory where data collection information will
be stored temporarily prior to a scheduled uploaded into the MDW. The caching
mechanism improves performance of the data collection process for those data
collections that are continuously collecting information. Once I’ve completed
the storage location information my screen looks like this:

Configure Management Data Warehouse Storage

Clicking the “Next>”
button takes me to the following screen:

Complete the Wizard

Here I get a summary of how
I have configured my data collections. When I click “Finish” on this screen,
the data collection configuration will start. Once the configuration is
completed the following screen is displayed:

Configure Data Collection Wizard Progress

At this point, I am done
configuring my MDW and Data Collection component.

System Data Collection

When you go through the MDW
wizard to set up an instance to start using data collections, three different
system data collections are created. They are “Disk Usage”, “Query Statistics”
and “Server Activity”. The Disk Usage data collection collects disk space
usage information related to each database. This data collection runs by default
every 6 hours and captures the disk space information for both Data and Log
files associated with each database. You can use the information from this
data collection to produce disk space trend reports for your database. This
kind of data is useful when you are doing capacity planning. Another system
data collection is the “Query Statistics” data collection. This data
collection gathers information about the different queries that are run against
your server. Note that not all queries are captured. Books Online documents
the criteria for when T-SQL queries are captured by this data collection. You
can use the query statistics to help identify queries that use large amounts of
resources. Having this information helps your performance and tuning efforts
focus in on the queries that are having the greatest impact on resource usage.
The last system data collection is “Server Activity”. This data collection
collects system related information like CPU, memory and other system related
performance counters. You can use the information this data collection gathers
to produce trend reports that show resources usage information over time. This
information will help identify server resource bottlenecks.

Here is a screen shot that
shows you where to find the different system data collections in SQL Server
Management Studio’s Object Explorer:

where to find the different system data collections in SQL Server Management Studio's Object Explorer

Value of the Management Data Warehouse and Data Collections

The MDW and the Data Collection
containers provide DBAs with an easy mechanism to obtain statistics over time.
The data collection process relies on prebuilt SSIS routines and SQL Agent job,
which minimize the amount of things the DBA needs to build to collect and
maintain a database that contains performance statistics. With the predefined
system data collections a DBA has to do very little work at all to start
collecting statistics. By implementing the MDW and Data Collections, DBAs now
have the tools necessary to monitor performance and do trend analysis for the
SQL Server 2008 instances they manage.


See All Articles by Columnist
Gregory A. Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles