By Paul C. Zikopoulos and Roman B. Melnyk
The Version 8.1 release of IBM DB2 Universal Database (DB2 UDB) late last year came with the ability to start a new data stripe on the underlying storage units. In this article, we introduce you to this new feature and the problems that it solves, and take you through a step-by-step example of how it works.
Mapping Data to Storage 101
Before we discuss this new enhancement, let’s review the basics of the DB2 UDB storage model. Although we assume that you have a good handle on this topic already, a quick summary might be helpful.
To be useful, the data in a database must be persistent – that is the whole point of data management. The DB2 UDB object that is used to map the logical schema of the database to disk is called a table space. (If you come from a Microsoft SQL Server background, you would call this object a data file.) A table space is mapped to storage objects using containers. For example, a file, directory or raw device can act as a container for a DB2 UDB table space.
DB2 UDB has two types of table spaces: system-managed space (SMS) table spaces and database-managed space (DMS) table spaces.
- SMS table spaces are managed by the operating system and, for the most part, allocate and deallocate space as needed. This type of table space uses a directory on the operating system to store database objects.
- DMS table spaces are managed by the database manager. The storage objects that are attached to a DMS table space can be files or raw devices. DMS table spaces come with more management overhead, because you have to define the size of the containers.
Generally, those who want to keep the database schema and its maintenance as simple as possible favor SMS table spaces because their storage allocations are automatic and are handled by the underlying operating system. Those who are willing to add a little more complexity to their environment favor DMS table spaces, which have the following advantages:
- They let you add new containers (underlying data storage objects) easily without the need for redirected restore operations.
- They let you place specific database objects on specific containers such as, for example, placing indexes on faster disks, and rarely accessed large objects on slower disks.
- They avoid operating system overhead when accessing storage objects; DMS table spaces are generally thought to deliver 15 to 30% better performance than their SMS counterparts.
Pages in a DMS table space are logically numbered from 0 to n-1, where n is the number of usable pages in the table space. Pages are grouped into extents, and an extent is a container space that is allocated to a single database object. If a DMS table space has several containers, and they are all of equal size, the first extent is located in the first container, the second extent is located in the second container, and so on. After the last container, the process repeats in a round-robin fashion, starting with the first container (see Figure 1). A layer of extents across the set of the containers is called a stripe. If all the stripes in a table space contain the same set of containers, the resulting configuration represents a single range.
Figure 1. A DMS table space with three containers of equal size. Extents are grouped into logical units called stripes. This is a single range table space.
Adding More Storage to a DMS Table Space
You can add more containers to a DMS table space when you need to increase the storage capacity of your database. When new containers are added to a table space, a rebalancing of the table space data might occur. Rebalancing is the moving of table space extents from one location to another in an attempt to keep the data striped (Figure 2). It is a good idea to spread your data across all the disks that have containers, thereby increasing the parallelism that you can get out of DB2 UDB. Even if you are not running in a symmetric multiprocessing (SMP) environment, DB2 UDB will take advantage of this parallelism.
Figure 2. In DB2 UDB Version 7.2, adding a new container to a table space could result in a rebalancing operation to keep the data evenly distributed among the containers.
The issue in DB2 UDB Version 7.2 was that when you added a new container to the table space (unless this was part of a redirected restore operation), the operation often forced a rebalancing of the data across all the containers (including the new one), such that the data was evenly distributed.
What’s so bad about automatically rebalancing data? After all, the rebalance utility operates online. The truth is that rebalancing a large table space can take a very long time, and the process can have a significant negative impact on performance: in highly concurrent environments, the physical movement of data could cause delays that a database administrator may want to avoid. (We should point out that DB2 UDB Version 8.1.2 adds a throttling feature to the rebalance utility. Using this feature, you can scale back a rebalancing operation so that it will not impact business operations during peak times, and allow it to run at “full throttle” during off-hours.)
In DB2 UDB Version 8.1, you can add new containers to a DMS table space without rebalancing. This is done through the creation of stripe sets. Each stripe set is independent of the other stripe sets in the table space. Because all of the table space data can be found in the existing stripe sets, adding a new stripe set does not require rebalancing (see Figure 3).
<- Stripe set 0
Stripe set 1 ->
Figure 3. Adding containers to a DMS table space without rebalancing.
A table space starts with three containers (C0, C1, and C2) that are each three extents in size (stripe set 0, shaded blue). Two new containers (C3 and C4) are added using the BEGIN NEW STRIPE SET option on the ALTER TABLESPACE statement. These new containers are each five extents in size (stripe set 1, shaded yellow). The stripe sets are independent of each another.