Marin Komadina
Procedure for Adding New Database
Partition
This system has performed very well, until we doubled the data volume which
resulted in increased CPU and memory utilization. This can be solved by adding
more memory and new CPUs. After that, we are going to create a new database
partition on added the hardware and redistribute the user data across all
partitions.
Hardware upgrade
We upgraded the hardware to an additional 2 CPUs, 4 GB RAM and additional
disk space for new database partition.
Add
new database partition
There's
both an online and offline method for adding a new database partition to an existing
database. Since, in both cases we need to restart the database to enable full
Read Write access to the new partition, we are going to use the offline method.
When we add new partition, temporary tablespace is automatically created on the
new partition, and we need to decide where we want it to be located. Possibilities
for temporary tablespace are:
-
NO INDICATION
will use the temporary tablespace container definition from node 0
-
LIKE NODE will
use temporary tablespaces with containers similar to what we already have on
another partition
-
WITHOUT
TABLESPACES will create a partition without temporary tablespace containers
NO
INDICATION is the default value and we will use that option. Now we are ready
to add new partition (2):
$ db2stop
02-10-2003 16:51:13 0 0 SQL1064N DB2STOP processing was successful.
02-10-2003 16:51:14 1 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
To register the new partition (2) we need to change the partition
configuration file:
$ vi db2nodes.cfg
0 ARTIST0A1 0
1 ARTIST0A1 1
2 ARTIST0A1 2 -> new partition (2)
On virtual node 2, we will separately start and create the partition
(2):
$ export DB2NODE=2
$ db2start nodenum 2
02-10-2003 17:08:59 2 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
$ db2 add node
DB20000I The ADD NODE command completed successfully.
The newly
created partition has associated only temporary tablespace:
db2 => list tablespaces
Tablespaces for Current Database
Tablespace ID = 1
Name = TEMPSPACE1
Type = System managed space
Contents = System Temporary data
State = 0x0000
Detailed explanation:
db2 => list tablespace containers for 1
Tablespace Containers for Tablespace 1
Container ID = 0
Name = /export/home/db2inst1/db2inst1/NODE0002/SQL00001/SQLT0001.0
Type = Path
Next we will stop partition 2 and start the whole database:
$ db2stop nodenum 2
02-10-2003 17:12:16 2 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
$ export DB2NODE=0
$ db2start
02-10-2003 17:12:56 1 0 SQL1063N DB2START processing was successful.
02-10-2003 17:12:57 0 0 SQL1063N DB2START processing was successful.
02-10-2003 17:12:57 2 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
The next step is to extend node group GLOBAL to a new partition:
db2 => alter nodegroup global add node (2)
SQL1759W Redistribute nodegroup is required to change data partitioning for
objects in nodegroup "GLOBAL" to include some added nodes or exclude some
dropped nodes. SQLSTATE=01618
db2 => select * from sysibm.sysnodegroupdef
NGNAME NODENUM IN_USE
------------------ ------- ------
IBMCATGROUP 0 Y
IBMDEFAULTGROUP 1 Y
IBMDEFAULTGROUP 0 Y
LOCAL 0 Y
GLOBAL 2 A -> active pending status
GLOBAL 1 Y
Node group GLOBAL is distributed over two
partitions, 1 and 2, and has active pending status. Objects in node group
GLOBAL are still only on node 1.
Note: Node group IBMDEFAULTGROUP is the
default node group for new database objects. It is recommended to extend this
node group over all partitions, however, we are not going to do that in our
test.
Redistribute data
Data redistribution over all defined
partitions in the node group is the last step. Data redistribution must be
applied from the catalog node. We are going to distribute data uniformly using the
default hash function. Other methods for data distribution, (using distribution
file or using target partition map), need better application and data knowledge
than we have in this test.
db2 => redistribute nodegroup GLOBAL uniform
DB20000I The REDISTRIBUTE NODEGROUP command completed successfully. db2 =>
In the
case of tables without a partitioning key, distribution will not succeed and the
partitioning key need not be unique.
Redistribution log
is located on path: ~/sqllib/redist
Data Redistribution Utility
___________________________
The following options have been specified:
Nodegroup name : GLOBAL
Data Redistribution option : U
Redistribute Nodegroup : uniformly
No. of nodes to be added : 1
List of nodes to be added : 2
No. of nodes to be dropped : 0
List of nodes to be dropped :
Delete will be done in parallel with the insert.
The execution of the Data Redistribution operation on:
Begun at Ended at Table
17.38.07 DB2INST1.PART_TEST 17.38.08
--All tables in the nodegroup have been successfully redistributed.-
Successfully finished data distribution will reset the status flag for
the node group:
NGNAME NODENUM IN_USE
------------------ ------- ------
IBMCATGROUP 0 Y
IBMDEFAULTGROUP 0 Y
IBMDEFAULTGROUP 1 Y
LOCAL 0 Y
GLOBAL 2 Y
GLOBAL 1 Y
The PART_TEST table has records equally distributed over partitions in
node group:
PARTITION TOTAL
----------- -----------
1 2560
2 2560
Data
distribution can be checked reading the partition map for table DB2INST1.PART_TEST:
>> db2gpmap -d ARTIST -m tab_distribution.txt -t DB2INST1.PART_TEST
Retrieving the partition map ID using table DB2INST1.PART_TEST
--
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
--
For performance reasons, after redistribution IBM suggests statistics
be updated on the redistributed objects:
db2 => runstats on table db2inst1.part_test with distribution and detailed indexes all shrlevel change
DB20000I The RUNSTATS command completed successfully.
Conclusion
The main
advantage of the DB2 UDB EEE database is optimal scalability. When you have more users, more
data, and complex queries, just add additional hardware and DB2 will scale.
Measurements for different
database operations (load
of all tables, index creation, runstats, single and multiple concurrent delete
and insert operations) on duplicate hardware configurations give us an average
scalability factor of 0,98. It is almost ideal, however, there are still some points
that IBM improve on in the future:
-
Implement
memory sharing between multiple partitions on the same physical node
-
Solve problem
of expensive select
or update on non partitioned key
-
Limit
interprocess communication between partitions on multipartition installation
-
Implement partitioning
models other than hash partitioning
-
Provide real
online database expansion (after adding a new partition, full RW access to the newly
added partition is possible only after database restart)
-
Enable non
blocking data redistribution after adding or dropping partitions
»
See All Articles by Columnist Marin Komadina