Tip 5. TSM
Communication problems
A period
of successful TSM usage followed by a of series unsuccessfully backup operations,
raises the standard question: "What went wrong if nothing has changed?"
Rechecking the environment and database configuration parameters leads to the conclusion
that only the network configuration between database server and the TSM server
has been changed. The following error message has occurred during a regular
online database backup on the TSM:
db2 => backup db artist online use tsm
SQL2025N An I/O error "-50" occurred on media "TSM".
Listing 10: Backup error condition
Return
codes from Tivoli Storage Manager APIs describe this problem as a TCP/IP communications
failure:
cat /opt/tivoli/tsm/client/api/bin/sample/dsmrc.h |
grep -50
#define DSM_RC_TCPIP_FAILURE
-50 /* TCP/IP communications failure */
Listing 11: TSM error code explanation
In order to
find more detailed information about the error requires the TSM API tracing files.
TSM API tracing is enabled using traceflags and tracefile configuration
entries in the dsm.opt configuration file.
# cat dsm.opt
SERVERNAME TESTTSM001
traceflags service api
tracefile /tmp/artist_tracing.log
Listing 12: Enabling TSM API tracing
Possible
sources of the problem might be:
- a problem
with some database configuration parameters
- a password
problem between the TSM server, TSM API and DB2
- a problem
with the TSM server configuration
- a problem
in the network infrastructure connecting the TSM server and the database server
After
enabling communication tracing and a series of connectivity tests, this problem
has shown up. There were infrastructure changes on the network and the DB2 database
server had been disconnected from a fast 100MB and re-connected to a slower
10MB network segment. This resulted in communication between the DB2 database
and the TSM server having a longer delay than before and backup failed to
finish. Luckily, we have some parameters at our disposal for fine communication
tuning:
adsm> q opt
Server Option Option Setting
----------------- --------------------
CommTimeOut 900 (-> to 1800 )
Listing 13: Changing the TSM server CommTimeOut parameter
A change has
been made to the CommTimeOut parameter. The value has been extended from 900
seconds to 1800 seconds. From the TSM Administrators Reference Guide, the CommTimeOut
parameter has the following description:
"CommTimeOut
- Specifies how long the server waits (in seconds) for an expected client
message during an operation that causes a database update. If the length of
time exceeds this time-out, the server ends the session with the client. You
may want to increase the time-out value to prevent clients from timing out if
|there is a heavy network load in your environment or client will be backing up
large files. "
Tip 6. Checking
Backup on the TSM server
In the DB2 version 7.1, IBM offered a new utility,
db2ckbkp. This utility is used to:
-
test the
integrity of a backup image and search for possible corruptions
-
display
information that is stored in the backup header
-
display
information about the objects and the log file header in the backup image
Detecting an unusable backup directly on the TSM could
save precious DBA time. However, system utility db2ckbkp has one small feature,
it cannot be used to check a backup on the TSM server. The DBA has to restore
the whole backup file from the TSM server on the local filesystem, and than
check it with db2ckbkp utility. Checking a TSM backup file for possible corruptions, using the db2ckbkp
utility:
$ db2ckbkp ARTIST.0.artist.NODE0000.CATN0000.20040125010545.001
[1] Buffers processed: ###############################################################################################
Image Verification Complete - successful.
Listing 14: db2ckbkp system utility
IBM has acknowledged
that the db2adutil system command should be used for checking a database backup
on the TSM server. An example of the TSM backup check:
$ db2adutl VERIFY FULL TAKEN AT 20040125010545.000
Query for database ARTIST
Retrieving FULL DATABASE BACKUP information. Please wait.
FULL DATABASE BACKUP image:
./ARTIST.0.artist.NODE0000.CATN0000. 20040125010545.000, Node: 0
Do you wish to verify this image (Y/N)?
Read 4194304 bytes, assuming we are at the end of the image
Image Verification Complete - successful.
Listing 15: TSM related database parameters, with
enabled TSM_PASSWORD
From the IBM
documentation:
Verify
option performs consistency checking on the backup copy that is on the server.
This parameter causes the entire backup image to be transferred over the
network.
The whole
image will be read from the TSM server into a local memory buffer. (Not the
whole
image at once, but piece by piece). Only a temporary file was written to the
local disk. I have been testing this option, which is extremely useful, and did
not find any problems even with large backup files. Taking measurements, some large
backup files (100 GB), required only 10 minutes for checking. After testing
with several files different sizes, I doubt that a backup file is entirely
transferred to the local filesystem, as IBM documentation states. Nevertheless,
this method is fully functional.
Conclusion
The explained
situations reflect things that might shorten the TSM learning path. The TSM
backup system is very powerful and very well suited. A DBA needs to test the TSM
recovery process so that in the event a recovery is necessary, it will not be the
first time. Half of the battle is knowing what features are available, and the
other half is testing.
Related Articles:
»
See All Articles by Columnist Marin Komadina