IBM Self Help Web Support
Self Help Web Support is IBM's Web knowledge database with
search capability over official IBM support web sites and knowledge databases.
You can find the start page for the Self Help Support at the Web address: www.ibm.com/software/support.
One doesn't need to be a registered user to use this
technical knowledge database.
Some of the articles are protected, and only registered
user (user with valid support key) can access this type of document. On this
Web site, we can search among closed APARs and software fixes. It is my opinion
that this should be the most important source of technical information for
everyday troubleshooting routines, for any DB2 DBA. Although, the IBM support
model has many possibilities, this would be the fastest and quickest way to get
self-help.

Call from the Customer
We had a call from a customer stating that he had a
problem with a production database. The database was a DB2 UDB V7.1 EEE on Sun
Solaris, with two database partitions. The customer was in the middle of work, when the application suddenly crashed with trace information written
in the database dump directory. Unix administrators reported a machine
restart, due to the system error.
A check in the machine system log revealed more
information about the crash:
$ cat /var/adm/messages
Jan 01 18:43:34 ARTIST0 SUNW,UltraSPARC-III+: [ID 266074 kern.warning] WARNING: [AFT1]
Uncorrectable system bus (UE) Event detected by CPU2
User Instruction Access at TL=0, errID 0x001c63a6.bf3009e0
Jan 01 18:43:34 ARTIST0 AFSR 0x00000004.00000131 AFAR 0x00000021.f081cb00
Jan 01 18:43:34 ARTIST0 Fault_PC 0x1cb00 Esynd 0x0131 /N0/SB1/P0/B1
Jan 01 18:43:34 ARTIST0 SUNW,UltraSPARC-III+: [ID 357985 kern.notice] [AFT1] errID
0x001c63a6.bf3009e0 More than four Bits were in error and is fatal: will reboot
The machine had a hardware problem, and was rebooted. We expected
that the database would need to be recovered due the abnormal end of db2
processes. In the database log however, we could not find any information indicating
the database had restarted or crashed.
The hardware problem with the machine was soon solved, and
the machine was up and running. A system check showed that
there was no main DB2 system process, indicating that the DB2 database was not
running. Diagnose information from the database message log
(db2diag.log) indicated a problem with the database automatic recovery
procedure. Here are the extracted messages from the db2diag.log:
Crash Recovery is needed.
-> Crash recovery was started
Crash recovery has been initiated. Lowtran LSN is "0008DB961070", Minbuff LSN is
"0008E02AC2CC".
Using parallel recovery with 3 agents 7 QSets 96 queues and 8 chunks
Forward phase of crash recovery has completed. Next LSN is "0008E257133C".
-> Rollforward finished
2003-01-01-18.43.35.640690 Instance:db2inst1 Node:000
PID:10061(db2loggr 0) Appid:none
data_protection sqlpgarl Probe:120
Bp 13212000, blkOffSet 9911, ReadCount 81000 0000 0000 0034 ffff ffff 0000 0000
.......4........
0000 0000 0054 0050 0053 0005 0000 2001 .....T.P.S.... .
4942 4d4c 4f47 0008 deff 0000 0000 0196 IBMLOG..........
0000 2710 0000 26bc 3db5 4b2e 3d4a 7cec ..'...&.=.K.=J|.
3db5 4b2f 0000 0000 0000 0000 0000 0000 =.K/............
0000 0000 0000 0000 0000 0000 0000 0000 ................
0000 0000 0000 0000 0053 0000 .........S..
2003-01-01-18.43.35.783967 Instance:db2inst1 Node:000
PID:10061(db2loggr 0) Appid:none
data_protection sqlpgarl Probe:120
DIA3806C Unexpected end of file was reached.
ZRC=0xFFFFF609
-> Recovery canceled while reading log files
Error -2551 when reading LSN 0008 E16A B2C7 from log file S0000406.LOG
LSN being undone: 0008 e16a b2c7 ...j..
In-doubt transaction(s) exists at the end of crash recovery.
-> Transactions was not cleared
Crash recovery completed. Return Code = "-2551"
-> Recovery finished unseccussfully
Recovery started on log file: 5330 3030 3034 3034 2e4c 4f47 S0000404.LOG
-> Last touched log file S0000404.LOG
Restart failed with sqlcode: ffff fbee ....
Dirty BDS CB at agentActivationTerm! Correcting.
BDS CB before cleanup = 1244 4620 0000 0002
Marking the database bad.
-> Database marked as bad
The recovery finished without success. The database was
rolled forward and then marked bad, due the "Unexpected end of file was
reached" error.