Keep on top of corruption in your Oracle database with
health checks.
The old saying, there are 100
ways to…. And in keeping with that saying, Oracle, in Release 11g, has introduced
database health checks under a framework called health monitor. This health
monitor framework is designed to run diagnostic checks on your Oracle database.
It will look at different components of the database—checking for corruption,
both physical and logical. While not 100 ways to…, Oracle has, as we have
looked at in the last few articles on ADR, also tried to implement and
interface into the health checks with ADR. Nevertheless, as we will see, it
might not be the best way to do this quite yet.
So, do you have datafile, block, undo, redo, dictionary, or xyz corruption
in your database? You might actually be running just fine and not even know it.
As we will see, in the following example(s), I have a fresh install of Oracle
and it has corruption in it. I’d really like to track that down; but that is
for another day. When running health checks, you, as a DBA, can decide to run
them adhoc and in a totally reactive mode where someone might complain about an
error message or you might see an Oracle alert that begs the question for one
of these health checks to be run. You can also run health checks using the
DBMS_HM PL/SQL package. Depending on how these run on your system, how long
they take, and how often you see corruption, you might want to schedule these
as a job. However, please take note that this is a new feature and with all new
features, you should be 100% confident they will not cause corruption or lock
your system. Just test and monitor those runs appropriately. Also note that
these health checks can be run with the database online (OPEN or MOUNT) and
available, or when the database is offline (NOMOUNT). As there are multiple
types of health checks, there are also various health checks that can only be
run with the database offline. For instance, the redo integrity check can only
be run offline, which makes complete sense when you think of it. Without going
into all of the different corruption/integrity checks, you can use the
following SQL to look at them. It gives you a good idea as to which checks can
be run while the database is online or offline.
SQL> SELECT name,internal_check,offline_capable,description FROM v$hm_check;
NAME I O DESCRIPTION
—————————— – – ———————————————-
HM Test Check Y Y Check for HM Functionality
DB Structure Integrity Check N Y Checks integrity of all database files
Data Block Integrity Check N Y Checks integrity of a datafile block
Redo Integrity Check N Y Checks integrity of redo log content
Logical Block Check Y N Checks logical content of a block
Transaction Integrity Check N N Checks a transaction for corruptions
Undo Segment Integrity Check N N Checks integrity of an undo segment
All Control Files Check Y Y Checks all control files in the database
CF Member Check Y Y Checks a multiplexed copy of the control file
All Datafiles Check Y Y Check for all datafiles in the database
Single Datafile Check Y Y Checks a datafile
Log Group Check Y Y Checks all members of a log group
Log Group Member Check Y Y Checks a particular member of a log group
Archived Log Check Y Y Checks an archived log
Redo Revalidation Check Y Y Checks redo log content
IO Revalidation Check Y Y Checks file accessability
Block IO Revalidation Check Y Y Checks file accessability
Txn Revalidation Check Y N Revalidate corrupted txn
Failure Simulation Check Y Y Creates dummy failures
Dictionary Integrity Check N N Checks dictionary integrity
Because some of the health
checks require additional input parameters, for the sake of simplicity, I have
decided to show just one where there are no input parameters required. If you
would like to see the checks that require parameters, you can run the following
SQL I found in the Oracle documentation:
SELECT c.name check_name, p.name parameter_name,
p.type, p.default_value, p.description
FROM v$hm_check_param p, v$hm_check c
WHERE p.check_id = c.id and c.internal_check = ‘N’
ORDER BY c.name;
I don’t know how many times I’ve encountered dictionary problems–everything
from erroneous information in the dictionary tables to locking problems–so I
am semi-excited about the Dictionary Integrity Check. The dictionary integrity
check will take a look at the integrity of core dictionary objects, including: tab$
, clu$
,
fet$
, uet$
, seg$
,
undo$
, ts$
, file$
,
obj$
, ind$
, icol$
,
col$
, user$
, con$
, cdef$
,
ccol$
, bootstrap$
, objauth$
, ugroup$
, tsq$
,
syn$
, view$
, typed_view$
, superobj$
,
seq$
, lob$
, coltype$
,
subcoltype$
, ntab$
, refcon$
, opqtype$
,
dependency$
, access$
, viewcon$
, icoldep$
,
dual$
, sysauth$
, objpriv$
, defrole$
,
and ecol$
. That’s a nice
long list as far as I’m concerned and the check should take a look at the
contents of dictionary entries, constraints, and referential integrity.
To run the dictionary integrity check manually, we need to issue the
following DBMS_HM.RUN_CHECK PL/SQL package—giving the additional name for the
run. In this example, I’ll name my run DICTIONARY_CHECK_11032008.
BEGIN
DBMS_HM.RUN_CHECK(‘Dictionary Integrity Check’,
‘DICTIONARY_CHECK_11032008’);
END;
/
Now this is where
it gets a bit interesting. The DBMS_HM.RUN_CHECK stores its information in ADR
so you can either look at a report using the DBMS_HM PL/SQL package or the
ADRCI utility. To me, this isn’t much of a choice as the DBMS_HM PL/SQL package
will output the report in HTML, XML, or text format. The ADRCI utility only
outputs the report in XML. However, we will attack this from both so that we
can see where the deficiencies in ADR exist and where ADR and DBMS_HM cross.
Since the results of the DBMS_HM.RUN_CHECK stores its information in ADR, we
can easily see the HM_RUN by issuing the following ADRCI command.
adrci> show hm_run
ADR Home = /opt/app/oracle/diag/rdbms/db11fs/db11FS:
*****************************************************
**********************************************************
HM RUN RECORD 3
**********************************************************
RUN_ID 41
RUN_NAME DICTIONARY_CHECK_11032008
CHECK_NAME Dictionary Integrity Check
NAME_ID 24
MODE 0
START_TIME 2008-11-03 18:32:10.819805 -07:00
RESUME_TIME <NULL>
END_TIME 2008-11-03 18:32:18.827387 -07:00
MODIFIED_TIME 2008-11-03 18:32:18.827387 -07:00
TIMEOUT 0
FLAGS 0
STATUS 5
SRC_INCIDENT_ID 0
NUM_INCIDENTS 0
ERR_NUMBER 0
REPORT_FILE <NULL>
The most notable piece of information, or lack of information, is the
REPORT_FILE that is <NULL>. This column is only populated when a report
is created. The creation of a report can be done through the DBMS_HM package or
through the ADRCI CREATE REPORT command.
For the DBMS_HM package use:
SET LONG 100000
SET LONGCHUNKSIZE 1000
SET PAGESIZE 1000
SET LINESIZE 512
SELECT DBMS_HM.GET_RUN_REPORT(‘DICTIONARY_CHECK_11032008’) FROM DUAL;
For the ADRCI CREATE REPORT command use:
CREATE REPORT hm_run DICTIONARY_CHECK_11032008
When using the DBMS_HM package the output looks like:
DBMS_HM.GET_RUN_REPORT(‘DICTIONARY_CHECK_11032008’)
——————————————————————
Basic Run Information
Run Name : DICTIONARY_CHECK_11032008
Run Id : 41
Check Name : Dictionary Integrity Check
Mode : MANUAL
Status : COMPLETED
Start Time : 2008-11-03 18:32:10.819805 -07:00
End Time : 2008-11-03 18:32:18.827387 -07:00
Error Encountered : 0
Source Incident Id : 0
Number of Incidents Created : 0
Input Paramters for the Run
TABLE_NAME=ALL_CORE_TABLES
CHECK_MASK=ALL
Run Findings And Recommendations
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 42
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: file$ pk 42 on object FILE$
failed
Message : Damaged rowid is AAAAARAABAAAAByAAF – description: No further
damage description available
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 45
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: dependency$.dobj# fk 126 on
object DEPENDENCY$ failed
Message : Damaged rowid is AAAABnAABAAAOiHABI – description: No further
damage description available
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 48
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: dependency$.dobj# fk 126 on
object DEPENDENCY$ failed
Message : Damaged rowid is AAAABnAABAAAQXqAA6 – description: No further
damage description available
After running either the DBMS_HM package or the ADRCI CREATE REPORT command,
we can reissue the ADRCI show hm_run command and now the REPORT_FILE
information is populated.
REPORT_FILE /opt/app/oracle/diag/rdbms/db11fs/db11FS/hm/HMREPORT_DICTIONARY_CHECK_11032008.hm
When using the ADRCI utility, you must first run a command to generate the
report file if it does not exist, and then run another command to display its
contents. You can also view the report through ADRCI by issuing the following
command. Because the output is in XML it will not be shown here.
show report hm_run DICTIONARY_CHECK_11032008
Regardless of the method used,
these checks are going to be a vital piece of information to investigate
corruption in an Oracle database. While I have not performed any elaborate
testing on timing, locking, or performance issues with these I will caution
their use. Anytime corruption checks are run, there has to be some form of
locking on the objects. It will only take time to watch these procedures run
and the actual effects they have on a database. But then again, if you have
problems, don’t you want to find them? Just another feather in the DBA hat to
play with.