Oracle Health Checks with Health Monitor; Using PL/SQL or ADRNovember 6, 2008 Keep on top of corruption in your Oracle database with health checks. The old saying, there are 100 ways to.... And in keeping with that saying, Oracle, in Release 11g, has introduced database health checks under a framework called health monitor. This health monitor framework is designed to run diagnostic checks on your Oracle database. It will look at different components of the databasechecking for corruption, both physical and logical. While not 100 ways to..., Oracle has, as we have looked at in the last few articles on ADR, also tried to implement and interface into the health checks with ADR. Nevertheless, as we will see, it might not be the best way to do this quite yet. So, do you have datafile, block, undo, redo, dictionary, or xyz corruption in your database? You might actually be running just fine and not even know it. As we will see, in the following example(s), I have a fresh install of Oracle and it has corruption in it. Id really like to track that down; but that is for another day. When running health checks, you, as a DBA, can decide to run them adhoc and in a totally reactive mode where someone might complain about an error message or you might see an Oracle alert that begs the question for one of these health checks to be run. You can also run health checks using the DBMS_HM PL/SQL package. Depending on how these run on your system, how long they take, and how often you see corruption, you might want to schedule these as a job. However, please take note that this is a new feature and with all new features, you should be 100% confident they will not cause corruption or lock your system. Just test and monitor those runs appropriately. Also note that these health checks can be run with the database online (OPEN or MOUNT) and available, or when the database is offline (NOMOUNT). As there are multiple types of health checks, there are also various health checks that can only be run with the database offline. For instance, the redo integrity check can only be run offline, which makes complete sense when you think of it. Without going into all of the different corruption/integrity checks, you can use the following SQL to look at them. It gives you a good idea as to which checks can be run while the database is online or offline. SQL> SELECT name,internal_check,offline_capable,description FROM v$hm_check; NAME I O DESCRIPTION ------------------------------ - - ---------------------------------------------- HM Test Check Y Y Check for HM Functionality DB Structure Integrity Check N Y Checks integrity of all database files Data Block Integrity Check N Y Checks integrity of a datafile block Redo Integrity Check N Y Checks integrity of redo log content Logical Block Check Y N Checks logical content of a block Transaction Integrity Check N N Checks a transaction for corruptions Undo Segment Integrity Check N N Checks integrity of an undo segment All Control Files Check Y Y Checks all control files in the database CF Member Check Y Y Checks a multiplexed copy of the control file All Datafiles Check Y Y Check for all datafiles in the database Single Datafile Check Y Y Checks a datafile Log Group Check Y Y Checks all members of a log group Log Group Member Check Y Y Checks a particular member of a log group Archived Log Check Y Y Checks an archived log Redo Revalidation Check Y Y Checks redo log content IO Revalidation Check Y Y Checks file accessability Block IO Revalidation Check Y Y Checks file accessability Txn Revalidation Check Y N Revalidate corrupted txn Failure Simulation Check Y Y Creates dummy failures Dictionary Integrity Check N N Checks dictionary integrity Because some of the health checks require additional input parameters, for the sake of simplicity, I have decided to show just one where there are no input parameters required. If you would like to see the checks that require parameters, you can run the following SQL I found in the Oracle documentation:
SELECT c.name check_name, p.name parameter_name,
p.type, p.default_value, p.description
FROM v$hm_check_param p, v$hm_check c
WHERE p.check_id = c.id and c.internal_check = 'N'
ORDER BY c.name;
I dont know how many times Ive encountered dictionary problems--everything
from erroneous information in the dictionary tables to locking problems--so I
am semi-excited about the Dictionary Integrity Check. The dictionary integrity
check will take a look at the integrity of core dictionary objects, including: To run the dictionary integrity check manually, we need to issue the following DBMS_HM.RUN_CHECK PL/SQL packagegiving the additional name for the run. In this example, Ill name my run DICTIONARY_CHECK_11032008.
BEGIN
DBMS_HM.RUN_CHECK('Dictionary Integrity Check',
'DICTIONARY_CHECK_11032008');
END;
/
Now this is where it gets a bit interesting. The DBMS_HM.RUN_CHECK stores its information in ADR so you can either look at a report using the DBMS_HM PL/SQL package or the ADRCI utility. To me, this isnt much of a choice as the DBMS_HM PL/SQL package will output the report in HTML, XML, or text format. The ADRCI utility only outputs the report in XML. However, we will attack this from both so that we can see where the deficiencies in ADR exist and where ADR and DBMS_HM cross. Since the results of the DBMS_HM.RUN_CHECK stores its information in ADR, we can easily see the HM_RUN by issuing the following ADRCI command. adrci> show hm_run ADR Home = /opt/app/oracle/diag/rdbms/db11fs/db11FS: ***************************************************** ********************************************************** HM RUN RECORD 3 ********************************************************** RUN_ID 41 RUN_NAME DICTIONARY_CHECK_11032008 CHECK_NAME Dictionary Integrity Check NAME_ID 24 MODE 0 START_TIME 2008-11-03 18:32:10.819805 -07:00 RESUME_TIME <NULL> END_TIME 2008-11-03 18:32:18.827387 -07:00 MODIFIED_TIME 2008-11-03 18:32:18.827387 -07:00 TIMEOUT 0 FLAGS 0 STATUS 5 SRC_INCIDENT_ID 0 NUM_INCIDENTS 0 ERR_NUMBER 0 REPORT_FILE <NULL> The most notable piece of information, or lack of information, is the REPORT_FILE that is <NULL>. This column is only populated when a report is created. The creation of a report can be done through the DBMS_HM package or through the ADRCI CREATE REPORT command. For the DBMS_HM package use:
SET LONG 100000
SET LONGCHUNKSIZE 1000
SET PAGESIZE 1000
SET LINESIZE 512
SELECT DBMS_HM.GET_RUN_REPORT('DICTIONARY_CHECK_11032008') FROM DUAL;
For the ADRCI CREATE REPORT command use: CREATE REPORT hm_run DICTIONARY_CHECK_11032008 When using the DBMS_HM package the output looks like:
DBMS_HM.GET_RUN_REPORT('DICTIONARY_CHECK_11032008')
------------------------------------------------------------------
Basic Run Information
Run Name : DICTIONARY_CHECK_11032008
Run Id : 41
Check Name : Dictionary Integrity Check
Mode : MANUAL
Status : COMPLETED
Start Time : 2008-11-03 18:32:10.819805 -07:00
End Time : 2008-11-03 18:32:18.827387 -07:00
Error Encountered : 0
Source Incident Id : 0
Number of Incidents Created : 0
Input Paramters for the Run
TABLE_NAME=ALL_CORE_TABLES
CHECK_MASK=ALL
Run Findings And Recommendations
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 42
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: file$ pk 42 on object FILE$
failed
Message : Damaged rowid is AAAAARAABAAAAByAAF - description: No further
damage description available
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 45
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: dependency$.dobj# fk 126 on
object DEPENDENCY$ failed
Message : Damaged rowid is AAAABnAABAAAOiHABI - description: No further
damage description available
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 48
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: dependency$.dobj# fk 126 on
object DEPENDENCY$ failed
Message : Damaged rowid is AAAABnAABAAAQXqAA6 - description: No further
damage description available
After running either the DBMS_HM package or the ADRCI CREATE REPORT command, we can reissue the ADRCI show hm_run command and now the REPORT_FILE information is populated. REPORT_FILE /opt/app/oracle/diag/rdbms/db11fs/db11FS/hm/HMREPORT_DICTIONARY_CHECK_11032008.hm When using the ADRCI utility, you must first run a command to generate the report file if it does not exist, and then run another command to display its contents. You can also view the report through ADRCI by issuing the following command. Because the output is in XML it will not be shown here. show report hm_run DICTIONARY_CHECK_11032008 Regardless of the method used, these checks are going to be a vital piece of information to investigate corruption in an Oracle database. While I have not performed any elaborate testing on timing, locking, or performance issues with these I will caution their use. Anytime corruption checks are run, there has to be some form of locking on the objects. It will only take time to watch these procedures run and the actual effects they have on a database. But then again, if you have problems, dont you want to find them? Just another feather in the DBA hat to play with. |