March 18, 2010 Oracles new ADR with command interface shows promise for database administrators who like to script their own solution for quickly scraping the alert log and automatically looking for errors. Oracles alert log is a chronological log of important messages and errors that are generated using an Oracle database. Often times it is the first place DBAs look to see if there are errors being generated or checking to see if the Oracle database is healthy, or at least not spitting out errors and complaining about something. If you have yet to pry open the hood of Oracle and have never seen an alert log, below is a snippet of some of the errors (ORA-) and messages that Oracle keeps in the alert log. Notice that Ive included two different snippets. One is what most DBAs are familiar with, a pure text file, and one that is new to many, an XML alert log. Both are generated but the prior is, as suggested by Oracle, not being maintained and not receiving the same messaging as the XML alert log. From the text alert log: Wed Dec 09 14:55:16 2009 Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc: ORA-00313: open failed for members of log group 3 of thread 1 ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory Additional information: 3 From the XML alert log: <msg time='2009-12-09T14:55:16.441-07:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' module='sqlplus@ludwig (TNS V1-V3)' pid='14798'> <txt>Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc: ORA-00313: open failed for members of log group 3 of thread 1 ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory Additional information: 3 </txt> </msg> Clearly, they are very similar except for the XML tags. These XML tags cause some interesting problems for those of us that like to scrape the alert log with system utilities. It isnt as easy as pointing the scripts at a new file out on disk. Thankfully Oracle has provided a utility (ADRCI) that can be used to extract diagnostic information generated from an Oracle database, including the alert log. When used interactively by issuing the adrci command from a system prompt, which is often the way it is used, a user can easily extract alert log detail with very simple commands such as the following: SHOW ALERT TAIL; To display the last 10 lines of the alert log. SHOW ALERT -TAIL 50; To display the last 50 lines of the alert log. SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'"; search for alert lines that have ORA- in them. However, like many other DBA tasks we clearly hate logging into a database server and issuing commands. What is more expected is to have some set of commands that can be executed through a schedule to check and alert us, such as by email. Additionally if we schedule something to check on a regular interval, say 10 minutes, it becomes a better and more reliable monitoring methodology then having a DBA waste time logging into 10s or 100s of database servers every 10 minutes. Therefore, and to not belabor the point, here are the scripts that Ive started to develop. I hope that you can use them: This solution makes use of a very simple directory structure under the oracle users home directory. Here are the directories/files used:
Below is the script itself. To use, just execute by typing in at the command prompt: [oracle@ludwig alert]$ ./CHKALERT.sh The file you will be interested in looking at, if youve setup the /home/oracle/alert/CHKALERT file properly to look for specific ORA errors, will be the /home/oracle/alert/lst/CHKALERT.log file. This file contains each of the ORA errors found for the last run and looks like the following: CHKALERT.log :::::::::::::: 001:2010:03:04:02:32:29:ORA-00312: online log 1 thread 1: '/oradata/db11/redo01.log' 001:2010:03:04:02:32:29:ORA-00312: online log 2 thread 1: '/oradata/db11/redo02.log' 001:2010:03:04:02:32:29:ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log' To fully automate the process, not covered in this article, would be to interrogate this file and if it contains something, it should be sent to the appropriate DBAs to figure out what to do with the errors found. Have fun with the script. Change as you see fit and begin to take advantage of Oracle new ADRCI utility for scraping the alert logs.
#!/bin/sh
#--File: CHKALERT.sh
#--setup
PGM="CHKALERT"
ALRT=$HOME/alert
LST=${ALRT}/lst
LOG=${LST}/${PGM}.log
CURR=${ALRT}/${PGM}.curr
#--Unix environment variables
ECHO=echo; export ECHO
CAT=/bin/cat; export CAT
RM=/bin/rm; export RM
TOUCH=/bin/touch; export TOUCH
GREP=/bin/grep; export GREP
AWK=/bin/awk; export AWK
WC=/usr/bin/wc; export WC
TAIL=/usr/bin/tail; export TAIL
HEAD=/usr/bin/head; export HEAD
SUM=/usr/bin/sum; export SUM
#--Oracle environment variables
ORACLE_SID=db11; export ORACLE_SID
ORACLE_HOME=`${GREP} ${ORACLE_SID}: /etc/oratab | ${AWK} -F: '{print $2}'`; export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH; export PATH
#--code
start=`date "+%Y:%m:%d:%H:%M:%S"`
${RM} ${LOG}
${TOUCH} ${LOG}
#--execute SQL to get some diagnostic variables
echo "set echo off" > ${LST}/${PGM}.sql
echo "set feedback off" >> ${LST}/${PGM}.sql
echo "set heading off" >> ${LST}/${PGM}.sql
echo "set linesize 40" >> ${LST}/${PGM}.sql
echo "set pagesize 55" >> ${LST}/${PGM}.sql
echo "set verify off" >> ${LST}/${PGM}.sql
echo "set linesize 300" >> ${LST}/${PGM}.sql
echo "SELECT 'homepath:'||replace(homepath.value,adrbase.value||'/','')" >> ${LST}/${PGM}.sql
echo " FROM v\$diag_info homepath, v\$diag_info adrbase" >> ${LST}/${PGM}.sql
echo " WHERE homepath.name = 'ADR Home'" >> ${LST}/${PGM}.sql
echo " AND adrbase.name = 'ADR Base';" >> ${LST}/${PGM}.sql
echo "SELECT 'day:'||to_char(sysdate ,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "SELECT 'nextday:'||to_char(sysdate+1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "SELECT 'prevday:'||to_char(sysdate-1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "exit" >> ${LST}/${PGM}.sql
sqlplus -s '/as sysdba' @${LST}/${PGM}.sql > ${LST}/${PGM}.lst
#-- get diag information variables just queried from the database
homepath=`${GREP} homepath ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
day=`${GREP} "^day" ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
nextday=`${GREP} nextday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
prevday=`${GREP} prevday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
#-- get the timezone from the alert log (safest place to get)
#-- the proper timezone is needed to properly filter the alert log for date ranges you
#-- want to look at
echo "set echo off" > ${LST}/${PGM}.adrci
echo "set termout off" >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.tmp" >> ${LST}/${PGM}.adrci
echo "show alert -tail 1" >> ${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null
timezone=`${HEAD} -1 ${LST}/${PGM}.tmp | ${AWK} -F" " '{print $3}'`
#-- extract alert log errors for the current day (today) and previous day (yesterday)
#-- previous day alerts will be used if the current file has yesterday's day as last day;
#-- meaning that we have had a switch to a new day and might have errors still to
#-- process from the previous day
echo "set echo off" > ${LST}/${PGM}.adrci
echo "set termout off" >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${day}" >> ${LST}/${PGM}.adrci
echo "show alert -P \"ORIGINATING_TIMESTAMP BETWEEN '${day} 00:00:00.000000 ${timezone}' AND
'${nextday} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'\" -term" >>
${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${prevday}" >> ${LST}/${PGM}.adrci
echo "show alert -P \"ORIGINATING_TIMESTAMP BETWEEN '${prevday} 00:00:00.000000 ${timezone}' AND
'${day} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'\" -term" >>
${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null
#-- get current contents of the current file
#-- default to current day if no current file
if [ -r "${CURR}" ]
then
#-- if the current exists then get the information it contains
daychecksum=`${GREP} day ${CURR} | ${AWK} -F":" '{print $2}'`
daylastline=`${GREP} day ${CURR} | ${AWK} -F":" '{print $3}'`
daylastday=`${GREP} day ${CURR} | ${AWK} -F":" '{print $4}'`
else
#-- if the current does not exist then default to today
daychecksum=0
daylastline=3
daylastday=${day}
fi
#-- set the days to search through for alerts
#-- if last day in current file was yesterday then include previous day
#-- if last day in current file is not yesterday then just scan today's alerts
if [ "${daylastday}" = "${prevday}" ]
then
alertdays="${prevday} ${day}"
else
alertdays="${day}"
fi
#-- for each of the days to scan for alerts
for theday in ${alertdays}
do
#-- check alert errors for the last day.
if [ -r "${LST}/${PGM}.${theday}" ]
then
#-- If the checksum generated is DIFFERENT we should start reporting from the top.
#--
#-- If the checksum generated is the SAME we should start reporting from end of
#-- the previously generated output.
new_daychecksum=`${HEAD} -4 ${LST}/${PGM}.${theday} | ${SUM} | ${AWK} '{print $1}'`
if [ ${new_daychecksum} -ne ${daychecksum} ]
then
daychecksum=${new_daychecksum}
daylastline=3
fi
#-- get the number of lines in the generated errors so we can report to the
#-- end of the file and we know where to start next time.
new_daylastline=`${WC} -l ${LST}/${PGM}.${theday} | ${AWK} -F" " '{print $1}'`
#-- if the number of lines in the output is 3 then there are no errors found.
if [ ${new_daylastline} -ne 3 ]
then
#-- if number of lines in extracted alerts is the same as last time then no new alerts
if [ ${new_daylastline} -ne ${daylastline} ]
then
#-- find the line to begin reporting new alerts from
fromline=`expr ${new_daylastline} - ${daylastline}`
#-- produce alert lines for alerts defined in file CHKALERT
${TAIL} -${fromline} ${LST}/${PGM}.${theday} |
while read LINE
do
for ORAS in `${CAT} ${ALRT}/CHKALERT`
do
ora=`${ECHO} ${LINE} | ${GREP} ${ORAS}`
if [ $? -eq 0 ]
then
#-- you might want to do something here
#-- that is specific to certain ORA- errors
err="001"
echo "${err}:${start}:${LINE}" >> ${LOG}
fi
done
done
fi
fi
daylastline=${new_daylastline}
#-- update the current file only if the day being processed is current day
if [ "${theday}" = "${day}" ]
then
${ECHO} "day:"${daychecksum}":"${daylastline}":"${day} > ${CURR}
fi
fi
done
As a DBA, you are responsible for the proper running of your Oracle database. You have a task list one mile long that must be performed every day. In the case of alert log monitoring, you must check this log file a few times a day, if not more, to ensure that users have not experienced any problems and that the database is running without any internal notifications. What makes this worse; most database shops have more than one instance of Oracle running. In order to check the alert log, you must log onto each database box and open the alert log file with an editor to search for errors. The joy of the approach I would suggest gives you access to the alert log without ever having to log onto the database box again. Using a simple package and procedures, which you should modify to suit your tastes, you can have a robust alert log monitoring device and sleep well at nights. Reference for the procedures, packages , and objects usedDirectory ObjectThis object allows you to access and use external Large Objects (LOBS) or Binary Files (BFILEs) in an Oracle Server. The DIRECTORY object is an alias for a physical directory under which there may be files you wish to access. Note: In Oracle documentation, there is a statement that states: Do Not Map DIRECTORY to Directories of Data Files, And So On. A DIRECTORY should not be mapped to physical directories that contain Oracle data files, control files, log files, and other system files. Tampering with these files (accidental or otherwise) could corrupt the database or the server operating system. I have since contacted Oracle support (TAR 266057.996) and have gotten the following response / blessing from them on this method of accessing the alert.log: You can access the alert.log
file through the creation of a directory object. Utl_raw packageThis is a package supplied by Oracle that allows you to manipulate raw data and should be available with a typical install. If it is not there, you can find it in the $ORACLE_HOME\rdbms\admin directory. If you have to install this package yourself just do so as user sys. DBMS_LOB packageUsed to access, search, and manipulate LOB/BFILE information. Bdump_dir_setupThis procedure is an initial setup for use of the package. It determines where your alert log is and creates a directory object to that log file. Build_possible_alert_log_nameWhen supplied with possible SID names from the database, this procedure builds a string of possible alert log file names. Typically, for Windows environments, the alert log name is SIDALRT.LOG and for Unix environments it is alert_SID.log. Check_alertlog_nameThis procedure just loops through the possible alert log file names and checks for its existence. Read_alertlogThe meat of the code. Sets up calls to the build_possible_alert_log_name procedure, calls to check_alertlog_name for the existence of a good alert log file, opens the alert log file and begins reading the alert log file for the lines encountered after first finding the current date in the log. It then prints out the lines to the terminal. To get you going, lets first go through the How to Use procedures to put the simple shell on a database box and see the results. After you are comfortable with the shell, I would suggest you modify it by removing the DBMS_OUTPUT.PUT_LINE with a call to code that will search the strings returned, insert them into a table, and have email or pager notification wrapped around those important error messages. After you get that done, just put the procedures on every database you want to monitor, schedule a task or dbms_job around it and then sleep peacefully. How to Use
SQL> @that_darn_alert_log
SQL> set serveroutput on
SQL> exec that_darn_alert_log.bdump_dir_setup
SQL> exec that_darn_alert_log.read_alertlog --put code here --put code here --put code here --put code here --put code here --put code here --put code here --put code here --put code here --put code here The Oracle alert log contains loads of information. Even so, Oracle has yet to put in place an easy method for DBAs to be notified when errors and important messages do occur. On the flip side, even the best DBA with the best intentions will on occasion forget to view this log from time to time. Do yourself a favor and put something in place to notify yourself so that potential database failure can be averted. I have given you one method to do just that. Please modify to your heart's content and let me know how it works. |