Oracle’s new ADR with command interface shows promise for database administrators who like to script their
own solution for quickly scraping the alert log and automatically looking for errors.
Oracle’s alert log is a chronological log of
important messages and errors that are generated using an Oracle database.
Often times it is the first place DBAs look to see if there are errors being
generated or checking to see if the Oracle database is healthy, or at least not
spitting out errors and complaining about something. If you have yet to pry
open the hood of Oracle and have never seen an alert log, below is a snippet of
some of the errors (ORA-) and messages that Oracle keeps in the alert log.
Notice that I’ve included two different snippets. One is what most DBAs are
familiar with, a pure text file, and one that is new to many, an XML alert log.
Both are generated but the prior is, as suggested by Oracle, not being
maintained and not receiving the same messaging as the XML alert log.
From the text alert log:
Wed Dec 09 14:55:16 2009 Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc: ORA-00313: open failed for members of log group 3 of thread 1 ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory Additional information: 3
From the XML alert log:
<msg time='2009-12-09T14:55:16.441-07:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' module='sqlplus@ludwig (TNS V1-V3)' pid='14798'> <txt>Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc: ORA-00313: open failed for members of log group 3 of thread 1 ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory Additional information: 3 </txt> </msg>
Clearly, they are very similar except for the
XML tags. These XML tags cause some interesting problems for those of us that
like to scrape the alert log with system utilities. It isn’t as easy as
pointing the scripts at a new file out on disk. Thankfully Oracle has provided
a utility (ADRCI) that can be used to extract diagnostic information generated
from an Oracle database, including the alert log.
When used interactively by issuing the “adrci”
command from a system prompt, which is often the way it is used, a user can
easily extract alert log detail with very simple commands such as the
following:
SHOW ALERT –TAIL; To display the last 10 lines of the alert log. SHOW ALERT -TAIL 50; To display the last 50 lines of the alert log. SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'"; search for alert lines that have ORA- in them.
However, like many other DBA tasks we clearly
hate logging into a database server and issuing commands. What is more expected
is to have some set of commands that can be executed through a schedule to
check and alert us, such as by email. Additionally if we schedule something to
check on a regular interval, say 10 minutes, it becomes a better and more
reliable monitoring methodology then having a DBA waste time logging into 10s
or 100s of database servers every 10 minutes. Therefore, and to not belabor the
point, here are the scripts that I’ve started to develop. I hope that you can
use them:
This solution makes use of a very simple
directory structure under the oracle user’s home directory. Here are
the directories/files used:
/home/oracle/alert |
This |
/home/oracle/alert/CHKALERT |
This file contains ORA- errors, or any other |
/home/oracle/alert/CHKALERT.sh |
This is the script that will need to be executed |
/home/oracle/alert/CHKALERT.curr |
File that contains information on the last time |
/home/oracle/alert/lst |
The directory where output is generated |
/home/oracle/alert/lst/CHKALERT.adrci |
Is dynamically generated during runtime and will |
/home/oracle/alert/lst/CHKALERT.lst |
Is dynamically generated during runtime and will |
/home/oracle/alert/lst/CHKALERT.sql |
Is the SQL*Plus SQL used to generate the |
/home/oracle/alert/lst/CHKALERT.tmp |
Contains header output from the alert log to get a |
/home/oracle/alert/lst/CHKALERT.yyyy-mm-dd |
will |
Below is the script itself. To use, just execute
by typing in at the command prompt:
[oracle@ludwig alert]$ ./CHKALERT.sh
The file you will be interested in looking at,
if you’ve setup the /home/oracle/alert/CHKALERT file properly to look for
specific ORA errors, will be the /home/oracle/alert/lst/CHKALERT.log file. This
file contains each of the ORA errors found for the last run and looks like the
following:
CHKALERT.log :::::::::::::: 001:2010:03:04:02:32:29:ORA-00312: online log 1 thread 1: '/oradata/db11/redo01.log' 001:2010:03:04:02:32:29:ORA-00312: online log 2 thread 1: '/oradata/db11/redo02.log' 001:2010:03:04:02:32:29:ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
To fully automate the process, not covered in
this article, would be to interrogate this file and if it contains something,
it should be sent to the appropriate DBAs to figure out what to do with the
errors found. Have fun with the script. Change as you see fit and begin to take
advantage of Oracle new ADRCI utility for scraping the alert logs.
#!/bin/sh #--File: CHKALERT.sh #--setup PGM="CHKALERT" ALRT=$HOME/alert LST=${ALRT}/lst LOG=${LST}/${PGM}.log CURR=${ALRT}/${PGM}.curr #--Unix environment variables ECHO=echo; export ECHO CAT=/bin/cat; export CAT RM=/bin/rm; export RM TOUCH=/bin/touch; export TOUCH GREP=/bin/grep; export GREP AWK=/bin/awk; export AWK WC=/usr/bin/wc; export WC TAIL=/usr/bin/tail; export TAIL HEAD=/usr/bin/head; export HEAD SUM=/usr/bin/sum; export SUM #--Oracle environment variables ORACLE_SID=db11; export ORACLE_SID ORACLE_HOME=`${GREP} ${ORACLE_SID}: /etc/oratab | ${AWK} -F: '{print $2}'`; export ORACLE_HOME PATH=$ORACLE_HOME/bin:$PATH; export PATH #--code start=`date "+%Y:%m:%d:%H:%M:%S"` ${RM} ${LOG} ${TOUCH} ${LOG} #--execute SQL to get some diagnostic variables echo "set echo off" > ${LST}/${PGM}.sql echo "set feedback off" >> ${LST}/${PGM}.sql echo "set heading off" >> ${LST}/${PGM}.sql echo "set linesize 40" >> ${LST}/${PGM}.sql echo "set pagesize 55" >> ${LST}/${PGM}.sql echo "set verify off" >> ${LST}/${PGM}.sql echo "set linesize 300" >> ${LST}/${PGM}.sql echo "SELECT 'homepath:'||replace(homepath.value,adrbase.value||'/','')" >> ${LST}/${PGM}.sql echo " FROM v$diag_info homepath, v$diag_info adrbase" >> ${LST}/${PGM}.sql echo " WHERE homepath.name = 'ADR Home'" >> ${LST}/${PGM}.sql echo " AND adrbase.name = 'ADR Base';" >> ${LST}/${PGM}.sql echo "SELECT 'day:'||to_char(sysdate ,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql echo "SELECT 'nextday:'||to_char(sysdate+1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql echo "SELECT 'prevday:'||to_char(sysdate-1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql echo "exit" >> ${LST}/${PGM}.sql sqlplus -s '/as sysdba' @${LST}/${PGM}.sql > ${LST}/${PGM}.lst #-- get diag information variables just queried from the database homepath=`${GREP} homepath ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'` day=`${GREP} "^day" ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'` nextday=`${GREP} nextday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'` prevday=`${GREP} prevday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'` #-- get the timezone from the alert log (safest place to get) #-- the proper timezone is needed to properly filter the alert log for date ranges you #-- want to look at echo "set echo off" > ${LST}/${PGM}.adrci echo "set termout off" >> ${LST}/${PGM}.adrci echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci echo "spool ${LST}/${PGM}.tmp" >> ${LST}/${PGM}.adrci echo "show alert -tail 1" >> ${LST}/${PGM}.adrci echo "spool off" >> ${LST}/${PGM}.adrci adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null timezone=`${HEAD} -1 ${LST}/${PGM}.tmp | ${AWK} -F" " '{print $3}'` #-- extract alert log errors for the current day (today) and previous day (yesterday) #-- previous day alerts will be used if the current file has yesterday's day as last day; #-- meaning that we have had a switch to a new day and might have errors still to #-- process from the previous day echo "set echo off" > ${LST}/${PGM}.adrci echo "set termout off" >> ${LST}/${PGM}.adrci echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci echo "spool ${LST}/${PGM}.${day}" >> ${LST}/${PGM}.adrci echo "show alert -P "ORIGINATING_TIMESTAMP BETWEEN '${day} 00:00:00.000000 ${timezone}' AND '${nextday} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'" -term" >> ${LST}/${PGM}.adrci echo "spool off" >> ${LST}/${PGM}.adrci echo "spool ${LST}/${PGM}.${prevday}" >> ${LST}/${PGM}.adrci echo "show alert -P "ORIGINATING_TIMESTAMP BETWEEN '${prevday} 00:00:00.000000 ${timezone}' AND '${day} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'" -term" >> ${LST}/${PGM}.adrci echo "spool off" >> ${LST}/${PGM}.adrci adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null #-- get current contents of the current file #-- default to current day if no current file if [ -r "${CURR}" ] then #-- if the current exists then get the information it contains daychecksum=`${GREP} day ${CURR} | ${AWK} -F":" '{print $2}'` daylastline=`${GREP} day ${CURR} | ${AWK} -F":" '{print $3}'` daylastday=`${GREP} day ${CURR} | ${AWK} -F":" '{print $4}'` else #-- if the current does not exist then default to today daychecksum=0 daylastline=3 daylastday=${day} fi #-- set the days to search through for alerts #-- if last day in current file was yesterday then include previous day #-- if last day in current file is not yesterday then just scan today's alerts if [ "${daylastday}" = "${prevday}" ] then alertdays="${prevday} ${day}" else alertdays="${day}" fi #-- for each of the days to scan for alerts for theday in ${alertdays} do #-- check alert errors for the last day. if [ -r "${LST}/${PGM}.${theday}" ] then #-- If the checksum generated is DIFFERENT we should start reporting from the top. #-- #-- If the checksum generated is the SAME we should start reporting from end of #-- the previously generated output. new_daychecksum=`${HEAD} -4 ${LST}/${PGM}.${theday} | ${SUM} | ${AWK} '{print $1}'` if [ ${new_daychecksum} -ne ${daychecksum} ] then daychecksum=${new_daychecksum} daylastline=3 fi #-- get the number of lines in the generated errors so we can report to the #-- end of the file and we know where to start next time. new_daylastline=`${WC} -l ${LST}/${PGM}.${theday} | ${AWK} -F" " '{print $1}'` #-- if the number of lines in the output is 3 then there are no errors found. if [ ${new_daylastline} -ne 3 ] then #-- if number of lines in extracted alerts is the same as last time then no new alerts if [ ${new_daylastline} -ne ${daylastline} ] then #-- find the line to begin reporting new alerts from fromline=`expr ${new_daylastline} - ${daylastline}` #-- produce alert lines for alerts defined in file CHKALERT ${TAIL} -${fromline} ${LST}/${PGM}.${theday} | while read LINE do for ORAS in `${CAT} ${ALRT}/CHKALERT` do ora=`${ECHO} ${LINE} | ${GREP} ${ORAS}` if [ $? -eq 0 ] then #-- you might want to do something here #-- that is specific to certain ORA- errors err="001" echo "${err}:${start}:${LINE}" >> ${LOG} fi done done fi fi daylastline=${new_daylastline} #-- update the current file only if the day being processed is current day if [ "${theday}" = "${day}" ] then ${ECHO} "day:"${daychecksum}":"${daylastline}":"${day} > ${CURR} fi fi done