Alert Log Scraping with Oracle’s ADRCI Utility


Oracle’s new ADR with command interface shows promise for database administrators who like to script their
own solution for quickly scraping the alert log and automatically looking for errors.

Oracle’s alert log is a chronological log of
important messages and errors that are generated using an Oracle database.
Often times it is the first place DBAs look to see if there are errors being
generated or checking to see if the Oracle database is healthy, or at least not
spitting out errors and complaining about something. If you have yet to pry
open the hood of Oracle and have never seen an alert log, below is a snippet of
some of the errors (ORA-) and messages that Oracle keeps in the alert log.
Notice that I’ve included two different snippets. One is what most DBAs are
familiar with, a pure text file, and one that is new to many, an XML alert log.
Both are generated but the prior is, as suggested by Oracle, not being
maintained and not receiving the same messaging as the XML alert log.

From the text alert log:

Wed Dec 09 14:55:16 2009
Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3

From the XML alert log:

<msg time='2009-12-09T14:55:16.441-07:00' org_id='oracle' comp_id='rdbms'
 client_id='' type='UNKNOWN' level='16'
 module='[email protected] (TNS V1-V3)' pid='14798'>
 <txt>Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
 </txt>
</msg>

Clearly, they are very similar except for the
XML tags. These XML tags cause some interesting problems for those of us that
like to scrape the alert log with system utilities. It isn’t as easy as
pointing the scripts at a new file out on disk. Thankfully Oracle has provided
a utility (ADRCI) that can be used to extract diagnostic information generated
from an Oracle database, including the alert log.

When used interactively by issuing the “adrci”
command from a system prompt, which is often the way it is used, a user can
easily extract alert log detail with very simple commands such as the
following:

SHOW ALERT –TAIL; To display the last 10 lines of the alert log.
SHOW ALERT -TAIL 50; To display the last 50 lines of the alert log.
SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'"; search for alert lines that have ORA- in them.

However, like many other DBA tasks we clearly
hate logging into a database server and issuing commands. What is more expected
is to have some set of commands that can be executed through a schedule to
check and alert us, such as by email. Additionally if we schedule something to
check on a regular interval, say 10 minutes, it becomes a better and more
reliable monitoring methodology then having a DBA waste time logging into 10s
or 100s of database servers every 10 minutes. Therefore, and to not belabor the
point, here are the scripts that I’ve started to develop. I hope that you can
use them:

This solution makes use of a very simple
directory structure under the oracle user’s home directory. Here are
the directories/files used:

/home/oracle/alert

This
is where the main script and supporting files exist

/home/oracle/alert/CHKALERT

This file contains ORA- errors, or any other
errors we are interested in checking for in the alert log

/home/oracle/alert/CHKALERT.sh

This is the script that will need to be executed

/home/oracle/alert/CHKALERT.curr

File that contains information on the last time
the alert log was scanned; containing information about the errors found and
put into the CHKALERT.yyyy-mm-dd files

/home/oracle/alert/lst

The directory where output is generated

/home/oracle/alert/lst/CHKALERT.adrci

Is dynamically generated during runtime and will
be used as a script to the ADRCI utility

/home/oracle/alert/lst/CHKALERT.lst

Is dynamically generated during runtime and will
be used as output to a SQL*Plus call to get some variables to be used when
calling ADRCI

/home/oracle/alert/lst/CHKALERT.sql

Is the SQL*Plus SQL used to generate the
CHKALERT.lst output

/home/oracle/alert/lst/CHKALERT.tmp

Contains header output from the alert log to get a
timezone

/home/oracle/alert/lst/CHKALERT.yyyy-mm-dd

will
be files generated for each day that contains ORA- errors for that day

Below is the script itself. To use, just execute
by typing in at the command prompt:

[[email protected] alert]$ ./CHKALERT.sh

The file you will be interested in looking at,
if you’ve setup the /home/oracle/alert/CHKALERT file properly to look for
specific ORA errors, will be the /home/oracle/alert/lst/CHKALERT.log file. This
file contains each of the ORA errors found for the last run and looks like the
following:

CHKALERT.log
::::::::::::::
001:2010:03:04:02:32:29:ORA-00312: online log 1 thread 1: '/oradata/db11/redo01.log'
001:2010:03:04:02:32:29:ORA-00312: online log 2 thread 1: '/oradata/db11/redo02.log'
001:2010:03:04:02:32:29:ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'

To fully automate the process, not covered in
this article, would be to interrogate this file and if it contains something,
it should be sent to the appropriate DBAs to figure out what to do with the
errors found. Have fun with the script. Change as you see fit and begin to take
advantage of Oracle new ADRCI utility for scraping the alert logs.

#!/bin/sh
#--File: CHKALERT.sh

#--setup
PGM="CHKALERT"
ALRT=$HOME/alert
LST=${ALRT}/lst
LOG=${LST}/${PGM}.log
CURR=${ALRT}/${PGM}.curr

#--Unix environment variables
ECHO=echo;          export ECHO
CAT=/bin/cat;       export CAT
RM=/bin/rm;         export RM
TOUCH=/bin/touch;   export TOUCH
GREP=/bin/grep;     export GREP
AWK=/bin/awk;       export AWK
WC=/usr/bin/wc;     export WC
TAIL=/usr/bin/tail; export TAIL
HEAD=/usr/bin/head; export HEAD
SUM=/usr/bin/sum;   export SUM

#--Oracle environment variables
ORACLE_SID=db11;             export ORACLE_SID
ORACLE_HOME=`${GREP} ${ORACLE_SID}: /etc/oratab | ${AWK} -F: '{print $2}'`; export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH; export PATH

#--code
start=`date "+%Y:%m:%d:%H:%M:%S"`
${RM} ${LOG}
${TOUCH} ${LOG}

#--execute SQL to get some diagnostic variables
echo "set echo     off"                                                    > ${LST}/${PGM}.sql
echo "set feedback off"                                                   >> ${LST}/${PGM}.sql
echo "set heading  off"                                                   >> ${LST}/${PGM}.sql
echo "set linesize  40"                                                   >> ${LST}/${PGM}.sql
echo "set pagesize  55"                                                   >> ${LST}/${PGM}.sql
echo "set verify   off"                                                   >> ${LST}/${PGM}.sql
echo "set linesize 300"                                                   >> ${LST}/${PGM}.sql
echo "SELECT 'homepath:'||replace(homepath.value,adrbase.value||'/','')"  >> ${LST}/${PGM}.sql
echo "  FROM v$diag_info homepath, v$diag_info adrbase"                 >> ${LST}/${PGM}.sql
echo " WHERE homepath.name = 'ADR Home'"                                  >> ${LST}/${PGM}.sql
echo "   AND adrbase.name  = 'ADR Base';"                                 >> ${LST}/${PGM}.sql
echo "SELECT     'day:'||to_char(sysdate  ,'yyyy-mm-dd') FROM dual;"      >> ${LST}/${PGM}.sql
echo "SELECT 'nextday:'||to_char(sysdate+1,'yyyy-mm-dd') FROM dual;"      >> ${LST}/${PGM}.sql
echo "SELECT 'prevday:'||to_char(sysdate-1,'yyyy-mm-dd') FROM dual;"      >> ${LST}/${PGM}.sql
echo "exit"                                                               >> ${LST}/${PGM}.sql
sqlplus -s '/as sysdba' @${LST}/${PGM}.sql                         > ${LST}/${PGM}.lst

#-- get diag information variables just queried from the database
homepath=`${GREP} homepath               ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
     day=`${GREP} "^day"                 ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
 nextday=`${GREP} nextday                ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
 prevday=`${GREP} prevday                ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`

#-- get the timezone from the alert log (safest place to get)
#-- the proper timezone is needed to properly filter the alert log for date ranges you
#--   want to look at
echo "set echo off"                             > ${LST}/${PGM}.adrci
echo "set termout off"                         >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}"                >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.tmp"                 >> ${LST}/${PGM}.adrci
echo "show alert -tail 1"                      >> ${LST}/${PGM}.adrci
echo "spool off"                               >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci       1>/dev/null 2>/dev/null
timezone=`${HEAD} -1 ${LST}/${PGM}.tmp | ${AWK} -F" " '{print $3}'`

#-- extract alert log errors for the current day (today) and previous day (yesterday)
#-- previous day alerts will be used if the current file has yesterday's day as last day;
#--   meaning that we have had a switch to a new day and might have errors still to
#--   process from the previous day
echo "set echo off"                             > ${LST}/${PGM}.adrci
echo "set termout off"                         >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}"                >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${day}"              >> ${LST}/${PGM}.adrci
echo "show alert -P "ORIGINATING_TIMESTAMP BETWEEN '${day} 00:00:00.000000 ${timezone}' AND 
'${nextday} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'" -term" >> 
${LST}/${PGM}.adrci
echo "spool off"                               >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${prevday}"  >> ${LST}/${PGM}.adrci
echo "show alert -P "ORIGINATING_TIMESTAMP BETWEEN '${prevday} 00:00:00.000000 ${timezone}' AND 
'${day} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'" -term" >> 
${LST}/${PGM}.adrci
echo "spool off"                               >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci       1>/dev/null 2>/dev/null

#-- get current contents of the current file
#-- default to current day if no current file
if [ -r "${CURR}" ]
then
  #-- if the current exists then get the information it contains
  daychecksum=`${GREP} day ${CURR} | ${AWK} -F":" '{print $2}'`
  daylastline=`${GREP} day ${CURR} | ${AWK} -F":" '{print $3}'`
   daylastday=`${GREP} day ${CURR} | ${AWK} -F":" '{print $4}'`
else
  #-- if the current does not exist then default to today
  daychecksum=0
  daylastline=3
   daylastday=${day}
fi

#-- set the days to search through for alerts
#-- if last day in current file was yesterday then include previous day
#-- if last day in current file is not yesterday then just scan today's alerts
if [ "${daylastday}" = "${prevday}" ]
then
  alertdays="${prevday} ${day}"
else
  alertdays="${day}"
fi

#-- for each of the days to scan for alerts
for theday in ${alertdays}
do
 #-- check alert errors for the last day.
 if [ -r "${LST}/${PGM}.${theday}" ]
 then
  #-- If the checksum generated is DIFFERENT we should start reporting from the top.
  #--
  #-- If the checksum generated is the SAME we should start reporting from end of
  #-- the previously generated output.
  new_daychecksum=`${HEAD} -4 ${LST}/${PGM}.${theday} | ${SUM} | ${AWK} '{print $1}'`
  if [ ${new_daychecksum} -ne ${daychecksum} ]
  then
   daychecksum=${new_daychecksum}
   daylastline=3
  fi

  #-- get the number of lines in the generated errors so we can report to the
  #-- end of the file and we know where to start next time.
  new_daylastline=`${WC} -l ${LST}/${PGM}.${theday} | ${AWK} -F" " '{print $1}'`

  #-- if the number of lines in the output is 3 then there are no errors found.
  if [ ${new_daylastline} -ne 3 ]
  then
   #-- if number of lines in extracted alerts is the same as last time then no new alerts
   if [ ${new_daylastline} -ne ${daylastline} ]
   then
    #-- find the line to begin reporting new alerts from
    fromline=`expr ${new_daylastline} - ${daylastline}`
    #-- produce alert lines for alerts defined in file CHKALERT
    ${TAIL} -${fromline} ${LST}/${PGM}.${theday} |
    while read LINE
    do
     for ORAS in `${CAT} ${ALRT}/CHKALERT`
     do
       ora=`${ECHO} ${LINE} | ${GREP} ${ORAS}`
      if [ $? -eq 0 ]
      then
       #-- you might want to do something here
       #--  that is specific to certain ORA- errors
       err="001"
       echo "${err}:${start}:${LINE}"  >> ${LOG}
      fi
     done
    done
   fi
  fi
  daylastline=${new_daylastline}
  #-- update the current file only if the day being processed is current day
  if [ "${theday}" = "${day}" ]
  then
   ${ECHO} "day:"${daychecksum}":"${daylastline}":"${day} > ${CURR}
  fi
fi
done


»


See All Articles by Columnist

James Koopmann

James Koopmann
James Koopmann
James Koopmann has fourteen years of database design, development and performance tuning experience. In addition, he has extensive database administration experience in Oracle and other relational databases in production environments, specializing in performance tuning of database engines and SQL based applications. Koopmann is an accomplished author with several technical papers in various Oracle related publications such as Oracle Magazine, Oracle Professional and SQL>UPDATE_RMOUG. He is a featured author and database expert for DatabaseJournal, a member of the editorial review committee for Select Journal (The Magazine for the International Oracle Users Group), an Oracle Certified Professional DBA and noted speaker at local Oracle User Groups around the country.

Latest Articles