Oracle's new ADR with command interface shows promise for database administrators who like to script their
own solution for quickly scraping the alert log and automatically looking for errors.
Oracles alert log is a chronological log of
important messages and errors that are generated using an Oracle database.
Often times it is the first place DBAs look to see if there are errors being
generated or checking to see if the Oracle database is healthy, or at least not
spitting out errors and complaining about something. If you have yet to pry
open the hood of Oracle and have never seen an alert log, below is a snippet of
some of the errors (ORA-) and messages that Oracle keeps in the alert log.
Notice that Ive included two different snippets. One is what most DBAs are
familiar with, a pure text file, and one that is new to many, an XML alert log.
Both are generated but the prior is, as suggested by Oracle, not being
maintained and not receiving the same messaging as the XML alert log.
From the text alert log:
Wed Dec 09 14:55:16 2009
Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
From the XML alert log:
<msg time='2009-12-09T14:55:16.441-07:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
module='sqlplus@ludwig (TNS V1-V3)' pid='14798'>
<txt>Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
</txt>
</msg>
Clearly, they are very similar except for the
XML tags. These XML tags cause some interesting problems for those of us that
like to scrape the alert log with system utilities. It isnt as easy as
pointing the scripts at a new file out on disk. Thankfully Oracle has provided
a utility (ADRCI) that can be used to extract diagnostic information generated
from an Oracle database, including the alert log.
When used interactively by issuing the adrci
command from a system prompt, which is often the way it is used, a user can
easily extract alert log detail with very simple commands such as the
following:
SHOW ALERT TAIL; To display the last 10 lines of the alert log.
SHOW ALERT -TAIL 50; To display the last 50 lines of the alert log.
SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'"; search for alert lines that have ORA- in them.
However, like many other DBA tasks we clearly
hate logging into a database server and issuing commands. What is more expected
is to have some set of commands that can be executed through a schedule to
check and alert us, such as by email. Additionally if we schedule something to
check on a regular interval, say 10 minutes, it becomes a better and more
reliable monitoring methodology then having a DBA waste time logging into 10s
or 100s of database servers every 10 minutes. Therefore, and to not belabor the
point, here are the scripts that Ive started to develop. I hope that you can
use them:
This solution makes use of a very simple
directory structure under the oracle users home directory. Here are
the directories/files used:
|
/home/oracle/alert
|
This
is where the main script and supporting files exist
|
|
/home/oracle/alert/CHKALERT
|
This file contains ORA- errors, or any other
errors we are interested in checking for in the alert log
|
|
/home/oracle/alert/CHKALERT.sh
|
This is the script that will need to be executed
|
|
/home/oracle/alert/CHKALERT.curr
|
File that contains information on the last time
the alert log was scanned; containing information about the errors found and
put into the CHKALERT.yyyy-mm-dd files
|
|
/home/oracle/alert/lst
|
The directory where output is generated
|
|
/home/oracle/alert/lst/CHKALERT.adrci
|
Is dynamically generated during runtime and will
be used as a script to the ADRCI utility
|
|
/home/oracle/alert/lst/CHKALERT.lst
|
Is dynamically generated during runtime and will
be used as output to a SQL*Plus call to get some variables to be used when
calling ADRCI
|
|
/home/oracle/alert/lst/CHKALERT.sql
|
Is the SQL*Plus SQL used to generate the
CHKALERT.lst output
|
|
/home/oracle/alert/lst/CHKALERT.tmp
|
Contains header output from the alert log to get a
timezone
|
|
/home/oracle/alert/lst/CHKALERT.yyyy-mm-dd
|
will
be files generated for each day that contains ORA- errors for that day
|
Below is the script itself. To use, just execute
by typing in at the command prompt:
[oracle@ludwig alert]$ ./CHKALERT.sh
The file you will be interested in looking at,
if youve setup the /home/oracle/alert/CHKALERT file properly to look for
specific ORA errors, will be the /home/oracle/alert/lst/CHKALERT.log file. This
file contains each of the ORA errors found for the last run and looks like the
following:
CHKALERT.log
::::::::::::::
001:2010:03:04:02:32:29:ORA-00312: online log 1 thread 1: '/oradata/db11/redo01.log'
001:2010:03:04:02:32:29:ORA-00312: online log 2 thread 1: '/oradata/db11/redo02.log'
001:2010:03:04:02:32:29:ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
To fully automate the process, not covered in
this article, would be to interrogate this file and if it contains something,
it should be sent to the appropriate DBAs to figure out what to do with the
errors found. Have fun with the script. Change as you see fit and begin to take
advantage of Oracle new ADRCI utility for scraping the alert logs.
#!/bin/sh
#--File: CHKALERT.sh
#--setup
PGM="CHKALERT"
ALRT=$HOME/alert
LST=${ALRT}/lst
LOG=${LST}/${PGM}.log
CURR=${ALRT}/${PGM}.curr
#--Unix environment variables
ECHO=echo; export ECHO
CAT=/bin/cat; export CAT
RM=/bin/rm; export RM
TOUCH=/bin/touch; export TOUCH
GREP=/bin/grep; export GREP
AWK=/bin/awk; export AWK
WC=/usr/bin/wc; export WC
TAIL=/usr/bin/tail; export TAIL
HEAD=/usr/bin/head; export HEAD
SUM=/usr/bin/sum; export SUM
#--Oracle environment variables
ORACLE_SID=db11; export ORACLE_SID
ORACLE_HOME=`${GREP} ${ORACLE_SID}: /etc/oratab | ${AWK} -F: '{print $2}'`; export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH; export PATH
#--code
start=`date "+%Y:%m:%d:%H:%M:%S"`
${RM} ${LOG}
${TOUCH} ${LOG}
#--execute SQL to get some diagnostic variables
echo "set echo off" > ${LST}/${PGM}.sql
echo "set feedback off" >> ${LST}/${PGM}.sql
echo "set heading off" >> ${LST}/${PGM}.sql
echo "set linesize 40" >> ${LST}/${PGM}.sql
echo "set pagesize 55" >> ${LST}/${PGM}.sql
echo "set verify off" >> ${LST}/${PGM}.sql
echo "set linesize 300" >> ${LST}/${PGM}.sql
echo "SELECT 'homepath:'||replace(homepath.value,adrbase.value||'/','')" >> ${LST}/${PGM}.sql
echo " FROM v\$diag_info homepath, v\$diag_info adrbase" >> ${LST}/${PGM}.sql
echo " WHERE homepath.name = 'ADR Home'" >> ${LST}/${PGM}.sql
echo " AND adrbase.name = 'ADR Base';" >> ${LST}/${PGM}.sql
echo "SELECT 'day:'||to_char(sysdate ,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "SELECT 'nextday:'||to_char(sysdate+1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "SELECT 'prevday:'||to_char(sysdate-1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "exit" >> ${LST}/${PGM}.sql
sqlplus -s '/as sysdba' @${LST}/${PGM}.sql > ${LST}/${PGM}.lst
#-- get diag information variables just queried from the database
homepath=`${GREP} homepath ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
day=`${GREP} "^day" ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
nextday=`${GREP} nextday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
prevday=`${GREP} prevday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
#-- get the timezone from the alert log (safest place to get)
#-- the proper timezone is needed to properly filter the alert log for date ranges you
#-- want to look at
echo "set echo off" > ${LST}/${PGM}.adrci
echo "set termout off" >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.tmp" >> ${LST}/${PGM}.adrci
echo "show alert -tail 1" >> ${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null
timezone=`${HEAD} -1 ${LST}/${PGM}.tmp | ${AWK} -F" " '{print $3}'`
#-- extract alert log errors for the current day (today) and previous day (yesterday)
#-- previous day alerts will be used if the current file has yesterday's day as last day;
#-- meaning that we have had a switch to a new day and might have errors still to
#-- process from the previous day
echo "set echo off" > ${LST}/${PGM}.adrci
echo "set termout off" >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${day}" >> ${LST}/${PGM}.adrci
echo "show alert -P \"ORIGINATING_TIMESTAMP BETWEEN '${day} 00:00:00.000000 ${timezone}' AND
'${nextday} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'\" -term" >>
${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${prevday}" >> ${LST}/${PGM}.adrci
echo "show alert -P \"ORIGINATING_TIMESTAMP BETWEEN '${prevday} 00:00:00.000000 ${timezone}' AND
'${day} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'\" -term" >>
${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null
#-- get current contents of the current file
#-- default to current day if no current file
if [ -r "${CURR}" ]
then
#-- if the current exists then get the information it contains
daychecksum=`${GREP} day ${CURR} | ${AWK} -F":" '{print $2}'`
daylastline=`${GREP} day ${CURR} | ${AWK} -F":" '{print $3}'`
daylastday=`${GREP} day ${CURR} | ${AWK} -F":" '{print $4}'`
else
#-- if the current does not exist then default to today
daychecksum=0
daylastline=3
daylastday=${day}
fi
#-- set the days to search through for alerts
#-- if last day in current file was yesterday then include previous day
#-- if last day in current file is not yesterday then just scan today's alerts
if [ "${daylastday}" = "${prevday}" ]
then
alertdays="${prevday} ${day}"
else
alertdays="${day}"
fi
#-- for each of the days to scan for alerts
for theday in ${alertdays}
do
#-- check alert errors for the last day.
if [ -r "${LST}/${PGM}.${theday}" ]
then
#-- If the checksum generated is DIFFERENT we should start reporting from the top.
#--
#-- If the checksum generated is the SAME we should start reporting from end of
#-- the previously generated output.
new_daychecksum=`${HEAD} -4 ${LST}/${PGM}.${theday} | ${SUM} | ${AWK} '{print $1}'`
if [ ${new_daychecksum} -ne ${daychecksum} ]
then
daychecksum=${new_daychecksum}
daylastline=3
fi
#-- get the number of lines in the generated errors so we can report to the
#-- end of the file and we know where to start next time.
new_daylastline=`${WC} -l ${LST}/${PGM}.${theday} | ${AWK} -F" " '{print $1}'`
#-- if the number of lines in the output is 3 then there are no errors found.
if [ ${new_daylastline} -ne 3 ]
then
#-- if number of lines in extracted alerts is the same as last time then no new alerts
if [ ${new_daylastline} -ne ${daylastline} ]
then
#-- find the line to begin reporting new alerts from
fromline=`expr ${new_daylastline} - ${daylastline}`
#-- produce alert lines for alerts defined in file CHKALERT
${TAIL} -${fromline} ${LST}/${PGM}.${theday} |
while read LINE
do
for ORAS in `${CAT} ${ALRT}/CHKALERT`
do
ora=`${ECHO} ${LINE} | ${GREP} ${ORAS}`
if [ $? -eq 0 ]
then
#-- you might want to do something here
#-- that is specific to certain ORA- errors
err="001"
echo "${err}:${start}:${LINE}" >> ${LOG}
fi
done
done
fi
fi
daylastline=${new_daylastline}
#-- update the current file only if the day being processed is current day
if [ "${theday}" = "${day}" ]
then
${ECHO} "day:"${daychecksum}":"${daylastline}":"${day} > ${CURR}
fi
fi
done
»
See All Articles by Columnist
James Koopmann