Troubleshooting ASM problems on VMware ESX 3.x

Brief intro

It is easy and convenient to
assume that everything will go fine with all of the automated features that
most applications are sold with today. Oracle has also worked hard to achieve
this since version 8. With the newer Oracle 10g, many administration tasks were
automated, Oracle 11g came up with loads of simplified tasks that were
previously automated, but in real life, a lot of things can happen even when
you think that you have planned carefully.

In this series, we will look
at what I encountered on my ESX 3.x Oracle RAC setup. We will do this in a more
conversational style: we’ll go about encountering the error and then fixing the

Oracle RAC error

Upon attempting to log on to
my Oracle 10g RAC, I got the typical “Oracle not available” error. Looking at
the cluster statistics, we saw:

[[email protected] bin]$ crs_stat -t
Name Type Target State Host
ora.esxrac.db application ONLINE OFFLINE
ora….c1.inst application ONLINE OFFLINE
ora….c2.inst application ONLINE OFFLINE
ora….serv.cs application ONLINE OFFLINE
ora….ac1.srv application ONLINE OFFLINE
ora….ac2.srv application ONLINE OFFLINE
ora….SM1.asm application ONLINE ONLINE vm01
ora….01.lsnr application ONLINE ONLINE vm01
ora.vm01.gsd application ONLINE ONLINE vm01
ora.vm01.ons application ONLINE ONLINE vm01 application ONLINE ONLINE vm01
ora….SM2.asm application ONLINE ONLINE vm02
ora….02.lsnr application ONLINE ONLINE vm02
ora.vm02.gsd application ONLINE ONLINE vm02
ora.vm02.ons application ONLINE ONLINE vm02 application ONLINE ONLINE vm02

Obviously, this doesn’t tell
much, so we attempted to start the individual applications to no avail. It
threw an error, which wasn’t really helpful:

CRS-0215: Could not start resource ‘ora.esxrac.esxrac1.inst’.
CRS-0215: Could not start resource ‘ora.esxrac.esxrac1.inst’.

Is the cluster healthy?

[[email protected] bin]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[[email protected] bin]$ crsctl check cssd
CSS appears healthy

Next, we checked our alert.log,
called alert_esxrac1.log, named after our esxrac1 db instance; the whole file is
a bit too much, so obviously we have taken out a portion that returned an error:

ARC0: Becoming the ‘no FAL’ ARCH
ARC0: Becoming the ‘no SRL’ ARCH
Thu Jun 21 12:15:45 2007
ARC2: Archival started
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=29, OS id=4724
Thu Jun 21 12:15:45 2007
Errors in file /u01/app/oracle/admin/esxrac/udump/esxrac1_ora_4571.trc:
style=”color: red; background: transparent;”ORA-19504:
failed to create file “+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf”
ORA-17502: ksfdcre:4 Failed to create file
ORA-15041: diskgroup space exhausted
Thu Jun 21 12:15:45 2007
ARCH: Error 19504 Creating archive log file to
ARCH: Failed to archive thread 1 sequence 153 (19504)
Thu Jun 21 12:15:48 2007

We also went ahead and checked
our TRACE file, for detailed information:

Oracle Database 10g Enterprise Edition Release – Production
With the Partitioning, Real Application Clusters, Oracle Label Security,
OLAP and Data Mining Scoring Engine options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: Linux
Node name:
Release: 2.6.9-
Version: #1 SMP Sun Oct 15 14:02:40 PDT 2006
Machine: i686
Instance name: esxrac1
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 8878, image: [email protected] (TNS V1-V3)

*** SERVICE NAME:() 2007-06-21 12:45:14.815
*** SESSION ID:(150.3) 2007-06-21 12:45:14.815
style=”color: red; background: transparent;”Failed to create file ‘+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf’ (file not accessible?)
ORA-19504: failed to create file “+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf”
ORA-17502: ksfdcre:4 Failed to create file +FLASH_RECO_AREA/esxrac/1_153_620732678.dbf
ORA-15041: diskgroup space exhausted
*** 2007-06-21 12:45:14.823 60679 kcrr.c
ARCH: Error 19504 Creating archive log file to ‘+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf’
*** 2007-06-21 12:45:14.824 58941 kcrr.c
kcrrfail: dest:1 err:19504 force:0 blast:1
ORA-16038: log 1 sequence# 153 cannot be archived
ORA-19504: failed to create file “”
ORA-00312: online log 1 thread 1: ‘+ORADATA/esxrac/onlinelog/group_1.257.620732695’
ORA-00312: online log 1 thread 1: ‘+FLASH_RECO_AREA/esxrac/onlinelog/group_1.257.620732699’
ksuitm: waiting for [5] seconds before killing DIAG

Aha, so there was our
culprit. The diskspace was exhausted and as you may remember, we created just
one big flash_reco file, which was apparently not big enough for our Oracle
`10g RAC.

Going back and logging into
our ASM instance:

[[email protected] bin]$ sqlplus / as sysdba

SQL*Plus: Release – Production on Thu Jun 21 13:02:00 2007

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release – Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP
and Data Mining Scoring Engine options

SQL> show parameters asmdisk
SQL> show parameter asm_disk

———————————— ———– ——————————
asm_diskgroups string ORADATA, FLASH_RECO_AREA
asm_diskstring string
SQL> desc v$asm_diskgroup
Name Null? Type
—————————————– ——– —————————-

Now, we execute the
following query:

SQL> select group_number, name, total_mb, free_mb
2 from v$asm_diskgroup;

———— —————————— ———- ———-
1 FLASH_RECO_AREA 10236 30
2 ORADATA 20472 16224

So clearly, we need to add
more storage space to our ASM Flashback_Recovery_Area.


In our next article, we will go about
adding an extra vmdk file to the ASM diskgroup and bring our RAC back online.

This normally need not happen in your
production environment where you as the DBA are in total command of the
four-cores of your RAC infrastructure. With four cores, I mean that you will
want to pay attention to the following:

  • networking:
    • are all the NICs functioning properly
    • are they teamed
    • are they redundant
  • storage:
    • are my disks optimal. In our last article we
      went ahead and tested our disk I/O with the cool Oracle tool called Orion
    • is there enough space on my LUNs (if you are
      using SAN, i.e)
  • processors:
    • are my CPUs performing well,
  • memory:
    • is there enough memory
    • is the SGA optimal.

We will get into a more predictive mode
when we go about doing capacity planning for our Oracle 11g RAC, to inspect all
our “4-cores” for RAC readiness, and all that on ESX 3.5 hopefully!


See All Articles by Columnist
Tarry Singh

Tarry Singh
Tarry Singh
I have been active in several industries since 1991. While working in the maritime industry I have worked for several Fortune 500 firms such as NYK, A.P. Møller-Mærsk Group. I made a career switch, emigrated, learned a new language and moved into the IT industry starting 2000. Since then I have been a Sr. DBA, (Technical) Project Manager, Sr. Consultant, Infrastructure Specialist (Clustering, Load Balancing, Networks, Databases) and (currently) Virtualization/Cloud Computing Expert and Global Sourcing in the IT industry. My deep understanding of multi-cultural issues (having worked across the globe) and international exposure has not only helped me successfully relaunch my career in a new industry but also helped me stay successful in what I do. I believe in "worknets" and "collective or swarm intelligence". As a trainer (technical as well as non-technical) I have trained staff both on national and international level. I am very devoted, perspicacious and hard working.

Latest Articles