Troubleshooting ASM problems on VMware ESX 3.x

September 18, 2007


Brief intro


It is easy and convenient to assume that everything will go fine with all of the automated features that most applications are sold with today. Oracle has also worked hard to achieve this since version 8. With the newer Oracle 10g, many administration tasks were automated, Oracle 11g came up with loads of simplified tasks that were previously automated, but in real life, a lot of things can happen even when you think that you have planned carefully.

In this series, we will look at what I encountered on my ESX 3.x Oracle RAC setup. We will do this in a more conversational style: we’ll go about encountering the error and then fixing the problem.


Oracle RAC error


Upon attempting to log on to my Oracle 10g RAC, I got the typical “Oracle not available” error. Looking at the cluster statistics, we saw:


[oracle@vm01 bin]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.esxrac.db  application    ONLINE    OFFLINE
ora....c1.inst application    ONLINE    OFFLINE
ora....c2.inst application    ONLINE    OFFLINE
ora....serv.cs application    ONLINE    OFFLINE
ora....ac1.srv application    ONLINE    OFFLINE
ora....ac2.srv application    ONLINE    OFFLINE
ora....SM1.asm application    ONLINE    ONLINE    vm01
ora....01.lsnr application    ONLINE    ONLINE    vm01
ora.vm01.gsd   application    ONLINE    ONLINE    vm01
ora.vm01.ons   application    ONLINE    ONLINE    vm01
ora.vm01.vip   application    ONLINE    ONLINE    vm01
ora....SM2.asm application    ONLINE    ONLINE    vm02
ora....02.lsnr application    ONLINE    ONLINE    vm02
ora.vm02.gsd   application    ONLINE    ONLINE    vm02
ora.vm02.ons   application    ONLINE    ONLINE    vm02
ora.vm02.vip   application    ONLINE    ONLINE    vm02

Obviously, this doesn’t tell much, so we attempted to start the individual applications to no avail. It threw an error, which wasn’t really helpful:

CRS-0215: Could not start resource 'ora.esxrac.esxrac1.inst'. 
CRS-0215: Could not start resource 'ora.esxrac.esxrac1.inst'.

Is the cluster healthy?

[oracle@vm01 bin]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[oracle@vm01 bin]$ crsctl check cssd
CSS appears healthy

Next, we checked our alert.log, called alert_esxrac1.log, named after our esxrac1 db instance; the whole file is a bit too much, so obviously we have taken out a portion that returned an error:

ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Jun 21 12:15:45 2007
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=29, OS id=4724
Thu Jun 21 12:15:45 2007
Errors in file /u01/app/oracle/admin/esxrac/udump/esxrac1_ora_4571.trc:
 style="color: red; background: transparent;"ORA-19504: 
 failed to create file "+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf"
ORA-17502: ksfdcre:4 Failed to create file 
 +FLASH_RECO_AREA/esxrac/1_153_620732678.dbf
ORA-15041: diskgroup space exhausted
Thu Jun 21 12:15:45 2007
ARCH: Error 19504 Creating archive log file to 
 '+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf'
ARCH: Failed to archive thread 1 sequence 153 (19504)
Thu Jun 21 12:15:48 2007

We also went ahead and checked our TRACE file, for detailed information:

/u01/app/oracle/admin/esxrac/udump/esxrac1_ora_8878.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Oracle Label Security, 
OLAP and Data Mining Scoring Engine options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name:	Linux
Node name:	vm01.hanze.nl
Release:	2.6.9-42.0.0.0.1.ELsmp
Version:	#1 SMP Sun Oct 15 14:02:40 PDT 2006
Machine:	i686
Instance name: esxrac1
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 8878, image: oracle@vm01.hanze.nl (TNS V1-V3)

*** SERVICE NAME:() 2007-06-21 12:45:14.815
*** SESSION ID:(150.3) 2007-06-21 12:45:14.815
 style="color: red; background: transparent;"Failed to create file '+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf' (file not accessible?)
ORA-19504: failed to create file "+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf"
ORA-17502: ksfdcre:4 Failed to create file +FLASH_RECO_AREA/esxrac/1_153_620732678.dbf
ORA-15041: diskgroup space exhausted
*** 2007-06-21 12:45:14.823 60679 kcrr.c
ARCH: Error 19504 Creating archive log file to '+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf'
*** 2007-06-21 12:45:14.824 58941 kcrr.c
kcrrfail: dest:1 err:19504 force:0 blast:1
ORA-16038: log 1 sequence# 153 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+ORADATA/esxrac/onlinelog/group_1.257.620732695'
ORA-00312: online log 1 thread 1: '+FLASH_RECO_AREA/esxrac/onlinelog/group_1.257.620732699'
ksuitm: waiting for [5] seconds before killing DIAG

Aha, so there was our culprit. The diskspace was exhausted and as you may remember, we created just one big flash_reco file, which was apparently not big enough for our Oracle `10g RAC.

Going back and logging into our ASM instance:

[oracle@vm01 bin]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jun 21 13:02:00 2007

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP
and Data Mining Scoring Engine options

SQL> show parameters asmdisk
SQL> show parameter asm_disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      ORADATA, FLASH_RECO_AREA
asm_diskstring                       string
SQL> desc v$asm_diskgroup
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 GROUP_NUMBER                                       NUMBER
 NAME                                               VARCHAR2(30)
 SECTOR_SIZE                                        NUMBER
 BLOCK_SIZE                                         NUMBER
 ALLOCATION_UNIT_SIZE                               NUMBER
 STATE                                              VARCHAR2(11)
 TYPE                                               VARCHAR2(6)
 TOTAL_MB                                           NUMBER
 FREE_MB                                            NUMBER
 REQUIRED_MIRROR_FREE_MB                            NUMBER
 USABLE_FILE_MB                                     NUMBER
 OFFLINE_DISKS                                      NUMBER
 UNBALANCED                                         VARCHAR2(1)
 COMPATIBILITY                                      VARCHAR2(60)
 DATABASE_COMPATIBILITY                             VARCHAR2(60)

Now, we execute the following query:

SQL> select group_number, name, total_mb, free_mb
  2  from v$asm_diskgroup;

GROUP_NUMBER NAME                             TOTAL_MB    FREE_MB
------------ ------------------------------ ---------- ----------
           1 FLASH_RECO_AREA                     10236         30
           2 ORADATA                             20472      16224

So clearly, we need to add more storage space to our ASM Flashback_Recovery_Area.

Conclusion

In our next article, we will go about adding an extra vmdk file to the ASM diskgroup and bring our RAC back online.

This normally need not happen in your production environment where you as the DBA are in total command of the four-cores of your RAC infrastructure. With four cores, I mean that you will want to pay attention to the following:

  • networking:
    • are all the NICs functioning properly
    • are they teamed
    • are they redundant
  • storage:
    • are my disks optimal. In our last article we went ahead and tested our disk I/O with the cool Oracle tool called Orion
    • is there enough space on my LUNs (if you are using SAN, i.e)
  • processors:
    • are my CPUs performing well,
  • memory:
    • is there enough memory
    • is the SGA optimal.

We will get into a more predictive mode when we go about doing capacity planning for our Oracle 11g RAC, to inspect all our “4-cores” for RAC readiness, and all that on ESX 3.5 hopefully!

» See All Articles by Columnist Tarry Singh








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers