Troubleshooting ASM problems on VMware ESX 3.xSeptember 18, 2007 Brief introIt is easy and convenient to assume that everything will go fine with all of the automated features that most applications are sold with today. Oracle has also worked hard to achieve this since version 8. With the newer Oracle 10g, many administration tasks were automated, Oracle 11g came up with loads of simplified tasks that were previously automated, but in real life, a lot of things can happen even when you think that you have planned carefully. In this series, we will look at what I encountered on my ESX 3.x Oracle RAC setup. We will do this in a more conversational style: well go about encountering the error and then fixing the problem. Oracle RAC errorUpon attempting to log on to my Oracle 10g RAC, I got the typical Oracle not available error. Looking at the cluster statistics, we saw: [oracle@vm01 bin]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.esxrac.db application ONLINE OFFLINE ora....c1.inst application ONLINE OFFLINE ora....c2.inst application ONLINE OFFLINE ora....serv.cs application ONLINE OFFLINE ora....ac1.srv application ONLINE OFFLINE ora....ac2.srv application ONLINE OFFLINE ora....SM1.asm application ONLINE ONLINE vm01 ora....01.lsnr application ONLINE ONLINE vm01 ora.vm01.gsd application ONLINE ONLINE vm01 ora.vm01.ons application ONLINE ONLINE vm01 ora.vm01.vip application ONLINE ONLINE vm01 ora....SM2.asm application ONLINE ONLINE vm02 ora....02.lsnr application ONLINE ONLINE vm02 ora.vm02.gsd application ONLINE ONLINE vm02 ora.vm02.ons application ONLINE ONLINE vm02 ora.vm02.vip application ONLINE ONLINE vm02 Obviously, this doesnt tell much, so we attempted to start the individual applications to no avail. It threw an error, which wasnt really helpful: CRS-0215: Could not start resource 'ora.esxrac.esxrac1.inst'. CRS-0215: Could not start resource 'ora.esxrac.esxrac1.inst'. Is the cluster healthy? [oracle@vm01 bin]$ crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy [oracle@vm01 bin]$ crsctl check cssd CSS appears healthy Next, we checked our alert.log, called alert_esxrac1.log, named after our esxrac1 db instance; the whole file is a bit too much, so obviously we have taken out a portion that returned an error: ARC0: Becoming the 'no FAL' ARCH ARC0: Becoming the 'no SRL' ARCH Thu Jun 21 12:15:45 2007 ARC2: Archival started ARC1: STARTING ARCH PROCESSES COMPLETE ARC1: Becoming the heartbeat ARCH ARC2 started with pid=29, OS id=4724 Thu Jun 21 12:15:45 2007 Errors in file /u01/app/oracle/admin/esxrac/udump/esxrac1_ora_4571.trc: style="color: red; background: transparent;"ORA-19504: We also went ahead and checked our TRACE file, for detailed information: /u01/app/oracle/admin/esxrac/udump/esxrac1_ora_8878.trc Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP and Data Mining Scoring Engine options ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1 System name: Linux Node name: vm01.hanze.nl Release: 2.6.9-42.0.0.0.1.ELsmp Version: #1 SMP Sun Oct 15 14:02:40 PDT 2006 Machine: i686 Instance name: esxrac1 Redo thread mounted by this instance: 1 Oracle process number: 21 Unix process pid: 8878, image: oracle@vm01.hanze.nl (TNS V1-V3) *** SERVICE NAME:() 2007-06-21 12:45:14.815 *** SESSION ID:(150.3) 2007-06-21 12:45:14.815 style="color: red; background: transparent;"Failed to create file '+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf' (file not accessible?) ORA-19504: failed to create file "+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf" ORA-17502: ksfdcre:4 Failed to create file +FLASH_RECO_AREA/esxrac/1_153_620732678.dbf ORA-15041: diskgroup space exhausted *** 2007-06-21 12:45:14.823 60679 kcrr.c ARCH: Error 19504 Creating archive log file to '+FLASH_RECO_AREA/esxrac/1_153_620732678.dbf' *** 2007-06-21 12:45:14.824 58941 kcrr.c kcrrfail: dest:1 err:19504 force:0 blast:1 ORA-16038: log 1 sequence# 153 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 1 thread 1: '+ORADATA/esxrac/onlinelog/group_1.257.620732695' ORA-00312: online log 1 thread 1: '+FLASH_RECO_AREA/esxrac/onlinelog/group_1.257.620732699' ksuitm: waiting for [5] seconds before killing DIAG Aha, so there was our culprit. The diskspace was exhausted and as you may remember, we created just one big flash_reco file, which was apparently not big enough for our Oracle `10g RAC. Going back and logging into our ASM instance: [oracle@vm01 bin]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jun 21 13:02:00 2007 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP and Data Mining Scoring Engine options SQL> show parameters asmdisk SQL> show parameter asm_disk NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ asm_diskgroups string ORADATA, FLASH_RECO_AREA asm_diskstring string SQL> desc v$asm_diskgroup Name Null? Type ----------------------------------------- -------- ---------------------------- GROUP_NUMBER NUMBER NAME VARCHAR2(30) SECTOR_SIZE NUMBER BLOCK_SIZE NUMBER ALLOCATION_UNIT_SIZE NUMBER STATE VARCHAR2(11) TYPE VARCHAR2(6) TOTAL_MB NUMBER FREE_MB NUMBER REQUIRED_MIRROR_FREE_MB NUMBER USABLE_FILE_MB NUMBER OFFLINE_DISKS NUMBER UNBALANCED VARCHAR2(1) COMPATIBILITY VARCHAR2(60) DATABASE_COMPATIBILITY VARCHAR2(60) Now, we execute the following query:
SQL> select group_number, name, total_mb, free_mb
2 from v$asm_diskgroup;
GROUP_NUMBER NAME TOTAL_MB FREE_MB
------------ ------------------------------ ---------- ----------
1 FLASH_RECO_AREA 10236 30
2 ORADATA 20472 16224
So clearly, we need to add more storage space to our ASM Flashback_Recovery_Area. ConclusionIn our next article, we will go about adding an extra vmdk file to the ASM diskgroup and bring our RAC back online. This normally need not happen in your production environment where you as the DBA are in total command of the four-cores of your RAC infrastructure. With four cores, I mean that you will want to pay attention to the following:
We will get into a more predictive mode when we go about doing capacity planning for our Oracle 11g RAC, to inspect all our 4-cores for RAC readiness, and all that on ESX 3.5 hopefully! |