Clustering Oracle RAC Virtual Machines across physical and ESX hosts

Brief intro

In our previous
article
, we looked at the clustering possibilities across two or more ESX
Servers. In this article, we will take a detailed look at various possibilities
of building clusters across several physical and ESX hosts since we weren’t able to pick that one up in our
last article. In addition, we will take a quick look at upgrading clustered
Virtual Machines in all the three scenarios. There is a very good chance that
you have a Oracle RAC test or development cluster on an ESX 2.5 version and
want to move over to the latest ESX 3.x version (the latest being ESX 3.0.2 as
of last week).

Clustering Oracle RAC Virtual Machines across physical and ESX hosts

I speak to several clients who are
running their production Oracle environments on VMware. The choice of running
Oracle RAC differs per organization but I firmly believe that it is
possible to have a DSS (Decision Support System) running on Oracle RAC (which
normally has large transactions and less concurrent users) on
ESX Servers in production.  On a typical OLTP environment, it might
not be smart to deploy RAC on ESX without any planning but a DSS can surely run
fine. Moreover, there is no reason of not trying it on your
ESX system. There are already enough test, development and staging deployments
running on ESX servers.

Now to take a quick look at the tasks we
need to perform to build the cluster:

  • Physical Node: It
    must have network adapters (2 NIC cards), it must have access to the same
    storage (SAN LUN volumes) as that of the ESX server (this is for visibility,
    both guest Virtual Machines and the physical machine must be able to see the
    shared volume) and the OS version and patches must be identical on all
    platforms (virtual or physical). Also, note that there shouldn’t be
    multipathing software running on the physical node.
  • Virtual Node:
    Here the steps are pretty much the same. The ESX host must have at least 2
    physical NICs, although it’s advisable to have three (one for service console,
    two NICs for teaming/bonding and redundancy), the VM must have two vNICs
    (Virtual NICs) – one for outbound and the other connected to a private VLAN for
    high-speed interconnects for cache fusion.
  • Adding shared storage: Please follow the same steps as we did in our clustering VMs
    across multiple ESX hosts. On the Virtual Machine (physical node is a simple
    procedure— it’s a simple matter of assigning your HBA to the mapped SAN LUN and
    you are done) you will click add storage and choose Mapped SAN LUN, the hard
    disks point to the LUN using RDM (Raw Device Mapping). In the LUN selection,
    you choose the same LUN (Logical Unit Name) that is being accessed by the
    physical node(s). Then select the virtual device node on a different SCSI
    controller hence creating a new SCSI controller. Edit the new SCSI (1: 0)
    controller properties and change the sharing to “physical”. Carry out the same
    step for all the shared disks (OCR.vmdk, VOTINGDISK.vmdk, SPFILEASM.vmdk,
    ASM01.vmdk and so on). Upon clicking finish, you are done.
  • The final step is obviously to install and
    configure the Oracle RAC clusterware and database.

Upgrading your RAC cluster

Upgrading your ESX server or your cluster
software is not an easy task. We will not go too deep into ESX server upgrade
as it is out of the scope of this article but will concentrate on several
scenarios such as upgrading clusters on one ESX server, across physical hosts
or on a typical heterogeneous cluster (physical and virtual nodes):

  • Upgrading the cluster on one ESX host: Power off your VM, let your system admin upgrade your ESX server
    from 2.5 to 3.x; upgrade your VMFS2 to VMFS3. This you do by opening the VI
    client, selecting the volume and click “upgrade to VMFS3”, upgrade the shared
    RDM files if necessary, right click each cluster in the inventory panel and
    click upgrade virtual hardware. Restart the cluster. Should for any reason you
    run into an error, try importing the backup vmdks like this:

  • vmkfstools -I /vmfs/volumes/vol1/<old-disk>.vmdk
    /vmfs/volumes/vol2/<RACDir>/<new-disk>.vmdk

    Then rename
    the old-disk.vmdk and edit the >vmx file to point to the new-disk.vmdk.
    Restart the cluster successfully.

  • Upgrading cluster across ESX hosts: You could do this using shared pass-through RDM and with shared
    file systems.

    1. Using shared pass-through RDM: Here you first upgrade your ESX server from 2.5 to 3.x. Via the VI
      client, upgrade your shared pass-through RDM files from VMFS2 to VMFS3, right
      click the cluster VM and select “upgrade virtual hardware”. Do the same for the
      boot disk and you are done. Turn on your cluster and verify the upgrade.

    2. Using files in shared (VMFS2) volumes: Do the following before upgrading to VMFS3:

    3. vmkfstools -L lunreset vmhba<C:T:L>:0
      vmkfstools -F public vmhba<C:T:L:P>

      This makes
      the shared files public. Then do the ESX host upgrades from ESX 2.5 to ESX 3.0.
      Choose the first upgraded node in the configuration tab and click “storage”:
      upgrade the VMFS2 disks in your cluster by clicking “Upgrade to VMFS3”, create LUNs
      for each of the shared RAC disks, create a RDM for each shared disk and import
      the virtual disk to this RDM:

      vmkfstools -i /vmfs/volumes/vol1/<old-disk>.vmdk
      /vmfs/volumes/vol2/<RACDir>/<rdm-for-vmrac01>/<myrdm.vmdk> -d
      rdmp:/vmfs/devices/disks/vmhbax.y.0

      Here:

      old-disk.vmdk:
      our RAC vmdk which is to be imported.

      myrdm.vmdk:
      New RDM for vmrac01 (Our first node)

      vmhba1.2:3:Tthe LUN that backs the RDM

      Now edit the virtual machine’s configuration file (vmrac01.vmx) to
      point to the RDM instead of the shared file by doing the following: scsi<X>:<Y>.<filename>
      = “rdm-fxy vmrac01/.vmdk. Restart the cluster and check
      for its liveliness.

Conclusion

Although the VMware ESX server has
several models of clustering and HA, we should not forget that some mission
critical applications like Oracle RAC cannot be fully replaced by other OS or
even infrastructure level clustering and high availability. The whole purpose
of demonstrating the Oracle RAC on ESX is not only to solidify the business
imperative in a consolidated setup for test and development purposes but also
that you as an administrator have the “RAC running under your desk!” The fact
that we can run and even setup, test and benchmark mission-critical
applications in our own premises gives us the power to be on top of our
applications and businesses.

»


See All Articles by Columnist
Tarry Singh

Tarry Singh
Tarry Singh
I have been active in several industries since 1991. While working in the maritime industry I have worked for several Fortune 500 firms such as NYK, A.P. Møller-Mærsk Group. I made a career switch, emigrated, learned a new language and moved into the IT industry starting 2000. Since then I have been a Sr. DBA, (Technical) Project Manager, Sr. Consultant, Infrastructure Specialist (Clustering, Load Balancing, Networks, Databases) and (currently) Virtualization/Cloud Computing Expert and Global Sourcing in the IT industry. My deep understanding of multi-cultural issues (having worked across the globe) and international exposure has not only helped me successfully relaunch my career in a new industry but also helped me stay successful in what I do. I believe in "worknets" and "collective or swarm intelligence". As a trainer (technical as well as non-technical) I have trained staff both on national and international level. I am very devoted, perspicacious and hard working.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles