Oracle RAC Administration : Backing up your RAC with RMAN
March 15, 2007
VMware has this very cute VCB utility. Although on the ESX 3.0.1 server where I am currently building my RAC, I just use FTP/SCP to copy the files to some network drive. This is handy, as I need to do all kinds of things, like remove a node, simulate rebuilding and recovering RAC files and just about every other trick that you see in the manual. We are determined to break our RAC in the test environment and build it back up. In addition, we always keep our backup. Always! However, in a regular production environment you would go nuts copying tons of data just to backup. Well, there are always better and faster methods discovered every day in the storage world but let's just try to analyze our RMAN utility and how to use it in our Oracle RAC environment.
RMAN in your toolkit
Whether you are a single node Oracle DBA or a multi-node RAC farm DBA, you will be, as I mentioned above, confronted with all kinds of alternatives. Sticking to RMAN is the best choice. Why? Whether you use raw partitions or cooked partitions (OCFS, NFS), RMAN does all of the backup by accessing the datafiles via the memory buffer. This ensures that every block is read and written, so no data is lost and there is no chance of error.
Although RMAN is a backup solution and it does not matter if you are a RAC or a single node database, still there might be some little things to watch out for in a RAC database.
Adequate RMAN configuration for RAC
Every backup plan has a goal, which is to provide a speedy recovery. In addition, depending on what your strategy is, you need to configure your RAC accordingly. RMAN needs to make a dedicated connection, unlike a typical client, through the regular ONS (Oracle Net Services), which means all it needs is one node of the cluster. Moreover, you can allocate channels at each node of your RAC by specifying a command as such:
allocate channel x1 type sbt connect sys/password@nickrac01; allocate channel x2 type sbt connect sys/password@nickrac02; allocate channel x3 type sbt connect sys/password@nickrac03; allocate channel x4 type sbt connect sys/password@nickrac04;
That way you can distribute the workload across nodes without doing all of the I/O intensive job via that connected node. Again, it depends on your architectural setup and backup policy. It could very well be that you want to do it via an nth node, which has an additional 10 Gbps card connected to your SAN and this node happens to be supporting a typical DSS system, which is not under stress during the schedule job or on an OLTP environment where that particular node purely services backup. Just an example scenario, but as I mentioned, discuss it thoroughly with your system Admins and SAN admin (should you have a SAN that is) before working on your backup strategy. While some environments may respond well to a distributed RMAN activity (a typical non-24x7 environment comes to mind), a single node might be best for a heavy 24/7 OLTP environment.
What about snapshot controlfile?
Having the controlfile on a shared system, like the 1GB OCFS files we created for the OCR, votingdisks or spfileasm, would make more sense, if you need to run backup from multiple nodes. Alternatively, you could have the same local file destination on all nodes. How to see the snapshot file location:
rman> show snapshot controlfile name;
Backing up datafiles
You may back up your datafiles from any node since they are on the ASM or OCFS system anyway, which means they are on a shared storage. So this is expected to go smoothly, regardless of the node from where the RMAN. Be aware of the tape scenario if you still use a tape back. Check the configurations of your 3rd party SBT interface and the media management layer, or, if backing up to a disk, which is sharable by all nodes or a zone on SAN where all HBAs from all nodes are connected in case of a single node failure. There are several ways of providing backup space.
Backing up Archivelog files
There are several ways of backing up archivelog files as well. Archive logs are better placed on a shared cluster such as an OCFS volume dedicated to archive log files of all nodes, which makes it easier for the RMAN to connect to the shared volume and backup those files from there. Then depending upon your architectural decision, you can either allocate a channel per node or use one node. The same thing can be done on NFS volumes and have all of the archive logs backed up with RMAN from there. Alternatively, you can also archive the files on DAS (Direct Attached Storage) or local disks but as I said, it is the recovery we are talking about and not just merely the backup. This DAS archive log setup might be extremely cumbersome during recovery. Setting it up might sound similar to the above strategies, like allocating channels at individual nodes to perform a backup of the archivelog file.
What about FRA (Flashback Recovery Area)
In a typical single node environment it might sound a little overdone to have the flashback reco area, but it is important, no doubt about it. In RAC, it is even more important. When we go about installing our database (we will come to that as well in the next articles, as many of our readers have asked us for VMware with Oracle Linux installation) we are given the option of choosing an area. It is very wise to choose an OCFS formatted drive for your flashback recovery area; you can also choose ASM. With FRA, you can have your backs reachable to all nodes should the node where you are operating from, fails. It is this simple design that will help you with a seamless recovery.
Here we have touched the backup in RAC very lightly. We went quickly through several scenarios and backup possibilities. One thing to remember is that a successful backup strategy makes no sense if the recovery was not thought about. Backup strategies are often misunderstood by many administrators. We often ask ourselves what our recovery strategy should be and then are forced to think of a sound well-thought backup strategy. For instance, unique naming of the per-node log_archive_dest files for easy NFS mounting from any node OR get rid of SPOFs (Single Point Of failure) when recovering archive files from a given node (which might fail) and cross-copy them to adjacent nodes for redundancy. In addition, stick to RMAN for its intelligent automated tasks.