Brief intro
VMware has this very cute VCB utility. Although on the ESX
3.0.1 server where I am currently building my RAC, I just use FTP/SCP to copy
the files to some network drive. This is handy, as I need to do all kinds of
things, like remove a node, simulate rebuilding and recovering RAC files and
just about every other trick that you see in the manual. We are determined to
break our RAC in the test environment and build it back up. In addition, we
always keep our backup. Always! However, in a regular production environment
you would go nuts copying tons of data just to backup. Well, there are always
better and faster methods discovered every day in the storage world but let’s
just try to analyze our RMAN utility and how to use it in our Oracle RAC
environment.
RMAN in your toolkit
Whether you are a single node Oracle DBA or a multi-node
RAC farm DBA, you will be, as I mentioned above, confronted with all kinds of
alternatives. Sticking to RMAN is the best choice. Why? Whether you use raw
partitions or cooked partitions (OCFS, NFS), RMAN does all of the backup by
accessing the datafiles via the memory buffer. This ensures that every block is
read and written, so no data is lost and there is no chance of error.
Although RMAN is a backup solution and it does
not matter if you are a RAC or a single node database, still there might be
some little things to watch out for in a RAC database.
Adequate RMAN configuration for RAC
Every backup plan has a goal, which is to
provide a speedy recovery. In addition, depending on what your strategy is, you
need to configure your RAC accordingly. RMAN needs to make a dedicated
connection, unlike a typical client, through the regular ONS (Oracle Net
Services), which means all it needs is one
node of the cluster. Moreover, you can allocate channels at each node of your
RAC by specifying a command as such:
allocate channel x1 type sbt connect sys/password@nickrac01; allocate channel x2 type sbt connect sys/password@nickrac02; allocate channel x3 type sbt connect sys/password@nickrac03; allocate channel x4 type sbt connect sys/password@nickrac04;
That way you can distribute the workload across
nodes without doing all of the I/O intensive job via that connected node. Again,
it depends on your architectural setup and backup policy. It could very well be
that you want to do it via an nth node, which has an additional 10 Gbps card
connected to your SAN and this node happens to be supporting a typical DSS
system, which is not under stress during the schedule job or on an OLTP
environment where that particular node purely services backup. Just an example
scenario, but as I mentioned, discuss it thoroughly with your system Admins and
SAN admin (should you have a SAN that is) before working on your backup
strategy. While some environments may respond well to a distributed RMAN
activity (a typical non-24×7 environment comes to mind), a single node might be
best for a heavy 24/7 OLTP environment.
What about snapshot controlfile?
Having the controlfile on a shared system,
like the 1GB OCFS files we created for the OCR, votingdisks or spfileasm, would
make more sense, if you need to run backup from multiple nodes. Alternatively,
you could have the same local file destination on all nodes. How to see the snapshot
file location:
rman> show snapshot controlfile name;
Backing up datafiles
You may back up your datafiles from any node since
they are on the ASM or OCFS system anyway, which means they are on a shared
storage. So this is expected to go smoothly, regardless of the node from where
the RMAN. Be aware of the tape scenario if you still use a tape back. Check the
configurations of your 3rd party SBT interface and the media
management layer, or, if backing up to a disk,
which is sharable by all nodes or a zone on SAN where all HBAs from all nodes
are connected in case of a single node failure. There are several ways of
providing backup space.
Backing up Archivelog files
There are several ways of backing up archivelog
files as well. Archive logs are better placed on a shared cluster such as an
OCFS volume dedicated to archive log files of all nodes, which makes it easier
for the RMAN to connect to the shared volume and backup those files from there.
Then depending upon your architectural decision, you can either allocate a
channel per node or use one node. The same thing can be done on NFS volumes and
have all of the archive logs backed up with RMAN from there. Alternatively, you can also archive the files on DAS (Direct
Attached Storage) or local disks but as I said, it is the recovery we are
talking about and not just merely the backup. This DAS archive log setup might
be extremely cumbersome during recovery. Setting it up might sound similar to the
above strategies, like allocating channels at individual nodes to perform a
backup of the archivelog file.
What about FRA (Flashback
Recovery Area)
In a
typical single node environment it might sound a little overdone to have the
flashback reco area, but it is important, no doubt about it. In RAC, it is even
more important. When we go about installing our database (we will come to that
as well in the next articles, as many of our readers have asked us for VMware
with Oracle Linux installation) we are given the option of choosing an area. It
is very wise to choose an OCFS formatted drive for your flashback recovery
area; you can also choose ASM. With FRA, you can have your
backs reachable to all nodes should the node where you are operating from,
fails. It is this simple design that will help you with a seamless recovery.
Conclusion
Here we have touched the backup in RAC very lightly. We went
quickly through several scenarios and backup possibilities. One thing to
remember is that a successful backup strategy makes no sense if the recovery
was not thought about. Backup strategies are often misunderstood by many
administrators. We often ask ourselves what our recovery strategy should be and
then are forced to think of a sound well-thought backup strategy. For instance,
unique naming of the per-node log_archive_dest files for easy NFS mounting from
any node OR get rid of SPOFs (Single Point Of failure) when recovering archive
files from a given node (which might fail) and cross-copy them to adjacent
nodes for redundancy. In addition, stick to RMAN for its intelligent automated
tasks.