Brief intro
We will take a little detour from our regular path of
following the manual, some would say, to more practical path. One topic that
most intrigues both an Oracle DBA and their clients (in their own way) is connectivity.
DBAs are sometimes puzzled with the load balancing and failover part of connectivity.
Where is it, on the Client-side OR the Server-side? Is tnsnames.ora
still useful if I deploy the Easy Connect Net Naming method? Let’s brush up on
some basics.
Easy Connect Naming Methods
Easy Connect Naming methods were introduced
in Oracle 10g. The benefits were simple. The need to do a service name lookup
in the tnsnames.ora files for
TCP/IP environment was eliminated, a directory naming system is no longer
needed and clients are connected automatically. Connecting is also fairly
simple:
Syntax:
CONNECT username/password@[//]host[:port][/service_name]
So our typical examples could look like:
CONNECT oe/oe@localhost/nickserv CONNECT scott/tiger@172.22.202.100/examples
Connect-Time Failover
From the clients end, when your connection fails at one
node or service, you can then do a look up from your tnsnames.ora
file and go on seeking a connection with the other available node. Take this
example of our 4-node VMware ESX 3.x Oracle Linux Servers:
FOKERAC = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = nick01.wolga.com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = nick02.wolga. com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = brian01.wolga. com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = brian02.wolga. com)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = fokerac) ) )
Here the first address in the list is tried
at the client’s end. Should the connection to nick01.wolga.nl fail, then the
next address, nick02.wolga.nl, will be tried. This phenomenon is called
connection-time failover. You could very well have a 32-node RAC cluster
monitoring the galactic system at NASA and thus have all those nodes typed in
your tnsnames.ora file. Moreover,
these entries do not necessarily have to be part of the RAC cluster. So
it is possible that you are using Streams, Log Shipping or Advanced Replication
to maintain your HA (High Availability) model. These technologies facilitate
continued processing of the database by such a HA (High Availability) model in
a non-RAC environment. In a RAC environment we know (and
expect) the data to be the same across all nodes since there is only one
database.
TAF (Transparent Application Failover)
Transparent Application Failover actually
refers to a failover that occurs when a node or instance is unavailable due to an
outage or other reason that prohibits a connection to be established on that
node. This can be set to on with the following parameter FAILOVER. Setting it to ON will activate the TAF. It is turned on by
default unless you set it to OFF to disable it. Now, when you turn it on you have two types of
connections available by the means of the FAILOVER_MODE parameter. The type can be session, which is default or select.
When the type is SESSION, if the
instance fails, then the user is automatically connected to the next available
node without the user’s manual intervention. The SQL statements need to be
carried out again on the next node. However, when you set the TYPE to SELECT, then if you are connected and are in the middle of your query,
then your query will be restarted after you have been failed over to the next
available node. Take this example of our tnsnames.ora file, (go to the section beginning with CONNECT_DATA):
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5) ) )
This is also demonstrated in this image from my
article on Quest’s soRAC tool.
This is more intelligent in the sense that
should the instance, (where the user has fired a query), go down then the
session will remember the rows in its Global Buffer Cache, restart the query
and pick up from where the user left off when they were disconnected from the
first node.
Compare it to a scenario where the manager is
running the query through a client such as Crystal Reports or Oracle Reports.
Running the monthly query to get the data from his Server Monitoring software, let’s
say NetIQ, to get the statistics of all the servers the DBA needs to draw a
report on monitoring the CPU, memory or disk usage. Now if he ran a query that
had to fetch 5000 rows and he gets cut off at 3000 rows, he still has to
reissue (not manually though) start from the beginning but will only see the
results when he crosses the 3001 mark. Then upon prompt, the results will be
shown. However, this does not apply to the DML (Data Manipulation Language) such
as insert, update, delete. The user is met with an error if node failure
happens in the middle of the processing and all the (uncommitted) transactions
are rolled back. The session eventually fails over without the user having to
connect, however, he does have to issue the query again.
Preconnect Method
Consider our tnsnames.ora file again. Here in our older example, where the METHOD was BASIC, there is some additional time loss in reconnecting the lost
connection to the immediate available node. However, when you set the METHOD to PRECONNECT, a session is connected to all available nodes in the list. The
advantage is that failover is a lot quicker than the BASIC method but the downside is that the other instances will continue
to claim necessary resources such a system memory to keep those connections
alive.
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = PRECONNECT) (RETRIES = 180) (DELAY = 5) ) )
Conclusion
As a DBA, these little things are very crucial in deciding
what you intend to do with the RAC. Is it a DSS or will it be used for a heavy
OLTP environment. On a DSS, you can easily choose a PRECONNECT
method as you expect to have less connections with long durations (batch
processing, etc.) where as a typical OLTP will perform better if the cluster is
optimally configured with a BASIC method.