Oracle RAC Administration - Part 16: Balancing act between Server and Client
March 6, 2007
We will take a little detour from our regular path of following the manual, some would say, to more practical path. One topic that most intrigues both an Oracle DBA and their clients (in their own way) is connectivity. DBAs are sometimes puzzled with the load balancing and failover part of connectivity. Where is it, on the Client-side OR the Server-side? Is tnsnames.ora still useful if I deploy the Easy Connect Net Naming method? Let's brush up on some basics.
Easy Connect Naming Methods
Easy Connect Naming methods were introduced in Oracle 10g. The benefits were simple. The need to do a service name lookup in the tnsnames.ora files for TCP/IP environment was eliminated, a directory naming system is no longer needed and clients are connected automatically. Connecting is also fairly simple:
So our typical examples could look like:
CONNECT oe/oe@localhost/nickserv CONNECT firstname.lastname@example.org/examples
From the clients end, when your connection fails at one node or service, you can then do a look up from your tnsnames.ora file and go on seeking a connection with the other available node. Take this example of our 4-node VMware ESX 3.x Oracle Linux Servers:
FOKERAC = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = nick01.wolga.com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = nick02.wolga. com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = brian01.wolga. com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = brian02.wolga. com)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = fokerac) ) )
Here the first address in the list is tried at the clients end. Should the connection to nick01.wolga.nl fail, then the next address, nick02.wolga.nl, will be tried. This phenomenon is called connection-time failover. You could very well have a 32-node RAC cluster monitoring the galactic system at NASA and thus have all those nodes typed in your tnsnames.ora file. Moreover, these entries do not necessarily have to be part of the RAC cluster. So it is possible that you are using Streams, Log Shipping or Advanced Replication to maintain your HA (High Availability) model. These technologies facilitate continued processing of the database by such a HA (High Availability) model in a non-RAC environment. In a RAC environment we know (and expect) the data to be the same across all nodes since there is only one database.
TAF (Transparent Application Failover)
Transparent Application Failover actually refers to a failover that occurs when a node or instance is unavailable due to an outage or other reason that prohibits a connection to be established on that node. This can be set to on with the following parameter FAILOVER. Setting it to ON will activate the TAF. It is turned on by default unless you set it to OFF to disable it. Now, when you turn it on you have two types of connections available by the means of the FAILOVER_MODE parameter. The type can be session, which is default or select. When the type is SESSION, if the instance fails, then the user is automatically connected to the next available node without the users manual intervention. The SQL statements need to be carried out again on the next node. However, when you set the TYPE to SELECT, then if you are connected and are in the middle of your query, then your query will be restarted after you have been failed over to the next available node. Take this example of our tnsnames.ora file, (go to the section beginning with CONNECT_DATA):
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5) ) )
This is also demonstrated in this image from my article on Quests soRAC tool.
This is more intelligent in the sense that should the instance, (where the user has fired a query), go down then the session will remember the rows in its Global Buffer Cache, restart the query and pick up from where the user left off when they were disconnected from the first node.
Compare it to a scenario where the manager is running the query through a client such as Crystal Reports or Oracle Reports. Running the monthly query to get the data from his Server Monitoring software, let's say NetIQ, to get the statistics of all the servers the DBA needs to draw a report on monitoring the CPU, memory or disk usage. Now if he ran a query that had to fetch 5000 rows and he gets cut off at 3000 rows, he still has to reissue (not manually though) start from the beginning but will only see the results when he crosses the 3001 mark. Then upon prompt, the results will be shown. However, this does not apply to the DML (Data Manipulation Language) such as insert, update, delete. The user is met with an error if node failure happens in the middle of the processing and all the (uncommitted) transactions are rolled back. The session eventually fails over without the user having to connect, however, he does have to issue the query again.
Consider our tnsnames.ora file again. Here in our older example, where the METHOD was BASIC, there is some additional time loss in reconnecting the lost connection to the immediate available node. However, when you set the METHOD to PRECONNECT, a session is connected to all available nodes in the list. The advantage is that failover is a lot quicker than the BASIC method but the downside is that the other instances will continue to claim necessary resources such a system memory to keep those connections alive.
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = PRECONNECT) (RETRIES = 180) (DELAY = 5) ) )
As a DBA, these little things are very crucial in deciding what you intend to do with the RAC. Is it a DSS or will it be used for a heavy OLTP environment. On a DSS, you can easily choose a PRECONNECT method as you expect to have less connections with long durations (batch processing, etc.) where as a typical OLTP will perform better if the cluster is optimally configured with a BASIC method.