Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Mar 21, 2007

Oracle 10gR2 RAC Load Balancing Features - Page 2

By Jim Czuprynski

Advanced Load Balancing: The Load Balancing Advisor (LBA)

In a RAC environment, it’s not unlikely that one node may become overwhelmed by application requests for resources. For example, let’s assume that a two-node clustered database currently has 100 user sessions connected across both nodes in the cluster, and that the database is using standard server-side load balancing. If there is a sudden “storm” of 200 additional connections, the listeners on the two nodes will simply distribute them evenly across both nodes, 100 to each node, resulting in 150 connections per node. However, it’s possible that node RACDB2 is actually much “busier” than the other node. As a result, node RACDB2 will most likely be completely overwhelmed by those new connections’ resource demands, while node RACDB1 remains relatively underutilized with plenty of additional resources.

The good news is that Oracle 10gR2 now provides an advanced method to overcome this imbalance: the Load Balancing Advisor (LBA). The LBA will calculate how much work each instance in the clustered database has been asked to do. The LBA will then make sure that any new incoming work is routed to the instance(s) that are least busy. This determination takes place on a service-by-service basis across all instances that are providing resources to each service.

The LBA considers the following factors when determining how to perform this “balancing act”:

  • Are there any differences in processing power between nodes?
  • Are there any sessions that are currently blocked from execution because of waits?
  • Have any failures occurred on a node that might block processing from continuing?
  • Are there any services that are competing for resources, and have those services been granted a different priority to complete their tasks at hand?

Not only does this help to balance out the total workload across all instances in the cluster, it also insures that one node won’t be overwhelmed by requests for service by its primary application(s).

As part of its Fast Application Notification (FAN) event transmission features, Oracle 10g RAC uses Oracle Notification Services (ONS) to communicate the status of cluster resources to all participating node applications in the cluster. In Oracle 10gR2, FAN added a new event type, SERVICE_METRIC, whose event payload contains information about the relative workload of each node in the RAC cluster, and it’s this information that the Load Balancing Advisory uses to determine how to route new connections throughout the clustered database’s instances.

Oracle 10gR2 supplies an upgraded DBMS_SERVICE.MODIFY_SERVICE procedure that modifies an existing RAC service so that it can utilize the Load Balance Advisory for service-by-service load balancing. This procedure provides two crucial parameters, GOAL and CLB_GOAL, that interact to determine what methods (if any) Oracle 10gR2 will utilize to perform load balancing. For example, if I set the GOAL parameter to either the GOAL_SERVICE_TIME or the GOAL_THROUGHPUT enumerated constants for a specific RAC service, Oracle 10gR2 will activate the Load Balancing Advisory for load balancing of that service’s incoming connections. Table 1 explains the difference between these two load balancing targets.

Table 1. Oracle 10gR2 Load Balancing Advisory (LBA) Service Goals

LBA Goal

Explanation

GOAL_NONE

Setting DBMS_SERVICE.GOAL to this value disables the LBA.

GOAL_SERVICE_TIME

The LBA calculates a weighted moving average of the total elapsed time for completed work plus the bandwidth that’s available to the service to calculate the service goodness. This goal is ideal for services whose workload may change dramatically over a short period of time, e.g. an application that services a “clicks and mortar” store that provides customer self-service through an internet-based shopping web site.

GOAL_THROUGHPUT

The LBA calculates a weighted moving average of throughput (i.e. the rate at which work is completed) in addition to the bandwidth available to the service to calculate the service goodness. This goal is best suited for long-duration tasks that are typically queued to run serially, e.g. scheduled jobs that handle large batches of transactions.

In addition, the CLB_GOAL parameter provides yet another load balancing method: It tells the LBA how to perform connection load balancing. See Table 2 for a list of its acceptable enumerated constant values and the impact of setting either.

Table 2. Oracle 10gR2 Connection Load Balancing (CLB) Goals

CLB Goal

Explanation

CLB_GOAL_SHORT

The Load Balancing Advisory will be used for connection load balancing only if it is enabled (i.e. set to other than GOAL_NONE). If the LBA has been disabled, connection load balancing will utilize abridged advice determined by CPU utilization.

CLB_GOAL_LONG

Connection load balancing will be determined by first tallying the total number of connections per instance, and then by counting the number of sessions per each service. Oracle recommends using this setting for services whose applications tend to connect for long periods of time (e.g. Oracle Forms). The Load Balancing Advisory can be used in conjunction with this setting as long as the connection pool has been sized to accommodate “gravitation “ within the pool without adding or subtracting connections. Oracle recommends this option as the most efficient design.

Depending on the settings for these two parameters, the LBA uses the metrics shown in Table 3 to make its determination of which instance should receive the next set of incoming connections:

Table 3. Oracle 10gR2 Load Balancing Advisory (LBA) Metrics

LBA Goal

LBA Metric Used

Explanation

LONG

Session Count By Instance

The LBA uses this metric to evenly distribute sessions across the cluster’s nodes. This works best for services that are distributed uniformly across the cluster’s instances, and the nodes have similar capacities.

LONG

Node Run Queue Length

If the service uses only a subset of the RAC instances, and the nodes have dissimilar capacities, then the LBA will attempt to place more sessions on the node with the least load at the time that the connection was created.

SHORT

Goodness By Service

This method incorporates a ranking for the quality of service that the service is experiencing at an instance level, including whether access has been restricted from an instance. It attempts to prevent a listener from routing all connections to the same instance in between updates to the goodness value because each listener will adjust its local goodness rating – computed as a delta value - as connections are distributed across the cluster. This delta represents the average of resource time connections are consuming when they use that service. Also, to prevent a “storm surge” of logins to the least busy instance, Oracle 10gR2 will compute a threshold delta if the goodness rating delta value is too low.

Proof of Concept: LBA in Oracle 10gR2 RAC

To enable my RAC database environment for LBA, I’ll first activate three different levels of load balancing using DBMS_SERVICE.MODIFY_SERVICE against the three new services I just created:

  • The ADHOC service has been set up with no LBA load balancing (GOAL=GOAL_NONE) and no connection load balancing, either (CLB_GOAL=CLB_GOAL_LONG).
  • The DSS service will use the LBA for load balancing with optimized service time as its goal (GOAL=GOAL_SERVICE_TIME and CLB_GOAL=CLB_GOAL_SHORT).
  • Finally, the OLTP service will use the LBA for load balancing with a goal of throughput (GOAL=GOAL_THROUGHPUT and CLB_GOAL=CLB_GOAL_SHORT).

Listing 5 shows the code I’ve used to activate LBA management for these services, as well as a query against the DBA_SERVICES view plus the resulting output that verifies the proper settings for GOAL and CLB_GOAL for these services.

Now that these three services are configured for LBA, I’ll use a combination of shell scripts to place a load on one of the two nodes for this RAC clustered database and then attempt to connect to both nodes in the cluster using the three RAC services. I’ll first execute the ConstantLoadGenerator.sh shell script to create 100 sessions that execute a SQL script that places a moderately heavy load on the database’s CPU, and then I’ll execute shell script RandomLoadGenerator.sh to simulate 40 additional user connections for one of the three services.

How can I determine what the impact will be of running these tests? Oracle 10gR2’s GV$SERVICEMETRIC global view shows each service’s “goodness” on each instance in the RAC clustered database, the service’s “predicted” goodness (or delta), and summary statistics on how much resources the service consumed during the time period. Listing 6 shows the query I’ve constructed against that view; here’s the result of that query before I started my evaluations:

                                        Current Service-Level Metrics
                                           (From GV$SERVICEMETRIC)
 
                                               Pred-               CPU     Elpsd      # 0f
                                               icted              Time      Time      User
                                               Good-               Per       Per     Calls    DBTime
Service                         Inst    Good    ness              Call      Call       Per       Per
Name     Start Time End Time      ID    ness    Incr   Flags     (mus)     (mus)    Second    Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
ADHOC 15:06:41 15:06:48 1 0 1 0 0 0 0 0 15:06:23 15:06:29 2 0 1 0 0 0 0 0 15:05:02 15:06:02 1 0 1 0 0 0 0 0 15:04:37 15:05:37 2 0 1 0 0 0 0 0 DSS 15:06:41 15:06:48 1 100 100 4 0 0 0 0 15:06:23 15:06:29 2 100 100 4 0 0 0 0 15:05:02 15:06:02 1 100 100 4 0 0 0 0 15:04:37 15:05:37 2 100 100 4 0 0 0 0 OLTP 15:06:41 15:06:48 1 0 100 4 0 0 0 0 15:06:23 15:06:29 2 0 100 4 0 0 0 0 15:05:02 15:06:02 1 0 100 4 0 0 0 0 15:04:37 15:05:37 2 0 100 4 0 0 0 0

Here’s the results of the first test against the ADHOC RAC service. While my tests executed, I noticed that Oracle 10gR2 continued to route connections to each node in the cluster regardless of the load on each instance, and the number of connections remained relatively constant on both nodes as new connections arrived. Here are the results of a query against GVSERVICE_METRIC for that test:

>>> Output after ADHOC unit testing:
                                        Current Service-Level Metrics
                                           (From GV$SERVICEMETRIC)
                                               Pred-               CPU     Elpsd      # 0f
                                               icted              Time      Time      User
                                               Good-               Per       Per     Calls    DBTime
Service                         Inst    Good    ness              Call      Call       Per       Per
Name     Start Time End Time      ID    ness    Incr   Flags     (mus)     (mus)    Second    Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
ADHOC 15:10:48 15:10:52 1 34 1 0 6481 69808 65 454 ADHOC 15:10:36 15:10:42 2 26 1 0 1106 3419 92 31 ADHOC 15:09:02 15:10:02 1 34 1 0 0 0 0 0 ADHOC 15:08:38 15:09:38 2 26 1 0 0 0 0 0
DSS 15:10:48 15:10:52 1 100 100 4 0 0 0 0 DSS 15:10:36 15:10:42 2 100 100 4 0 0 0 0 DSS 15:09:02 15:10:02 1 100 100 4 0 0 0 0 DSS 15:08:38 15:09:38 2 100 100 4 0 0 0 0 OLTP 15:10:48 15:10:52 1 0 100 4 0 0 0 0 OLTP 15:10:36 15:10:42 2 0 100 4 0 0 0 0 OLTP 15:09:02 15:10:02 1 0 100 4 0 0 0 0 OLTP 15:08:38 15:09:38 2 0 100 4 0 0 0 0

For the tests I ran against the DSS and OLTP RAC services, however, I noticed that Oracle 10gR2 definitely tended to route connections to the node that was least loaded, and this was reflected in the query results against the GVSERVICE_METRIC global view for these tests:

>>> Output after DSS unit testing:
                                        Current Service-Level Metrics
                                           (From GV$SERVICEMETRIC)
                                               Pred-               CPU     Elpsd      # 0f
                                               icted              Time      Time      User
                                               Good-               Per       Per     Calls    DBTime
Service                         Inst    Good    ness              Call      Call       Per       Per
Name     Start Time End Time      ID    ness    Incr   Flags     (mus)     (mus)    Second    Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
ADHOC    15:12:48   15:12:53       1       0       1       0         0         0         0         0
ADHOC    15:12:37   15:12:42       2       0       1       0         0         0         0         0
ADHOC    15:11:06   15:12:06       2       0       1       0         0         0         0         0
ADHOC    15:11:02   15:12:02       1       0       1       0     49585    322550         2        54
DSS 15:12:48 15:12:53 1 900 9947 0 26051 200992 14 273 DSS 15:12:37 15:12:42 2 2050 9901 0 4301 11800 56 66 DSS 15:11:06 15:12:06 2 2050 9901 0 885 916 2 0 DSS 15:11:02 15:12:02 1 900 9947 0 0 0 0 0
OLTP 15:12:48 15:12:53 1 0 100 4 0 0 0 0 OLTP 15:12:37 15:12:42 2 0 100 4 0 0 0 0 OLTP 15:11:06 15:12:06 2 0 100 4 0 0 0 0 OLTP 15:11:02 15:12:02 1 0 100 4 0 0 0 0
>>> Output after OLTP unit testing:
 
                                        Current Service-Level Metrics
                                           (From GV$SERVICEMETRIC)
 
                                               Pred-               CPU     Elpsd      # 0f
                                               icted              Time      Time      User
                                               Good-               Per       Per     Calls    DBTime
Service                         Inst    Good    ness              Call      Call       Per       Per
Name     Start Time End Time      ID    ness    Incr   Flags     (mus)     (mus)    Second    Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
ADHOC    15:14:48   15:14:53       1       0       1       0         0         0         0         0
ADHOC    15:14:36   15:14:42       2       0       1       0         0         0         0         0
ADHOC    15:13:06   15:14:06       2       0       1       0         0         0         0         0
ADHOC    15:13:02   15:14:02       1       0       1       0         0         0         0         0
DSS      15:14:48   15:14:53       1     100     100       4         0         0         0         0
DSS      15:14:36   15:14:42       2     100     100       4         0         0         0         0
DSS      15:13:06   15:14:06       2     100     100       4         0         0         0         0
DSS      15:13:02   15:14:02       1     100     100       4     59223    173148         0         3
OLTP 15:14:48 15:14:53 1 5200 645 0 7799 53529 46 246 OLTP 15:14:36 15:14:42 2 3800 524 0 5354 16001 55 88 OLTP 15:13:06 15:14:06 2 3800 524 0 0 0 0 0 OLTP 15:13:02 15:14:02 1 5200 645 0 0 0 0 0

Conclusion

With the advent of the Load Balancing Advisory in Oracle 10gR2, it’s now possible to insure that an unexpectedly dramatic increase in a RAC service’s workload does not overwhelm any single node of an Oracle 10gR2 RAC clustered database. Oracle 10gR2 leverages proven, existing RAC components like FAN and ONS event publishing to implement this enhancement. Finally, several new and enhanced data dictionary views provide excellent feedback to measure the efficiency of the Load Balancing Advisory.

Download the SQL scripts and shell scripts for this article.

References and Additional Reading

Even though I’ve hopefully provided enough technical information in this article to encourage you to explore with these features, I also strongly suggest that you first review the corresponding detailed Oracle documentation before proceeding with any experiments. Actual implementation of these features should commence only after a crystal-clear understanding exists. Please note that I’ve drawn upon the following Oracle 10gR2 documentation for the deeper technical details of this article:

B14197-03 Oracle 10gR2 Clusterware and Real Application Clusters Administration and Deployment Guide

B14203-08 Oracle 10gR2 Clusterware and Real Application Clusters Installation Guide for Linux

B14212-02 Oracle 10gR2 Net Services Administrator’s Guide

B14213-01 Oracle 10gR2 Net Services Reference

B14237-02 Oracle 10gR2 Reference

B14258-01 Oracle 10gR2 PL/SQL Packages and Types Reference

B14210-02 Oracle 10gR2 High Availability Overview

B25159-01 Oracle 10gR2 High Availability Best Practices

And these MetaLink documents are also invaluable to obtaining a deeper understanding of how the Load Balancing Advisory has been implemented in Oracle 10gR2:

226880.1 Configuration of Load Balancing and Transparent Application Failover

» See All Articles by Columnist Jim Czuprynski



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM