Oracle 10gR2 RAC Load Balancing Features - Page 2March 21, 2007 Advanced Load Balancing: The Load Balancing Advisor (LBA)In a RAC environment, its not unlikely that one node may become overwhelmed by application requests for resources. For example, lets assume that a two-node clustered database currently has 100 user sessions connected across both nodes in the cluster, and that the database is using standard server-side load balancing. If there is a sudden storm of 200 additional connections, the listeners on the two nodes will simply distribute them evenly across both nodes, 100 to each node, resulting in 150 connections per node. However, its possible that node RACDB2 is actually much busier than the other node. As a result, node RACDB2 will most likely be completely overwhelmed by those new connections resource demands, while node RACDB1 remains relatively underutilized with plenty of additional resources. The good news is that Oracle 10gR2 now provides an advanced method to overcome this imbalance: the Load Balancing Advisor (LBA). The LBA will calculate how much work each instance in the clustered database has been asked to do. The LBA will then make sure that any new incoming work is routed to the instance(s) that are least busy. This determination takes place on a service-by-service basis across all instances that are providing resources to each service. The LBA considers the following factors when determining how to perform this balancing act:
Not only does this help to balance out the total workload across all instances in the cluster, it also insures that one node wont be overwhelmed by requests for service by its primary application(s). As part of its Fast Application Notification (FAN) event transmission features, Oracle 10g RAC uses Oracle Notification Services (ONS) to communicate the status of cluster resources to all participating node applications in the cluster. In Oracle 10gR2, FAN added a new event type, SERVICE_METRIC, whose event payload contains information about the relative workload of each node in the RAC cluster, and its this information that the Load Balancing Advisory uses to determine how to route new connections throughout the clustered databases instances. Oracle 10gR2 supplies an upgraded DBMS_SERVICE.MODIFY_SERVICE procedure that modifies an existing RAC service so that it can utilize the Load Balance Advisory for service-by-service load balancing. This procedure provides two crucial parameters, GOAL and CLB_GOAL, that interact to determine what methods (if any) Oracle 10gR2 will utilize to perform load balancing. For example, if I set the GOAL parameter to either the GOAL_SERVICE_TIME or the GOAL_THROUGHPUT enumerated constants for a specific RAC service, Oracle 10gR2 will activate the Load Balancing Advisory for load balancing of that services incoming connections. Table 1 explains the difference between these two load balancing targets.
In addition, the CLB_GOAL parameter provides yet another load balancing method: It tells the LBA how to perform connection load balancing. See Table 2 for a list of its acceptable enumerated constant values and the impact of setting either.
Depending on the settings for these two parameters, the LBA uses the metrics shown in Table 3 to make its determination of which instance should receive the next set of incoming connections:
Proof of Concept: LBA in Oracle 10gR2 RACTo enable my RAC database environment for LBA, Ill first activate three different levels of load balancing using DBMS_SERVICE.MODIFY_SERVICE against the three new services I just created:
Listing 5 shows the code Ive used to activate LBA management for these services, as well as a query against the DBA_SERVICES view plus the resulting output that verifies the proper settings for GOAL and CLB_GOAL for these services. Now that these three services are configured for LBA, Ill use a combination of shell scripts to place a load on one of the two nodes for this RAC clustered database and then attempt to connect to both nodes in the cluster using the three RAC services. Ill first execute the ConstantLoadGenerator.sh shell script to create 100 sessions that execute a SQL script that places a moderately heavy load on the databases CPU, and then Ill execute shell script RandomLoadGenerator.sh to simulate 40 additional user connections for one of the three services. How can I determine what the impact will be of running these tests? Oracle 10gR2s GV$SERVICEMETRIC global view shows each services goodness on each instance in the RAC clustered database, the services predicted goodness (or delta), and summary statistics on how much resources the service consumed during the time period. Listing 6 shows the query Ive constructed against that view; heres the result of that query before I started my evaluations:
Current Service-Level Metrics
(From GV$SERVICEMETRIC)
Pred- CPU Elpsd # 0f
icted Time Time User
Good- Per Per Calls DBTime
Service Inst Good ness Call Call Per Per
Name Start Time End Time ID ness Incr Flags (mus) (mus) Second Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
Heres the results of the first test against the ADHOC RAC service. While my tests executed, I noticed that Oracle 10gR2 continued to route connections to each node in the cluster regardless of the load on each instance, and the number of connections remained relatively constant on both nodes as new connections arrived. Here are the results of a query against GVSERVICE_METRIC for that test:
>>> Output after ADHOC unit testing:
Current Service-Level Metrics
(From GV$SERVICEMETRIC)
Pred- CPU Elpsd # 0f
icted Time Time User
Good- Per Per Calls DBTime
Service Inst Good ness Call Call Per Per
Name Start Time End Time ID ness Incr Flags (mus) (mus) Second Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
For the tests I ran against the DSS and OLTP RAC services, however, I noticed that Oracle 10gR2 definitely tended to route connections to the node that was least loaded, and this was reflected in the query results against the GVSERVICE_METRIC global view for these tests:
>>> Output after DSS unit testing:
Current Service-Level Metrics
(From GV$SERVICEMETRIC)
Pred- CPU Elpsd # 0f
icted Time Time User
Good- Per Per Calls DBTime
Service Inst Good ness Call Call Per Per
Name Start Time End Time ID ness Incr Flags (mus) (mus) Second Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
ADHOC 15:12:48 15:12:53 1 0 1 0 0 0 0 0
ADHOC 15:12:37 15:12:42 2 0 1 0 0 0 0 0
ADHOC 15:11:06 15:12:06 2 0 1 0 0 0 0 0
ADHOC 15:11:02 15:12:02 1 0 1 0 49585 322550 2 54
>>> Output after OLTP unit testing:
Current Service-Level Metrics
(From GV$SERVICEMETRIC)
Pred- CPU Elpsd # 0f
icted Time Time User
Good- Per Per Calls DBTime
Service Inst Good ness Call Call Per Per
Name Start Time End Time ID ness Incr Flags (mus) (mus) Second Second
-------- ---------- ---------- ----- ------- ------- ------- --------- --------- --------- ---------
ADHOC 15:14:48 15:14:53 1 0 1 0 0 0 0 0
ADHOC 15:14:36 15:14:42 2 0 1 0 0 0 0 0
ADHOC 15:13:06 15:14:06 2 0 1 0 0 0 0 0
ADHOC 15:13:02 15:14:02 1 0 1 0 0 0 0 0
DSS 15:14:48 15:14:53 1 100 100 4 0 0 0 0
DSS 15:14:36 15:14:42 2 100 100 4 0 0 0 0
DSS 15:13:06 15:14:06 2 100 100 4 0 0 0 0
DSS 15:13:02 15:14:02 1 100 100 4 59223 173148 0 3
ConclusionWith the advent of the Load Balancing Advisory in Oracle 10gR2, its now possible to insure that an unexpectedly dramatic increase in a RAC services workload does not overwhelm any single node of an Oracle 10gR2 RAC clustered database. Oracle 10gR2 leverages proven, existing RAC components like FAN and ONS event publishing to implement this enhancement. Finally, several new and enhanced data dictionary views provide excellent feedback to measure the efficiency of the Load Balancing Advisory. Download the SQL scripts and shell scripts for this article. References and Additional ReadingEven though Ive hopefully provided enough technical information in this article to encourage you to explore with these features, I also strongly suggest that you first review the corresponding detailed Oracle documentation before proceeding with any experiments. Actual implementation of these features should commence only after a crystal-clear understanding exists. Please note that Ive drawn upon the following Oracle 10gR2 documentation for the deeper technical details of this article: B14197-03 Oracle 10gR2 Clusterware and Real Application Clusters Administration and Deployment Guide B14203-08 Oracle 10gR2 Clusterware and Real Application Clusters Installation Guide for Linux B14212-02 Oracle 10gR2 Net Services Administrators Guide B14213-01 Oracle 10gR2 Net Services Reference B14237-02 Oracle 10gR2 Reference B14258-01 Oracle 10gR2 PL/SQL Packages and Types Reference B14210-02 Oracle 10gR2 High Availability Overview B25159-01 Oracle 10gR2 High Availability Best Practices And these MetaLink documents are also invaluable to obtaining a deeper understanding of how the Load Balancing Advisory has been implemented in Oracle 10gR2: 226880.1 Configuration of Load Balancing and Transparent Application Failover |
|||||||||||||||||||||||||||||||||