Parallel Processing — Bane or Boon

An oft-used (and subsequently oft-abused) execution path is parallel execution, usually ‘instigated’ by some sort of parallel hint. Developers, albeit with the best intentions, misuse and abuse this mechanism because of faulty logic, that ‘logic’ being that if one process executes in X amount of time then Y parallel processes will execute in X/Y amount of time, and nothing could be further from the truth in many cases. Yes, there are opportunities to use parallelism to speed up processing but in most of the cases I’ve seen outside of Exadata it’s doing more harm than good.

To illustrate this in a ‘real world’ situation say you’re in a restaurant and the service appears to be slow; you think ‘if I had three waitresses instead of one I’d get my food faster’, but let’s look at that from a different point of view. Three waitresses for one table means that all three waitresses need to communicate with each other to avoid repeating work done by the others (this, of course, takes time). Each waitress needs to be assigned specific duties but also must be available to take over for another if something happens (one broke her toe, for instance); such an occurrence requires reassignment of duties and adjusting the schedule to accomodate the change. Eventually you get your order but it will take MORE time than if a single waitress performed all of the necessary tasks.

Parallel processing does more than simply ‘divide and conquer’ as it requires several steps most developers may not know about or may ignore entirely. Exadata, DSS and DW systems, running with large numbers of CPUs or, in the case of Exadata, multiple storage servers where processing can be offloaded, can benefit from parallel processing as the load can be distributed among the CPUs reducing the load on a single processor. OLTP systems, on the other hand, usually involve operations, which are quick to begin with and the overhead of implementing parallel processing is quite large compared to the overall execution time; additionally it may take longer to complete the parallel execution than it would to properly tune the query for single-threaded processing and eliminate the parallelism altogether.

So what does parallel processing bring to the table? Obviously the possibility, but not the guarantee, of reduced execution time. What does it cost? That depends on the system but for OLTP systems it usually creates more work than it accomplishes as a parallel query coordinator process is spawned, along with X number of parallel slaves all ‘talking’ to that coordinator as well as a parallel-to-serial operation by the coordinator to assemble the final results. Many OLTP systems aren’t designed for parallel processing (and rightly so) as parallel execution is designed to utilize resources that may remain idle (CPU, memory) and are in sufficient quantity to warrant generating the additional overhead required. Queries and statements should be tuned properly (there, I’ve said it again) for speedy execution and slapping a parallel hint on them isn’t the correct way to go about the tuning process. Looking at an example of the differences between parallel and serial execution might help clear the air.

  
SQL> select id, txtval, status
  2  from para_tst pt
  3  where id between 9001 and 34001;

        ID TXTVAL                          STATUS
---------- ------------------------------ -------
      9389 ALL_SCHEDULER_RUNNING_JOBS       VALID
      9390 USER_SCHEDULER_RUNNING_JOBS      VALID
      9391 USER_SCHEDULER_RUNNING_JOBS      VALID
[...]
     20772 /130d52e2_JDK2Sorter             VALID
     20773 /130d52e2_JDK2Sorter             VALID
     20774 /4c28cb16_ToolLogOptions         VALID
     20775 /4c28cb16_ToolLogOptions         VALID
     20776 /cbd9a55f_AbortException         VALID
     20777 /cbd9a55f_AbortException         VALID

591960 rows selected.

Elapsed: 00:02:03.68

Execution Plan
----------------------------------------------------------
Plan hash value: 350193380

---------------------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  593K|    20M|  1698   (2)| 00:00:21 |       |       |
|  1 |  PARTITION RANGE ITERATOR|          |  593K|    20M|  1698   (2)| 00:00:21 |    19 |    21 |
|* 2 |   TABLE ACCESS FULL      | PARA_TST |  593K|    20M|  1698   (2)| 00:00:21 |    19 |    21 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID" 
SQL> select /*+ parallel(pt 2) pq_distribute(pt partition) */
  2        id, txtval, status
  3  from para_tst pt
  4  where id between 9001 and 34001;

        ID TXTVAL                           STATUS
---------- ------------------------------- -------
     18404 /79bc64f9_JvmMemoryMeta           VALID
     18405 /b80019c2_EnumJvmThreadContent    VALID
     18406 /b80019c2_EnumJvmThreadContent    VALID
     18407 /bcfa29b5_EnumJvmThreadCpuTime    VALID
     18408 /bcfa29b5_EnumJvmThreadCpuTime    VALID
     17997 /b3bd73eb_CommonClassObject       VALID
     17998 /b3bd73eb_CommonClassObject       VALID
     17999 /c5a69e17_ServerSchemaObject1     VALID
[...]
     20724 /4bd3ef8d_KnownOptions4           VALID
     20725 /4bd3ef8d_KnownOptions4           VALID
     20726 /75d2b0ba_KnownOptions5           VALID
     20727 /75d2b0ba_KnownOptions5           VALID
     20728 /75e9b2d4_KnownOptions            VALID
     20729 /75e9b2d4_KnownOptions            VALID

591960 rows selected.

Elapsed: 00:02:23.59

Execution Plan
----------------------------------------------------------
Plan hash value: 1393746857

-------------------------------------------------------------------------------------------------------------------------
| Id | Operation            | Name     | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ |IN-OUT| PQ Distr |
-------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |          |  593K|    20M|   942   (1)| 00:00:12 |       |       |       |      |          |
|  1 |  PX COORDINATOR      |          |      |       |            |          |       |       |       |      |          |
|  2 |   PX SEND QC (RANDOM)| :TQ10000 |  593K|    20M|   942   (1)| 00:00:12 |       |       | Q1,00 | P->S | QC (RAND)|
|  3 |    PX BLOCK ITERATOR |          |  593K|    20M|   942   (1)| 00:00:12 |    19 |    21 | Q1,00 | PCWC |          |
|* 4 |     TABLE ACCESS FULL| PARA_TST |  593K|    20M|   942   (1)| 00:00:12 |    19 |    21 | Q1,00 | PCWP |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("ID" 

On a conventional system running 11.2.0.2 with absolutely no load the parallel execution takes almost twenty seconds longer; on a heavily used OLTP system this difference will likely be greater as the overhead to run in parallel is greater than that for the ‘straight’ query. Resources allocated in OLTP configurations are intended for multiple users performing concurrent processing thus there usually aren’t ‘extra’ CPU cycles or memory to toss at parallel execution. Since the parallel query slaves will be waiting for resources (just like their user session counterparts) this will increase the response time and delay processing until such resources are freed. Notice also the execution path, which includes the parallel-to-serial operation that consolodates the outputs from the parallel query slaves into the final result set. Depending upon the server configuration there could be many more parallel slaves than shown here, and ‘funneling’ all of that data into a single ‘pipe’ consumes time and resources you most likely won’t have in the OLTP environment.

When is it good practice to use parallel processing? One situation that comes to mind is in creating/populating tables using complex queries that may return faster in parallel than with standard serial processing. An application on which I worked was populating a table with a rather benign join but it was taking far too long to complete the load — the elapsed time exceeded the batch processing window. Using parallelism (along with regularly updated statistics) dramatically reduced the response time from over an hour to less than two minutes, well within the batch window allowing the rest of the processing to continue. [Note that this was a batch process, run outside of the normal business day, which freed resources normally allocated to user sessions.] There are others, outside of data warehousing applications, but they’re exceptions and not the rule.

So, we’ve learned that even though parallelism would appear to make things go faster in reality that’s not often the case due to the extra overhead in managing additional processes and consolodating the individual results into the final result set. We’ve also learned that simply slapping a /*+ parallel */ hint into a query doesn’t constitute tuning and doing so can make performance worse instead of better, outside of running on Exadata. The correct choice is to properly tune the query or queries in question using resources such as AWR and ASH reports, execution plans and wait statistics to pinpoint the problem area or areas to address. Parallelism isn’t a silver bullet and wasn’t intended to be and should be used sparingly, if at all.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles