Using multiple jobs
I have used this method in many places for DML
activities, with or without conditions, and found it to be quite fast (most of
the time even faster than the other methods). The processes can be tuned to
work in the available resources. The idea is to break a big job into smaller
units that can be run independently of each other. These smaller jobs can be
executed in groups at proper intervals. In case a job fails, the same can be
re-run and will not require the whole process to be started again. Changes are
independently committed in each job, making them separate transactions.
This process does not require any structural
changes except for the new column addition. The number of jobs can be altered
to moderate the processing. The Oracle provided job scheduler, (DBMS_JOB), is
used to submit and run the jobs.
Steps followed:
1. Add the new column to the table with no
default value.
2. Submit jobs to update the table. Each job
will update a certain number of rows and commit the changes.
3. Once all of the jobs are completed, modify
the column to specify a default value and make it not null.
To break an update on a huge table into smaller
units of work, we require a unique column or a non-unique column that has good
cardinality for the values present. If an index does not exist on such a column, one needs to be
created temporarily. Preferably, it should be a key column such as TRX_ID
(unique) of MTL_TRX table in my case. We need to decide how many jobs we should
generate and what will be the load for each job based on the identified column.
For example, my table, MTL_TRX, has 2.2 million
records; I want to submit 7000 records approximately in each job. Therefore, about
300 threads need to be created. This information can be hardcoded in the
script, but I thought of going a step further and making a generic routine that
could be used for all such processing. The package JNC_SLICE_JOB takes input (minimum
key value, maximum key value and number of threads) from the calling program
and stores the job breakup information in a PL/SQL table of records. This
information is then used by the calling program to submit job requests.
Running the routine from outside produces the
following result. The routine code is present at the end of the article.
SQL> set serverout on
SQL> declare
2 l_status varchar2(200);
3 begin
4 jnc_slice_job.slice(1000, 5000, 8, l_status);
5 dbms_output.put_line(l_status);
6 end;
7 /
1000 : 1499 --the from and to range for each thread.
1500 : 1999
2000 : 2499
2500 : 2999
3000 : 3499
3500 : 3999
4000 : 4499
4500 : 5000
OK
Inputs provided are: minumum value of a key column (1000),
maximum value of the key column (5000),
no of threads to create (8)
Ouput returned is : status of the execution, OK (if successful), oracle error (if failure).
The table SLICE_TAB is populated with the processing range for each job. This
table can then be referred in the calling routine to submit the jobs.
The code mentioned here deals with only numeric
data and can be further modified to work with VARCHAR2 datatypes or even ROWID.
Now, coming back to our original requirement, the job processing parameters are
set as below:
SQL> select name, value
2 from v$parameter
3 where name like 'job%';
NAME VALUE
-------------------------------- -----
job_queue_processes 20
job_queue_interval 30
The above setting means that 20 jobs are to be
run every 30 seconds. I can also modify the count of jobs to be executed on the
fly, based on system load.
alter system set job_queue_processes = 25;
The following script is called to submit Oracle jobs.
set serverout on
declare
l_thread pls_integer := 300;
l_stanum number := 0;
l_endnum number := 0;
l_status varchar2(200);
l_job_id number;
l_job varchar2(4000);
errexc exception;
begin
execute immediate 'alter table mtl_trx add invind varchar2(1)';
select min(trx_id)
into l_stanum
from mtl_trx;
select max(trx_id)
into l_endnum
from mtl_trx;
jnc_slice_job.slice(l_stanum, l_endnum, l_thread, l_status);
if l_status != 'OK' then
dbms_output.put_line(l_status);
raise errexc;
end if;
for i in nvl(jnc_slice_job.slice_tab.first, 0) .. nvl(jnc_slice_job.slice_tab.last, 0) loop
l_job := 'begin update mtl_trx set invind = ''N'' where trx_id between '
|| jnc_slice_job.slice_tab(i).minlmt
|| ' and '
|| jnc_slice_job.slice_tab(i).maxlmt
|| ';'
|| ' commit; end;';
dbms_job.submit(l_job_id, l_job, sysdate, null);
end loop;
commit;
dbms_output.put_line('Please monitor jobs submitted..');
end;
On running the above script, 300 jobs are
submitted and execution is started at once. The job processing can be monitored
from DBA_JOBS and DBA_JOBS_RUNNING tables.
SQL> select job, what from user_jobs;
JOB WHAT
---------- ------------------------------------------------------------------------
14070 begin update mtl_trx set invind = 'N' where trx_id between 22214424 and..
14071 begin update mtl_trx set invind = 'N' where trx_id between 22236581 and..
14072 begin update mtl_trx set invind = 'N' where trx_id between 22258738 and..
14073 begin update mtl_trx set invind = 'N' where trx_id between 22280895 and..
14074 begin update mtl_trx set invind = 'N' where trx_id between 22303052 and..
14075 begin update mtl_trx set invind = 'N' where trx_id between 22325209 and..
14076 begin update mtl_trx set invind = 'N' where trx_id between 22347366 and..
14077 begin update mtl_trx set invind = 'N' where trx_id between 22369523 and..
Upon completion of all the jobs, the column is
marked as not null with a default value.
SQL> alter table mtl_trx modify invind default 'N' not null;
Table altered.
The complete process takes 6-9 minutes
(excluding statistics generation). The execution of jobs in threads is quite
fast as the WHERE condition uses the unique index present on the column
specified. This is acceptable timing without any structural changes being done.
The timing can be further brought down by reducing the time interval or
increasing the number of jobs.
Advantage: Process time is considerably reduced. Commits
are possible at intervals. Each job is a separate transaction and can be run
again independently in case of failure (and avoid re-doing the whole work again).
The option of submitting in chunks gives us the advantage of running it online
also, multiple undo segments are used, no structural changes are required apart
from column addition, and updates based on a logical analysis in the code are possible.
Disadvantage: Requires initial groundwork and proper
planning to have the process in place, initialization parameters are altered
for a short time.
Using Partitions
Partitioning is used in CBO environments. This
feature allows us to break a large table down into smaller segments. If a new
column is being added to an already partitioned table, we can modify the
statement to make use of the partitions and save time.
For this example, I recreated the MTL_TRX table
with partitions on the transaction date column. The table is analyzed and the
update is carried out with optimizer mode set to FIRST_ROWS.
create table mtl_trx_2
(TRX_ID NUMBER,
TRX_DATE DATE,
ITEM_ID NUMBER,
SHOP VARCHAR2(10),
TRX_TYPE_ID NUMBER,
QTY NUMBER,
COST NUMBER)
partition by range (trx_date)
(partition p1 values less than (to_date('01-dec-2003', 'dd-mon-yyyy')),
partition p2 values less than (to_date('01-jan-2004', 'dd-mon-yyyy')),
partition p3 values less than (to_date('01-feb-2004', 'dd-mon-yyyy')),
partition p4 values less than (to_date('01-mar-2004', 'dd-mon-yyyy')),
partition p5 values less than (to_date('01-apr-2004', 'dd-mon-yyyy')),
partition p6 values less than (to_date('01-may-2004', 'dd-mon-yyyy')),
partition p7 values less than (to_date('01-jun-2004', 'dd-mon-yyyy')),
partition p8 values less than (to_date('01-jul-2004', 'dd-mon-yyyy')));
This allows me to split data month-wise in
different segments. I will now create the new column without any default value.
SQL> alter table mtl_trx add stcind varchar2(1);
Table altered.
The update of default value is carried out
partition-wise. This will result in only one segment being updated at a time.
The update time is reduced as a full-table scan is avoided and Oracle
concentrates only on the relevant partition that is referred to by the WHERE
clause condition (partition pruning feature of the optimizer).
SQL> update mtl_trx
2 set stcind = 'A'
3 where trx_date >= '01-apr-04' and trx_date < '01-may-04';
214930 rows updated.
and so on..
I run the partition-wise updates individually
for different segments. The column is finally marked as not null once all of
the updates are complete. The total process takes me around 10-12 minutes.
Advantage: Process time is reduced, commits possible
at intervals (if multiple updates are run separately).
Disadvantage: Table should be partitioned, no other
structural/system change apart from the column addition, an undo space problem
may occur if the partition size is too large.
Conclusion
As per my analysis, the process of running jobs
in smaller units of work has proven to be quite flexible, convenient and fast
in the long run. Other methods can also be used, depending on the size of the
tables being dealt with.
As the database size increases, doing such massive
activities becomes more cumbersome and prone to failure. We need to look out
for other alternatives, as mentioned above, to make things easier and faster.
At all times we need to make sure that existing data is not being put at risk.
»
See All Articles by Columnist Amar Kumar Padhi
---------------------------------------------------------------------------------------
--Script : jnc_slice_job.sql
--Subject: Routine for slicing a given value and placing it in table format.
--Note : The process will pick up the threading information from the calling routine.
---------------------------------------------------------------------------------------
create or replace package jnc_slice_job is
procedure slice(pi_min_id in number,
pi_max_id in number,
pi_threads in number,
pio_status in out nocopy varchar2);
type slice_row is record(minlmt number, maxlmt number);
type slice_typ is table of slice_row
index by binary_integer;
slice_tab slice_typ;
end jnc_slice_job;
/
create or replace package body jnc_slice_job is
procedure slice(pi_min_id in number,
pi_max_id in number,
pi_threads in number,
pio_status in out nocopy varchar2) is
l_min_id number := pi_min_id;
l_max_id number := pi_max_id;
l_threads number := pi_threads;
l_thread_load number := 0;
l_dif number := 0;
l_cnt number := 0;
begin
slice_tab.delete;
pio_status := 'OK';
l_dif := l_max_id - l_min_id;
l_thread_load := round(l_dif/l_threads);
while l_min_id < l_max_id loop
l_cnt := l_cnt + 1;
slice_tab(l_cnt).minlmt := l_min_id;
l_min_id := l_min_id + l_thread_load;
if l_min_id >= l_max_id then
slice_tab(l_cnt).maxlmt := l_min_id;
else
slice_tab(l_cnt).maxlmt := l_min_id - 1;
end if;
end loop;
for i in slice_tab.first .. slice_tab.last loop
dbms_output.put_line(slice_tab(i).minlmt || ' : ' || slice_tab(i).maxlmt);
end loop;
exception
when others then
pio_status := 'Error in slicing. ' || substr(sqlerrm, 1, 150);
end;
end jnc_slice_job;
/
sho err