Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Jun 10, 2004

Creating NOT NULL Columns in Huge Oracle Tables - Page 3

By Amar Kumar Padhi

Using multiple jobs

I have used this method in many places for DML activities, with or without conditions, and found it to be quite fast (most of the time even faster than the other methods). The processes can be tuned to work in the available resources. The idea is to break a big job into smaller units that can be run independently of each other. These smaller jobs can be executed in groups at proper intervals. In case a job fails, the same can be re-run and will not require the whole process to be started again. Changes are independently committed in each job, making them separate transactions.

This process does not require any structural changes except for the new column addition. The number of jobs can be altered to moderate the processing. The Oracle provided job scheduler, (DBMS_JOB), is used to submit and run the jobs.

Steps followed:

1. Add the new column to the table with no default value.

2. Submit jobs to update the table. Each job will update a certain number of rows and commit the changes.

3. Once all of the jobs are completed, modify the column to specify a default value and make it not null.

To break an update on a huge table into smaller units of work, we require a unique column or a non-unique column that has good cardinality for the values present. If an index does not exist on such a column, one needs to be created temporarily. Preferably, it should be a key column such as TRX_ID (unique) of MTL_TRX table in my case. We need to decide how many jobs we should generate and what will be the load for each job based on the identified column.

For example, my table, MTL_TRX, has 2.2 million records; I want to submit 7000 records approximately in each job. Therefore, about 300 threads need to be created. This information can be hardcoded in the script, but I thought of going a step further and making a generic routine that could be used for all such processing. The package JNC_SLICE_JOB takes input (minimum key value, maximum key value and number of threads) from the calling program and stores the job breakup information in a PL/SQL table of records. This information is then used by the calling program to submit job requests.

Running the routine from outside produces the following result. The routine code is present at the end of the article.

SQL> set serverout on
SQL> declare
  2    l_status varchar2(200);
  3  begin
  4    jnc_slice_job.slice(1000, 5000, 8, l_status);
  5    dbms_output.put_line(l_status);
  6  end;
  7  /
1000 : 1499       --the from and to range for each thread.
1500 : 1999
2000 : 2499
2500 : 2999
3000 : 3499
3500 : 3999
4000 : 4499
4500 : 5000
OK

Inputs provided are: minumum value of a key column (1000), 
                     maximum value of the key column (5000), 
                     no of threads to create (8)
Ouput returned is  : status of the execution, OK (if successful), oracle error (if failure).
The table SLICE_TAB is populated with the processing range for each job. This 
table can then be referred in the calling routine to submit the jobs.

The code mentioned here deals with only numeric data and can be further modified to work with VARCHAR2 datatypes or even ROWID. Now, coming back to our original requirement, the job processing parameters are set as below:

SQL> select name, value 
  2  from   v$parameter
  3  where  name like 'job%';

NAME                             VALUE
-------------------------------- -----
job_queue_processes              20
job_queue_interval               30

The above setting means that 20 jobs are to be run every 30 seconds. I can also modify the count of jobs to be executed on the fly, based on system load.

alter system set job_queue_processes = 25;

The following script is called to submit Oracle jobs.

set serverout on
declare
  l_thread  pls_integer := 300;
  l_stanum  number := 0;
  l_endnum  number := 0;
  l_status  varchar2(200);
  l_job_id  number;
  l_job     varchar2(4000);
  errexc    exception;
begin
  execute immediate 'alter table mtl_trx add invind varchar2(1)';

  select min(trx_id)
  into   l_stanum
  from   mtl_trx;

  select max(trx_id)
  into   l_endnum
  from   mtl_trx;

  jnc_slice_job.slice(l_stanum, l_endnum, l_thread, l_status);
  if l_status != 'OK' then
    dbms_output.put_line(l_status);
    raise errexc;
  end if;

  for i in nvl(jnc_slice_job.slice_tab.first, 0) .. nvl(jnc_slice_job.slice_tab.last, 0) loop
    l_job :=  'begin update mtl_trx set invind = ''N'' where  trx_id between ' 
              || jnc_slice_job.slice_tab(i).minlmt
              || ' and '   
              || jnc_slice_job.slice_tab(i).maxlmt 
              || ';' 
              || '  commit; end;';

    dbms_job.submit(l_job_id, l_job, sysdate, null);
  end loop;
  commit;
  dbms_output.put_line('Please monitor jobs submitted..');
end;
                    

On running the above script, 300 jobs are submitted and execution is started at once. The job processing can be monitored from DBA_JOBS and DBA_JOBS_RUNNING tables.

SQL> select job, what from user_jobs;

       JOB WHAT
---------- ------------------------------------------------------------------------
     14070 begin update mtl_trx set invind = 'N' where  trx_id between 22214424 and..
     14071 begin update mtl_trx set invind = 'N' where  trx_id between 22236581 and..
     14072 begin update mtl_trx set invind = 'N' where  trx_id between 22258738 and..
     14073 begin update mtl_trx set invind = 'N' where  trx_id between 22280895 and..
     14074 begin update mtl_trx set invind = 'N' where  trx_id between 22303052 and..
     14075 begin update mtl_trx set invind = 'N' where  trx_id between 22325209 and..
     14076 begin update mtl_trx set invind = 'N' where  trx_id between 22347366 and..
     14077 begin update mtl_trx set invind = 'N' where  trx_id between 22369523 and..

Upon completion of all the jobs, the column is marked as not null with a default value.

SQL> alter table mtl_trx modify invind default 'N' not null;

Table altered.

The complete process takes 6-9 minutes (excluding statistics generation). The execution of jobs in threads is quite fast as the WHERE condition uses the unique index present on the column specified. This is acceptable timing without any structural changes being done. The timing can be further brought down by reducing the time interval or increasing the number of jobs.

Advantage: Process time is considerably reduced. Commits are possible at intervals. Each job is a separate transaction and can be run again independently in case of failure (and avoid re-doing the whole work again). The option of submitting in chunks gives us the advantage of running it online also, multiple undo segments are used, no structural changes are required apart from column addition, and updates based on a logical analysis in the code are possible.

Disadvantage: Requires initial groundwork and proper planning to have the process in place, initialization parameters are altered for a short time.

Using Partitions

Partitioning is used in CBO environments. This feature allows us to break a large table down into smaller segments. If a new column is being added to an already partitioned table, we can modify the statement to make use of the partitions and save time.

For this example, I recreated the MTL_TRX table with partitions on the transaction date column. The table is analyzed and the update is carried out with optimizer mode set to FIRST_ROWS.

create table mtl_trx_2
(TRX_ID       NUMBER,
 TRX_DATE     DATE,
 ITEM_ID      NUMBER,
 SHOP         VARCHAR2(10),
 TRX_TYPE_ID  NUMBER,
 QTY          NUMBER,
 COST         NUMBER)
partition by range (trx_date)
(partition p1 values less than (to_date('01-dec-2003', 'dd-mon-yyyy')),
 partition p2 values less than (to_date('01-jan-2004', 'dd-mon-yyyy')),
 partition p3 values less than (to_date('01-feb-2004', 'dd-mon-yyyy')),
 partition p4 values less than (to_date('01-mar-2004', 'dd-mon-yyyy')),
 partition p5 values less than (to_date('01-apr-2004', 'dd-mon-yyyy')),
 partition p6 values less than (to_date('01-may-2004', 'dd-mon-yyyy')),
 partition p7 values less than (to_date('01-jun-2004', 'dd-mon-yyyy')),
 partition p8 values less than (to_date('01-jul-2004', 'dd-mon-yyyy')));

This allows me to split data month-wise in different segments. I will now create the new column without any default value.

SQL> alter table mtl_trx add stcind varchar2(1);

Table altered.

The update of default value is carried out partition-wise. This will result in only one segment being updated at a time. The update time is reduced as a full-table scan is avoided and Oracle concentrates only on the relevant partition that is referred to by the WHERE clause condition (partition pruning feature of the optimizer).

SQL> update mtl_trx
  2  set    stcind = 'A'
  3  where  trx_date >= '01-apr-04' and trx_date < '01-may-04';

214930 rows updated.

and so on..

I run the partition-wise updates individually for different segments. The column is finally marked as not null once all of the updates are complete. The total process takes me around 10-12 minutes.

Advantage: Process time is reduced, commits possible at intervals (if multiple updates are run separately).

Disadvantage: Table should be partitioned, no other structural/system change apart from the column addition, an undo space problem may occur if the partition size is too large.

Conclusion

As per my analysis, the process of running jobs in smaller units of work has proven to be quite flexible, convenient and fast in the long run. Other methods can also be used, depending on the size of the tables being dealt with.

As the database size increases, doing such massive activities becomes more cumbersome and prone to failure. We need to look out for other alternatives, as mentioned above, to make things easier and faster. At all times we need to make sure that existing data is not being put at risk.

» See All Articles by Columnist Amar Kumar Padhi

---------------------------------------------------------------------------------------

--Script : jnc_slice_job.sql
--Subject: Routine for slicing a given value and placing it in table format.
--Note   : The process will pick up the threading information from the calling routine.
---------------------------------------------------------------------------------------

create or replace package jnc_slice_job is
  procedure slice(pi_min_id in             number,
                  pi_max_id in             number,
                  pi_threads in            number,
                  pio_status in out nocopy varchar2);

  type slice_row is record(minlmt  number, maxlmt number);
  type slice_typ is table of slice_row
      index by binary_integer;

  slice_tab  slice_typ;
end jnc_slice_job;
/

create or replace package body jnc_slice_job is

procedure slice(pi_min_id in             number,
                pi_max_id in             number,
                pi_threads in            number,
                pio_status in out nocopy varchar2) is

  l_min_id   number := pi_min_id;
  l_max_id   number := pi_max_id;
  l_threads  number := pi_threads;
  l_thread_load number := 0;
  l_dif      number := 0;
  l_cnt      number := 0;

begin
  slice_tab.delete;

  pio_status := 'OK';

  l_dif := l_max_id - l_min_id;

  l_thread_load := round(l_dif/l_threads);

  while l_min_id < l_max_id loop
    l_cnt := l_cnt + 1;
    slice_tab(l_cnt).minlmt := l_min_id;
    l_min_id := l_min_id + l_thread_load;

    if l_min_id >= l_max_id then
      slice_tab(l_cnt).maxlmt := l_min_id;   
    else
      slice_tab(l_cnt).maxlmt := l_min_id - 1;   
    end if;
  end loop;

  for i in slice_tab.first .. slice_tab.last loop
   dbms_output.put_line(slice_tab(i).minlmt || ' : ' || slice_tab(i).maxlmt);
  end loop;  

exception
  when others then
    pio_status := 'Error in slicing. ' || substr(sqlerrm, 1, 150);
end;

end jnc_slice_job;
/

sho err





Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date