Document Management with Oracle Text - Page 4

June 12, 2003

Synching and Optimising a Text Index

Some potentional problems with the Text index are the index fragmentation, document invalidation and the index out of sync.

Index Sync

The following select will provide the information about out of sync indexes:

SQL> select count(*),PND_INDEX_OWNER from ctx_pending group by PND_INDEX_OWNER; 

--------- ------------------------------
     1000 ARTIST
      698 WEB_ARTIST
      955 WEB_ARTIST

A user ARTIST has only one Text index with the name KB_INDEX Text. This index has 1000 pending, not indexed documents. We will run a manually synchronization of the KB_INDEX:

SQL> alter index ARTIST.KB_INDEX rebuild online parameters('sync memory 12345678'); 
Index altered.

After synchronization, the KB_INDEX should not have any non-indexed documents:

SQL> select count(*), PND_INDEX_OWNER from ctx_pending group by PND_INDEX_OWNER;

--------- ------------------------------
      698 WEB_ARTIST
      955 WEB_ARTIST

The index KB_INDEX is now in the sync with the base table.

Another way to ensure index synchronization is to use the ctx_dll package. With this package, we have a time schedule option for the index synchronization.

Setting index synchronization every 2 hours:

SQL> exec ctx_schedule.startup ( 'web_text_ger', 'SYNC', 120 ); 
Index Optimization

Index Optimization can be fast or full. The fast optimization runs through the whole index table and connects fragmented rows together, to a maximum of 4000 bytes for a single row. This reduces the number of rows in the index table.

Full mode optimization is called garbage collection. It removes the old information left over after document invalidation and connects separate rows together where possible. Both optimizations are non-blocking, providing queries to run on the index during optimization.

Running fast index optimization:

SQL> alter index web_artist.web_text_ger rebuild online parameters ('optimize fast');

Running full index optimization:

SQL> alter index web_text_ger rebuild online parameters ('optimize full');

Running full index optimization every 2 hours:

SQL> exec ctx_schedule.startup ( 'web_text_ger', 'OPTIMIZE FAST', 120 );

Index optimization with "alter index..." is a blocking operation; during index maintainance, no one can use the index. To provide a non-blocking effect, use ctx_dll package, which allows users to use the Text index during rebuild or resync.

An example of making fast optimization:

Ctx_ddl.Sync_Index( idx_name=>'WEB_TEXT_GER', optlevel => Ctx_ddl.Optlevel_fast) ;

Monitoring and Error Logging

The indexing process can log all progress information in a log file.

PL/SQL procedure successfully completed.

Content of the log file should look similar to the following:

Oracle interMedia Text: Release - Production on Tue Oct 17 09:35:46
(c) Copyright 1999 Oracle Corporation.  All rights reserved.

09-35-46 10/17/00 begin logging
09-36-21 10/17/00 populate index: INTERMEDIA.WEB_TEXT_GER
09-36-21 10/17/00 ..opening interMedia Text session
09-36-21 10/17/00 Begin document indexing
09-36-24 10/17/00 100 documents indexed
09-36-25 10/17/00 200 documents indexed
09-36-26 10/17/00 300 documents indexed
09-36-37 10/17/00 Errors reading documents: 0
09-36-37 10/17/00 Index data for 353 documents to be written to database
09-36-37 10/17/00    memory use: 490054
09-36-38 10/17/00    index data written to database.
09-36-38 10/17/00 End of document indexing. 353 documents indexed.

To stop the logging activity use the following syntax:

SQL> ctx_output.end_log;

Processing errors encountered during indexing are logged. The index build will continue reading the next document to index. The logged errors are stored in the database and can be checked with the ctx_user_index_errors view.

Sometimes a create index or alter index fails, leaving the context index in an unusable state. The database believes that the index is in "LOADING" or "INPROGRESS" state and blocks any attempt to recover through alter index. This situation can only be corrected by dropping and recreating the index.

SQL> select owner,index_name, status from dba_indexes where index_name='WEB_*';

OWNER                          INDEX_NAME                     STATUS
------------------------------ ------------------------------ --------
WEB_ARTIST                     WEB_TEXT_GER                   VALID
WEB_ARTIST                     WEB_TEXT                       INPROGRS

SQL> drop index web_artist.web_text;
drop index web_artist.web_text
ERROR at line 1:
ORA-29868: cannot issue DDL on a domain index marked as LOADING

SQL> drop index web_artist.web_text force;
Index dropped.

In Oracle version 9.2.0 we have the package CTX_ADM.MARK_FAILED. This package makes a call to force the index from LOADING to FAILED, thus unblocking the alter index command.


Oracle version 9.2 has several main enhancements for Oracle Text such as multi column datastore, new multi lexers, better partitioning support, enhanced XML features, document classification and many others. In addition, Oracle version 9.2 implements support for create or rebuild index online, providing DML on the base table during an index creation.

Nevertheless, many customers still remember the buggy code with ConText and interMedia, and the poor customer support for this product. Many of them have switched to some other text engine, using an Oracle database as a simple data store.

» See All Articles by Columnist Marin Komadina

The Network for Technology Professionals



Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers