Document Management with Oracle Text - Page 4June 12, 2003 Synching and Optimising a Text IndexSome potentional problems with the Text index are the index fragmentation, document invalidation and the index out of sync. Index SyncThe following select will provide the information about out of sync indexes:
SQL> select count(*),PND_INDEX_OWNER from ctx_pending group by PND_INDEX_OWNER;
COUNT(*) PND_INDEX_OWNER
--------- ------------------------------
1000 ARTIST
698 WEB_ARTIST
955 WEB_ARTIST
A user ARTIST has only one Text index with the name KB_INDEX Text. This index has 1000 pending, not indexed documents. We will run a manually synchronization of the KB_INDEX:
SQL> alter index ARTIST.KB_INDEX rebuild online parameters('sync memory 12345678');
Index altered.
After synchronization, the KB_INDEX should not have any non-indexed documents:
SQL> select count(*), PND_INDEX_OWNER from ctx_pending group by PND_INDEX_OWNER;
COUNT(*) PND_INDEX_OWNER
--------- ------------------------------
698 WEB_ARTIST
955 WEB_ARTIST
The index KB_INDEX is now in the sync with the base table. Another way to ensure index synchronization is to use the ctx_dll package. With this package, we have a time schedule option for the index synchronization. Setting index synchronization every 2 hours: SQL> exec ctx_schedule.startup ( 'web_text_ger', 'SYNC', 120 ); Index OptimizationIndex Optimization can be fast or full. The fast optimization runs through the whole index table and connects fragmented rows together, to a maximum of 4000 bytes for a single row. This reduces the number of rows in the index table. Full mode optimization is called garbage collection. It removes the old information left over after document invalidation and connects separate rows together where possible. Both optimizations are non-blocking, providing queries to run on the index during optimization. Running fast index optimization:
SQL> alter index web_artist.web_text_ger rebuild online parameters ('optimize fast');
Running full index optimization:
SQL> alter index web_text_ger rebuild online parameters ('optimize full');
Running full index optimization every 2 hours: SQL> exec ctx_schedule.startup ( 'web_text_ger', 'OPTIMIZE FAST', 120 ); Index optimization with "alter index..." is a blocking operation; during index maintainance, no one can use the index. To provide a non-blocking effect, use ctx_dll package, which allows users to use the Text index during rebuild or resync. An example of making fast optimization: begin Ctx_ddl.Sync_Index( idx_name=>'WEB_TEXT_GER', optlevel => Ctx_ddl.Optlevel_fast) ; end; / Monitoring and Error LoggingThe indexing process can log all progress information in a log file.
SQL> execute CTX_OUTPUT.START_LOG('$ORACLE_HOME/ctx/log');
PL/SQL procedure successfully completed.
Content of the log file should look similar to the following: Oracle interMedia Text: Release 8.1.7.0.0 - Production on Tue Oct 17 09:35:46 (c) Copyright 1999 Oracle Corporation. All rights reserved. 09-35-46 10/17/00 begin logging 09-36-21 10/17/00 populate index: INTERMEDIA.WEB_TEXT_GER 09-36-21 10/17/00 ..opening interMedia Text session 09-36-21 10/17/00 Begin document indexing 09-36-24 10/17/00 100 documents indexed 09-36-25 10/17/00 200 documents indexed 09-36-26 10/17/00 300 documents indexed 09-36-37 10/17/00 Errors reading documents: 0 09-36-37 10/17/00 Index data for 353 documents to be written to database 09-36-37 10/17/00 memory use: 490054 09-36-38 10/17/00 index data written to database. 09-36-38 10/17/00 End of document indexing. 353 documents indexed. To stop the logging activity use the following syntax: SQL> ctx_output.end_log; Processing errors encountered during indexing are logged. The index build will continue reading the next document to index. The logged errors are stored in the database and can be checked with the ctx_user_index_errors view. Sometimes a create index or alter index fails, leaving the context index in an unusable state. The database believes that the index is in "LOADING" or "INPROGRESS" state and blocks any attempt to recover through alter index. This situation can only be corrected by dropping and recreating the index. SQL> select owner,index_name, status from dba_indexes where index_name='WEB_*'; OWNER INDEX_NAME STATUS ------------------------------ ------------------------------ -------- WEB_ARTIST WEB_TEXT_GER VALID WEB_ARTIST WEB_TEXT INPROGRS SQL> drop index web_artist.web_text; drop index web_artist.web_text * ERROR at line 1: ORA-29868: cannot issue DDL on a domain index marked as LOADING SQL> drop index web_artist.web_text force; Index dropped. In
Oracle version 9.2.0 we have the package ConclusionOracle version 9.2 has several main enhancements for Oracle Text such as multi column datastore, new multi lexers, better partitioning support, enhanced XML features, document classification and many others. In addition, Oracle version 9.2 implements support for create or rebuild index online, providing DML on the base table during an index creation. Nevertheless, many customers still remember the buggy code with ConText and interMedia, and the poor customer support for this product. Many of them have switched to some other text engine, using an Oracle database as a simple data store. |