Document Management with Oracle Text

A
few years ago, many projects started using the multimedia capability of Oracle
databases. Powerful procedures for handling image, audio, video and the text
data in various formats are incorporated in the database kernel code; even the
default database installation has multimedia objects installed. Storing,
retrieving and searching the text from documents stored inside the Oracle
database are the most common usage of multimedia database capabilities. This
feature, incorporated into the product was formerly known as ConText, later
known as interMedia Text (iMT) and with the database version 9.X, it is
known as Oracle Text.

This article covers:

  • Document
    Index Type

  • Oracle
    Text Architecture (Classes, Objects, Preferences, Attributes)

  • Oracle
    Text Installation Check

  • Text
    Indexes Inventory

  • Synching
    and Optimising a Text Index

  • Monitoring
    and Error Logging

  • Maintenance
    Tips

  • Conclusion

Document Index Type

Oracle
Text is an extension to the Oracle database that allows searching specific words
in the tables of documents, using standard SQL expressions. Oracle Text is
integrated in a number of Oracle products such as Portal, iFS and Applications.
Supported document types are text, HTML, DOC, XLS, PPT, PDF and XML documents. txt/HTML data content will be stored in the clob column and other formatted
document
content in the blob column. The content can "also" stored outside of
database via BFILES.
For any kind of data content, the text engine is used for indexing and
retrieving.

Overview
of the Oracle Text product development:

Product Name

Database version

Index Type

ConText 2.x.x

< 8.1.5

interMedia Text

< 8.1.7

Context

Oracle Text for Oracle8i

(V 8.1.x) 8.1.7

Catalog, Context

Oracle Text

9.x

Ctxxpath,
Catalog, Context

There are three different index types: Context,
Catalog and Ctxxpath. They are used all for document indexing, but each has a
different functionality.

Context index is a "domain"
index used for fast retrieval of unstructured text.

DML processing on a Context index is
deferred. The actual index updates do not take place until an index SYNC is
performed.

Catalog (CTXCAT) index is an online, "catalog" index, efficient for
searching between small,
simple text fields and with queries using some structured criteria, (usually
numbers or dates). This index type supports only a basic functionality provided
in a Context index. A Catalog index has all the characteristics of the normal
database index.

Ctxxpath index is a special index installed during an Oracle Text
install. This index uses Oracle Text code and can be created only on sys.xmltype columns. It is used to
speed up certain queries using the existsnode method.

Marin Komadina
Marin Komadina
Marin was born June 27, 1968 in Zagreb, Croatia. He graduated in 1993 form The Faculty for Electrotechnology and Computer Sciences, University of Zagreb in Croatia. He started his professional career as a System specialist and DBA for the Croatian company Informatika System. His most important project was the development and implementation of the enterprise, distributed point of sales solution, based on the Oracle technology. In 1999, Marin became the company CTO, where he played an active role in company development and technical orientation. After Informatika System, Marin worked as an IT Manager Assistant for the Austrian international retail company "Segro," on location in Graz (Austria) and Zagreb (Croatia). He was responsible for the company's technical infrastructure and operational support. Segro used IBM technology, OS/400 operating system and DB2 database. In 1998, Marin joined the international telecommunication company VIPNet GSM that was a part of greater concern, Mobilkom Austria& Western Wireless Int. USA. After one year, Marin took over the IT System Manager position, where he managed many multi-platform, telecommunication projects and was leading the IT system department. In 2001, Marin started to work in Germany as a senior system architect. He is currently working for German banks on different banking projects.

Latest Articles