Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted Jun 12, 2003

Document Management with Oracle Text - Page 2

By Marin Komadina

Oracle Text Architecture (Classes, Objects, Preferences, Attributes)

An Oracle text index can be created on the following column types:


Text index will store words and the documents in which these words occur.

The Oracle Text engine uses special preference system for providing this service.

Text indexing is managed in several stages called "classes". Defined classes on the Oracle database are:

SQL> select * from ctx_classes;
CLA_NAME                       CLA_DESCRIPTION
------------------------------ -----------------------------DATASTORE    Data store Class
FILTER                          Filter Class
SECTION_GROUP                   Section Group
LEXER                           Lexer Class
WORDLIST                        Word List Class
STOPLIST                        Stop List Class
STORAGE                         Storage Class
INDEX_SET                       Index Set

Datastore is a document data selector table from which the CTX engine reads column data and returns document data.

Filter takes the document data from the datastore class and filters it to readable text.

Sectioner takes a text format, as input, and generates two outputs (the section boundaries and plaintext).

Lexer gets plaintext from the sectioner and splits it into words (discrete tokens).

Wordlist, Stoplist, and storage classes

STOPLIST Class holds a list of stop words, which are filtered out during indexing WORDLIST Class holds fuzzy and steam expansion settings used at the query time

STORAGE Class holds storage parameters for the underlying index tables and their indexes.

For example, a text index KB_INDEX will consist of the following objects:

DR$KB_INDEX$I - the tokens table, indexed tokens

DR$KB_INDEX$K - the docid mapping table, where text keys occur

DR$KB_INDEX$N - the negative row table, documents marked for deletion

DR$KB_INDEX$R - the rowid mapping table

DR$KB_INDEX$P - the substring index table

Every class has a defined list of objects, which are only templates.

A Class list on the Oracle database version:

SQL> select * from ctx_objects;
---------------   ------------------- -----------------------------------------
DATASTORE         DIRECT_DATASTORE      Documents are stored in the column
DATASTORE         DETAIL_DATASTORE      Documents are split into multiple lines
DATASTORE         FILE_DATASTORE        Documents are stored in files,column is file name
DATASTORE         URL_DATASTORE         Documents are web pages, column is URL
DATASTORE         USER_DATASTORE        Documents are stored in the column
DATASTORE         NESTED_DATASTORE      Documents are stored in a column in the nested table
DATASTORE     MULTI_COLUMN_DATASTORE    Documents are stored in multiple columns
FILTER            NULL_FILTER           Null filter
FILTER            USER_FILTER           User-defined filter
FILTER            CHARSET_FILTER        character set converting filter
FILTER            INSO_FILTER           filter for binary document formats
FILTER            PROCEDURE_FILTER      Procedure filter
SECTION_GROUP     NULL_SECTION_GROUP    null section group
SECTION_GROUP     BASIC_SECTION_GROUP   basic section group
SECTION_GROUP     HTML_SECTION_GROUP    html section group
SECTION_GROUP     XML_SECTION_GROUP     xml section group
SECTION_GROUP     NEWS_SECTION_GROUP    news section group
SECTION_GROUP     AUTO_SECTION_GROUP    auto section group
LEXER             BASIC_LEXER           Lexer for alphabetic languages
LEXER             JAPANESE_VGRAM_LEXER  V-gram lexer for Japanese
LEXER             KOREAN_LEXER          Dictionary-based lexer for Korean
LEXER             CHINESE_VGRAM_LEXER   V-GRAM lexer for Chinese
LEXER             CHINESE_LEXER         Chinese lexer
LEXER             MULTI_LEXER           Multi-language lexer
WORDLIST          BASIC_WORDLIST        basic wordlist
STOPLIST          BASIC_STOPLIST        basic stoplist
STOPLIST          MULTI_STOPLIST        multi-language stoplist
STORAGE           BASIC_STORAGE         text-index storage
INDEX_SET         BASIC_INDEX_SET       basic index set

Instead of directly using objects, we are creating a "preference" from one of the object templates, which is further customized by setting "attributes".

When we create the Oracle Text index, the indexing engine will read the defined preference for the index and invoke the attribute for each indexed document.

Oracle Archives

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM