Setting Up Oracle Database’s Oracle Text

Steve Callan walks through some Oracle Text setup steps and working examples of different index types used by Oracle Text.

Oracle Text – Expanding Your String Searching Capabilities in Oracle Database discussed some of Oracle Text’s features and functionality. In this article, Steve Callan goes through some setup steps and worked examples.

First of all, where do you get Oracle Text? In at least 10g and above, it is installed by default. Where can you see if feature X is installed or not? One place is within DBA_REGISTRY, query on comp_name and status.

SQL> col comp_name for a40
SQL> select comp_name, status from dba_registry;
COMP_NAME STATUS
---------------------------------------- -------
Oracle Database Catalog Views VALID
Oracle Database Packages and Types VALID
Oracle Workspace Manager VALID
JServer JAVA Virtual Machine VALID
Oracle XDK VALID
Oracle Database Java Packages VALID
Oracle Expression Filter VALID
Oracle Data Mining VALID
Oracle Text VALID
Oracle XML Database VALID
Oracle Rules Manager VALID
Oracle interMedia VALID
OLAP Analytic Workspace VALID
Oracle OLAP API VALID
OLAP Catalog VALID
Spatial VALID
Oracle Enterprise Manager VALID

Since Text is a feature, and as we all know, upgrades can be cause for concern in terms of extra steps having to be performed, you’ll be happy to note that the upgrade of Oracle Text from one release to another comes along as part of the overall upgrade process. In other words, you don’t have to do anything.

Second, what does it take for a regular user to be able to use Oracle Text? Not a whole lot. Create a table and insert data as needed, create the appropriate Oracle Text index, and execute a query against the table.

SQL> conn scott/tiger
Connected.
SQL> create table docs (id number primary key, text varchar2(80));
Table created.
SQL> insert into docs values (1, 'first document');
1 row created.
SQL> insert into docs values (2, 'second document');
1 row created.
SQL> commit;
Commit complete.
SQL> create index doc_index on docs(text)
2 indextype is ctxsys.context;
Index created.
SQL> select id, text
2 from docs
3 where contains(text, 'first') > 0;
ID TEXT
---------- ------------------------------
1 first document

In the above example, only two elements are different from what you would expect to see in a regular query. The first is the CREATE INDEX statement. Note the syntax where INDEXTYPE is of CTXSYS.CONTEXT. As mentioned in the introductory article, one of the four index types in Oracle Text is context-based. The second is the syntax in the SELECT statement. Other than those two new aspects, using Oracle Text, albeit in a simple example, is pretty easy to do.

Who is the CTXSYS user? Using Toad for its handy schema browser interface, we can see the following information with respect to what constitutes this schema (ignoring tables and indexes for the time being).

What becomes clear about CTXSYS is that its main driver (what makes it tick) is the CTXAPP role, and looking further into what the role has for grants is a collection of EXECUTE privileges on several packages.

For the simple query I used earlier, the amount of steps shown in a formatted trace file (using TKPROF) is pretty amazing. The actual query appears as the 6th statement or procedure call (one of them is dynamic sampling since the table has not been analyzed yet). Recalling what was mentioned earlier about the indexes in Text being domain indexes? We can see this as a fact via the execution plan.

select id, text from docs
where contains(text,'first')>0
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.03 0 145 0 0
Execute 1 0.01 0.01 0 0 0 0
Fetch 2 0.00 0.00 0 2 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.04 0.05 0 147 0 1
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 54
 
Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY INDEX ROWID DOCS (cr=16 pr=0 pw=0 time=70341 us)
1 DOMAIN INDEX DOC_INDEX (cr=15 pr=0 pw=0 time=70375 us)

When examining the SQL being executed behind the scenes (specifically, DML), you’ll see references to four different items (tables) ending with a distinctive “dollar letter” identifier. These suffixes are:

  • $I
  • $K
  • $R
  • $N

They are further identified as being prefixed with DR, the index name, and then the table identifier (one of the four suffixes). A white paper at OTN on how Text processes DML explains them in more detail.

What this all leads up to is that to use Text, a user needs to have (aside from other normal privileges) been granted the CTXAPP role, some additional EXECUTE grants on other CTXSYS packages, and sufficient storage space for table-like indexes.

A set of grants in the documentation is shown below.

GRANT EXECUTE ON CTXSYS.CTX_CLS TO scott;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO scott;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO scott;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO scott;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO scott;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO scott;
GRANT EXECUTE ON CTXSYS.CTX_THES TO scott;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO scott;

The user using Text also needs to be aware of how Text-related indexes are maintained. Going back to the simple DOCS table, let’s insert another record and then query for it.

SQL> insert into docs values (3, 'third document');
1 row created.
SQL> commit;
Commit complete.
SQL> select id, text from docs
2 where contains(text,'third')>0;
no rows selected

You can clearly find this record via regular SQL, but the Oracle Text query fails to return the record. Why is that? Again, this type of index requires synchronization after DML, and the user definitely needs EXECUTE privileges on the CTX_DDL package to make that happen. So, let’s sync the index and re-try the query.

SQL> exec ctx_ddl.sync_index('doc_index');
PL/SQL procedure successfully completed.
SQL> select id, text from docs
2 where contains(text,'third')>0;
ID TEXT
---------- ------------------------------
3 third document

Depending on your application, you can begin to see the criticality of this requirement. Suppose you were managing a no-fly list that is updated once every 24 hours. A new person of interest is inserted into the no-fly table, but without the index being updated yet, there will be a window where this person may be allowed to board an aircraft unless some other step is taken (e.g., manually checking a web page or some other external source of information). A failure there is not good either.

In the index sync statement, I took advantage of a default parameter (several, actually). The specification for CTX_DDL is shown below.

PROCEDURE sync_index(
idx_name in varchar2 default NULL,
memory in varchar2 default NULL,
part_name in varchar2 default NULL,
parallel_degree in number default 1
);

Specifying the index name is sufficient, but what comes into play with real world sized data sets is how much memory you can afford to use to cache the index ahead of time. What does the SYNC_INDEX procedure do? We don’t know (exactly) because Oracle wrapped the package body. As no value was given for the memory parameter, does that mean no memory was used? The answer to that is no, and the way to see the default value is to query a CTXSYS table.

The path to the description of Text packages is a redirection from the PL/SQL Packages and Types Reference guide to the Oracle Text Reference guide. The description of CTX_DDL says that the memory parameter uses the system value for DEFAULT_INDEX_MEMORY. Don’t go looking for this via SHOW PARAMETER as “system” does not refer to the instance initialization parameters. Instead, query the CTX_PARAMETERS table, and in my case, the value for this parameter is 12582912 bytes, or about 12MB. The max setting is 1GB.

As another example of using Oracle Text, but this time with a different index type, we can use the catalog search example shown in the Oracle Text Application Developer’s Guide. The example is based on using more than one column for an index, and in this case, supports searching for string and sorting at the same time. The steps are to create a table, populate it, create a sub-index (the additional column we’ll be using in addition to the main index), create the catalog index, and they query from the table.

Create table and populate it
CREATE TABLE auction(
item_id NUMBER,
title VARCHAR2(100),
category_id NUMBER,
price NUMBER,
bid_close DATE);
INSERT INTO AUCTION VALUES
(1, 'NIKON CAMERA', 1, 400, '24-OCT-2002');
INSERT INTO AUCTION VALUES
(2, 'OLYMPUS CAMERA', 1, 300, '25-OCT-2002');
INSERT INTO AUCTION VALUES
(3, 'PENTAX CAMERA', 1, 200, '26-OCT-2002');
INSERT INTO AUCTION VALUES
(4, 'CANON CAMERA', 1, 250, '27-OCT-2002'); 
commit;

 

Create the sub-index
EXEC CTX_DDL.CREATE_INDEX_SET('auction_iset');
EXEC CTX_DDL.ADD_INDEX('auction_iset','price');

 

Create the catalog index
CREATE INDEX auction_titlex ON AUCTION(title) 
INDEXTYPE IS CTXSYS.CTXCAT
PARAMETERS ('index set auction_iset');

And now we’re ready to query.

SQL> COLUMN title FORMAT a40;
SQL> SELECT title, price FROM auction 
2 WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0;
TITLE PRICE
---------------------------------------- ----------
PENTAX CAMERA 200
CANON CAMERA 250
OLYMPUS CAMERA 300
NIKON CAMERA 400

Overall, pretty easy, except the documentation has a couple of typographical Easter eggs to make some statements fail. The first CTX_DDL call should be CREATE_INDEX_SET, not CREATE.INDEXT_SET. The “sub-index A” comment at the end of an EXEC statement does not work well either, so just omit it as I did above.

Finally, what happens when the base table is updated (or overall, DML is applied against it)? With the catalog index, the index is synchronized for you. Given that something (DML) is taking place on the base table and an index is being updated, is there an implicit commit taking place? The answer is no. You can test this by running the query based on the two new records, and then running the same query in another session. So, one other potential gotcha in the documentation is the absence of a COMMIT statement.

In Closing

So far, we’ve seen two working examples of different index types used by Oracle Text. The simple things are simple to do; and likewise, the harder things are harder too. Next time, we’ll take a look at some of the more complex features and usage of this tool.

Additional Resources

Oracle Text

» See All Articles by Columnist Steve Callan

Steve Callan
Steve Callan
Steve is an Oracle DBA (OCP 8i and 9i)/developer working in Denver. His Oracle experience also includes Forms and Reports, Oracle9iAS and Oracle9iDS.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles