The Globalization of Language in Oracle – Index Requirements

What are the performance implications for the
globalization of your Oracle database? This article takes a look at indexing
requirements when using NLS_SORT <> BINARY and using an ORDER BY clause.

First, I would like to put to bed a myth that I am sure will
sprout up as more and more individuals venture down the implementation of
maintaining multiple languages within their Oracle databases. There is a
statement made that states it is a requirement to create an index for columns
that require linguistic sorting capability. This statement in no way requires an
index on those columns that contain multi lingual data. We all know that
indexes are optional and are not required for any form of data to be inserted
into our databases. This statement only emphasizes that if you want to have
linguistic data in your database and you want to sort the data efficiently then
you should consider adding a linguistic index on those columns. As always, when
we tune SQL statements, it is dependent on the type and amount of data we are
trying to sort. In addition, as Oracle will perform a sort in the background
for the linguistic sort you require, depending on the predicates in the WHERE
clause, it may still be more efficient to NOT have a linguistic index on the
columns.

By default, Oracle databases are
setup with an NLS_SORT set to BINARY. When we do a sort on a column, the sort
is done in BINARY order, which equals the order of the ASCII characters. This
is the order all of us in the English-speaking world have grown to love. You
can see sample data output on this in the second article of this series called The
Globalization of Language in Oracle – The NLS_COMP and NLS_SORT variables
. Look at Table 2 of that
article and the column where NLS_COMP=ANSI and NLS_SORT=BINARY are. This sort
was done on the xGraphic character and as you can see, the order of the Binary
value represents the order of graphic character stored. As soon as we begin to
introduce case insensitivity or accent insensitivity, the order of the binary
value is all over the place in the next two columns in that table. This is
because we have introduced a new sort order defined through the NLS_SORT
parameter that takes precedence over the default BINARY order.

When we design databases we have been taught to put
indexes on those columns that are involved in predicate comparisons of the
WHERE clause or involved in sort functionality of SQL statements, such as the
ORDER BY clause. For instance, Table A executes a simple SQL statement
that queries the char_table we have been working with and ORDERs BY the xgraphic
character. Following is the explain plan for just such a query. I have added
the setting of the NLS_SORT parameter just for clarification that in fact we
are running in "default" mode for binary sorting. As you can see from
the explain output, we in fact used the binary index that was created on the xgraphic
column.

Table A.


ALTER SESSION SET NLS_SORT=BINARY
SELECT xdecimal, xgraphic FROM char_table ORDER BY xgraphic

Explain Output


SELECT STATEMENT Optimizer=ALL_ROWS (Cost=827 Card=33,166 bytes=895,482)
TABLE ACCESS (BY INDEX ROWID) of “CHAR_TABLE” #1 TABLE (Cost=827 Card=33,166 bytes=895,482)
INDEX (FULL SCAN) of “CHAR_TABLE_XGRAPHIC_BINARY_IX” INDEX (Cost=26 Card=33,166 bytes=)

The question now comes into play, since we are talking about
multi linguistic databases, what happens when we are required by our clients to
present the data in the sort order of a particular locale? For this example, we
assume our client is in Germany. We need to change the NLS_SORT session
variable to GERMAN. This in itself will provide a sort order to
the client with a linguistic sort order accustomed to by the German language.
The problem comes into play on how Oracle must get and sort the data to be
presented. Table B shows the exact same SQL statement as in Table A,
but please notice the explain plan output. As you can see we set the NLS_SORT
to GERMAN and then produced the explain output. The access path now does a full
table scan on the char_table and then performs a sort. This sort is actually a
German Linguistic Sort. I think it would be nice for Oracle to indicate that in
the explain plan but am sure it will be in a future release. Anyway, you can
see that now the cost of the SELECT statement also went up considerably. Therefore,
the question becomes ‘how can we now provide acceptable performance for our
German clients.’

Table B.


ALTER SESSION SET NLS_SORT=GERMAN
SELECT xdecimal, xgraphic FROM char_table ORDER BY xgraphic

Explain Output


SELECT STATEMENT Optimizer=ALL_ROWS (Cost=9,115 Card=38,269 bytes=1,033,263)
SORT (ORDER BY) (Cost=9,115 Card=38,269 bytes=1,033,263)
TABLE ACCESS (FULL) of “CHAR_TABLE” #1 TABLE (Cost=8,819 Card=38,269 bytes=1,033,263)

The solution is really quite simple. We need only place an
index (linguistic index) on the column that is required for the sort to be
performed on. This is where the myth I mentioned in the opening paragraph will
come from. Table C gives the session SQL statements executed along with
the explain output on the SELECT statement. For creating the index, you need
only specify the NLSSORT function around the column you would like to put a
linguistic index on. Also, include the language for the function-based index.
Now when we set the NLS_SORT session variable to GERMAN and execute the SELECT
statement you can see that the linguistic index is used. Moreover, access to
the table data is by ROWID just as in the Table A example. Also, note
that the cost of executing this SQL statement also returned to the original
value.

Table C.


CREATE INDEX CHAR_TABLE_XGRAPHIC_GERMAN_IX ON CHAR_TABLE (NLSSORT(XGRAPHIC, ‘NLS_SORT=GERMAN’));
ALTER SESSION SET NLS_SORT=GERMAN
SELECT xdecimal, xgraphic FROM char_table ORDER BY xgraphic

Explain Output


SELECT STATEMENT Optimizer=ALL_ROWS (Cost=827 Card=33,166 bytes=895,482)
TABLE ACCESS (BY INDEX ROWID) of “CHAR_TABLE” #1 TABLE (Cost=827 Card=33,166 bytes=895,482)
INDEX (FULL SCAN) of “CHAR_TABLE_XGRAPHIC_GERMAN_IX” INDEX (Cost=26 Card=33,166 bytes=)

Small Caveat

The one other caveat you need to be aware of is that in
order for the linguistic index to be used the column in the table must be NOT NULLable.
Alternatively, you can add the WHERE <column_name> IS NOT NULL to your
SQL statement.

Providing adequate performance for multi-lingual data is not
difficult. You only need to add linguistic indexes where they make sense on
your tables. Are these indexes required by Oracle? No. Will these indexes
always be used when you issue an ORDER BY clause? No. Remember performance
tuning, especially for SQL statements is dependent on a variety of variables.
You need to be concerned with well formed SQL statements, row count,
cardinality, and selectivity before just placing indexes on any and every
column involved in a SQL statement.

»


See All Articles by Columnist
James Koopmann

James Koopmann
James Koopmann has fourteen years of database design, development and performance tuning experience. In addition, he has extensive database administration experience in Oracle and other relational databases in production environments, specializing in performance tuning of database engines and SQL based applications. Koopmann is an accomplished author with several technical papers in various Oracle related publications such as Oracle Magazine, Oracle Professional and SQL>UPDATE_RMOUG. He is a featured author and database expert for DatabaseJournal, a member of the editorial review committee for Select Journal (The Magazine for the International Oracle Users Group), an Oracle Certified Professional DBA and noted speaker at local Oracle User Groups around the country.

Latest Articles