The Globalization of Language in Oracle – And Case-Insensitivity

How to take advantage of globalization techniques to perform
case insensitivity in Oracle.

Up to now in this series on globalization,
we have discussed how to set particular environment variables or database
parameters. This was done to get our client machines configured in such a way that
they truly represented their true locale and for the proper translation of
character strings when communicating between the client and database server.

This article shifts gears, taking us
past the setting of variables, and diving into an implementation of how we can
use these settings to our benefit–in particular our ability to perform case
insensitivity comparisons.

Over the years, the ability, and
desire of the end use community, to equate ‘a’ with ‘A’ has dumbfounded many a
programmer and DBA. The reason for wanting to perform a case insensitive search
is quite simplistic. An end user, when searching for a person, place or thing,
typically wants to just key in for instance ‘suzy smith’ with no regard for how
another user may have entered the name at data entry time. Hopefully you can
see that not only does case insensitivity help with searching for items but
also during data entry as true duplicates may be found. This of course assumes
there is no difference between ‘suzy smith’, ‘Suzy Smith’ and ‘SUZY SMITH’.

Two articles ago in The
Globalization of Language in Oracle – The NLS_COMP and NLS_SORT variables
,
we
touched on the ability to have case insensitivity and accent insensitivity for
a character. In this article we set NLS_COMP=ANSI and NLS_SORT=BINARY_CI (case
insensitivity) or NLS_SORT=BINARY_AI (accent and case insensitivity). The
difference being that under NLS_SORT=BINARY_CI (case insensitivity) a=A and d=D.
BUT under NLS_SORT=BINARY_AI (accent and case insensitivity) a=A=d=D.

Pre-Oracle 10g

In pre-Oracle10g versions of Oracle,
in order to get this type of equality of characters you needed to use a
combination of the NLS_UPPER and NLS_LOWER functions or use the NLS_SORT
setting of GENERIC_BASELETTER. While these approaches worked they do have the
flaw of having to programmatically code the NLS_UPPER or NLS_LOWER functions
into every select statement. Or when using the GENERIC_BASELETTER you had a
solution that was not a true linguistic solution for character comparison.

Oracle 10g

Now, in Oracle 10g if we have a
table called NAMES and have the following entries in that table:

SQL> select name from names;
NAME
-------------
suzy smith
Suzy Smith
SUZY SMITH

If we wanted to perform a search on
‘Suzy Smith’, under normal setting of case sensitive (NLS_SORT=BINARY)
searching we would get the following:


SQL> select name from names where name = ‘Suzy Smith’;
NAME
————
Suzy Smith

If we then wanted to turn on case
insensitivity (NLS_SORT=BINARY_CI) we would get the following results:


SQL> alter session set NLS_COMP=ANSI;
SQL> alter session set NLS_SORT=BINARY_CI;
SQL> select name from names where name = ‘Suzy Smith’

NAME
————-
suzy smith
Suzy Smith
SUZY SMITH

As you have noticed these examples
use strict equality (where name = ‘Suzy Smith’). Often times, more often than
not, we want to perform a search where we only know part of a name. So under
normal settings of case sensitivity NLS_SORT=BINARY when we perform a search
for ‘Suzy%’ we get the following:


SQL> select name from names where name like ‘Suzy%’;
NAME
————
Suzy Smith

And when we switch to case
insensitivity (NLS_SORT=BINARY_CI) we get this result:


SQL> alter session set NLS_COMP=ANSI;
SQL> alter session set NLS_SORT=BINARY_CI;
SQL> select name from names where name like ‘Suzy%’;

NAME
————
Suzy Smith

This is the exact same results and
thus we gain nothing by using an NLS_SORT setting for case insensitivity. As
you can see case insensitivity ONLY works for straight equality searches.

In order to perform a case
insensitive search that does not use strict equality we must now introduce the
function REGEXP_LIKE. The REGEXP_LIKE function does regular expression matching
instead of the simple pattern matching of the LIKE comparison operator. If you
have ever used any of the various arrays of Unix type scripting languages that
have pattern matching you will feel right at home. If not I would encourage you
to do a bit of research on pattern matching and you will soon see the power
behind the types of searching, should say pattern matching, you can do. For
this article, we will only present the most simplistic use of this function for
finding ‘Suzy%’.

The following is the format for the
REGEXP_LIKE function.

REGEXP_LIKE(source_string, pattern [, match_parameter] )

source_string

the column or characters we are wanting to match against.

pattern

the pattern or regular expression we would like the source
string compared against.

match_parameter

is, for our case insensitivity, is ‘i’ for case
insensitivity or ‘c’ for case sensitivity. If this match parameter is not
supplied then the value of the NLS_SORT parameter is used.


So for our example of searching for ‘Suzy%’ and having an
NLS_SORT for case sensitive searching (NLS_SORT=BINARY) we could issue the
following types of SELECT statements and get the following results:


SQL> select name from names where REGEXP_LIKE(name,’Suzy’);
NAME
———–
Suzy Smith

SQL> select name from names where REGEXP_LIKE(name,’Suzy’,’i’);
NAME
————
suzy smith
Suzy Smith
SUZY SMITH

SQL> select name from names where REGEXP_LIKE(name,’[sS][uU][zZ][yY]’);
NAME
———–
suzy smith
Suzy Smith
SUZY SMITH

If we switched to an NLS_SORT=BINARY_CI for case
insensitivity we now get the following types of results. As you can see we do
not need to specify the match_parameter of ‘i’ for case insensitivity because
it defaults to the NLS_SORT variable.


SQL> select name from names where REGEXP_LIKE(name,’Suzy’);
NAME
————
suzy smith
Suzy Smith
SUZY SMITH

There are literally dozens of ways you can pattern match for
finding character strings with this function that extend the power of normal
searches you can do by strict equality or by using the simplistic LIKE
comparison operator.

While case insensitivity can extend your application and
allow your users to literally find dozens of hidden artifacts, the fact still
remains that there is no direct way to fully convert an application to case
insensitivity unless you do not use the LIKE operator and every SQL statement
is an exact equality match. In order to provide for complete case insensitivity
to your end user community you need to start using the REGEXP_LIKE function.
The only real issue is that you need to change your applications. Maybe in the
next release of Oracle they will be able to extend the functionality of the
LIKE comparison to provide true case insensitivity matching.

»


See All Articles by Columnist
James Koopmann

James Koopmann
James Koopmann
James Koopmann has fourteen years of database design, development and performance tuning experience. In addition, he has extensive database administration experience in Oracle and other relational databases in production environments, specializing in performance tuning of database engines and SQL based applications. Koopmann is an accomplished author with several technical papers in various Oracle related publications such as Oracle Magazine, Oracle Professional and SQL>UPDATE_RMOUG. He is a featured author and database expert for DatabaseJournal, a member of the editorial review committee for Select Journal (The Magazine for the International Oracle Users Group), an Oracle Certified Professional DBA and noted speaker at local Oracle User Groups around the country.

Latest Articles