Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Apr 21, 2005

The Globalization of Language in Oracle - And Case-Insensitivity

By James Koopmann

How to take advantage of globalization techniques to perform case insensitivity in Oracle.

Up to now in this series on globalization, we have discussed how to set particular environment variables or database parameters. This was done to get our client machines configured in such a way that they truly represented their true locale and for the proper translation of character strings when communicating between the client and database server.

This article shifts gears, taking us past the setting of variables, and diving into an implementation of how we can use these settings to our benefit--in particular our ability to perform case insensitivity comparisons.

Over the years, the ability, and desire of the end use community, to equate 'a' with 'A' has dumbfounded many a programmer and DBA. The reason for wanting to perform a case insensitive search is quite simplistic. An end user, when searching for a person, place or thing, typically wants to just key in for instance 'suzy smith' with no regard for how another user may have entered the name at data entry time. Hopefully you can see that not only does case insensitivity help with searching for items but also during data entry as true duplicates may be found. This of course assumes there is no difference between 'suzy smith', 'Suzy Smith' and 'SUZY SMITH'.

Two articles ago in The Globalization of Language in Oracle - The NLS_COMP and NLS_SORT variables, we touched on the ability to have case insensitivity and accent insensitivity for a character. In this article we set NLS_COMP=ANSI and NLS_SORT=BINARY_CI (case insensitivity) or NLS_SORT=BINARY_AI (accent and case insensitivity). The difference being that under NLS_SORT=BINARY_CI (case insensitivity) a=A and d=D. BUT under NLS_SORT=BINARY_AI (accent and case insensitivity) a=A=d=D.

Pre-Oracle 10g

In pre-Oracle10g versions of Oracle, in order to get this type of equality of characters you needed to use a combination of the NLS_UPPER and NLS_LOWER functions or use the NLS_SORT setting of GENERIC_BASELETTER. While these approaches worked they do have the flaw of having to programmatically code the NLS_UPPER or NLS_LOWER functions into every select statement. Or when using the GENERIC_BASELETTER you had a solution that was not a true linguistic solution for character comparison.

Oracle 10g

Now, in Oracle 10g if we have a table called NAMES and have the following entries in that table:

SQL> select name from names;
NAME
-------------
suzy smith
Suzy Smith
SUZY SMITH

If we wanted to perform a search on 'Suzy Smith', under normal setting of case sensitive (NLS_SORT=BINARY) searching we would get the following:

SQL> select name from names where name = 'Suzy Smith';
NAME
------------
Suzy Smith

If we then wanted to turn on case insensitivity (NLS_SORT=BINARY_CI) we would get the following results:

SQL> alter session set NLS_COMP=ANSI;
SQL> alter session set NLS_SORT=BINARY_CI;
SQL> select name from names where name = 'Suzy Smith'
NAME
-------------
suzy smith
Suzy Smith
SUZY SMITH

As you have noticed these examples use strict equality (where name = 'Suzy Smith'). Often times, more often than not, we want to perform a search where we only know part of a name. So under normal settings of case sensitivity NLS_SORT=BINARY when we perform a search for 'Suzy%' we get the following:

SQL> select name from names where name like 'Suzy%';
NAME
------------
Suzy Smith

And when we switch to case insensitivity (NLS_SORT=BINARY_CI) we get this result:

SQL> alter session set NLS_COMP=ANSI;
SQL> alter session set NLS_SORT=BINARY_CI;
SQL> select name from names where name like 'Suzy%';
NAME
------------
Suzy Smith

This is the exact same results and thus we gain nothing by using an NLS_SORT setting for case insensitivity. As you can see case insensitivity ONLY works for straight equality searches.

In order to perform a case insensitive search that does not use strict equality we must now introduce the function REGEXP_LIKE. The REGEXP_LIKE function does regular expression matching instead of the simple pattern matching of the LIKE comparison operator. If you have ever used any of the various arrays of Unix type scripting languages that have pattern matching you will feel right at home. If not I would encourage you to do a bit of research on pattern matching and you will soon see the power behind the types of searching, should say pattern matching, you can do. For this article, we will only present the most simplistic use of this function for finding 'Suzy%'.

The following is the format for the REGEXP_LIKE function.
REGEXP_LIKE(source_string, pattern [, match_parameter] )

source_string

the column or characters we are wanting to match against.

pattern

the pattern or regular expression we would like the source string compared against.

match_parameter

is, for our case insensitivity, is 'i' for case insensitivity or 'c' for case sensitivity. If this match parameter is not supplied then the value of the NLS_SORT parameter is used.


So for our example of searching for 'Suzy%' and having an NLS_SORT for case sensitive searching (NLS_SORT=BINARY) we could issue the following types of SELECT statements and get the following results:

SQL> select name from names where REGEXP_LIKE(name,'Suzy');
NAME
-----------
Suzy Smith

SQL> select name from names where REGEXP_LIKE(name,'Suzy','i');
NAME
------------
suzy smith
Suzy Smith
SUZY SMITH

SQL> select name from names where REGEXP_LIKE(name,'[sS][uU][zZ][yY]');
NAME
-----------
suzy smith
Suzy Smith
SUZY SMITH

If we switched to an NLS_SORT=BINARY_CI for case insensitivity we now get the following types of results. As you can see we do not need to specify the match_parameter of 'i' for case insensitivity because it defaults to the NLS_SORT variable.

SQL> select name from names where REGEXP_LIKE(name,'Suzy');
NAME
------------
suzy smith
Suzy Smith
SUZY SMITH

There are literally dozens of ways you can pattern match for finding character strings with this function that extend the power of normal searches you can do by strict equality or by using the simplistic LIKE comparison operator.

While case insensitivity can extend your application and allow your users to literally find dozens of hidden artifacts, the fact still remains that there is no direct way to fully convert an application to case insensitivity unless you do not use the LIKE operator and every SQL statement is an exact equality match. In order to provide for complete case insensitivity to your end user community you need to start using the REGEXP_LIKE function. The only real issue is that you need to change your applications. Maybe in the next release of Oracle they will be able to extend the functionality of the LIKE comparison to provide true case insensitivity matching.

» See All Articles by Columnist James Koopmann



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date