Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Mar 17, 2005

The Globalization of Language in Oracle - The NLS_COMP and NLS_SORT variables

By James Koopmann

How does Oracle compare and sort data. There is more to it than meets the eye.

Ever since I was a newbie to databases I never questioned how sorting occurred in a database. I just knew that there was a predefined order to the character set (English) and I never thought much more of it. We all get too complacent about equating predicates in a WHERE clause or adding an ORDER BY clause to sort our output to put priority to the data we view. If you care to delve into how Oracle determines how to sort, there is a good write up in the Oracle documentation set that discusses primary (first) level, secondary, and tertiary level sorting mechanisms. I would encourage you to search on these sorting levels as the reading gives great insight into the complexities that need to be overcome. We are, in this article, more concerned with the setting of variables to get the desired outcome for the locale we need to present data to.

Well these past few months, yes months, I have been playing around with this globalization stuff and have continually been stumped by what is in Oracle documentation and what I can actually get to work on my local databases. After getting over the hang-up of operating system variables over riding my Oracle client's session variables it is now time to get a bit deeper into what these variables do. In my opinion the two most important variables that can get set for globalization are the NLS_COMP and NLS_SORT variables. These two variables themselves define how a client will experience the data that gets presented to them. They control how characters get compared to each other and how they sort. Table 1 gives a definition of these two variables.

Table 1 NLS_COMP & NLS_SORT Defined

PARAMETER

Descripton

NLS_COMP

The sole purpose of this variable is to determine how the predicates in SQL statements will evaluate to each other when a comparison is required. The typical comparison operator is the WHERE clause but also includes such comparisons as ORDER BY, START WITH, HAVING,...etc...

There are only two settings.

BINARY : All comparisons will be done in accordance to the binary value of the characters.

ANSI : All comparisons will be done linguistically in accordance to the setting of the variable NLS_SORT.

How to set

  1. Set with an operating system environment variable (SET NLS_COMP=<binary|ansi>)
  2. ALTER SESSION SET NLS_COMP=<binary|ansi>;

NLS_SORT

NLS_SORT specifies the collating sequence for ORDER BY queries.

If the value is BINARY, then the collating sequence for ORDER BY queries is based on the numeric value of characters.

If the value is a named linguistic sort, sorting is based on the order of the defined linguistic sort. Most (but not all) languages supported by the NLS_LANGUAGE parameter also support a linguistic sort with the same name.

There are many values.

BINARY : All sorting is done in accordance to the binary value of the characters.

V$NLS_VALID_VALUES : All sorting is done in accordance to a named linguistic definition. This means that Oracle will sort in accordance to a particular locale (GERMAN, POLISH, FRENCH, etc.).These valid named definitions may be obtained by querying the V$NLS_VALID_VALUES where parameter = `SORT'. Just be aware that all the values in this view also have a hybrid definition by adding a suffix of _CI (case insensitivity or _AI (accent-insensitive and case-insensitive). But more on these two hybrids in an upcoming article.

How to set

  1. Set with an operating system environment variable (SET NLS_SORT=<valid_value>)
  2. ALTER SESSION SET NLS_SORT=<valid_value>;

 

Newsflash Setting NLS_LANG environment variable

It has come to my attention that the setting of your NLS_LANG environment variable is of the utmost importance. This little variable, if not set properly, will create havoc within your database. It should be set to a valid setting that takes the form of <nls_language>_<nls_territory>.<nls_characterset> . The concern comes when you have a registry entry on your client machine that is not compatible with the database you are connecting to. In my case it was a difference in character set which caused incompatibilities between clients at different locales. You can find these values from the view V$NLS_VALID_VALUES.

For example my NLS_LANG variable was set to AMERICAN_AMERICA.WE8MSWIN1252


In the first article on globalization in Oracle I presented you with a table on some common characters in the English and German language. Table 2 re-visits this data. You should revisit the first article if you want to create this table and query it. Table 2 gives you a few examples so that you can see how setting the NLS_SORT variable will alter the output of a SELECT ... ORDER BY statement.

Table 2 Example data sorted on different NLS_COMP & NLS_SORT values

select xgraphic from char_table order by xgraphic;

NLS_COMP=ANSI

NLS_SORT=BINARY

NLS_COMP=ANSI

NLS_SORT=BINARY_CI

NLS_COMP=ANSI

NLS_SORT=BINARY_AI

XDECIMAL XBINARY XGRAPHIC

-------- ---------- ---------

65 1000001 A

66 1000010 B

67 1000011 C

68 1000100 D

79 1001111 O

85 1010101 U

97 1100001 a

98 1100010 b

99 1100011 c

100 1100100 d

111 1101111 o

117 1110101 u

196 11000100 Ä

214 11010110 Ö

220 11011100 Ü

228 11100100 ä

246 11110110 ö

252 11111100 ü

XDECIMAL XBINARY XGRAPHIC

-------- ---------- ---------

65 1000001 A

97 1100001 a

66 1000010 B

98 1100010 b

67 1000011 C

99 1100011 c

68 1000100 D

100 1100100 d

79 1001111 O

111 1101111 o

85 1010101 U

117 1110101 u

196 11000100 Ä

228 11100100 ä

214 11010110 Ö

246 11110110 ö

220 11011100 Ü

252 11111100 ü

XDECIMAL XBINARY XGRAPHIC

-------- ---------- --------

65 1000001 A

97 1100001 a

228 11100100 ä

196 11000100 Ä

66 1000010 B

98 1100010 b

67 1000011 C

99 1100011 c

68 1000100 D

100 1100100 d

79 1001111 O

111 1101111 o

214 11010110 Ö

246 11110110 ö

85 1010101 U

117 1110101 u

252 11111100 ü

220 11011100 Ü

To take a look at what is really happing behind the scene here we need to extract the numerical representation of a character to see how it sorts. This can be done by imposing the NLSSORT function around the character string in question. Table 3 presents the output of just such a query. The second column now uses the _CI (case insensitivity) extension. The item to glean from this output is that the upper case letter A has the same NLSSORT value as the lower case letter a and the ä character now has the same nlssort value as Ä. Now when we get to _AI (accent insensitivity & case insensitivity) in column three we have a condition where the nlssort value is the same for a set of characters such that a=A=ä=Ä. WOW! Now I encourage you to run under these three scenarios and issue the following SQL.

SELECT xgraphic from char_table where xgraphic = `a';

Table 3 Example data sorted on different NLS_COMP & NLS_SORT values with NLSSORT

select xdecimal, nlssort(xgraphic) nlssort,xgraphic from char_table order by xgraphic;

NLS_COMP=ANSI

NLS_SORT=BINARY

NLS_COMP=ANSI

NLS_SORT=BINARY_CI

NLS_COMP=ANSI

NLS_SORT=BINARY_AI

XDECIMAL NLSSORT XGRAPHIC

-------- ---------- ---------

65 4100 A

66 4200 B

67 4300 C

68 4400 D

79 4F00 O

85 5500 U

97 6100 a

98 6200 b

99 6300 c

100 6400 d

111 6F00 o

117 7500 u

196 C38400 Ä

214 C39600 Ö

220 C39C00 Ü

228 C3A400 ä

246 C3B600 ö

252 C3BC00 ü

XDECIMAL NLSSORT XGRAPHIC

-------- ---------- --------

65 6100 A

97 6100 a

66 6200 B

98 6200 b

67 6300 C

99 6300 c

68 6400 D

100 6400 d

79 6F00 O

111 6F00 o

85 7500 U

117 7500 u

196 C3A400 Ä

228 C3A400 ä

214 C3B600 Ö

246 C3B600 ö

220 C3BC00 Ü

252 C3BC00 ü

XDECIMAL NLSSORT XGRAPHIC

-------- ---------- --------

65 6100 A

97 6100 a

228 6100 ä

196 6100 Ä

66 6200 B

98 6200 b

67 6300 C

99 6300 c

68 6400 D

100 6400 d

79 6F00 O

111 6F00 o

214 6F00 Ö

246 6F00 ö

85 7500 U

117 7500 u

252 7500 ü

220 7500 Ü

Dealing with a multi lingual database is confusing at best. But when we can get a glimpse at the internal numbers associated with a sort, it starts to give us some insights into how things work and why we are sorting in a particular order. Unfortunately this article has brought up just as many questions as I think I have answered for myself. Hopefully for you also. If there is just one item I would like everyone to get out of this article, it is that you MUST set the NLS_LANG variable properly. DO NOT rely upon past settings and PLEASE verify the setting in your registry. Globalization is a wild ride, please hold on.

» See All Articles by Columnist James Koopmann



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date