Beware of Mixing Collation with SQL Server 2000 – Part 1

A few months ago, I was
running a stored procedure (SP) that performed a cross database join on our SQL
Server 2000 box. This particular procedure was processing records from a table
in one database (database A), that was originally created on our SQL Server
2000 box, while also processing records from a table in another database
(database B), that was also on our SQL Server 2000 box, but was created by
restoring a SQL Server 7.0 database backup. The SQL Server 7 machine had a
different character set and sort order than our SQL Server 2000 database. The
SELECT statement the SP was running was joining records from the table in
database A with the table in database B on a varchar key field. Can you imagine
what happened? The SP failed with a “collation error.”

SQL Server 2000 finally
solved the problem of taking database backups from servers with a different
character set / sort order and restoring them, without getting a collation
error. Before SQL Server 2000, backups from one server could only be restored
on a different server if the same character set and sort order where used by
both servers. With SQL Server 2000, you are able to set the collating sequence
at the server, database, and even the column level. Now that you can have
multiple collating sequences on a single database server, there are a few
things you should understand about mixing collations.

When you have character data
stored with different collation sequences, you need to be aware that comparing
columns becomes more involved. Columns with the same collating sequences can
be compared or joined without any additional considerations. Although when
you compare columns that have different collating sequences, it requires some special
coding considerations.

Let’s look at an example
that will demonstrate what I am talking about. The example below tries to join
data from two different tables on columns that have different collating
sequences.


create table #a (char_set1 varchar(50) collate Latin1_General_CI_AS)
create table #b(char_set2 varchar(50)collate Latin1_General_BIN)
insert into #a values (‘collate’)
insert into #b values (‘collate’)
select * from #a join #b on char_set1 = char_set2

When the code above is executed,
the select statement returns the following error:

Cannot
resolve collation conflict for equal to operation.

Why does the SELECT
statement get this error? This error happens because char_set1 column in table
#a has collation of Latin1_General_CI_AS, where as char_set2 column in table #b
uses a collation of Latin1_General_BIN.

Of course, SQL Server allows
you to write code so you can join the above two tables, even though the join
columns each have a different collation setting. By Changing the SELECT statement
so that one of the columns does an explicit collation conversion prior to
joining, resolves the issue shown above. Below char_set2, which has a
collation setting of Latin1_General_BIN is explicitly converted to
Latin1_General_CI_AS using the COLLATE clause:

select * from #a join #b on char_set1 = char_set2 collate Latin1_general_CI_AS

By using this explicit
conversion from one collation to another you can then join both tables. Now if
you really have to store and compare columns with different collation, then SQL
Server is great at providing you this functionality. Although if you have no
business requirements to store data with different collations then by all means
don’t. Storing data of varying collations causes additional unnecessary programming
considerations therefore choose your collation requirements wisely.

In our environment, we have many
contractors doing development. These contractors typically develop our
databases on their own SQL Server box. Their server and/or database collation
setting may or may not be the same as ours. Occasionally a contractor will
send me a database backup and tell me to restore it onto our test server. This
is easily done with SQL Server 2000, regardless of the collation setting on the
developer’s machine. Every so often when I do this, I find that the restored
database has a different collation setting then our server. When this happens
I have to convert the data in the restored database to our standard collation. (This
is another topic that I will discuss in part 2). I do this conversion because
we require all our data to have the same collation setting. If you are
restoring databases from other SQL Server machines, be aware that you might be
restoring data that has a different collation setting then your server.

I’m going to leave you with
one more example of why knowing your collating setting is extremely import.

The TEMPDB database is
created each time the server is rebooted. TEMPDB has the same default
collation sequence as the model DB. If for some reason you have a SQL Server
2000 database that has a different sort sequence than your current TEMPDB
collation setting, then you potentially might have a problem when comparing
data between your application and TEMPDB.

Here is an example of the
TEMPDB comparison I am talking about. This example creates database test with
a collation of Latin1_General_BIN, then creates and populates table b with
data. Next, it builds table #temp in TEMPDB and inserts records into it from
table b. The columns in TEMPDB use the model db collation sequence settings,
which is in my case is Latin1_general_CI_AS.


create database test collate Latin1_General_BIN
go
use test
go
Create table b(char_set1 varchar(50))
insert into b values (‘collate’)
CREATE TABLE #TEMP (CHAR_SET2 VARCHAR(50))
insert into #temp select * from b
select * from #temp
select * from #temp t join b on t.char_set2 = b.char_set1

When I run the above code, I
get the collation conflict error on the second SELECT statement. If the two
databases had the same collation sequence this error would not occur. If I
change the second SELECT statement to read as follows, the join will work
without a collation conflict:


select * from #temp t join b on t.char_set2 collate Latin1_General_BIN = b.char_set1

There are a couple of
different ways to determine your default sort order and character set. The sp_helpsort
SP can be used to determine the character set / sort order of your server. For
determining the database character set / sort order use sp_helpdb. Now if you
also what to know the sort order / character set for each column in a database
then use the information_schema.columns view to display this information.

Conclusion

Being able to restore
databases onto a SQL Server 2000 box, without requiring the target server to
have the same character set and sort order as the source server, is a big
improvement over SQL Server 7.0. Although you need to be aware of the
collation problems that might occur if your SQL Server 2000 machine contains
databases and columns with varying collations. If you really do not have a
business requirement to store data with different collations, then verify
collation information every time you restore a database where you are unsure of
the collation setting on the source server/database. The bottom line is you had
better know your column collation settings in order to avoid the collation
conflict error message.

»


See All Articles by Columnist
Gregory A. Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles