Dealing With Upper and Lower Case Data

When working with data in SQL Server you might run across
different needs to change the case of the data. For instance, you might want to
convert all the characters in a varchar column to upper case, lower case, or
change only the first character of a column value to upper case. In addition,
you might want to compare different character strings and either have them be
equal or not equal when the characters are the same but the case is different.
In this article, I will show you different examples of how to deal with
different situations related to the case of character strings.

What is Collation?

Each server, database, or character column has a default
collation. A collation is a definition for how each character is represented,
stored and compared. Part of the collation setting, known as case sensitivity,
determines if an upper case “A” and a low case “a” when compared against either
other are the same or different. If the collation is case-insensitive then a
lower case “a” and an upper case “A” when compared are the same. So when using
case insensitive data the following two strings are equal, “AaAaA” and “aaaaa”.
On the other hand, if your collation setting is case-sensitive then the
preceding two strings of A’s will not be equal. Therefore, when you are working
with data you need to understand what the case sensitivity is for the character
data you are dealing with.

UPPER and LOWER Function

Within SQL Server, two functions are provided to manage
the case of character strings. These functions are UPPER and LOWER. The UPPER
function accepts a character string as a parameter and returns the same
character string where all the characters have been converted to upper case. If
the character string “AbCd1234” is passed to the UPPER function, then this
function returns “ABCD1234”. The LOWER function does just the opposite and
converts all characters it receives to lower case, so if “AbCd1234” is passed
to the LOWER function then it would return “abcd1234”.

Converting the First Character of Each Column to Upper Case

For my first example let me show you how to convert the
first character of each column value to upper case and lower casing the rest of
the column. Here is the T-SQL code for my example:


SET NOCOUNT ON
IF exists (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N’[mytable]’)
and OBJECTPROPERTY(id, N’IsUserTable’) = 1)
DROP TABLE [mytable]
GO
CREATE TABLE [mytable] (
A CHAR(10))
go
INSERT INTO mytable VALUES(‘aAaA’)
INSERT INTO mytable VALUES(‘bBbB’)
INSERT INTO mytable VALUES(‘cCcC’)
INSERT INTO mytable VALUES(‘dDdD’)
INSERT INTO mytable VALUES(‘DDdD’)
— Uppercase first letter of each value in column
SELECT UPPER(SUBSTRING(A,1,1)) + LOWER(SUBSTRING(A,2,LEN(A)))
[Upper Case First Letter Only]
FROM mytable
DROP TABLE mytable

In this example, I created a table “mytable” that
contained a single column. I then insert five different records into this
table. The column in each row contains a character string that has a mixed set
of cases. In the SELECT statement above, I use the substring function to get
the first character of column “A”. I then pass that first character to the UPPER
function. This upper cases the first character of column “A”. I then use the
concatenation operator (+) to append the rest of column “A” to that upper cased
first character. Before the rest of column “A” is concatenated to the first
character, it is lower cased by using the LOWER function on the substring of
column “A” starting at second character of column “A”. Here is the output I get
when I run the above script:


Upper Case First Letter Only
—————————-
Aaaa
Bbbb
Cccc
Dddd
Dddd

Proper Casing a Persons Name

In this example, I will take a column that contains a
persons name and proper case the name. By proper case I mean I will take the
first character of each name (First, Middle, Last, etc.) and upper case it and
then lower case the rest of each name.

The bulk of my proper case example is coded in a T-SQL
function. The function handles proper casing any number of names. It also
handles multiple spaces between names, Irish type names, and with apostrophe
or parentheses within in the name. Here is the code for my proper case
function:


IF OBJECT_ID (N’dbo.ProperCase’, N’FN’) IS NOT NULL
DROP FUNCTION dbo.ProperCase
GO
CREATE FUNCTION ProperCase (@String VARCHAR(8000)) RETURNS VARCHAR(8000) AS BEGIN DECLARE @TempString VARCHAR(8000) DECLARE @PS VARCHAR(8000)
SET @PS = ”
— lower case entire string
SET @TempString = lower(@String)
WHILE patindex(‘%[-( ”]%’,@TempString) > 0 BEGIN
— Check to see if first character of @TempString is whitespace
IF (patindex(‘%[-( ”]%’,SUBSTRING(@TempString,1,1)) > 0)
BEGIN
SET @PS = @PS + SUBSTRING(@TempString,1,1)
END
ELSE — @TempString starts with a Name
BEGIN
IF SUBSTRING(@TempString,1,2) = ‘mc’
BEGIN
SET @PS = @PS + ‘Mc’
SET @TempString = SUBSTRING(@Tempstring,3,LEN(@TempString))
END
IF SUBSTRING(@TempString,1,3) = ‘mac’
BEGIN
SET @PS = @PS + ‘Mac’
SET @TempString = SUBSTRING(@Tempstring,4,LEN(@TempString))
END

— upper case first character and return string up to the next space
SET @PS = @PS + UPPER(SUBSTRING(@TempString,1,1)) +
SUBSTRING(@TempString,2,patindex(‘%[-( ”]%’,@TempString)-1)

END
— truncation string that we have already processed

SET @TempString = SUBSTRING(@TempString,
patindex(‘%[-( ”]%’,@TempString)+1,LEN(@TempString))
— Trim off leading spaces
SET @TempString = LTRIM(@TempString)
END
IF SUBSTRING(@TempString,1,2) = ‘mc’
BEGIN
SET @PS = @PS + ‘Mc’
SET @TempString = SUBSTRING(@Tempstring,3,LEN(@TempString))
END
IF SUBSTRING(@TempString,1,3) = ‘mac’
BEGIN
SET @PS = @PS + ‘Mac’
SET @TempString = SUBSTRING(@Tempstring,4,LEN(@TempString))
END
— proper case last word/name
SET @PS = @PS + UPPER(SUBSTRING(@TempString,1,1)) +
SUBSTRING(@TempString,2,LEN(@TempString))
— check for spaces in front of special characters
SET @PS = Replace(@PS,’ -‘,’-‘)
SET @PS = Replace(@PS,’ ”’,””)

RETURN (@PS)
END

Here is an example of how to execute my ProperCase
function:


CREATE TABLE #temp (
PersonName VARCHAR(100))
INSERT INTO #temp VALUES(‘GREGory A. LARSEN’)
INSERT INTO #temp VALUES(‘TODD JOShUA SMITH’)
INSERT INTO #temp VALUES(‘HARRY JOnes’)
INSERT INTO #temp VALUES(‘Danielle i. RobBins’)
INSERT INTO #temp VALUES(‘mr. john d. johnson’)
INSERT INTO #temp VALUES(‘Jon o”connor’)
INSERT INTO #temp VALUES(‘mARtiN mCGalpan’)
INSERT INTO #temp VALUES(‘sARah maCdonald’)
INSERT INTO #temp VALUES(‘maRy Smith-foster’)
INSERT INTO #temp VALUES(‘Robert (bob) smith’)
SELECT dbo.ProperCase(PersonName) [Proper Case Names] FROM #temp
DROP TABLE #temp

When I run the above code, I get the following output:


Proper Case Names
————————————-
Gregory A. Larsen
Todd Joshua Smith
Harry Jones
Danielle I. Robbins
Mr. John D. Johnson
Jon O’Connor
Martin McGalpan
Sarah MacDonald
Mary Smith-Foster
Robert (Bob) Smith

My code above, that calls my proper case function, first
creates and populates a temporary table that contains a PersonName column where
the names are specified in a mix of upper and lower case characters. In the
SELECT statement, I call the ProperCase function and pass it the PersonName
column. The ProperCase function takes the passed parameter string and first
converts all the characters to lower case using the LOWER function and sets a
variable @TempString to the lower case value. The function then processes
through the @TempString with a WHILE loop looking for special characters in the
name (dash, parentheses, spaces or a quote). Each pass through the WHILE loop,
the @TempString is evaluated to determine if the first character is one of
these special characters, or is it the first character of a name. If a special
character is found the special character is added to the @PS variable. When a
new name is found, (first character is not a special character) then the case
of the first character is converted to upper case, and then both the first
character of the name and the rest of the name are added to the @PS string. The
@TempString is shortened to only include the characters after the first space.
The WHILE loop continues processing until the @TempString contains no more special
character. Once the WHILE loop completes, the final part of the name is then
proper cased and added to the @PS variable. The @PS variable is then returned
to the calling SELECT statement.

The function makes it easy to implement a process to
convert a character string to proper case. Having the proper case code embedded
in a function simplifies the code for the SELECT statement to proper case a
column.

Comparing Columns with Different Case

As stated earlier the collation of a character string will
determine how SQL Server compares character strings. If you store your data
using a case-insensitive format then when comparing the character string “AAAA”
and “aaaa” they will be equal. If you want to be able to compare two
case-insensitive strings and have the comparison be false when the two strings
are the same except for the case of the characters then you need to use the
COLLATE clause when comparing the two strings. To demonstrate look at the
following code:


CREATE TABLE #mytable (
C CHAR(10) COLLATE Latin1_General_CI_AS)
INSERT INTO #mytable VALUES(‘aaaa’)
SELECT C FROM #mytable WHERE C = ‘AAAA’
SELECT C FROM #mytable WHERE C = ‘AAAA’ COLLATE Latin1_General_CS_AS
DROP TABLE #mytable

When I run this code on my machine, I get the following
output.


C
———-
aaaa
(1 row(s) affected)
C
———-
(0 row(s) affected)

Here you can see the WHERE statement in the first SELECT
statement returned the single record from #mytable. The WHERE statement
evaluates to true even though the value of column C contains 4 lower case a’s
and it was being compared to 4 upper case A’s. This WHERE statement was true
because both the column C and the character string “AAAA” have a collation of
Latin1_General_CI_AS, which is a case-insensitive collation. A character string
is assigned the default collation of the database when the string is not
associated with a COLLATE clause. To make sure the 4 upper case A’s don’t match
the 4 lower case a’s in column C I need to specify a case-sensitive collation
on one side of the equal operator. By specifically casting the character string
“AAAA” to a case-sensitive collation using the “COLLATE Latin_General_CS_AS”
clause in the second SELECT statement I forced the sting on the right of the
equal operator to be case sensitive, which caused “aaaa” to not equal “AAAA”.

Conclusion

This article provided you with some examples of how to
physically change the case of some column values and how to set the collation
of a character string. Depending on your application’s needs you might need to
convert or compare character strings that contain upper and lower case
characters. The use of the UPPER and LOWER functions allows you to convert your
character strings to whatever case you need. The COLLATE clause helps you
explicitly convert a string of characters to the desired collation. When
comparing character strings it is important to understand the case-sensitivity
of all strings to make sure the outcome of your comparison is what you expect.

»


See All Articles by Columnist
Gregory A. Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles