Dealing With Upper and Lower Case Data
January 31, 2007
When working with data in SQL Server you might run across different needs to change the case of the data. For instance, you might want to convert all the characters in a varchar column to upper case, lower case, or change only the first character of a column value to upper case. In addition, you might want to compare different character strings and either have them be equal or not equal when the characters are the same but the case is different. In this article, I will show you different examples of how to deal with different situations related to the case of character strings.
What is Collation?
Each server, database, or character column has a default collation. A collation is a definition for how each character is represented, stored and compared. Part of the collation setting, known as case sensitivity, determines if an upper case A and a low case a when compared against either other are the same or different. If the collation is case-insensitive then a lower case a and an upper case A when compared are the same. So when using case insensitive data the following two strings are equal, AaAaA and aaaaa. On the other hand, if your collation setting is case-sensitive then the preceding two strings of As will not be equal. Therefore, when you are working with data you need to understand what the case sensitivity is for the character data you are dealing with.
UPPER and LOWER Function
Within SQL Server, two functions are provided to manage the case of character strings. These functions are UPPER and LOWER. The UPPER function accepts a character string as a parameter and returns the same character string where all the characters have been converted to upper case. If the character string AbCd1234 is passed to the UPPER function, then this function returns ABCD1234. The LOWER function does just the opposite and converts all characters it receives to lower case, so if AbCd1234 is passed to the LOWER function then it would return abcd1234.
Converting the First Character of Each Column to Upper Case
For my first example let me show you how to convert the first character of each column value to upper case and lower casing the rest of the column. Here is the T-SQL code for my example:
SET NOCOUNT ON IF exists (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[mytable]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) DROP TABLE [mytable] GO CREATE TABLE [mytable] ( A CHAR(10)) go INSERT INTO mytable VALUES('aAaA') INSERT INTO mytable VALUES('bBbB') INSERT INTO mytable VALUES('cCcC') INSERT INTO mytable VALUES('dDdD') INSERT INTO mytable VALUES('DDdD') -- Uppercase first letter of each value in column SELECT UPPER(SUBSTRING(A,1,1)) + LOWER(SUBSTRING(A,2,LEN(A))) [Upper Case First Letter Only] FROM mytable DROP TABLE mytable
In this example, I created a table mytable that contained a single column. I then insert five different records into this table. The column in each row contains a character string that has a mixed set of cases. In the SELECT statement above, I use the substring function to get the first character of column A. I then pass that first character to the UPPER function. This upper cases the first character of column A. I then use the concatenation operator (+) to append the rest of column A to that upper cased first character. Before the rest of column A is concatenated to the first character, it is lower cased by using the LOWER function on the substring of column A starting at second character of column A. Here is the output I get when I run the above script:
Upper Case First Letter Only ---------------------------- Aaaa Bbbb Cccc Dddd Dddd
Proper Casing a Persons Name
In this example, I will take a column that contains a persons name and proper case the name. By proper case I mean I will take the first character of each name (First, Middle, Last, etc.) and upper case it and then lower case the rest of each name.
The bulk of my proper case example is coded in a T-SQL function. The function handles proper casing any number of names. It also handles multiple spaces between names, Irish type names, and with apostrophe or parentheses within in the name. Here is the code for my proper case function:
IF OBJECT_ID (N'dbo.ProperCase', N'FN') IS NOT NULL DROP FUNCTION dbo.ProperCase GO CREATE FUNCTION ProperCase (@String VARCHAR(8000)) RETURNS VARCHAR(8000) AS BEGIN DECLARE @TempString VARCHAR(8000) DECLARE @PS VARCHAR(8000) SET @PS = '' -- lower case entire string SET @TempString = lower(@String) WHILE patindex('%[-( '']%',@TempString) > 0 BEGIN -- Check to see if first character of @TempString is whitespace IF (patindex('%[-( '']%',SUBSTRING(@TempString,1,1)) > 0) BEGIN SET @PS = @PS + SUBSTRING(@TempString,1,1) END ELSE -- @TempString starts with a Name BEGIN IF SUBSTRING(@TempString,1,2) = 'mc' BEGIN SET @PS = @PS + 'Mc' SET @TempString = SUBSTRING(@Tempstring,3,LEN(@TempString)) END IF SUBSTRING(@TempString,1,3) = 'mac' BEGIN SET @PS = @PS + 'Mac' SET @TempString = SUBSTRING(@Tempstring,4,LEN(@TempString)) END -- upper case first character and return string up to the next space SET @PS = @PS + UPPER(SUBSTRING(@TempString,1,1)) + SUBSTRING(@TempString,2,patindex('%[-( '']%',@TempString)-1) END -- truncation string that we have already processed SET @TempString = SUBSTRING(@TempString, patindex('%[-( '']%',@TempString)+1,LEN(@TempString)) -- Trim off leading spaces SET @TempString = LTRIM(@TempString) END IF SUBSTRING(@TempString,1,2) = 'mc' BEGIN SET @PS = @PS + 'Mc' SET @TempString = SUBSTRING(@Tempstring,3,LEN(@TempString)) END IF SUBSTRING(@TempString,1,3) = 'mac' BEGIN SET @PS = @PS + 'Mac' SET @TempString = SUBSTRING(@Tempstring,4,LEN(@TempString)) END -- proper case last word/name SET @PS = @PS + UPPER(SUBSTRING(@TempString,1,1)) + SUBSTRING(@TempString,2,LEN(@TempString)) -- check for spaces in front of special characters SET @PS = Replace(@PS,' -','-') SET @PS = Replace(@PS,' ''','''') RETURN (@PS) END
Here is an example of how to execute my ProperCase function:
CREATE TABLE #temp ( PersonName VARCHAR(100)) INSERT INTO #temp VALUES('GREGory A. LARSEN') INSERT INTO #temp VALUES('TODD JOShUA SMITH') INSERT INTO #temp VALUES('HARRY JOnes') INSERT INTO #temp VALUES('Danielle i. RobBins') INSERT INTO #temp VALUES('mr. john d. johnson') INSERT INTO #temp VALUES('Jon o''connor') INSERT INTO #temp VALUES('mARtiN mCGalpan') INSERT INTO #temp VALUES('sARah maCdonald') INSERT INTO #temp VALUES('maRy Smith-foster') INSERT INTO #temp VALUES('Robert (bob) smith') SELECT dbo.ProperCase(PersonName) [Proper Case Names] FROM #temp DROP TABLE #temp
When I run the above code, I get the following output:
Proper Case Names ------------------------------------- Gregory A. Larsen Todd Joshua Smith Harry Jones Danielle I. Robbins Mr. John D. Johnson Jon O'Connor Martin McGalpan Sarah MacDonald Mary Smith-Foster Robert (Bob) Smith
My code above, that calls my proper case function, first creates and populates a temporary table that contains a PersonName column where the names are specified in a mix of upper and lower case characters. In the SELECT statement, I call the ProperCase function and pass it the PersonName column. The ProperCase function takes the passed parameter string and first converts all the characters to lower case using the LOWER function and sets a variable @TempString to the lower case value. The function then processes through the @TempString with a WHILE loop looking for special characters in the name (dash, parentheses, spaces or a quote). Each pass through the WHILE loop, the @TempString is evaluated to determine if the first character is one of these special characters, or is it the first character of a name. If a special character is found the special character is added to the @PS variable. When a new name is found, (first character is not a special character) then the case of the first character is converted to upper case, and then both the first character of the name and the rest of the name are added to the @PS string. The @TempString is shortened to only include the characters after the first space. The WHILE loop continues processing until the @TempString contains no more special character. Once the WHILE loop completes, the final part of the name is then proper cased and added to the @PS variable. The @PS variable is then returned to the calling SELECT statement.
The function makes it easy to implement a process to convert a character string to proper case. Having the proper case code embedded in a function simplifies the code for the SELECT statement to proper case a column.
Comparing Columns with Different Case
As stated earlier the collation of a character string will determine how SQL Server compares character strings. If you store your data using a case-insensitive format then when comparing the character string AAAA and aaaa they will be equal. If you want to be able to compare two case-insensitive strings and have the comparison be false when the two strings are the same except for the case of the characters then you need to use the COLLATE clause when comparing the two strings. To demonstrate look at the following code:
CREATE TABLE #mytable ( C CHAR(10) COLLATE Latin1_General_CI_AS) INSERT INTO #mytable VALUES('aaaa') SELECT C FROM #mytable WHERE C = 'AAAA' SELECT C FROM #mytable WHERE C = 'AAAA' COLLATE Latin1_General_CS_AS DROP TABLE #mytable
When I run this code on my machine, I get the following output.
C ---------- aaaa (1 row(s) affected) C ---------- (0 row(s) affected)
Here you can see the WHERE statement in the first SELECT statement returned the single record from #mytable. The WHERE statement evaluates to true even though the value of column C contains 4 lower case as and it was being compared to 4 upper case As. This WHERE statement was true because both the column C and the character string AAAA have a collation of Latin1_General_CI_AS, which is a case-insensitive collation. A character string is assigned the default collation of the database when the string is not associated with a COLLATE clause. To make sure the 4 upper case As dont match the 4 lower case as in column C I need to specify a case-sensitive collation on one side of the equal operator. By specifically casting the character string AAAA to a case-sensitive collation using the COLLATE Latin_General_CS_AS clause in the second SELECT statement I forced the sting on the right of the equal operator to be case sensitive, which caused aaaa to not equal AAAA.
This article provided you with some examples of how to physically change the case of some column values and how to set the collation of a character string. Depending on your applications needs you might need to convert or compare character strings that contain upper and lower case characters. The use of the UPPER and LOWER functions allows you to convert your character strings to whatever case you need. The COLLATE clause helps you explicitly convert a string of characters to the desired collation. When comparing character strings it is important to understand the case-sensitivity of all strings to make sure the outcome of your comparison is what you expect.