How to Convert Data to Proper Case In SQL Server

The challenge you'll run into with these approaches is that you've lost information. Explain it to the business users that they've taken a blurry, out of focus picture and despite what they see on t.v. there's no way to make it crisp and in focus. There's always going to be situations where these rules won't work and as long as everyone knows going into this that's the case, then have at it.

This is HR data so I'm going to assume we're talking about getting names in a consistent title case format because the mainframe stores it as AARON BERTRAND and we want the new system to not yell at them. Aaron is easy (but not cheap). You and Max have already identified the problem with the Mc/Mac so it correctly capitalizes Mc/Mac but there are instances where it's too aggressive with Mackey/Maclin/Mackenzie. Mackenzie is an interesting case though - look how the popularity of it has boomed as a baby name

Mackenzie

At some point, there will be a poor child named Mackenzie MacKenzie because people are awful beings.

You're also going to run into lovely things like D'Antoni where we should cap both letters around the tick mark. Except for d'Autremont where you only capitalize the letter after the apostrophe. Heaven help you though, if you send mail to d'Illoni as their family name is D'illoni.

For the sake of contributing actual code, the following is a CLR method we used in a 2005 instance for our purposes. It generally used ToTitleCase except for the list of exceptions we built out which is when we basically gave up trying to codify the aforementioned exceptions.

namespace Common.Util
{
    using System;
    using System.Collections.Generic;
    using System.Globalization;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading;

    /// <summary>
    /// A class that attempts to proper case a word, taking into
    /// consideration some outliers.
    /// </summary>
    public class ProperCase
    {
        /// <summary>
        /// Convert a string into its propercased equivalent.  General case
        /// it will capitalize the first letter of each word.  Handled special 
        /// cases include names with apostrophes (O'Shea), and Scottish/Irish
        /// surnames MacInnes, McDonalds.  Will fail for Macbeth, Macaroni, etc
        /// </summary>
        /// <param name="inputText">The data to be recased into initial caps</param>
        /// <returns>The input text resampled as proper cased</returns>
        public static string Case(string inputText)
        {
            CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
            TextInfo textInfo = cultureInfo.TextInfo;
            string output = null;
            int staticHack = 0;

            Regex expression = null;
            string matchPattern = string.Empty;

            // Should think about maybe matching the first non blank character
            matchPattern = @"
                (?<Apostrophe>'.\B)| # Match things like O'Shea so apostrophe plus one.  Think about white space between ' and next letter.  TODO:  Correct it's from becoming It'S, can't -> CaN'T
                \bMac(?<Mac>.) | # MacInnes, MacGyver, etc.  Will fail for Macbeth
                \bMc(?<Mc>.) # McDonalds
                ";
            expression = new Regex(matchPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

            // Handle our funky rules            
            // Using named matches is probably overkill as the
            // same rule applies to all but for future growth, I'm
            // defining it as such.
            // Quirky behaviour---for 2005, the compiler will 
            // make this into a static method which is verboten for 
            // safe assemblies.  
            MatchEvaluator upperCase = delegate(Match match)
            {
                // Based on advice from Chris Hedgate's blog
                // I need to reference a local variable to prevent
                // this from being turned into static
                staticHack = matchPattern.Length;

                if (!string.IsNullOrEmpty(match.Groups["Apostrophe"].Value))
                {
                    return match.Groups["Apostrophe"].Value.ToUpper();
                }

                if (!string.IsNullOrEmpty(match.Groups["Mac"].Value))
                {
                    return string.Format("Mac{0}", match.Groups["Mac"].Value.ToUpper());
                }

                if (!string.IsNullOrEmpty(match.Groups["Mc"].Value))
                {
                    return string.Format("Mc{0}", match.Groups["Mc"].Value.ToUpper());
                }

                return match.Value;
            };

            MatchEvaluator evaluator = new MatchEvaluator(upperCase);

            if (inputText != null)
            {
                // Generally, title casing converts the first character 
                // of a word to uppercase and the rest of the characters 
                // to lowercase. However, a word that is entirely uppercase, 
                // such as an acronym, is not converted.
                // http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase(VS.80).aspx
                string temporary = string.Empty;
                temporary = textInfo.ToTitleCase(inputText.ToString().ToLower());
                output = expression.Replace(temporary, evaluator);
            }
            else
            {
                output = string.Empty;
            }

            return output;
        }
    }
}

Now that all of that is clear, I'm going to finish this lovely book of poetry by e e cummings


I realize you've already got a good solution, but I thought I'd add a simpler solution utilizing an Inline-Table-Valued-Function, albeit one that relies on using the upcoming "vNext" version of SQL Server, which includes the STRING_AGG() and STRING_SPLIT() functions:

IF OBJECT_ID('dbo.fn_TitleCase') IS NOT NULL
DROP FUNCTION dbo.fn_TitleCase;
GO
CREATE FUNCTION dbo.fn_TitleCase
(
    @Input nvarchar(1000)
)
RETURNS TABLE
AS
RETURN
SELECT Item = STRING_AGG(splits.Word, ' ')
FROM (
    SELECT Word = UPPER(LEFT(value, 1)) + LOWER(RIGHT(value, LEN(value) - 1))
    FROM STRING_SPLIT(@Input, ' ')
    ) splits(Word);
GO

Testing the function:

SELECT *
FROM dbo.fn_TitleCase('this is a test');

This Is A Test

SELECT *
FROM dbo.fn_TitleCase('THIS IS A TEST');

This Is A Test

See MSDN for documentation on STRING_AGG() and STRING_SPLIT()

Bear in mind the STRING_SPLIT() function does not guarantee to return items in any particular order. This can be most annoying. There is a Microsoft Feedback item requesting a column be added to the output of STRING_SPLIT to denote the order of the output. Consider upvoting that here

If you want to live on the edge, and want to use this methodology, it can be expanded to include exceptions. I've constructed an inline-table-valued-function that does just that:

CREATE FUNCTION dbo.fn_TitleCase
(
    @Input nvarchar(1000)
    , @SepList nvarchar(1)
)
RETURNS TABLE
AS
RETURN
WITH Exceptions AS (
    SELECT v.ItemToFind
        , v.Replacement
    FROM (VALUES /* add further exceptions to the list below */
          ('mca', 'McA')
        , ('maca','MacA')
        ) v(ItemToFind, Replacement)
)
, Source AS (
    SELECT Word = UPPER(LEFT(value, 1 )) + LOWER(RIGHT(value, LEN(value) - 1))
        , Num = ROW_NUMBER() OVER (ORDER BY GETDATE())
    FROM STRING_SPLIT(@Input, @SepList) 
)
SELECT Item = STRING_AGG(splits.Word, @SepList)
FROM (
    SELECT TOP 214748367 Word
    FROM (
        SELECT Word = REPLACE(Source.Word, Exceptions.ItemToFind, Exceptions.Replacement)
            , Source.Num
        FROM Source
        CROSS APPLY Exceptions
        WHERE Source.Word LIKE Exceptions.ItemToFind + '%'
        UNION ALL
        SELECT Word = Source.Word
            , Source.Num
        FROM Source
        WHERE NOT EXISTS (
            SELECT 1
            FROM Exceptions
            WHERE Source.Word LIKE Exceptions.ItemToFind + '%'
            )
        ) w
    ORDER BY Num
    ) splits;
GO

Testing this shows hows it works:

SELECT *
FROM dbo.fn_TitleCase('THIS IS A TEST MCADAMS MACKENZIE MACADAMS', ' ');

This Is A Test McAdams Mackenzie MacAdams


The best solution I came across can be found here.

I altered the script just a bit: I added LTRIM and RTRIM to the returned value since, in some cases, the script was adding spaces after the value.

Usage example for previewing conversion from UPPERCASE data to Proper Case, with exceptions:

SELECT <column>,[dbo].[fProperCase](<column>,'|APT|HWY|BOX|',NULL)
FROM <table> WHERE <column>=UPPER(<column>)

The really simple yet powerful aspect of this script is the ability to define exceptions within the function call itself.

One note of caution, however:
As currently written the script does not handle Mc[A-Z]%, Mac[A-Z]%, etc. last names correctly. I'm currently working on edits to handle that scenario.

As a work around I changed the function's returned parameter: REPLACE(REPLACE(LTRIM(RTRIM((@ProperCaseText))),'Mcd','McD'),'Mci','McI'), etc. ...

This method obviously required foreknowledge of the data and is not ideal. I'm sure there's a way to crack this but I'm in the middle of a conversion and don't currently have the time to dedicate to this one pesky issue.

Here's the code:

CREATE FUNCTION [dbo].[fProperCase](@Value varchar(8000), @Exceptions varchar(8000),@UCASEWordLength tinyint)
returns varchar(8000)
as
/* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Function Purpose: To convert text to Proper Case.
Created By:             David Wiseman
Website:                http://www.wisesoft.co.uk
Created:                2005-10-03
Updated:                2006-06-22
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INPUTS:

@Value :                This is the text to be converted to Proper Case
@Exceptions:            A list of exceptions to the default Proper Case rules. e.g. |RAM|CPU|HDD|TFT|
                              Without exception list they would display as Ram, Cpu, Hdd and Tft
                              Note the use of the Pipe "|" symbol to separate exceptions.
                              (You can change the @sep variable to something else if you prefer)
@UCASEWordLength: You can specify that words less than a certain length are automatically displayed in UPPERCASE

USAGE1:

Convert text to ProperCase, without any exceptions

select dbo.fProperCase('THIS FUNCTION WAS CREATED BY DAVID WISEMAN',null,null)
>> This Function Was Created By David Wiseman

USAGE2:

Convert text to Proper Case, with exception for WiseSoft

select dbo.fProperCase('THIS FUNCTION WAS CREATED BY DAVID WISEMAN @ WISESOFT','|WiseSoft|',null)
>> This Function Was Created By David Wiseman @ WiseSoft

USAGE3:

Convert text to Proper Case and default words less than 3 chars to UPPERCASE

select dbo.fProperCase('SIMPSON, HJ',null,3)
>> Simpson, HJ

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */
begin
      declare @sep char(1) -- Seperator character for exceptions
      declare @i int -- counter
      declare @ProperCaseText varchar(5000) -- Used to build our Proper Case string for Function return
      declare @Word varchar(1000) -- Temporary storage for each word
      declare @IsWhiteSpace as bit -- Used to indicate whitespace character/start of new word
      declare @c char(1) -- Temp storage location for each character

      set @Word = ''
      set @i = 1
      set @IsWhiteSpace = 1
      set @ProperCaseText = ''
      set @sep = '|'

      -- Set default UPPERCASEWord Length
      if @UCASEWordLength is null set @UCASEWordLength = 1
      -- Convert user input to lower case (This function will UPPERCASE words as required)
      set @Value = LOWER(@Value)

      -- Loop while counter is less than text lenth (for each character in...)
      while (@i <= len(@Value)+1)
      begin

            -- Get the current character
            set @c = SUBSTRING(@Value,@i,1)

            -- If start of new word, UPPERCASE character
            if @IsWhiteSpace = 1 set @c = UPPER(@c)

            -- Check if character is white space/symbol (using ascii values)
            set @IsWhiteSpace = case when (ASCII(@c) between 48 and 58) then 0
                                          when (ASCII(@c) between 64 and 90) then 0
                                          when (ASCII(@c) between 96 and 123) then 0
                                          else 1 end

            if @IsWhiteSpace = 0
            begin
                  -- Append character to temp @Word variable if not whitespace
                  set @Word = @Word + @c
            end
            else
            begin
                  -- Character is white space/punctuation/symbol which marks the end of our current word.
                  -- If word length is less than or equal to the UPPERCASE word length, convert to upper case.
                  -- e.g. you can specify a @UCASEWordLength of 3 to automatically UPPERCASE all 3 letter words.
                  set @Word = case when len(@Word) <= @UCASEWordLength then UPPER(@Word) else @Word end

                  -- Check word against user exceptions list. If exception is found, use the case specified in the exception.
                  -- e.g. WiseSoft, RAM, CPU.
                  -- If word isn't in user exceptions list, check for "known" exceptions.
                  set @Word = case when charindex(@sep + @Word + @sep,@exceptions collate Latin1_General_CI_AS) > 0
                                    then substring(@exceptions,charindex(@sep + @Word + @sep,@exceptions collate Latin1_General_CI_AS)+1,len(@Word))
                                    when @Word = 's' and substring(@Value,@i-2,1) = '''' then 's' -- e.g. Who's
                                    when @Word = 't' and substring(@Value,@i-2,1) = '''' then 't' -- e.g. Don't
                                    when @Word = 'm' and substring(@Value,@i-2,1) = '''' then 'm' -- e.g. I'm
                                    when @Word = 'll' and substring(@Value,@i-3,1) = '''' then 'll' -- e.g. He'll
                                    when @Word = 've' and substring(@Value,@i-3,1) = '''' then 've' -- e.g. Could've
                                    else @Word end

                  -- Append the word to the @ProperCaseText along with the whitespace character
                  set @ProperCaseText = @ProperCaseText + @Word + @c
                  -- Reset the Temp @Word variable, ready for a new word
                  set @Word = ''
            end
            -- Increment the counter
            set @i = @i + 1
      end
      return @ProperCaseText
end