.NET StringBuilder - check if ends with string

On msdn you can find the topic on how to search text in the StringBuilder object. The two options available to you are:

  1. Call ToString and search the returned String object.
  2. Use the Chars property to sequentially search a range of characters.

Since the first option is out of the question. You'll have to go with the Chars property.

public static class StringBuilderExtensions
{
    public static bool EndsWith(this StringBuilder sb, string text)
    {
        if (sb.Length < text.Length)
            return false;

        var sbLength = sb.Length;
        var textLength = text.Length;
        for (int i = 1; i <= textLength; i++)
        {
            if (text[textLength - i] != sb[sbLength - i])
                return false;
        }
        return true;
    }
}

To avoid the performance overhead of generating the full string, you can use the ToString(int,int) overload that takes the index range.

public static bool EndsWith(this StringBuilder sb, string test)
{
    if (sb.Length < test.Length)
        return false;

    string end = sb.ToString(sb.Length - test.Length, test.Length);
    return end.Equals(test);
}

Edit: It would probably be desirable to define an overload that takes a StringComparison argument:

public static bool EndsWith(this StringBuilder sb, string test)
{
    return EndsWith(sb, test, StringComparison.CurrentCulture);
}

public static bool EndsWith(this StringBuilder sb, string test, 
    StringComparison comparison)
{
    if (sb.Length < test.Length)
        return false;

    string end = sb.ToString(sb.Length - test.Length, test.Length);
    return end.Equals(test, comparison);
}

Edit2: As pointed out by Tim S in the comments, there is a flaw in my answer (and all other answers that assume character-based equality) that affects certain Unicode comparisons. Unicode does not require two (sub)strings to have the same sequence of characters to be considered equal. For example, the precomposed character é should be treated as equal to the character e followed by the combining mark U+0301.

Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");

string s = "We met at the cafe\u0301";
Console.WriteLine(s.EndsWith("café"));    // True 

StringBuilder sb = new StringBuilder(s);
Console.WriteLine(sb.EndsWith("café"));   // False

If you want to handle these cases correctly, it might be easiest to just call StringBuilder.ToString(), and then use the built-in String.EndsWith.


TL;DR

If you're goal is to get a piece or the whole of the StringBuilder's contents in a String object, you should use its ToString function. But if you aren't yet done creating your string, it's better to treat the StringBuilder as a character array and operate in that way than to create a bunch of strings you don't need.

String operations on a character array can become complicated by localization or encoding, since a string can be encoded in many ways (UTF8 or Unicode, for example), but its characters (System.Char) are meant to be 16-bit UTF16 values.

I've written the following method which returns the index of a string if it exists within the StringBuilder and -1 otherwise. You can use this to create the other common String methods like Contains, StartsWith, and EndsWith. This method is preferable to others because it should handle localization and casing properly, and does not force you to call ToString on the StringBuilder. It creates one garbage value if you specify that case should be ignored, and you can fix this to maximize memory savings by using Char.ToLower instead of precomputing the lower case of the string like I do in the function below. EDIT: Also, if you're working with a string encoded in UTF32, you'll have to compare two characters at a time instead of just one.

You're probably better off using ToString unless you're going to be looping, working with large strings, and doing manipulation or formatting.

public static int IndexOf(this StringBuilder stringBuilder, string str, int startIndex = 0, int? count = null, CultureInfo culture = null, bool ignoreCase = false)
{
    if (stringBuilder == null)
        throw new ArgumentNullException("stringBuilder");

    // No string to find.
    if (str == null)
        throw new ArgumentNullException("str");
    if (str.Length == 0)
        return -1;

    // Make sure the start index is valid.
    if (startIndex < 0 && startIndex < stringBuilder.Length)
        throw new ArgumentOutOfRangeException("startIndex", startIndex, "The index must refer to a character within the string.");

    // Now that we've validated the parameters, let's figure out how many characters there are to search.
    var maxPositions = stringBuilder.Length - str.Length - startIndex;
    if (maxPositions <= 0) return -1;

    // If a count argument was supplied, make sure it's within range.
    if (count.HasValue && (count <= 0 || count > maxPositions))
        throw new ArgumentOutOfRangeException("count");

    // Ensure that "count" has a value.
    maxPositions = count ?? maxPositions;
    if (count <= 0) return -1;

    // If no culture is specified, use the current culture. This is how the string functions behave but
    // in the case that we're working with a StringBuilder, we probably should default to Ordinal.
    culture = culture ?? CultureInfo.CurrentCulture;

    // If we're ignoring case, we need all the characters to be in culture-specific 
    // lower case for when we compare to the StringBuilder.
    if (ignoreCase) str = str.ToLower(culture);

    // Where the actual work gets done. Iterate through the string one character at a time.
    for (int y = 0, x = startIndex, endIndex = startIndex + maxPositions; x <= endIndex; x++, y = 0)
    {
        // y is set to 0 at the beginning of the loop, and it is increased when we match the characters
        // with the string we're searching for.
        while (y < str.Length && str[y] == (ignoreCase ? Char.ToLower(str[x + y]) : str[x + y]))
            y++;

        // The while loop will stop early if the characters don't match. If it didn't stop
        // early, that means we found a match, so we return the index of where we found the
        // match.
        if (y == str.Length)
            return x;
    }

    // No matches.
    return -1;
}

The primary reason one generally uses a StringBuilder object rather than concatenating strings is because of the memory overhead you incur since strings are immutable. The performance hit you see when you do excessive string manipulation without using a StringBuilder is often the result of collecting all the garbage strings you created along the way.

Take this for example:

string firstString = "1st", 
       secondString = "2nd", 
       thirdString = "3rd", 
       fourthString = "4th";
string all = firstString;
all += " & " + secondString;
all += " &" + thirdString;
all += "& " + fourthString + ".";

If you were to run this and open it up in a memory profiler, you'd find a set of strings that look something like this:

"1st", "2nd", "3rd", "4th", 
" & ", " & 2nd", "1st & 2nd"
" &", "&3rd", "1st & 2nd &3rd"
"& ", "& 4th", "& 4th."
"1st & 2nd &3rd& 4th."

That's fourteen total objects we created in that scope, but if you don't realize that every single addition operator creates a whole new string every time you might think there's only five. So what happens to the nine other strings? They languish away in memory until the garbage collector decides to pick them up.

So now to my point: if you're trying to find something out about a StringBuilder object and you're not wanting to call ToString(), it probably means you aren't done building that string yet. And if you're trying to find out if the builder ends with "Foo", it's wasteful to call sb.ToString(sb.Length - 1, 3) == "Foo" because you're creating another string object that becomes orphaned and obsolete the minute you made the call.

My guess is that you're running a loop aggregating text into your StringBuilder and you want to end the loop or just do something different if the last few characters are some sentinel value you're expecting.