What is wrong with function `LetterCounts` and other functions that operate on strings?

For short strings LetterCounts is slower, not sure why, for longer strings the timings are identical. Do you see similar behavior?

randomString[n_] := 
 RandomInteger[{1, 26}, n] /. 
   Thread[Range[26] -> CharacterRange["A", "Z"]] // StringJoin

counts[str_] := 
 KeyMap[FromCharacterCode, 
  Sort[Counts[Partition[ToCharacterCode[str], 2, 1]], Greater]]

<< GeneralUtilities`

BenchmarkPlot[{LetterCounts[#, 2] &, counts[#] &},
 randomString[#] &,
 10^Range[6],
 "IncludeFits" -> True]

enter image description here


LetterCounts[str, 2]

and

KeyMap[FromCharacterCode, 
  Sort[Counts[Partition[ToCharacterCode[str], 2, 1]], Greater]];

are not equivalent operations - just try the inputs found in the LetterCounts documentation and you'll quickly see differences. So the timing comparison is not very meaningful.

edit: To answer the question in the comments, the self-written

myCharacterCounts[str_, n_] := KeyMap[FromCharacterCode,
    Counts @ Partition[ToCharacterCode @ str, n, 1]
]

will run slightly faster than CharacterCounts[str, n], though on my machine the difference is sub-millisecond even for very large strings.

But this myCharacterCounts function still does not do everything that CharacterCounts does.

CharacterCounts takes options, as in

In[45]:= CharacterCounts["aAbBcC", IgnoreCase -> True]

Out[45]= <|"c" -> 2, "b" -> 2, "a" -> 2|>

and does argument checking, issuing a message for CharacterCounts[] or CharacterCounts[2]. Argument checking and options handling are generally required for any built-in system function, but not needed for self-written functions where you know you won't be passing bad arguments or options. This may be enough to account for the timing difference, or maybe CharacterCounts is being inefficient somewhere - I can't say.

I will say that it is often, but not always, possible to beat the timing for built-in functions if you focus on a subset of the functionality and neglect error handling. And if your application is time sensitive then it is worthwhile to use the custom function instead.