What is the purpose of the MB_CASE_*_SIMPLE constants?

We can find the corresponding C implementation at https://github.com/php/php-src/blob/master/ext/mbstring/php_unicode.c#L223

And have a look at the git commit message:

  • Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string.

  • mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are:

    • MB_CASE_LOWER (used by mb_strtolower)
    • MB_CASE_UPPER (used by mb_strtolower)
    • MB_CASE_TITLE
    • MB_CASE_FOLD
    • MB_CASE_LOWER_SIMPLE
    • MB_CASE_UPPER_SIMPLE
    • MB_CASE_TITLE_SIMPLE
    • MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)

So those constants with _SIMPLE suffix are for Unicode's Simple Case Folding, and those WITHOUT the suffix are for Full Case Folding.

And that answers the differences on Full Case Folding vs Simple Case Folding.


Here are some examples where it matters:

MB_CASE_UPPER_SIMPLE:

mb_convert_encoding("ß", MB_CASE_UPPER_SIMPLE); // "ß"
mb_convert_encoding("ß", MB_CASE_UPPER); // "SS"

MB_CASE_LOWER_SIMPLE:

mb_convert_encoding("İ", MB_CASE_LOWER_SIMPLE); // "i"
mb_convert_encoding("İ", MB_CASE_LOWER); // "i\xcc\x87"

MB_CASE_TITLE_SIMPLE is similar to MB_CASE_UPPER_SIMPLE in the same way that MB_CASE_UPPER is similar to MB_CASE_TITLE.

Tags:

Php

Mbstring