What is the difference between "utf8_unicode_ci" and "utf8_unicode_520_ci"

As you can read here (thanks user3399549 for link) there is problem with sorting/comparing polish letter "Ł" (L with stroke) (lower case: "ł"; html esc: ł and Ł ) here Peter Gulutzan explain differences between collocations:

We have these collations and rules for Ł :

utf8_polish_ci      Ł greater than L and less than M
utf8_unicode_ci     Ł greater than L and less than M
utf8_unicode_520_ci Ł equal to L
utf8_general_ci     Ł greater than Z

In polish language letter Ł is after letter L and before M. We can write this in following way (for clarify):

L < Ł < M 

and 

L != Ł  

So to avoid this kind of problems (with sorting/comapring) use utf8_unicode_ci (or better utf8mb4_unicode_ci).


As documented under Collation Names:

Unicode collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys: http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt. A collation name such as utf8_unicode_520_ci is based on UCA 5.2.0 weight keys: http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt.