Impossible to put a zero after an aleph?

Aleph (U+05D0) is a Hebrew letter, and Hebrew is written right-to-left, so Unicode assigns it the "Right-to-Left" bidirectional class. (See Unicode TR9: Bidirectional Algorithm for more details.)

Latin letters are of course "Left-to-Right". However, zero (U+0030) is in the "European Number" bidirectional class, which is a weak class – while LtR by default, it can switch to RtL if there's a "strong" Right-to-Left character before it. (See Bidirectional Character Types and Resolving Weak Types in TR9.)

As a result, the directions of before and after are swapped for the entire word – if you put the zero 'before', it will show up to the right; if you write the zero 'after' aleph, it will show up on the left.


'א', 'HEBREW LETTER ALEF' (U+05D0) has the BIDI (bi-directional) class "Right-to-Left [R]", because Hebrew is traditionally written right-to-left. Digits, on the other hand, have no specific directionality assigned to them, and so the whole chunk of aleph and zero is interpreted as being right-to-left. In this case, the following character may not necessarily be located on the right of the preceding character, as Unicode's rather complex bi-directional rules dictate.

You have several options to work around this issue.

  1. You can use 'ℵ', 'ALEF SYMBOL' (U+2135). It's a symbol and has the left-to-right property: ℵ0.

  2. Instead of the usual digit 0, you can use a zero-like character with left-to-right directionality, such as '〇', 'IDEOGRAPHIC NUMBER ZERO' (U+3007).

  3. The cleanest way is to use the 'LEFT-TO-RIGHT MARK' (U+200E) character (Wikipedia) after the aleph: "א‎0". This is an invisible zero-width character that is defined to have left-to-right directionality. Thus, it has the same effect on the bidirectional text layout algorithm as inserting, say, a left-to-right Latin letter after the א, except that no visible letter will appear there.


Perhaps, a better way to achieve this would be to:

echo -e "\u200F0א"

And the mandatory xkcd reference https://xkcd.com/1137/

‮LTR

Tags:

Unicode