Valid values for LC_CTYPE?

I didn't get into the details of who's "right or wrong" - but was equally annoyed by the issue. Some solutions to this:

  • Server-side:
    • change/disable AcceptEnv LC_* in /etc/ssh/sshd
      • cons: it sets them to the system-default
    • edit .profile
      • cons: single user
    • edit /etc/bash* or /etc/profile
      • cons: may be reversed in updates
  • Client-side:
    • alias ssh="LC_CTYPE=\"${LANG}\" ssh" in .bashrc/.profile/whereEver
      • cons: single user
    • same as server-side in .bashrc/.profile...
    • change/add settings in Terminal
      • con: entire session, be it local or remote

So, in the end I ended up creating mac-locale-fix.sh in /etc/profile.d on the server (raspian in my case) with this line in it:

[ "A${LC_CTYPE}" == "AUTF-8" ] && export LC_CTYPE="${LANG}"

Hope this helps others...


The basic question is

My primary question is, is this a bug in MacOS? Or is Linux wrong in insisting that the variable needs to be set to a fully specified locale name?

and the POSIX page for environment variables shows the reason why others view the macOS configuration as incorrect:

[XSI] If the locale value has the form:

language[_territory][.codeset]

it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.

LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME are defined to accept an additional field @ modifier, which allows the user to select a specific instance of localization data within a single category (for example, for selecting the dictionary as opposed to the character ordering of data). The syntax for these environment variables is thus defined as:

[language[_territory][.codeset][@modifier]]

For example, if a user wanted to interact with the system in French, but required to sort German text files, LANG and LC_COLLATE could be defined as:

LANG=Fr_FR
LC_COLLATE=De_DE

This could be extended to select dictionary collation (say) by use of the @ modifier field; for example:

LC_COLLATE=De_DE@dict

An implementation may support other formats.

If the locale value is not recognized by the implementation, the behavior is unspecified.

That is, they assume that POSIX prescribes a syntax for the locale settings. An unwary reader would assume that POSIX defines the permissible forms for the environment variables so that the codeset value is optional, and not act as a replacement for the language. But that last "may" opens up a can of worms, in effect blessing this difference in interpretation. Apple can do whatever it wants, if it wants to provide valid locales which don't follow that pattern exactly.

@tripleee suggested that the page on Locale gives better information, but that is almost entirely a discussion of the locale definitions rather than providing guidance for interoperability (i.e., POSIX's ostensible goal).

Neither page addresses differences in the available locale settings (such as ".utf8" versus ".UTF-8"). Those are implementation-dependent, as noted on the POSIX page. That leaves users with the sole solution being to determine for themselves what locale settings are supported on the local and remote systems, and (ssh behavior here) determine how to set those on the remote system "compatibly".