Why no different languages in passwords?

In general, there is no reason not to use arbitrary characters in a password, unless the system processing the password does something stupid with them (like removes them completely, leaving an empty password).

In the bad old days of language-specific character sets, there was always the risk that a password with non-ASCII characters might stop working when you switched to a different computer, because it was encoding those characters differently. But nowadays everybody's pretty much standardized on Unicode as the standard universal character set, so that reason rarely applies unless dealing with old legacy systems.

(Of course, we still have different ways of encoding Unicode characters into bytes, like UTF-8, UTF-16-LE/BE, UCS-32, etc., but at least those are generally pretty well under the receiving program's control, as opposed to depending on the user's OS and/or terminal settings like charset selection used to be. And, honestly, UTF-8 is becoming pretty well established as the standard Unicode encoding for I/O purposes, even if some software may use other encodings internally.)

In fact, using characters from a wider pool makes it harder for an attacker to guess your password by brute force. That said, making the password slightly longer generally does that more effectively than sprinkling "weird" characters into your password (obligatory xkcd link), so you should only use non-English characters in your password if you can type and remember them easily, e.g. because you speak a language that uses those characters.

Still, there are some reasons why you might sometimes not want to use non-ASCII characters in your password:

  • They might not be easy to type, if you ever need to log in from a shared / borrowed computer with a different keyboard, or if your own computer's keyboard layout gets switched for some reason.

    (Actually, the same goes for ASCII punctuation too, since a lot of keyboard layouts like to switch those keys around. But at least you usually can type them on any keyboard.)

  • The system processing the password might not accept them. People have funny ideas about what counts as a valid password, and especially some older systems (and especially if they were developed in the US) might simply refuse to accept anything but printable US-ASCII characters in passwords. They might even have sort-of-valid reasons for doing that, e.g. if the passwords are internally passed between old legacy systems that use different character encodings, or that break on non-ASCII data in some way (see below).

  • Even on the client side, some of those old charset issues might still rear their head. For example, if the password is typed into an HTML form on a web page, and if the page does not explicitly specify its character encoding, different browsers might auto-detect different encodings, causing the password (and any other text entered into the form) to be encoded differently.

  • For some writing systems, Unicode normalization could also cause problems. Without going into details, there are several equivalent ways to represent many characters in Unicode. If the password processing system does not explicitly run the password through a Unicode normalization algorithm before hashing it (and many do not), then it's possible that typing the same character on different computers might result in a different sequence of Unicode code points, causing passwords using that character not to match.

If the back-end that handles the passwords was never designed or tested with anything but ASCII characters, it might also simply break when given input it doesn't expect. For example:

  • Different parts might handle non-ASCII characters differently. You might expect that all password processing in a given system would go through the same code, but in practice, there could well be multiple implementations password hashing (e.g. for different user interfaces to the same back end data), and they might not agree 100% on unexpected inputs. This could e.g. mean that your password might work over the web, but not with a native client app, or vice versa.

  • The system might strip away non-ASCII characters, or even truncate the password at the first such character. You'd think that would be a crazy thing to do for a password (and it is!), but the system might e.g. be running all input through some generic "input sanitization" function that simply strips away anything it doesn't recognize as "safe". In most cases, that's not a terrible security measure to take (even if it would generally be safer to signal an error instead of silently discarding data); for passwords, it could be disastrous.

  • The system might internally use non-ASCII characters as delimiters, on the assumption that those characters will never appear in real data. That might seem like a silly thing to do, but I've seen it done, including here on Stack Exchange. At best, such as design would force the system not to accept such characters in the delimited data; at worst, using such characters could cause the data to be truncated or garbled. Obviously, that could be really bad for a password.

    (Of course, some systems may do this with ASCII delimiters, too, leading to silly restrictions like "password may not contain @ or %".)

  • The system might limit the password length. A lot of old password hashing schemes (and even some relatively modern and otherwise seemingly decent ones, like bcrypt from 1999) will not accept passwords longer than some number of bytes, and may even silently truncate passwords longer than the limit. This is a potential security problem in general, but it can be exacerbated by the use of a variable-length encoding like UTF-8, where non-ASCII characters take up two or more bytes per character. Thus, for example, a system that limited passwords to 16 bytes of UTF-8 could handle a 16-character ASCII password, but only 8 characters of e.g. Greek or Cyrillic text.

The upshot of all this is that, in general, using only printable US-ASCII characters in a password is least likely to trigger software bugs or limitations. Given that password handling bugs can often be difficult to deal with (since the only response you'll often get is "invalid password"), a lot of people may find it easiest and safest to stick to them.

If you do want to use a non-ASCII password (e.g. to make yourself less of an easy target for brute force password cracking, or just because it's easier for you to type and remember), you may want to test that the system really handles your password in a (reasonably) sane way. I would suggest testing at least:

  1. that you can log in with your password (using all available login methods, if the system offers several, and with all browsers or other clients you're likely to use);
  2. that all features linked to your password (like, say, encrypted storage) really work correctly; and
  3. that you cannot log in with simple variants of your password, e.g. with some non-ASCII characters changed or with extra characters appended to the end.

In any case, there's an argument to be made that, at least in 95% of all cases, you should not be choosing your own passwords anyway. Rather, you should be using a secure password manager, and letting it generate random passwords for you. Such randomly generated passwords will typically be long strings of random printable ASCII characters, to maximize entropy while minimizing potential compatibility issues, but it really doesn't matter much since you don't need to type or memorize them yourself.

Of course, you do still need to choose a password for your password manager. But hopefully your password manager, at least, is well written and will handle non-ASCII passwords correctly.


If you randomly choose your password from a larger space of possible passwords then the password will be stronger. So technically speaking the statement is wrong.

Weaknesses might be introduced when passwords with foreign language characters are to be typed on keyboards. But this is not a question of cryptography.


Character length

No one has mentioned anything about the length of passwords yet, so I'll add it here.

It's possible that the person who gave that advice was thinking along the lines of keeping the number of characters high, which when you restrict yourself to a latin character set is quite important. For example, the following would be how you would calculate the strength of an alpha-based password.

26n ≈ 25n

Even if you include capitalisation and numbers, length is still very important

96n ≈ 27n

Widening the character set to the whole of the UTF-8 character set gives you

1114112n ≈ 221n

But realistically, if you're just using the most common Chinese characters

2000n ≈ 211n

So, whilst the person who gave the advice might have been thinking "an eastern password might be shorter due to more words being a single character", there are improvements over alpha only passwords.

Characters vs. words

However, recent times has seen a large shift towards "pass-phrases" or the use of multiple random words to make up passwords in order to make the password more complex, which then gets comparable (in just English) to the whole UTF-8 character set.

1025110m ≈ 221m

Although again, just taking the most common English words

3000m ≈ 212n

I can't conceive how big that set would be if you start to include words from other languages as there's bound to be overlapping words, but given what happened to characters, I think you can safely assume it will be a bigger number still.

Caveat

All of this is based upon random assignment of characters/words, however, it should be noted that character and word choices are not random, and are weighted by how well known they are, and how likely they are to be used. Also, patterns become common, for example, "ll" or "qu". All of these typically make passwords less secure.

Making your password secure

Given all of this, there are ways to make your password difficult to guess but fairly easy to remember: select 3 to 5 words that have meaning for you, replace random characters in 1 or 2 of the words with other random characters, or another set of information entirely, use more than one language and character set...

Examples are bad

Some of the best known examples for passwords and making passwords have typically been used directly by readers, and read by hackers alike. This makes these passwords less secure. As such, take from these techniques and make up your own.

Summary (and answer to the ops question)

I can discern from this that the person who informed you of that was misinformed themselves (going by password length) or felt the internet was not a place for unicode characters.

Tags:

Passwords