What chars should I not allow in passwords?

From a security/implementation perspective, there shouldn't be any need to disallow characters apart from '\0' (which is hard to type anyway). The more characters you bar, the smaller the total phase space of possible passwords and therefore the quicker it is to brute-force passwords. Of course, most password-guessing actually uses dictionary words rather than systematic searches of the input domain...

From a usability perspective, however, some characters are not typed the same way on different machines. As an example, I have two different computers here where shift-3 produces # on one and £ on the other. When I type a password in, both appear as '*' so I don't know whether I got it right or not. Some people think that could confuse people enough to start disallowing those characters. I don't think it's worth doing. Most real people access real services from one or maybe two computers, and don't tend to put many extended characters in their passwords.

There can be issues with non-ASCII characters. A password is a sequence of glyphs, but the password processing (hashing) will need a sequence of bits, so there must be a deterministic way to transform glyphs into bits. This is the whole murky swamp of code pages. Even if you stick to Unicode, there is trouble afoot:

A single character can have several decompositions as code points. For instance, the "é" character (which is very frequent in French) can be encoded as either a single code point U+00E9, or as the sequence U+0065 U+0301; both sequences are meant to be equivalent. Whether you get one or the other depends on the conventions used by the input device.
A Unicode string is a sequence of code points (which are integers in the 0 to 1114110 range). There are several standard encodings for converting such a sequence into bytes; the most common will be UTF-8, UTF-16 (big-endian), UTF-16 (little-endian), UTF-32 (big-endian) and UTF-32 (little-endian). Any of these may or may not start with a BOM.

Therefore a single "é" can be meaningfully encoded into bytes with at least twenty distinct variants, and that's when sticking to "mainstream Unicode". Latin-1 encoding, or its Microsoft counterpart, is also widespread, so make that 21. Which encoding a given piece of software will use may depend upon a lot of factors, including the locale. It is bothersome when the user cannot log on his computer anymore because he switched the configuration from "Canadian - English" to "Canadian - French".

Experimentally, most problems of that kind are avoided by restricting passwords to the range of printable ASCII characters (those with codes ranging from 32 to 126 -- personally I would avoid space, so make that 33 to 126) and enforcing mono-byte encoding (no BOM, one character becomes one byte). Since passwords are meant to be typed on various keyboards with no visual feedback, the list of characters should be even more restricted for optimal usability (I daily battle with Canadian layouts where what is written on the keyboard does not necessarily match what the machine thinks it is, especially when going through one or two nested RDP connections; the '<', '>' and '\' characters are most often moving around). With just letters (uppercase and lowercase) and digits, you will be fine.

You could say that the user is responsible; he is free to use any characters he wishes as long as he deals with the problem of typing them. But that's not ultimately tenable: when users have trouble, they call your helpdesk, and you have to assume part of their mistakes.

If you are generating random passwords, it's a good idea to avoid characters that can be confused for others. For example (ignoring symbols):

Lowercase: l, o
Uppercase: I, O
Numbers: 1, 0

What chars should I not allow in passwords?

Tags:

Passwords

Password Policy

Web Application

Related

Recent Posts