Set of unambiguous looking letters & numbers for user input

I needed a replacement for hexadecimal (base 16) for similar reasons (e.g. for encoding a key, etc.), the best I could come up with is the following set of 16 characters, which can be used as a replacement for hexadecimal:

0 1 2 3 4 5 6 7 8 9 A B C D E F     Hexadecimal
H M N 3 4 P 6 7 R 9 T W C X Y F     Replacement

In the replacement set, we consider the following:

All characters used have major distinguishing features that would only be omitted in a truly awful font.

Vowels A E I O U omitted to avoid accidentally spelling words.

Sets of characters that could potentially be very similar or identical in some fonts are avoided completely (none of the characters in any set are used at all):

0 O D Q 
1 I L J
8 B 
5 S
2 Z

By avoiding these characters completely, the hope is that the user will enter the correct characters, rather than trying to correct mis-entered characters.

For sets of less similar but potentially confusing characters, we only use one character in each set, hopefully the most distinctive:

Y U V 

Here Y is used, since it always has the lower vertical section, and a serif in serif fonts

C G         

Here C is used, since it seems less likely that a C would be entered as G, than vice versa

X K         

Here X is used, since it is more consistent in most fonts

F E         

Here F is used, since it is not a vowel

In the case of these similar sets, entry of any character in the set could be automatically converted to the one that is actually used (the first one listed in each set). Note that E must not be automatically converted to F if hexadecimal input might be used (see below).

Note that there are still similar-sounding letters in the replacement set, this is pretty much unavoidable. When reading aloud, a phonetic alphabet should be used.

Where characters that are also present in standard hexadecimal are used in the replacement set, they are used for the same base-16 value. In theory mixed input of hexadecimal and replacement characters could be supported, provided E is not automatically converted to F.

Since this is just a character replacement, it should be easy to convert to/from hexadecimal.

Upper case seems best for the "canonical" form for output, although lower case also looks reasonable, except for "h" and "n", which should still be relatively clear in most fonts:

h m n 3 4 p 6 7 r 9 t w c x y f

Input can of course be case-insensitive.

There are several similar systems for base 32, see http://en.wikipedia.org/wiki/Base32 However these obviously need to introduce more similar-looking characters, in return for an additional 25% more information per character.

Apparently the following set was also used for Windows product keys in base 24, but again has more similar-looking characters:

B C D F G H J K M P Q R T V W X Y 2 3 4 6 7 8 9

Mainly drawing inspiration from this ux thread, mentioned by @rwb,

  • Several programs use similar things. The list in your post seems to be very similar to those used in these programs, and I think it should be enough for most purposes. You can add always add redundancy (error-correction) to "forgive" minor mistakes; this will require you to space-out your codes (see Hamming distance), though.
  • No references as to particular method used in deriving the lists, except trial and error with humans (which is great for non-ocr: your users are humans)
  • It may make sense to use character grouping (say, groups of 5) to increase context ("first character in the second of 5 groups")
  • Ambiguity can be eliminated by using complete nouns (from a dictionary with few look-alikes; word-edit-distance may be useful here) instead of characters. People may confuse "1" with "i", but few will confuse "one" with "ice".
  • Another option is to make your code into a (fake) word that can be read out loud. A markov model may help you there.