Where did the octal/hex notations come from?

C, the ancestor of C++ and Java, was originally developed by Dennis Richie on PDP-8s in the early 70s. Those machines had a 12-bit address space, so pointers (addresses) were 12 bits long and most conveniently represented in code by four 3-bit octal digits (first addressable word would be 0000octal, last addressable word 7777octal).

Octal does not map well to 8 bit bytes because each octal digit represents three bits, so there will always be excess bits representable in the octal notation. An all-TRUE-bits byte (1111 1111) is 377 in octal, but FF in hex.

Hex is easier for most people to convert to and from binary in their heads, since binary numbers are usually expressed in blocks of eight (because that's the size of a byte) and eight is exactly two Hex digits, but Hex notation would have been clunky and misleading in Dennis' time (implying the ability to address 16 bits). Programmers need to think in binary when working with hardware (for which each bit typically represents a physical wire) and when working with bit-wise logic (for which each bit has a programmer-defined meaning).

I imagine Dennis added the 0 prefix as the simplest possible variation on everyday decimal numbers, and easiest for those early parsers to distinguish.

I believe Hex notation 0x__ was added to C slightly later. The compiler parse tree to distinguish 1-9 (first digit of a decimal constant), 0 (first [insignificant] digit of an octal constant), and 0x (indicating a hex constant to follow in subsequent digits) from each other is considerably more complicated than just using a leading 0 as the indicator to switch from parsing subsequent digits as octal rather than decimal.

Why did Dennis design this way? Contemporary programmers don't appreciate that those early computers were often controlled by toggling instructions to the CPU by physically flipping switches on the CPUs front panel, or with a punch card or paper tape; all environments where saving a few steps or instructions represented savings of significant manual labor. Also, memory was limited and expensive, so saving even a few instructions had a high value.

In summary: 0 for octal because it was efficiently parseable and octal was user-friendly on PDP-8s (at least for address manipulation)

0x for hex probably because it was a natural and backward-compatible extension on the octal prefix standard and still relatively efficient to parse.


“New” numerals had to start with a digit, to work with existing syntax.

Established practice had variable names and other identifiers starting with a letter (or a few other symbols, perhaps underscore or dollar sign). So “a”, “abc”, and “a04” are all names. Numbers started with a digit. So “3” and “3e5” are numbers.

When you add new things to a programming language, you seek to make them fit into the existing syntax, grammar, and semantics, and you try to make existing code continue working. So, you would not want to change the syntax to make “x34” a hexadecimal number or “o34” an octal number.

So, how do you fit octal numerals into this syntax? Somebody realized that, except for “0”, there is no need for numerals beginning with “0”. Nobody needs to write “0123” for 123. So we use a leading zero to denote octal numerals.

What about hexadecimal numerals? You could use a suffix, so that “34x” means 3416. However, then the parser has to read all the way to the end of the numeral before it knows how to interpret the digits (unless it encounters one of the “a” to “f” digits, which would of course indicate hexadecimal). It is “easier” on the parser to know that the numeral is hexadecimal early. But you still have to start with a digit, and the zero trick has already been used, so we need something else. “x” was picked, and now we have “0x” for hexadecimal.

(The above is based on my understanding of parsing and some general history about language development, not on knowledge of specific decisions made by compiler developers or language committees.)


The zero prefix for octal, and 0x for hex, are from the early days of Unix.

The reason for octal's existence dates to when there was hardware with 6-bit bytes, which made octal the natural choice. Each octal digit represents 3 bits, so a 6-bit byte is two octal digits. The same goes for hex, from 8-bit bytes, where a hex digit is 4 bits and thus a byte is two hex digits. Using octal for 8-bit bytes requires 3 octal digits, of which the first can only have the values 0, 1, 2 and 3 (the first digit is really 'tetral', not octal). There is no reason to go to base32 unless somebody develops a system in which bytes are ten bits long, so a ten-bit byte could be represented as two 5-bit "nybbles".


I dunno ...

0 is for 0ctal

0x is for, well, we've already used 0 to mean octal and there's an x in hexadecimal so bung that in there too

as for natural progression, best look to the latest programming languages which can affix subscripts such as

123_27 (interpret _ to mean subscript)

and so on

?

Mark

Tags:

C

Hex

Octal