Why are special characters such as "carriage return" represented as "^M"?

I believe that what OP was actually asking about is called Caret Notation.

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code

The full list of ASCII control characters along with caret notation can be found here

Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.


That is exactly the reason.

ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7) manual page from a random Linux system (man ascii), up to and including CR (13):

   Oct   Dec   Hex   Char                       
   ─────────────────────────────────────────────
   000   0     00    NUL '\0'                    
   001   1     01    SOH (start of heading)     
   002   2     02    STX (start of text)         
   003   3     03    ETX (end of text)           
   004   4     04    EOT (end of transmission)   
   005   5     05    ENQ (enquiry)               
   006   6     06    ACK (acknowledge)           
   007   7     07    BEL '\a' (bell)             
   010   8     08    BS  '\b' (backspace)       
   011   9     09    HT  '\t' (horizontal tab)  
   012   10    0A    LF  '\n' (new line)        
   013   11    0B    VT  '\v' (vertical tab)    
   014   12    0C    FF  '\f' (form feed)       
   015   13    0D    CR  '\r' (carriage ret)    

Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.

The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.


The notation goes back to the earliest ASCII Teletypes (ca 1963). There was a CTRL key that toggled the 0x40 bit so that CTRL-M (carriage return) would be 0D instead of 4D, CTRL-G (bell) would be 07 instead of 47, CTRL-L (form feed) would be 0C instead of 4C.

There was no "design" in assigning particular letters to particular functions, it was just chance that, when the dust settled from assigning ASCII codes, the M key was one bit different from carriage return and hence carriage return became CTRL-M.

Here is the best shot I can find of an ASR33 keyboard. As you can see the control character names are printed in small letters on the corresponding alpha keys.

Teletype Model 33 ASR with paper tape punch/reader

Image by Marcin Wichary, User:AlanM1 (Derived (cropped) from File:ASR-33 2.jpg) [CC BY 2.0], via Wikimedia Commons

The M key does not have a notation on it because there is a dedicated "RETURN" key, so CTRL-M is redundant.