How do keyboard input and text output work?

There are several different scenarios; I'll describe the most common ones. The successive macroscopic events are:

  1. Input: the key press event is transmitted from the keyboard hardware to the application.
  2. Processing: the application decides that because the key A was pressed, it must display the character a.
  3. Output: the application gives the order to display a on the screen.

GUI applications

The de facto standard graphical user interface of unix systems is the X Window System, often called X11 because it stabilized in the 11th version of its core protocol between applications and the display server. A program called the X server sits between the operating system kernel and the applications; it provides services including displaying windows on the screen and transmitting key presses to the window that has the focus.

Input

+----------+              +-------------+         +-----+
| keyboard |------------->| motherboard |-------->| CPU |
+----------+              +-------------+         +-----+
             USB, PS/2, …                 PCI, …
             key down/up

First, information about the key press and key release is transmitted from the keyboard to the computer and inside the computer. The details depend on the type of hardware. I won't dwell more on this part because the information remains the same throughout this part of the chain: a certain key was pressed or released.

         +--------+        +----------+          +-------------+
-------->| kernel |------->| X server |--------->| application |
         +--------+        +----------+          +-------------+
interrupt          scancode             keysym
                   =keycode            +modifiers

When a hardware event happens, the CPU triggers an interrupt, which causes some code in the kernel to execute. This code detects that the hardware event is a key press or key release coming from a keyboard and records the scan code which identifies the key.

The X server reads input events through a device file, for example /dev/input/eventNNN on Linux (where NNN is a number). Whenever there is an event, the kernel signals that there is data to read from that device. The device file transmits key up/down events with a scan code, which may or may not be identical to the value transmitted by the hardware (the kernel may translate the scan code from a keyboard-dependent value to a common value, and Linux doesn't retransmit the scan codes that it doesn't know).

X calls the scan code that it reads a keycode. The X server maintains a table that translates key codes into keysyms (short for “key symbol”). Keycodes are numeric, whereas keysyms are names such as A, aacute, F1, KP_Add, Control_L, … The keysym may differ depending on which modifier keys are pressed (Shift, Ctrl, …).

There are two mechanisms to configure the mapping from keycodes to keysyms:

  • xmodmap is the traditional mechanism. It is a simple table mapping keycodes to a list of keysyms (unmodified, shifted, …).
  • XKB is a more powerful, but more complex mechanism with better support for more modifiers, in particular for dual-language configuration, among others.

Applications connect to the X server and receive a notification when a key is pressed while a window of that application has the focus. The notification indicates that a certain keysym was pressed or released as well as what modifiers are currently pressed. You can see keysyms by running the program xev from a terminal. What the application does with the information is up to it; some applications have configurable key bindings.

In a typical configuration, when you press the key labeled A with no modifiers, this sends the keysym a to the application; if the application is in a mode where you're typing text, this inserts the character a.

Relationship of keyboard layout and xmodmap goes into more detail on keyboard input. How do mouse events work in linux? gives an overview of mouse input at the lower levels.

Output

+-------------+        +----------+          +-----+         +---------+
| application |------->| X server |---····-->| GPU |-------->| monitor |
+-------------+        +----------+          +-----+         +---------+
               text or              varies          VGA, DVI,
               image                                HDMI, …

There are two ways to display a character.

  • Server-side rendering: the application tells the X server “draw this string in this font at this position”. The font resides on the X server.
  • Client-side rendering: the application builds an image that represents the character in a font that it chooses, then tells the X server to display that image.

See What are the purposes of the different types of XWindows fonts? for a discussion of client-side and server-side text rendering under X11.

What happens between the X server and the Graphics Processing Unit (the processor on the video card) is very hardware-dependent. Simple systems have the X server draw in a memory region called a framebuffer, which the GPU picks up for display. Advanced systems such as found on any 21st century PC or smartphone allow the GPU to perform some operations directly for better performance. Ultimately, the GPU transmits the screen content pixel by pixel every fraction of a second to the monitor.

Text mode application, running in a terminal

If your text editor is a text mode application running in a terminal, then it is the terminal which is the application for the purpose of the section above. In this section, I explain the interface between the text mode application and the terminal. First I describe the case of a terminal emulator running under X11. What is the exact difference between a 'terminal', a 'shell', a 'tty' and a 'console'? may be useful background here. After reading this, you may want to read the far more detailed What are the responsibilities of each Pseudo-Terminal (PTY) component (software, master side, slave side)?

Input

      +-------------------+               +-------------+
----->| terminal emulator |-------------->| application |
      +-------------------+               +-------------+
keysym                     character or
                           escape sequence

The terminal emulator receives events like “Left was pressed while Shift was down”. The interface between the terminal emulator and the text mode application is a pseudo-terminal (pty), a character device which transmits bytes. When the terminal emulator receives a key press event, it transforms this into one or more bytes which the application gets to read from the pty device.

Printable characters outside the ASCII range are transmitted as one or more byte depending on the character and encoding. For example, in the UTF-8 encoding of the Unicode character set, characters in the ASCII range are encoded as a single bytes, while characters outside that range are encoded as multiple bytes.

Key presses that correspond to a function key or a printable character with modifiers such as Ctrl or Alt are sent as an escape sequence. Escape sequences typically consist of the character escape (byte value 27 = 0x1B = \033, sometimes represented as ^[ or \e) followed by one or more printable characters. A few keys or key combination have a control character corresponding to them in ASCII-based encodings (which is pretty much all of them in use today, including Unicode): Ctrl+letter yields a character value in the range 1–26, Esc is the escape character seen above and is also the same as Ctrl+[, Tab is the same as Ctrl+I, Return is the same as Ctrl+M, etc.

Different terminals send different escape sequences for a given key or key combination. Fortunately, the converse is not true: given a sequence, there is in practice at most one key combination that it encodes. The one exception is the character 127 = 0x7f = \0177 which is often Backspace but sometimes Delete.

In a terminal, if you type Ctrl+V followed by a key combination, this inserts the first byte of the escape sequence from the key combination literally. Since escape sequences normally consist only of printable characters after the first one, this inserts the whole escape sequence literally. See key bindings table? for a discussion of zsh in this context.

The terminal may transmit the same escape sequence for some modifier combinations (e.g. many terminals transmit a space character for both Space and Shift+Space; xterm has a mode to distinguish modifier combinations but terminals based on the popular vte library don't). A few keys are not transmitted at all, for example modifier keys or keys that trigger a binding of the terminal emulator (e.g. a copy or paste command).

It is up to the application to translate escape sequences into symbolic key names if it so desires.

Output

+-------------+               +-------------------+
| application |-------------->| terminal emulator |--->
+-------------+               +-------------------+
               character or
               escape sequence

Output is rather simpler than input. If the application outputs a character to the pty device file, the terminal emulator displays it at the current cursor position. (The terminal emulator maintains a cursor position, and scrolls if the cursor would fall under the bottom of the screen.) The application can also output escape sequences (mostly beginning with ^[ or ^]) to tell the terminal to perform actions such as moving the cursor, changing the text attributes (color, bold, …), or erasing part of the screen.

Escape sequences supported by the terminal emulator are described in the termcap or terminfo database. Most terminal emulator nowadays are fairly closely aligned with xterm. See Documentation on LESS_TERMCAP_* variables? for a longer discussion of terminal capability information databases, and How to stop cursor from blinking and Can I set my local machine's terminal colors to use those of the machine I ssh into? for some usage examples.

Application running in a text console

If the application is running directly in a text console, i.e. a terminal provided by the kernel rather than by a terminal emulator application, the same principles apply. The interface between the terminal and the application is still a byte stream which transmits characters, with special keys and commands encoded as escape sequences.

Remote application, accessed over the network

Remote text application

If you run a program on a remote machine, e.g. over SSH, then the network communication protocol relays data at the pty level.

+-------------+           +------+           +-----+           +----------+
| application |<--------->| sshd |<--------->| ssh |<--------->| terminal |
+-------------+           +------+           +-----+           +----------+
               byte stream        byte stream       byte stream
               (char/seq)         over TCP/…        (char/seq)

This is mostly transparent, except that sometimes the remote terminal database may not know all the capabilities of the local terminal.

Remote X11 application

The communication protocol between applications an the server is itself a byte stream that can be sent over a network protocol such as SSH.

+-------------+            +------+        +-----+            +----------+
| application |<---------->| sshd |<------>| ssh |<---------->| X server |
+-------------+            +------+        +-----+            +----------+
               X11 protocol        X11 over       X11 protocol
                                   TCP/…

This is mostly transparent, except that some acceleration features such as movie decoding and 3D rendering that require direct communication between the application and the display are not available.


If you want to see this in a Unix system that is small enough to be understandable, dig into Xv6. It is more or less the mythical Unix 6th Edition that became the base of John Lion's famous commentary, long circulated as samizdat. Its code was reworked to compile under ANSI C and taking modern developments, like multiprocessors, into account.