bash: Shift+arrow keys make A,B,C,D

This is a keyboard input protocol that goes back to the 1980s, and your shell, not your "terminal driver" (whatever that is supposed to be) as in M. Vazquez-Abrams's answer, is not handling it properly. It is, moreover, a perfectly valid control sequence.

Background

Terminals emit control sequences for function key and extended key presses. They can emit DECFNK control sequences, which are CSI-introduced control sequences; Linux function key control sequences, which are a different kind of CSI-introduced sequence; SCO console function key control sequences, which are a third kind of CSI-introduced sequence; shifted single characters, prefixed with SS3; or, as in this case, ECMA-48 standard sequences for various things.

(SS3 and CSI are control characters, in the C1 range. Single Shift 3 and Control Sequence Introducer.)

You have two particular keypads on your (IBM Model M-alike or similar) keyboard, a calculator keypad and a cursor keypad. The model employed by DEC VT-style terminal emulators (which is most of the terminal emulators that you are likely to encounter, from the one in your kernel to unicode-rxvt) is that both keypads have separately switchable application/normal modes. A full-screen TUI application, something using the libedit or GNU readline libraries (or ZLE) such as your shell, and a few other types of application specify which mode they want, and then listen for control sequences coming from the terminal by reading bursts of characters (on the grounds that a human cannot type a full ECMA-48 control sequence anywhere near as fast as a terminal or a terminal emulator sends control sequences, so all coming in one burst is what distinguishes a user pressing the Esc key from the terminal emulator sending a control sequence starting with the ␛ character).

  • In application mode, the arrow keys on each keypad produce shifted single characters prefixed with SS3. Modifiers cannot really have any effect (despite XTerm having botched this) because ECMA-35 and ECMA-48 define SS2 and SS3 as only acting on a single following character. But, on the flip side, the calculator and cursor keypads generate different SS3-shifted characters, allowing the two keypads to be distinguished from each other.
  • In normal mode, the arrow keys on each keypad produce the same CSI-introduced control sequences, and they are the ones from ECMA-48 with augmentations from DEC VTs. In particular, the cursor keys send the ECMA-48 control sequences CUU, CUD, CUR, and CUL (CUrsor Up, CUrsor Down, CUrsor Right, and CUrsor Left). The DEC augmentations to the ECMA-48 control sequences are that the control sequence includes the current modifier state.

So one has a choice between application mode, where one cannot know what modifiers are pressed but one can distinguish the two Left Arrow keys, and normal mode, where one cannot distinguish between two arrow keys but one can know what modifiers are pressed.

In more detail: The DEC augmentations to the ECMA-48 control sequences are that the control sequence has two parameters:

  • The first parameter is analogous to the first parameter than a CUU, CUD, CUR, or CUL can actually have, per ECMA-48. It is the occurrence count, and is thus always 1.
  • The second parameter is the interesting one. It contains the modifier key state, which (for reasons involving how parameters in CSI-introduced control sequences work when omitted) is a set of bitflags for various modifier keys, plus 1, encoded as a decimal number.

This is how DEC VT terminals have been doing things since the 1980s. In recent years, several terminal emulators finally introduced the same functionality (albeit, as mentioned, XTerm got it rather wrong).

What's going on.

The problem is that your GNU readline library, libedit, ZLE, and so forth don't really handle the protocol properly. They are not totally to blame. They rely upon the termcap and terminfo systems, which simply aren't up to the job here. termcap and terminfo don't really have the notion of an input control sequence that can vary, let alone multiple-mode keypads.

For that you have to look to the likes of Vim, which can be programmed with special overrides for terminfo to specify control sequences that follow the aforegiven protocol (c.f. :help xterm-modifier-keys in Vim), or NeoVIM, which uses Paul Evans's libtermkey and its CSI driver. libtermkey's CSI driver is how one has to handle keyboard input properly from DEC VT-alike terminal emulators. It's an actual ECMA-48 state machine parser that decodes control sequences properly.

But what your shell is doing is looking up entries for the arrow keys in terminfo, and only matching those specific control sequences.

Specifically:

  • Your shell is looking up the kcub1 capability for your terminal in its terminfo record. Here's the one from the teken record, for example:
    % tput -T teken kcub1|hexdump -C
    00000000  1b 5b 44                                          |.[D|
    00000003
    %
  • It is only matching that specific input sequence as ← Left Arrow.
  • When you press ⇧ Level 2 Shift+← Left Arrow your terminal emulator is sending the control sequence CSI 1 ; 2 D. Rather, it is using the 7-bit alternatives and sending [ 1 ; 2 D, where [ is the way of encoding CSI in 7-bit characters.
  • Your shell fails to match that against any known fixed input sequence from terminfo, and aborts processing. On my Bourne Again shell here, it ends up swallowing the first two characters and acts as if I have pressed ; 2 D. On your Bourne Again shell, it ends up swallowing the first four characters and acts as if you have pressed D.

    What the failure mode is is dependent from exactly what set of input sequences it is attempting to pattern match, as that determines how many characters it swallows before it determines that it has a sequence with no possible matches. This of course is in turn dependent from what your terminal's terminfo/termcap record actually contains and what terminal type you have told your shell that your terminal is.

Fixes

The local fix for this sort of thing is to get creative with the keybindings in your shell. It's why, for example, you'll find people doing this sort of thing with the Z shell in their .zshrcs:

bindkey "\e[1;5D" backward-word
bindkey "\e[1;5C" forward-word

Unfortunately, there is no non-local fix. It would involve rearchitecting your shell's input handling quite significantly. Such rearchitecting is long overdue. (Witness NeoVIM.) But no-one has tackled it yet.

Further reading

  • Character Code Structure and Extension Techniques . ECMA-35. 6th edition. 1994. ECMA International.
  • Control Functions for Coded Character Sets. ECMA-48. 5th edition. 1991. ECMA International.
  • "ANSI, Short ANSI, and PC Keyboard Codes". VT420 Programmer Reference Manual. EK-VT420-RM-002. February 1992. Digital.
  • DECFNK. "ANSI Control Functions". VT510 Terminal Programmer Information. EK-VT510-RM. November 1993. Digital.
  • VT520/VT525 Video Terminal Programmer Information. EK-VT520-RM. July 1994. Digital.
  • https://github.com/fish-shell/fish-shell/issues/2139#issuecomment-388706768
  • https://unix.stackexchange.com/a/238932/5132

Pressing Ctrl+V will cause the next keypress to be input literally. For Shift+ this results in "^[[1;2A". The terminal driver consumes the "^[[1;2" as an invalid escape sequence, leaving only the "A".