Terminal escape sequences: why don't terminals report what features they support, instead of relying on terminfo?

It's not as simple as you might suppose. xterm (like the DEC VTxxx terminals starting with VT100) has a number of reports for various features (refer to XTerm Control Sequences). The most generally useful is that which tells what type of terminal it is:

CSI Ps c  Send Device Attributes (Primary DA).

Not all terminals have that type of response (Sun hardware console has/had none).

But there are more features than reports (for instance, how to tell whether a terminal is really interpreting UTF-8: the accepted route for that is via the locale environment variables, so no need has been established for another control sequence/response).

In practice, while there are a few applications that pay attention to reports (such as vim, checking the actual values of function keys, the number of colors using DCS + p Pt ST, and even the cursor appearance using DCS $ q Pt ST), the process is unreliable because some developers find it simpler to return a given report-response than to implement the feature. If you read through the source code for various programs, you'll find interesting quirks where someone has customized a response to make it look like some version of xterm.


Here's my view on escape sequences that query the terminal emulator.

tl;dr: Due their asynchronous nature, they are truly problematic to handle by apps. (Not by the terminal emulators, it's a piece of cake there.)


First, for simplicity, let's assume that all terminal emulators are guaranteed to send a response to all such queries. (This would require that queries have a well defined generic structure rather than one-off escape sequences, which doesn't seem to be the case.)

Let's design and implement in our minds a simple utility (such as "ls") as well as a more complex fullscreen app (such as "mc" or "vim") and let's see what problems we face.

  • A standard feature of Unix is that you can type ahead the next command while a previous one is running (e.g. type "sleep 10" Enter, then "mc" Enter, then press F5; about 10 second later, "mc" will open with the Copy dialog). One may or may not like this feature, but at least the behavior should be consistent across apps, and this is the behavior of apps that don't dynamically query the terminal emulator. Let's also imagine that "mc" uses such a querying escape sequence at startup to figure out whichever feature. Now, mc will receive the escape sequence of F5 prior to the response. It might ignore it (in which case the behavior will be inconsistent with the rest of the apps) or needs to stash it somewhere, wait until the response to the query arrives, and then process this F5. In order to do this, it can no longer let each component read and process directly from stdin, it needs some wrapper layer in between. Doable, but takes quite some effort to implement (the necessary additional work is noticeable even for projects started from scratch, let alone ones that are already implemented without this criterium and hence you have to heavily refactor.)

  • Similar story goes if it needs to query the terminal for whatever reason during its operation. Then dropping the intermediate "normal" characters before the response arrives is absolutely unacceptable.

  • Now instead of a utility written in C or some similar language, imagine a shell script itself that needs to read this response as well as other normal input from the user (from stdin). How would you go? For each "read" in your code handling such a response, would you manually set the terminator characters to newline or the end of the response escape sequence, and locate and strip off the escape sequence from the rest? How to make the escape sequence response not appear on the screen while the user input still appears? How would you "feed" the normal input that you receive while expecting an escape sequence response to the subsequent "regular" "read" commands that expect the user data? I can't see how it can be done, but even if it can, it's obviously unbearably complex, tedious and error-prone. The only thing I can imagine reasonably implementable and reliably working is to drop typed-ahead characters (resulting in an unusual behavior) and process such response escape sequence at the startup of your script only.

  • The roundtrip time between the terminal emulator and the app might become significant if your simple utility (e.g. "ls") is used in a shell script inside a loop. On single-core systems a context switch between the two apps (the utility and the terminal emulator) is required, although it's probably not that bad compared to the fork()+execve()+friends that happens anyways. On multi-core systems I guess this is not necessary, although I'm not sure with the details. The cost (latency), however, might become truly significant if actual network traffic is involved.

  • On rare cases that the app exits without reading the response (e.g. it crashes or gets killed), the response appears at the next command that you begin typing (which I'm sure you've already seen happening e.g. when you accidentally cat'ed a binary file).


Now, let's assume that there are some terminal emulators that don't recognize (or just choose to not respond to) some of the querying escape sequences, including future ones. This is how the current state of things look like. This makes all the previous bullet points even magnitudes harder. You cannot risk your app freezing in this case, so it needs a timeout.

  • How long do you wait for the answer? How do you make up an arbitrary timeout? Do you (and if, how) adjust this timeout according to the network's characteristics (e.g. local terminal emulator vs. ssh to the neighbor building vs. ssh to the other side of the globe)?

  • What if the response doesn't arrive in time (e.g. due to a lag over ssh)? Would your app continue in degraded mode, visible to the user?

  • What if the response arrives later than the app gives up? Your app potentially needs to be prepared for such response arriving even at places where it wouldn't expect at all. (E.g. in your shell script you initially query some state, wait for the response with a timeout, but each and every single "read" later on needs to be prepared to be polluted with this delayed response.)

  • The timeout adds a lot to the roundtrip time, probably much longer than kernel context switches or even network latency. Putting such commands in a shell script loop would be absolutely unbearable, but probably even negative implications on usability of interactive apps would be noticeable.


It's beyond the scope of this response to show what alternative I would find feasible. The design of the TERM variable also has plenty of limitations, which I'm not going to go into here. For querying features (that are static for the terminal emulator, rather than current properties such as cursor position) I would probably start in the direction of TERMCAP where the actual behavior was described. It could even point to a local file, and could be called TERM as now, but ssh-like utilities would be responsible to forward their actual contents to the remote site and point TERM over there to this file, in a similar manner to .Xauthority. Another completely different approach could be to have a fourth standard file descriptor for such meta-communication with the terminal emulator.