Removing control chars (including console codes / colours) from script output

The following script should filter out all ANSI/VT100/xterm control sequences for (based on ctlseqs). Minimally tested, please report any under- or over-match.

#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
    s/ \e[ #%()*+\-.\/]. |
       \e\[ [ -?]* [@-~] | # CSI ... Cmd
       \e\] .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
       \e[P^_] .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
       \e. //xg;
    print;
}

Known issues:

  • Doesn't complain about malformed sequences. That's not what this script is for.
  • Multi-line string arguments to DCS/PM/APC/OSC are not supported.
  • Bytes in the range 128–159 may be parsed as control characters, though this is rarely used. Here's a version which parses non-ASCII control characters (this will mangle non-ASCII text in some encodings including UTF-8).
#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
    s/ \e[ #%()*+\-.\/]. |
       (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
       (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
       (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
       \e.|[\x80-\x9f] //xg;
    print;
}

Updating Gilles' answer to also remove carriage returns and do backspace-erasing of previous characters, which were both important to me for a typescript generated on Cygwin:

#!/usr/bin/perl

while (<>) {
  s/ \e[ #%()*+\-.\/]. |
    \r | # Remove extra carriage returns also
    (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
    (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
    (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
    \e.|[\x80-\x9f] //xg;
    1 while s/[^\b][\b]//g;  # remove all non-backspace followed by backspace
  print;
}

I would use sed in this case:

cat -v typescript | sed -e "s/\x1b\[.\{1,5\}m//g"

sed -e "s/search/replace/g" is standard stuff. The regex is explained as below:

  • \x1b match the Escape preceeding the color code
  • \[ matches the first open bracket
  • .\{1,5\} matches 1 to 5 of any single character. Have to \ the curly braces to keep the shell from mangling them.
  • m last character in regex - usually trails the color code.
  • // empty string for what to replace everything with.
  • g match it multiple times per line.