Determine how long tabs '\t' are on a line

The TAB character is a control character which when sent to a terminal¹ makes the terminal's cursor move to the next tab-stop. By default, in most terminals, the tab stops are 8 columns apart, but that's configurable.

You can also have tab stops at irregular intervals:

$ tabs 3 9 11; printf '\tx\ty\tz\n'
  x     y z

Only the terminal knows how many columns to the right a TAB will move the cursor.

You can get that information by querying the cursor position from the terminal before and after the tab has been sent.

If you want to make that calculation by hand for a given line and assuming that line is printed at the first column of the screen, you'll need to:

know where the tab-stops are²
know the display width of every character
know the width of the screen
decide whether you want to handle other control characters like \r (which moves the cursor to the first column) or \b that moves the cursor back...)

It can be simplified if you assume the tab stops are every 8 columns, the line fits in the screen and there are no other control characters or characters (or non-characters) that your terminal cannot display properly.

With GNU wc, if the line is stored in $line:

width=$(printf %s "$line" | wc -L)
width_without_tabs=$(printf %s "$line" | tr -d '\t' | wc -L)
width_of_tabs=$((width - width_without_tabs))

wc -L gives the width of the widest line in its input. It does that by using wcwidth(3) to determine the width of characters and assuming the tab stops are every 8 columns.

For non-GNU systems, and with the same assumptions, see @Kusalananda's approach. It's even better as it lets you specify the tab stops but unfortunately currently doesn't work with GNU expand (at least) when the input contains multi-byte characters or 0-width (like combining characters) or double-width characters.

¹ note though that if you do stty tab3, the tty device line discipline will take over the tab processing (convert TAB to spaces based on its own idea of where the cursor might be before sending to the terminal) and implement tab stops every 8 columns. Testing on Linux, it seems to handle properly CR, LF and BS characters as well as multibyte UTF-8 ones (provided iutf8 is also on) but that's about it. It assumes all other non-control characters (including zero-width, double-width characters) have a width of 1, it (obviously) doesn't handle escape sequences, doesn't wrap properly... That's probably intended for terminals that can't do tab processing.

In any case, the tty line discipline does need to know where the cursor is and uses those heuristics above, because when using the icanon line editor (like when you enter text for applications like cat that don't implement their own line editor), when you press TabBackspace, the line discipline needs to know how many BS characters to send to erase that Tab character for display. If you change where the tab stops are (like with tabs 12), you'll notice that Tabs are not erased properly. Same if you enter double-width characters before pressing TabBackspace.

² For that, you could send tab characters and query the cursor position after each one. Something like:

tabs=$(
  saved_settings=$(stty -g)
  stty -icanon min 1 time 0 -echo
  gawk -vRS=R -F';' -vORS= < /dev/tty '
    function out(s) {print s > "/dev/tty"; fflush("/dev/tty")}
    BEGIN{out("\r\t\33[6n")}
    $NF <= prev {out("\r"); exit}
    {print sep ($NF - 1); sep=","; prev = $NF; out("\t\33[6n")}'
  stty "$saved_settings"
)

Then, you can use that as expand -t "$tabs" using @Kusalananda's solution.

$ expand file | awk '{ print gsub(/ /, " ") }'
11
9
9

The POSIX expand utility expands tabs to spaces. The awk script counts and outputs the number of substitutions needed to replace all spaces on each line.

To avoid counting any preexisting spaces in the input file:

$ tr ' ' '@' <file | expand | awk '{ print gsub(/ /, " ") }'

where @ is a character that is guaranteed not to exist in the input data.

If you want 10 spaces per tab instead of the ordinary 8:

$ tr ' ' '@' <file | expand -t 10 | awk '{ print gsub(/ /, " ") }'
9 
15
13

With perl:

perl -F/\\t/ -lpe '$c = 0; $F[-1] eq "" or pop @F; $_ = (map { $c += 8 - (length) % 8 } @F)[-1]' file

Alternatively:

perl -MList::Util=reduce -lpe \
    '@F = split /\t/, $_, -1; pop @F if $F[-1] ne ""; $_ = reduce { $a + $b } map { 8 - (length) % 8 } @F' file

You can change 8 above with some other value if you want TABs to have a different length.

Determine how long tabs '\t' are on a line

Tags:

Text Processing

Control Characters

Related

Recent Posts