What is the best way to pipe the output of a command through a pager if (and only if) it is too long?

I’ve got a solution that’s written for POSIX shell compliance, but I’ve tested it only in bash, so I don’t know for sure whether it’s portable.  And I don’t know zsh, so I have made no attempt to make it zsh-friendly.  You pipe your command into it; passing a command as argument(s) to another command is a bad design*.

Of course any solution to this problem needs to know how many rows and columns the terminal has.  In the code below, I’ve assumed that you can rely on the LINES and COLUMNS environment variables (which less looks at).  More reliable methods are:

  • use rows="${LINES:=$(tput lines)}" and cols="${COLUMNS:=$(tput cols)}", as suggested by A.P., or
  • look at the output from stty size.  Note that this command must have the terminal as its standard input, so, if it’s in a script, and you’re piping into the script, you’ll have to say stty size <&1 (in bash) or stty size < /dev/tty.  Capturing its output is even more complicated.

The secret ingredient: the fold command will break long lines the way the screen will, so the script can handle long lines correctly.

#!/bin/sh
buffer=$(mktemp)
rows="$LINES"
cols="$COLUMNS"
while true
do
      IFS= read -r some_data
      e=$?        # 1 if EOF, 0 if normal, successful read.
      printf "%s" "$some_data" >> "$buffer"
      if [ "$e" = 0 ]
      then
            printf "\n" >> "$buffer"
      fi
      if [ $(fold -w"$cols" "$buffer" | wc -l) -lt "$rows" ]
      then
            if [ "$e" != 0 ]
            then
                  cat "$buffer"
            else
                  continue
            fi
      else
            if [ "$e" != 0 ]
            then
                  "${PAGER:="less"}" < "$buffer"
                  # The above is equivalent to
                  # cat "$buffer"   | "${PAGER:="less"}"
                  # … but that’s a UUOC.
            else
                  cat "$buffer" - | "${PAGER:="less"}"
            fi
      fi
      break
done
rm "$buffer"

To use this:

  • Put the above into a file; let’s assume you call it mypager.
  • (Optionally) put it into a directory that’s is your search path; e.g., $HOME/bin.
  • Make it executable by typing chmod +x mypager.
  • Use it in commands like ps ax | mypager or ls -la | mypager.
    If you skipped the second step (putting the script into a directory that’s is your search path), you’ll have to do ps ax | path_to_mypager/mypager, where path_to_mypager can be a relative path like “.”.

* Why is passing a command as argument(s) to another command a bad design?

I. Aesthetics / Conformance to Traditions / Unix Philosophy

Unix has a philosophy of Do One Thing and Do It Well.  For example, if a program is going to display data in a certain way (as pagers do), then it shouldn’t also be invoking the mechanism that produces the data.  That’s what pipes are for.

Not many Unix programs execute user-specified commands or programs.  Let’s look at some that do:

  • The shell, as in sh -c "command"
    Well, running user-specified commands is the shell’s job; it’s the One Thing that the shell does.  (Of course I am not saying that the shell is a simple program.)
  • env, nice, nohup, setsid, su, and sudo.  These programs have something in common — they all exist to run a program with a modified execution environment1.  They have to work the way they do, because Unix generally doesn’t allow you to change the execution environment of another process; you have to change your own process, and then fork and/or exec.
    _______
    1 I’m using the phrase execution environment in the broad sense, referring not only to environment variables, but also process attributes such as “nice” value, UID and GIDs, process group, session ID, controlling terminal, open files, working directory, umask value, ulimits, signal dispositions, alarm timer, etc.
  • Programs that allow a “shell escape”.  The only example that springs to mind is vi/vim, although I’m pretty sure that there are others.  These are historical artifacts.  They predate window systems and even job control; if you were editing a file, and you wanted to do something else (like look at a directory listing), you would have had to save your file and exit from the editor to get back to your shell.  Nowadays you can switch to another window, or use Ctrl+Z (or type :suspend) to get back to your shell while keeping your editor alive, so shell escapes are, arguably, obsolete.

I’m not counting programs that execute other (hard-coded) programs so as to leverage their capabilities rather than duplicate them.  For example, some programs may execute diff or sort.  (For example, there are tales that that early versions of spell used sort -u to get a list of the words used in a document, and then diff — or perhaps comm — to compare that list to the dictionary word list and identify which words from the document were not in the dictionary.)

II. Timing Issues

The way your script is written, the RET="$($@)" line doesn’t complete until the invoked command completes.  Therefore, your script cannot begin to display data until the command that generates it has completed.  Probably the simplest way to fix that is to make the data-generating command separate from the data-displaying program (although there are other ways).

III. Command History

  1. Suppose you run some command with output processed by your display filter, and you look at the output, and decide that you want to save that output in a file.  If you had typed (as a hypothetical example)

    ps ax | mypager
    

    you can then type

    !:1 > myfile
    

    or press and edit the line appropriately.  Now, if you had typed

    mypager "ps ax"
    

    you can still go back and edit that command into ps ax > myfile, but it’s not so straightforward.

  2. Or suppose you decide that you want to run ps uax next.  If you had typed ps ax | mypager, you could do

    !:0 u!:*
    

    Again, with mypager "ps ax", it’s still doable, but, arguably, harder.

  3. Also, look at the two commands: ps ax | mypager and mypager "ps ax".  Suppose you run a history listing an hour later.  ISTM that you’d have to look at mypager "ps ax" a little bit harder to see what the command being executed is.

IV. Complex Commands / Quoting

  1. echo {1..10000} is obviously just an example command; ps ax isn’t much better.  What if you want to do something just a little bit more realistic, like ps ax | grep oracle?  If you type

    mypager ps ax | grep oracle
    

    it will run mypager ps ax and pipe the output from that through grep oracle.  So, if the output from ps ax is 30 lines long, mypager will invoke less, even if the output from ps ax | grep oracle is only 3 lines.  There are probably examples that will fail in a more dramatic fashion.

    So you have to do what I was showing earlier:

    mypager "ps ax | grep oracle"
    

    But RET="$($@)" can’t handle that.  There are, of course, ways to handle things like that, but they are discouraged.

  2. What if the command line whose output you want to capture is even more complicated; e.g.,

    command1  "arg1"   |   command2  'arg2'  $'arg3'

    where the arguments contain messy combinations of space, tab, $, |, \, <, >, *, ;, &, [, ], (, ), `, and maybe even ' and ".  A command like that can be hard enough to type directly into the shell correctly.  Now imagine the nightmare of having to quote it to pass it as an argument to mypager.


That's what the -F option of less is for, though you'd need to use the -X option as well, otherwise it prints the text to the alternate screen on terminals that have one (which means it would not be readily available once less exits). That might change in the future as there's currently an enhancement request to have -X implied when the text fits on one screen with -F (303) and RedHat systems apparently have had a patch for that since 2008 (though it has not made it upstreams yet (as of 2017-09-14, I've just sent a mail to [email protected] about that)).

So:

cmd | less -RXF

If you still wanted to use the alternate screen if the output is too long, that's where you'd need to get fancy (on systems that don't have the RedHat patch mentioned above):

page() {
  L=${LINES:-$(tput lines)} C=${COLUMNS:-$(tput cols)} \
    perl -Mopen=locale -MText::Tabs -MText::CharWidth=mbswidth -e '
      while(<STDIN>) {
        if ($pager) {
          print $pager $_;
        } else {
          chomp(my $line = $_);
          $line =~ s/\e\[[\d;]*m//g;
          $l += 1 + int(mbswidth(expand($line)) / $ENV{C});
          $buf .= $_;
          if ($l > $ENV{L}) {
            open $pager, "|-", "less", "-R", @ARGV or die "pager: $!";
            print $pager $buf;
          }
        }
      }
      print $buf unless $pager;' -- "$@"
}

To be used as:

cmd | page

or

page < file
page -S < file...

(not page file, it's only meant to page stdin).

We're trying to guess the length of the output by stripping the color escape sequences, expanding the tabs and computing the width so we can determine the number of terminal lines to display a given text line.

That should work as long as the output doesn't have other escape sequences or control/ill-encoded characters.

Note also one significant difference from the RedHat patch: for one-screen output, the output doesn't go through the less post-processing (like rendering of control characters as ^X in reverse video, squeezing of empty lines with -s...). While that's closer to what it being asked here, that may be less desirable in practice.

You may have to install the Text::CharWidth module which is not one of the standard ones (libtext-charwidth-perl package on Debian).