AWK: wrap lines to 72 characters

Not using awk

I understand this may just be one part of a larger problem you are trying to solve using awk or simply an attempt to understand awk better, but if you really just want to keep your line length to 72 columns, there is a much better tool.

The fmt tool was designed with specifically this in mind:

fmt --width=72 filename

fmt will also try hard to break the lines in reasonable places, making the output nicer to read. See the info page for more details about what fmt considers "reasonable places."


Here is an AWK script that wraps long lines and re-wraps the remainders as well as short lines:

awk -v WIDTH=72 '
{
    gsub("\t"," ")
    $0 = line $0
    while (length <= WIDTH) {
        line = $0
        more = getline
        gsub("\t"," ")
        if (more)
            $0 = line " " $0
        else
            $0 = line
            break
    }
    while (length >= WIDTH) {
        print substr($0,1,WIDTH)
        $0 = substr($0,WIDTH+1)
    }
    line = $0 " "
}

END {
    print
}
'

There is a Perl script available on CPAN which does a very nice job of reformatting text. It's called paradj (individual files). In order to do hyphenation, you will also need TeX::Hyphen.

SWITCHES
--------
The available switches are:

--width=n (or -w=n or -w n)
    Line width is n chars long

--left (or -l)
    Output is left-justified (default)

--right (or -r)
    Output is right-justified

--centered (or -c)
    Output is centered

--both (or -b)
    Output is both left- and right-justified

--indent=n (or -i=n or -i n)
    Leave n spaces for initial indention (defaults to 0)

--newline (or -n)
    Insert blank lines between paragraphs

--hyphenate (or -h)
    Hyphenate word that doesn't fit on a line

Here is a diff of some changes I made to support a left-margin option:

12c12
< my ($indent, $newline);
---
> my ($indent, $margin, $newline);
15a16
>   "margin:i" => \$margin,
21a23
> $margin = 0 if (!$margin);
149a152
>     print " " x $margin;
187a191,193
>   print "--margin=n (or -m=n or -m n)  Add a left margin of n ";
>   print "spaces\n";
>   print "                                (defaults to 0)\n";

Awk is a Turing-complete language, and not a particularly obfuscated one, so it's easy enough to truncate lines. Here's a straightforward imperative version.

awk -v WIDTH=72 '
{
    while (length>WIDTH) {
        print substr($0,1,WIDTH);
        $0=substr($0,WIDTH+1);
    }
    print;
}
'

If you want to truncate lines between words, you can code it up in awk, but recognizing words is a non-trivial (for reasons having more to do with natural languages than algorithmic difficulty). Many systems have a utility called fmt that does just that.