Why no vertical-mode Knuth-Plass?

this was addressed by Knuth in a q&a session in st. petersburg, florida, published in tugboat: - TUG'95: Questions and Answers with Prof. Donald E. Knuth, pp.18 (bottom of column 2) - 20; the session was republished in Digital Typography, with the relevant question starting on p.594.

the page-breaking problem was also the subject of Michael Plass' dissertation, Optimal Pagination Techniques for Automatic Typesetting Systems, posted on the tug web site.


It was proved by Plass, that the page breaking problem can be NP-complete. Computers were about 10^4 times slower than nowadays, so it was a problem.


Your assumption isn't quite correct, TeX doesn't simply use a greedy algorithm to fill pages. Each page break is chosen in such a way that at this point it is optimal. In order to achieve this, TeX actually typesets more material than would fit on the page and then chooses the break with the smallest badness. For example, if there's some vertical space that can be stretched or shrunk, then widows and orphans can be avoided in most cases. Or TeX uses the available shrink to fit not only a section header but also the first two lines of the corresponding section onto the current page.

What TeX does not do is optimizing over the whole set of page breaks. So it can happen that the first page break is just great, but only at the price of the second one being lousy; TeX doesn't look as far ahead as the next page break. The main reason for this seems to be memory constraints of the computers at the time TeX was designed. From the TeXbook, page 110:

TeX breaks lists of lines into pages by computing badness ratings and penalties, more or less as it does when breaking paragraphs into lines. But pages are made up one at a time and removed from TeX’s memory; there is no looking ahead to see how one page break will affect the next one. In other words, TeX uses a special method to find the optimum breakpoints for the lines in an entire paragraph, but it doesn’t attempt to find the optimum breakpoints for the pages in an entire document. The computer doesn’t have enough high-speed memory capacity to remember the contents of several pages, so TeX simply chooses each page break as best it can, by a process of “local” rather than “global” optimization.

It appears that the typesetting system Lout does some optimization of page breaks over several pages:

Lout uses Knuth's (the author of TeX, on which LaTeX is based) optimal line breaking algorithm, and has extended it to paragraph breaking across pages.