Any progress on Knuth-Plass algorithm?

What Plass proved in his PhD thesis was that paginating documents using dynamic programming with very specific functions for optimisation are NP complete but also that for other functions this is not the case. Nevertheless Knuth felt that there isn't any method (i.e., a reasonable goal function to minimize) that would both work in practice with the computer power at hand and at the same time would provide reasonable results. Thus TeX ended up using a simple first-fit algorithm for page breaking.

Basically we have to remind ourselves that there is no such thing as solving "the line breaking problem" or solving the "page breaking problem". It depends all on what criteria you add into your algorithm and what kind of functions you look at that should get optimized (i.e., that define your "quality") and each combination of those result in quite a different problem being solved. Knuth's linebreaking algorithm, for example, is only optimal with respect to the goal function it minimizes (which for most practical situations provides a useful definition of quality), but clearly it has no idea of rivers in a paragraph and will happily produce them. The demerits he is minimizing are carefully restricted so that you don't need to keep track how you reached a certain breakpoint (other than knowing whether the previous line was loosely or tightly set but not what happened earlier). River processing would throw that off guard and probably make the linebreaking problem NP complete.

Wohlfeil in his PhD (and the paper with Anne you cite is an earlier version of this work) looks at the following problem:

  • the document model consists of text and figures
  • figures can float but will preserve order
  • they will not appear before their main reference (or are at least visible from there, i.e., they may float slightly in front of the main reference)
  • the quality function is based on looking at how many pages one has to turn to see a figure from its main reference

(there are a few other variations and extensions like footnotes, or double spreads that may be larger than others but the above is the gist of the page model considered)

For that he has "solved" the pagination problem, the question is however if that is in practical terms sufficient to be used in real life production. In his PhD he documents a prototype implementation called X-Formatter, but that system never appeared in the wild (as far as I know).

In my opinion it is not a generally usable solution (though a clear step towards such a system) as it doesn't cover, for example:

  • use of more complex layouts with more than one column
  • more than a single figure stream where different types of floats can surpass each other while competing for space.

I do think, however, that he has clearly shown that there are goal functions that are measuring "quality" in a useful practical way that are usable with a dynamic programming approach and given todays computers could perhaps be generalized to provide a system the results useful paginations in acceptable time.