Why, in 2017, does LaTeX not use tree-like structures?

The short answer: this is just the way TeX is written, and no one has written anything better yet, as concisely described in the answer by @percusse.

For a longer answer, clearing up some likely misconceptions in “Why in 2017, does LaTeX…” (about what we're talking about, what is being developed and what not, and what has been done or is on the horizon), see below. This answer grew too long, so it needs a table of contents. :-)

  1. The stability of TeX
  2. LaTeX: macros for document authors
  3. TeX extensions and other systems
  4. Summary

1. The stability of TeX

LaTeX has been created nearly 35 years ago

TeX (the program, not LaTeX the macro package) was designed in 1977, became popular, and was rewritten in 1982, 35 years ago. At this point, Knuth declared TeX stable, that he would only fix bugs, not make further changes or add features (he needed to get back to his real work), though others were free to take his code and write new systems.

I cannot right now find a reference for Knuth declaring TeX stable in 1982, but you can see some remarks from when he details its story in 1986, and he did make a change from 7-bit to 8-bit in 1989, and at that time you can see him say in 1989 that

For more than five years I held firm to my conviction that a stable system was far better than a system that continues to evolve.

and in 1990 that

My work on developing TeX, METAFONT, and Computer Modern has come to an end. I will make no further changes except to correct extremely serious bugs. […] I strongly believe that an unchanging system has great value, even though it is axiomatic that any complex system can be improved. Therefore I believe that it is unwise to make further "improvements" to the systems called TeX and METAFONT. Let us regard these systems as fixed points, which should give the same results 100 years from now that they produce today.

Also, on others building better systems:

I have put these systems into the public domain so that people everywhere can use the ideas freely if they wish. […] anybody can make use of my programs in whatever way they wish, as long as they do not use the names TeX, METAFONT, or Computer Modern. In particular, any person or group who wants to produce a program superior to mine is free to do so. […] Of course I do not claim to have found the best solution to every problem. I simply claim that it is a great advantage to have a fixed point as a building block. […] I welcome continued research that will lead to alternative systems that can typeset documents better than TeX is able to do.

He had exhaustively documented both the program behaviour (via The TeXbook) and the program itself (via a method he called literate programming, and published in book form as TeX: The Program), and given talks on the internals of the TeX program. Later he even taught a course with the program's source code as the textbook. As a result of all this, TeX is arguably the most documented program of its size, ever, and it was ripe for others to build new systems on top of it.

However, for various reasons (such as perhaps the fact that the whole program is monolithic and the documentation assumes that you want to understand the program in its entirety down to the smallest detail: that's just the way Knuth thinks), fewer people have written extensions / modified the program, than Knuth imagined. See from 1996:

Fred: I heard you say you expected more people to extend TeX than have done so.

DEK: Yeah, absolutely. I expected extensions whenever someone had a special-purpose important project, like the Encyclopedia Britannica, or making an Arabic–Chinese dictionary, or whatever—a large project. I never expected that one tool would be able to handle everybody’s exotic projects. So I built a lot of hooks into the code so that it should be fairly easy for a computer science graduate to set up a new program for special occasions in a week or so. That was my thought. But I don’t think people have done that very much.

It’s certainly what I would have done! […] Rewriting a typesetting system is fairly easy. [laughter]

I tried to show how to do it, by implementing several of the features of TeX as if they were added on after, just to show how to use the hooks, as a demo. But that didn’t get things going. So, many more people are working with TeX at the macro level. Of course, the big advantage is that then you can share your output with others—you can assume it’s going to work on everybody else’s systems. But still, I thought special projects would lead to a lot of custom versions of the program. That hasn’t happened.

And again:

?: I'd like to ask about using parts of the TeX source. You made clear that the programmers were free to incorporate parts of the TeX source into their own programs. […]

DEK: I thought it would be fairly common to have special versions of TeX. I designed TeX so that it has many hooks inside […]

A macro language is Turing-complete—it can do anything—but it's certainly silly to try to do everything in a high-level language when it's so easy to do it at the lower level. Therefore I built in hooks to TeX and I implemented parts of TeX as demonstrations of these hooks, so that a person who read the code could see how to extend TeX to other things. We anticipated certain kinds of things for chemistry or for making changebars that would be done in the machine language for special applications.

That was what I thought would occur. And certainly, there was a point in the middle 80s when there were more than a thousand people in the world that knew the TeX program, that knew the intricacies of the TeX program quite well. They had read it, and they would have been able to make any of these extensions if they wanted. Now I would say that the number of people with a working knowledge of TeX's innards is probably less than a thousand, more than a hundred. It hasn't developed to the extent that I expected.

So part of the answer to your question is that not enough others have built sufficiently newer systems (we'll see later about the few that exist); you're basically still using a program that was designed in 1977, not 2017. Knuth evidently overestimated the readability of his code, or the willingness and ability of others to read and extend his pseudo-Pascal source code.

LaTeX: macros for document authors

When the need for TeX arose, Knuth already had the content prepared and polished (it was the second edition of Volume 2 of The Art of Computer Programming), and he had the first edition as reference for the book design. He only needed typesetting, and built TeX as a tool for doing by computer what, under hot-metal typesetting, a type compositor (a type setter) did by hand, or with a special machine like Monotype. So it has a lot of the capabilities needed for printing fine books. Its primitives are things like picking up characters from different fonts and placing them at different points on the page, raising and lowering characters, leaving a certain amount of blank space here, breaking a page there, and most interestingly, breaking paragraphs into lines for a pleasing result.

These are things that you care about when you're thinking like a type compositor, which makes sense for Knuth: he does the writing and polishing of content by hand, so when he approaches the computer to type things into TeX, it is for typesetting and controlling the appearance, which is only the very last tiny fraction of his total work on a book. For most authors (likely to type into a computer from the very beginning), they need a system that can assist them throughout the entire process of document preparation, and for most of that duration what an author needs to think about is their content and its structure, not typesetting or appearance. Moreover they don't want to have to bother about designing of the appearance either.

At around this time, there were other systems of document production by computer, though none of these were good enough (as used with the devices at the time) to produce what Knuth thought of as “real” books. At Bell Labs there was troff, whose eqn system for mathematics (published 1975) was in fact an inspiration for TeX's math syntax. There was PUB, which Knuth had used. One of these was Scribe: see this (10MB) PowerPoint presentation, a retrospective from 1998. It had syntax like:

@Chapter(Introduction)
@Section(Running Scribe)
@Begin(Quotation)
    Let's start at the very beginning, a very good place to start
@End(Quotation)

and so on. (Around this time, and in parallel to Scribe, GML and then SGML were developed, eventually leading to HTML and XML.) Leslie Lamport, who had used Scribe, brought these ideas to TeX. With the macro package he wrote (LaTeX), users could get both TeX's fine typesetting, and Scribe's ease-of-use, with its structured markup, separation of form and content, logical structure, etc. You can read more about this philosophy in Lamport's Document Production: Visual or Logical?, which indicates that one of the goals of LaTeX was to make formatting harder, and logical structure easier.

In my (2017) opinion, LaTeX was/is simultaneously a truly great idea (because it matches how authors think or should think) and a mistake (because it is written in TeX macros, instead of a more reasonable programming language).

One could imagine an alternative universe where LaTeX was written in a higher-level language, to take your input file, represent it as a logical structure (something like an abstract syntax tree), then perform transformations on the tree to produce raw TeX markup, which is then fed to the TeX engine which simply does the typesetting. If you wanted to change its working you would have options to tweak the transformations that happen at various stages. I think something like this is what you're asking, when you write:

we can imagine that all the functions take in as a parameter a document tree, and then do a tree-rewriting on it, producing another document tree, that can be then processed by another function, and so on.

Indeed, we can imagine. The fact that it is not written like this is a reason why, despite such structured markup, it's hard to get reliable transformation from LaTeX into other formats like HTML: even though the markup is structured, the meaning of any individual string of markup depends on various details of the TeX typesetting engine, what packages have been loaded / macros defined, and various bits of “state”.

Instead, LaTeX is written in the TeX macro “language” (which wasn't even originally intended to be a programming language, but merely as a way of doing simple text substitutions and shortcuts to save typing), using a bunch of clever tricks (see some of their precursors in Some TeX programming hacks by Lamport in 1982) to emulate high-level programming constructs. There were good reasons at the time: there were no widespread standard programming languages (every computer came with its own OS and its own set of supported compilers and languages), and TeX macros were guaranteed to work, and people had already started pushing macros farther than they should have been IMO (even Knuth from his very first design). So its implementation, too, reflects the constraints of the time. Lamport himself declared “no more new features” with LaTeX 2.09 in 1985, though later there was LaTeX2ε. Later work on LaTeX, such as expl3, only takes this even further, making the macros more and more complicated underneath while providing cleaner and cleaner interfaces to the user. This is a philosophy that makes sense from a certain point of view, and not from others.

And once there's a bulk of well-tested LaTeX code (with decades of experience in usage), and the goal of not breaking users' existing documents, it gets harder and harder to completely rewrite the system, even if anyone wanted to do. Even though there exists a LaTeX development team, they're focused on LaTeX3 and making things better for users, not on rewriting everything and risking breakage. Most users of LaTeX are simply users (not programmers), and it makes sense to optimize for their needs, rather than for cleanliness of implementation.

I think a couple of the problems you mention are merely from repeating the same mistakes/misunderstanding:

why, in 2017, is LaTeX still sooo complicated to program?

1) Computing is hard to do […] multiply two floats […]

2) The use of tokens […] your code will contain plenty of \expandafter\expandafter\expandafter, \noexpand... that make it completely unreadable

Well, why are you trying to write programs on top of TeX/LaTeX? It is a system for typesetting with a package for document authoring, not a programming environment with high-quality warnings, a debugger, and so on. If you are aware of these problems and don't mind them or even enjoy the challenge (there are many such people on this site), it's one thing. But for everyone else, nothing stops you from doing all your programming elsewhere (in a “real” programming language), and feeding the output into (La)TeX. That in fact is what I'd recommend. Complaining that LaTeX is not a good environment for programming does not make sense when it isn't intended as one: it's only intended as a system for document authoring and typesetting, and that it can do well.

Also, when you say

3) When you want to do advanced presentation, 99% of the time you will need to use crazy hacks to achieve what you want, that need a very good understanding of how the elements are coded. For example, I'm not even sure that it's possible to add a given code at the end of a page, for example to put a big stamp in the middle of all pages (but I may be wrong).

this is simply a symptom of not having learned the typesetting system (TeX) in the first place. Both of those things are straightforward in TeX (well the “stamp” probably involves a DVI or PDF special which you'll need to know how to write, but placing that on the page is not a problem), though it is certainly possible that in LaTeX you would have to fight and understand layers of macros, but that was the whole point of LaTeX: to give you structure rather than control over formatting.

TeX extensions and other systems

Let's return to the point from earlier, about writing new programs that build on TeX (either extend it, or use its ideas and write an entirely new one).

It has been tried, even done. In the late 1970s, when there were very few cross-platform compilers (and therefore languages), it was common for programmers to take someone's program, read it entirely and “port” it to a new language or system. People did read and understand the TeX program and wrote it in new languages like C. (I'm not talking of automated translation like web2c, but implementations like CommonTeX that were written by hand.) In later decades the practice generally declined: most programmers today would rather use someone else's library and write their own code on top of it, than read it and rewrite it.

Still, a few people did modify the TeX program, and if you look at the ones that survive, most of them solved specific user needs, rather than being written from the perspective of improving the implementation or making it easier to program:

  • Peter Breitenlohner extended TeX to eTeX (more registers etc). (His obituary “Once DEK commented that probably Peter knew the TeX code better than he himself.”)
  • Hàn Thế Thành extended TeX to pdfTeX, so as to produce PDF output directly, rather than DVI output. Later these incorporated the eTeX extensions as well. Today, in a typical distribution like TeX Live, when you run LaTeX (even if you run it as latex for DVI output rather than as pdflatex), the program that runs is pdfTeX.
  • Jonathan Kew extended TeX to XeTeX, so that the system can natively use Unicode and system fonts for typography.

All of these people edited tex.web to produce their extensions. To get an idea of what this involves, you can read the code of tex.web (which is available neatly formatted and even as a book, thanks to Knuth's Literate Programming), and maybe etex.ch or pdftex.web or xetex.web, and try your hand at modifying it yourself, so you understand the complexity. (Just be sure to give it a new name if you ever make it a program.)

There was however one system that did try to rewrite TeX purely for the sake of improving the implementation, so that it would use more modern programming practices, be easier to modify, etc. This was NTS, the New Typesetting System, a complete rewrite of TeX in Java. According to the Wikipedia page, it started in 1992, the coding started in 1998, and it was finished in 2000. At completion, although at some point presented as a success story, in practice it has been a failure: it was too slow, and didn't run LaTeX documents (didn't have the eTeX and pdfTeX extensions either), so no one started using it. The expected benefits from Java-ization did not materialize. (Actually NTS turns out to be usable today as computers have become faster, but… there are no clear reasons for using it. Some have said it faithfully reproduces most of the same problems that TeX has, just wrapped in Java classes. See some discussion here, though be warned there's a lot of Usenet flamewars there, and more heat than light.)

There are also some other typesetting systems that use no TeX source code at all, but use some of its ideas (some more than others): Lout, Patoline, SILE. I suspect, though, that any typesetting system that does not work with the bulk of existing LaTeX packages has not much hope of gaining adoption within the already very small audience (academia, etc) for technically involved typesetting.

Which brings us to the only other major implementation: LuaTeX. This started with a hand-translated C version of TeX (CXTeX see also LuaTeX says goodbye to Pascal), and it has progressed towards precisely the kind of things mentioned in the question:

assign each function to a given step. And if you need to be more precise, you can just say "to run my function F, I need to have the result of the function G, and I need to run before the function H". More or less like in the linux init program systemd (well, at least I think it works like this). I'm pretty sure that it would be more natural to program complicated stuff in LaTeX!

Well I cannot say that that's exactly how LuaTeX works, but LuaTeX does have hooks for various stages of typesetting, which allow for things to be done more elegantly. (Plus Lua is a more conventional language, compared to a macro / token-rewriting system like the one in TeX.) Here are some examples where I found it helpful:

  • parsing HH:MM time: using Lua avoids having to worry about token expansion
  • converting between encodings: I'm sure the task (writing a UTF-8 decoder, reading a table from a file, etc.) can be done in TeX macros too, but it's more natural in Lua
  • avoiding short words at line breaks: easier to influence line-breaking with the appropriate hook in pre_linebreak_filter, than doing it with TeX macros.
  • find permutation: some programming with word-matching
  • looping over characters: to look up catcodes (don't really need Lua)
  • generate digits of pi: for a pretty picture with TikZ
  • fit section if possible: simple subtraction of parameters (don't really need Lua)

There are many better examples by others, and as more people get used to LuaTeX and its possibilities, I believe we'll evolve towards more elegant and readable code that does the right thing at the right time (in the right hook). (That is, instead of figuring out how to set things up with macros so that eventually when TeX reaches the right phase of operation we care about, it does what we want, without any ill-effects on earlier or later phases — most of the token discomfort comes from this.) Already you can in principle generate documents with LuaTeX completely programmatically: see TeX without TeX.

But LuaTeX is still in development (see its roadmap), and we haven't seen the end of it yet (I hope).

Summary

  • The design of the TeX program you use is substantially the same as it was in 1977.
  • LaTeX is written in TeX macros, not a great programming environment for doing fancy things like tree transformations of the entire document.
  • LaTeX today is built on its version from the mid 1980s, and development has been mostly accumulative rather than radical rewrites or refactoring.
  • Some people have tried to rewrite the TeX program, but few have succeeded.
  • LuaTeX is one of the “new” programs that already lets you do things more elegantly, hook into transformations, and write code that rarely has to worry about token expansions.
  • The situation is not yet ideal and things may improve.
  • You can do your programming “outside” and feed that input into (La)TeX; this way you can use as powerful tools as you wish.

Because TeX is designed that way. Everybody would love to have a more modern language but somebody has to write it in a modern way.


TeX was designed 35 years ago. LaTeX is only macro package which uses TeX. All what you are mention in our question is concerned to TeX, no LaTeX. Please, remove the prefix "La" in all occurrences in your question.

And why there is not better typographical system today? Because nobody created it. Trials (with a general new concept of programming of typographical process) existed but without any success. There exist only conservative extensions today: LuaTeX, XeTeX.