How many lines of code does the original TeX contain?

  • The tex.web file contains 24985 lines at the moment. This is in the literate-programming style, so has a lot of what you might call comments.

  • When passed through tangle (e.g. tangle tex.web), you get Pascal code, but this is comparable to the code produced by a JavaScript minifier: statements are all jammed together on the line and there are line breaks at essentially ad-hoc places. (There are about 6115 lines in this file but that does not really mean anything.)

  • If you run this tex.p through a Pascal pretty-printer, that may be a better indication of the size of the program: I ran ptop -l 10000 tex.p tex.pretty.p and the resulting file was 20619 lines long. Of course this depends on the prettifier etc.

  • Instead if you run tex.web through weave, you get a typeset listing of the program include the “comments”; this is the way the program is intended to be read, and the form in which it has been published as a book. At standard settings, the resulting PDF has 1379 modules followed by the index, printed over about 500 pages.

  • This tex.web is in some sense not the “full” program but a putative “common denominator” program, intended to be easily portable to any Pascal compiler that was around at the time (1982). When ported to one such computer installation (OS, file conventions, etc) it would be accompanied by a (hopefully small) “change-file”. (In fact such a changefile was used even at the original TeX installation at Stanford, where the local editor had text files containing pages, the keyboard included quite a few non-ASCII characters, etc.) So you may want to take such a changefile into account too.

Finally, the image below (via here), in its middle region (i.e. if you ignore the four big boxes on the outside) shows the rough relative sizes (in terms of lines of code / number of modules) of the different parts of the program:

Knuth drawing


Edit: The above was just a quick answer I posted in the morning before going out, but in case there's some further interest:

  • Knuth's 1989 paper The Errors of TeX (DOI 10.1002/spe.4380190702), an updated version of which is reprinted in his collection Literate Programming, contains a definitive and detailed account of TeX's development. His own count, from the paper:

    Now TeX82 is in its third and final phase.* It has grown from the original 4600 statements in SAIL to 1376 modules in WEB, representing about 14,000 statements in Pascal.

    (*A footnote added in 1991, and thus only in the book, mentions the (7-bit→8-bit) “major changes in 1989 that can be said to have inaugurated “Phase 4” of TeX82”.)

  • For more than you want to know: I have started compiling a (very incomplete right now) web page about the program and my attempts to read it. :-)

  • In particular, to get a sense of the size of the program, you may want to look at Richard Sandberg's manual translation of tex.web into C++, which keeps only the code and almost no comments: .cpp (17426 lines (16045 sloc)), .h (3068 lines (2675 sloc))

  • Almost all usage of TeX today is via huge macro packages like LaTeX running on extended TeX programs, so (even not counting the size of the macro packages, more lines than TeX itself!) you may want to consider eTeX, pdfTeX, and XeTeX which are roughly 15%, 50–60%, and (relying heavily on system libraries) 35% bigger than Knuth TeX, respectively.


To supplement ShreevatsaR excellent answer, here is the beginning of tex.p, the “tangled” form of tex.web

{4:}{9:}{$C-,A+,D-}{[$C+,D+]}{:9}program TEX;label{6:}1,9998,9999;
{:6}const{11:}memmax=30000;memmin=0;bufsize=500;errorline=72;
halferrorline=42;maxprintline=79;stacksize=200;maxinopen=6;fontmax=75;
fontmemsize=20000;paramsize=60;nestsize=40;maxstrings=3000;
stringvacancies=8000;poolsize=32000;savesize=600;triesize=8000;
trieopsize=500;dvibufsize=800;filenamesize=40;
poolname='TeXformats:TEX.POOL                     ';

which corresponds to this part of tex.dvi, the “weaved form” of tex.web:

enter image description here

The listing of tex.web is

@ The program begins with a normal \PASCAL\ program heading, whose
components will be filled in later, using the conventions of \.{WEB}.
@.WEB@>
For example, the portion of the program called `\X\glob:Global
variables\X' below will be replaced by a sequence of variable declarations
that starts in $\section\glob$ of this documentation. In this way, we are able
to define each individual global variable when we are prepared to
understand what it means; we do not have to define all of the globals at
once.  Cross references in $\section\glob$, where it says ``See also
sections \gglob, \dots,'' also make it possible to look at the set of
all global variables, if desired.  Similar remarks apply to the other
portions of the program heading.

Actually the heading shown here is not quite normal: The |program| line
does not mention any |output| file, because \ph\ would ask the \TeX\ user
to specify a file name if |output| were specified here.
@:PASCAL H}{\ph@>
@^system dependencies@>

@d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:}
@f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'}
@f type==true {but `|type|' will not be treated as a reserved word}

@p @t\4@>@<Compiler directives@>@/
program TEX; {all file names are defined dynamically}
label @<Labels in the outer block@>@/
const @<Constants in the outer block@>@/
mtype @<Types in the outer block@>@/
var @<Global variables@>@/
@#
procedure initialize; {this procedure gets things started properly}
  var @<Local variables for initialization@>@/
  begin @<Initialize whatever \TeX\ might access@>@;
  end;@#
@t\4@>@<Basic printing procedures@>@/
@t\4@>@<Error handling procedures@>@/

Clearly, the number of code lines in tex.p is a very rough indication of the complexity of the program. The name tangle of the program that extracts the Pascal source from the Web file exactly expresses what it does: it makes a “literate program” into something very compact and nor really human readable.

Examining the C version used for compiling TeX in TeX Live is not conclusive either, because also the C source is quite tangled.

Tags:

Web

Line

Tex Core